linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Support memory hot-delete to boot memory
@ 2013-04-08 17:09 Toshi Kani
  2013-04-08 17:09 ` [PATCH v2 1/3] resource: Add __adjust_resource() for internal use Toshi Kani
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Toshi Kani @ 2013-04-08 17:09 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, linux-kernel, linuxram, guz.fnst, tmac,
	isimatu.yasuaki, wency, tangchen, jiang.liu, Toshi Kani

Memory hot-delete to a memory range present at boot causes an
error message in __release_region(), such as:

 Trying to free nonexistent resource <0000000070000000-0000000077ffffff>

Hot-delete operation still continues since __release_region() is 
a void function, but the target memory range is not freed from
iomem_resource as the result.  This also leads a failure in a 
subsequent hot-add operation to the same memory range since the
address range is still in-use in iomem_resource.

This problem happens because the granularity of memory resource ranges
may be different between boot and hot-delete.  During bootup,
iomem_resource is set up from the boot descriptor table, such as EFI
Memory Table and e820.  Each resource entry usually covers the whole
contiguous memory range.  Hot-delete request, on the other hand, may
target to a particular range of memory resource, and its size can be
much smaller than the whole contiguous memory.  Since the existing
release interfaces like __release_region() require a requested region
to be exactly matched to a resource entry, they do not allow a partial
resource to be released.

This patchset introduces release_mem_region_adjustable() for memory
hot-delete operations, which allows releasing a partial memory range
and adjusts remaining resource accordingly.  This patchset makes no
changes to the existing interfaces since their restriction is still
valid for I/O resources.

---
v2: Updated release_mem_region_adjustable() per code reviews from
Yasuaki Ishimatsu, Ram Pai and Gu Zheng. 

---
Toshi Kani (3):
 resource: Add __adjust_resource() for internal use
 resource: Add release_mem_region_adjustable()
 mm: Change __remove_pages() to call release_mem_region_adjustable()

---
 include/linux/ioport.h |   2 +
 kernel/resource.c      | 128 ++++++++++++++++++++++++++++++++++++++++++++-----
 mm/memory_hotplug.c    |  11 ++++-
 3 files changed, 126 insertions(+), 15 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/3] resource: Add __adjust_resource() for internal use
  2013-04-08 17:09 [PATCH v2 0/3] Support memory hot-delete to boot memory Toshi Kani
@ 2013-04-08 17:09 ` Toshi Kani
  2013-04-10  6:10   ` David Rientjes
  2013-04-08 17:09 ` [PATCH v2 2/3] resource: Add release_mem_region_adjustable() Toshi Kani
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Toshi Kani @ 2013-04-08 17:09 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, linux-kernel, linuxram, guz.fnst, tmac,
	isimatu.yasuaki, wency, tangchen, jiang.liu, Toshi Kani

Added __adjust_resource(), which is called by adjust_resource()
internally after the resource_lock is held.  There is no interface
change to adjust_resource().  This change allows other functions
to call __adjust_resource() internally while the resource_lock is
held.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
---
 kernel/resource.c |   35 ++++++++++++++++++++++-------------
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 73f35d4..ae246f9 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -706,24 +706,13 @@ void insert_resource_expand_to_fit(struct resource *root, struct resource *new)
 	write_unlock(&resource_lock);
 }
 
-/**
- * adjust_resource - modify a resource's start and size
- * @res: resource to modify
- * @start: new start value
- * @size: new size
- *
- * Given an existing resource, change its start and size to match the
- * arguments.  Returns 0 on success, -EBUSY if it can't fit.
- * Existing children of the resource are assumed to be immutable.
- */
-int adjust_resource(struct resource *res, resource_size_t start, resource_size_t size)
+static int __adjust_resource(struct resource *res, resource_size_t start,
+				resource_size_t size)
 {
 	struct resource *tmp, *parent = res->parent;
 	resource_size_t end = start + size - 1;
 	int result = -EBUSY;
 
-	write_lock(&resource_lock);
-
 	if (!parent)
 		goto skip;
 
@@ -751,6 +740,26 @@ skip:
 	result = 0;
 
  out:
+	return result;
+}
+
+/**
+ * adjust_resource - modify a resource's start and size
+ * @res: resource to modify
+ * @start: new start value
+ * @size: new size
+ *
+ * Given an existing resource, change its start and size to match the
+ * arguments.  Returns 0 on success, -EBUSY if it can't fit.
+ * Existing children of the resource are assumed to be immutable.
+ */
+int adjust_resource(struct resource *res, resource_size_t start,
+			resource_size_t size)
+{
+	int result;
+
+	write_lock(&resource_lock);
+	result = __adjust_resource(res, start, size);
 	write_unlock(&resource_lock);
 	return result;
 }

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/3] resource: Add release_mem_region_adjustable()
  2013-04-08 17:09 [PATCH v2 0/3] Support memory hot-delete to boot memory Toshi Kani
  2013-04-08 17:09 ` [PATCH v2 1/3] resource: Add __adjust_resource() for internal use Toshi Kani
@ 2013-04-08 17:09 ` Toshi Kani
  2013-04-10  6:16   ` David Rientjes
  2013-04-08 17:09 ` [PATCH v2 3/3] mm: Change __remove_pages() to call release_mem_region_adjustable() Toshi Kani
  2013-04-08 20:44 ` [PATCH v2 0/3] Support memory hot-delete to boot memory Andrew Morton
  3 siblings, 1 reply; 13+ messages in thread
From: Toshi Kani @ 2013-04-08 17:09 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, linux-kernel, linuxram, guz.fnst, tmac,
	isimatu.yasuaki, wency, tangchen, jiang.liu, Toshi Kani

Added release_mem_region_adjustable(), which releases a requested
region from a currently busy memory resource.  This interface
adjusts the matched memory resource accordingly even if the
requested region does not match exactly but still fits into.

This new interface is intended for memory hot-delete.  During
bootup, memory resources are inserted from the boot descriptor
table, such as EFI Memory Table and e820.  Each memory resource
entry usually covers the whole contigous memory range.  Memory
hot-delete request, on the other hand, may target to a particular
range of memory resource, and its size can be much smaller than
the whole contiguous memory.  Since the existing release interfaces
like __release_region() require a requested region to be exactly
matched to a resource entry, they do not allow a partial resource
to be released.

There is no change to the existing interfaces since their restriction
is valid for I/O resources.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
---
 include/linux/ioport.h |    2 +
 kernel/resource.c      |   93 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 95 insertions(+)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 85ac9b9b..0fe1a82 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -192,6 +192,8 @@ extern struct resource * __request_region(struct resource *,
 extern int __check_region(struct resource *, resource_size_t, resource_size_t);
 extern void __release_region(struct resource *, resource_size_t,
 				resource_size_t);
+extern int release_mem_region_adjustable(struct resource *, resource_size_t,
+				resource_size_t);
 
 static inline int __deprecated check_region(resource_size_t s,
 						resource_size_t n)
diff --git a/kernel/resource.c b/kernel/resource.c
index ae246f9..870fb26 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1021,6 +1021,99 @@ void __release_region(struct resource *parent, resource_size_t start,
 }
 EXPORT_SYMBOL(__release_region);
 
+/**
+ * release_mem_region_adjustable - release a previously reserved memory region
+ * @parent: parent resource descriptor
+ * @start: resource start address
+ * @size: resource region size
+ *
+ * The requested region is released from a currently busy memory resource.
+ * It adjusts the matched busy memory resource accordingly if the requested
+ * region does not match exactly but still fits into.  Existing children of
+ * the busy memory resource must be immutable in this request.
+ *
+ * Note, when the busy memory resource gets split into two entries, the code
+ * assumes that all children remain in the lower address entry for simplicity.
+ * Enhance this logic when necessary.
+ */
+int release_mem_region_adjustable(struct resource *parent,
+			resource_size_t start, resource_size_t size)
+{
+	struct resource **p;
+	struct resource *res, *new;
+	resource_size_t end;
+	int ret = -EINVAL;
+
+	end = start + size - 1;
+	if ((start < parent->start) || (end > parent->end))
+		return ret;
+
+	p = &parent->child;
+	write_lock(&resource_lock);
+
+	while ((res = *p)) {
+		if (res->start >= end)
+			break;
+
+		/* look for the next resource if it does not fit into */
+		if (res->start > start || res->end < end) {
+			p = &res->sibling;
+			continue;
+		}
+
+		if (!(res->flags & IORESOURCE_MEM))
+			break;
+
+		if (!(res->flags & IORESOURCE_BUSY)) {
+			p = &res->child;
+			continue;
+		}
+
+		/* found the target resource; let's adjust accordingly */
+		if (res->start == start && res->end == end) {
+			/* free the whole entry */
+			*p = res->sibling;
+			kfree(res);
+			ret = 0;
+		} else if (res->start == start && res->end != end) {
+			/* adjust the start */
+			ret = __adjust_resource(res, end + 1,
+						res->end - end);
+		} else if (res->start != start && res->end == end) {
+			/* adjust the end */
+			ret = __adjust_resource(res, res->start,
+						start - res->start);
+		} else {
+			/* split into two entries */
+			new = kzalloc(sizeof(struct resource), GFP_KERNEL);
+			if (!new) {
+				ret = -ENOMEM;
+				break;
+			}
+			new->name = res->name;
+			new->start = end + 1;
+			new->end = res->end;
+			new->flags = res->flags;
+			new->parent = res->parent;
+			new->sibling = res->sibling;
+			new->child = NULL;
+
+			ret = __adjust_resource(res, res->start,
+						start - res->start);
+			if (ret) {
+				kfree(new);
+				break;
+			}
+			res->sibling = new;
+		}
+
+		break;
+	}
+
+	write_unlock(&resource_lock);
+	return ret;
+}
+
 /*
  * Managed region resource
  */

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/3] mm: Change __remove_pages() to call release_mem_region_adjustable()
  2013-04-08 17:09 [PATCH v2 0/3] Support memory hot-delete to boot memory Toshi Kani
  2013-04-08 17:09 ` [PATCH v2 1/3] resource: Add __adjust_resource() for internal use Toshi Kani
  2013-04-08 17:09 ` [PATCH v2 2/3] resource: Add release_mem_region_adjustable() Toshi Kani
@ 2013-04-08 17:09 ` Toshi Kani
  2013-04-08 20:44 ` [PATCH v2 0/3] Support memory hot-delete to boot memory Andrew Morton
  3 siblings, 0 replies; 13+ messages in thread
From: Toshi Kani @ 2013-04-08 17:09 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, linux-kernel, linuxram, guz.fnst, tmac,
	isimatu.yasuaki, wency, tangchen, jiang.liu, Toshi Kani

Changed __remove_pages() to call release_mem_region_adjustable().
This allows a requested memory range to be released from
the iomem_resource table even if it does not match exactly to
an resource entry but still fits into.  The resource entries
initialized at bootup usually cover the whole contiguous
memory ranges and may not necessarily match with the size of
memory hot-delete requests.

If release_mem_region_adjustable() failed, __remove_pages() logs
an error message and continues to proceed as it was the case
with release_mem_region().  release_mem_region(), which is defined
to __release_region(), logs an error message and returns no error
since a void function.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
---
 mm/memory_hotplug.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 57decb2..c916582 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -705,8 +705,10 @@ EXPORT_SYMBOL_GPL(__add_pages);
 int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 		 unsigned long nr_pages)
 {
-	unsigned long i, ret = 0;
+	unsigned long i;
 	int sections_to_remove;
+	resource_size_t start, size;
+	int ret = 0;
 
 	/*
 	 * We can only remove entire sections
@@ -714,7 +716,12 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 	BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK);
 	BUG_ON(nr_pages % PAGES_PER_SECTION);
 
-	release_mem_region(phys_start_pfn << PAGE_SHIFT, nr_pages * PAGE_SIZE);
+	start = phys_start_pfn << PAGE_SHIFT;
+	size = nr_pages * PAGE_SIZE;
+	ret = release_mem_region_adjustable(&iomem_resource, start, size);
+	if (ret)
+		pr_warn("Unable to release resource <%016llx-%016llx> (%d)\n",
+				start, start + size - 1, ret);
 
 	sections_to_remove = nr_pages / PAGES_PER_SECTION;
 	for (i = 0; i < sections_to_remove; i++) {

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/3] Support memory hot-delete to boot memory
  2013-04-08 17:09 [PATCH v2 0/3] Support memory hot-delete to boot memory Toshi Kani
                   ` (2 preceding siblings ...)
  2013-04-08 17:09 ` [PATCH v2 3/3] mm: Change __remove_pages() to call release_mem_region_adjustable() Toshi Kani
@ 2013-04-08 20:44 ` Andrew Morton
  2013-04-08 20:58   ` Toshi Kani
  3 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2013-04-08 20:44 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-mm, linux-kernel, linuxram, guz.fnst, tmac,
	isimatu.yasuaki, wency, tangchen, jiang.liu

On Mon,  8 Apr 2013 11:09:53 -0600 Toshi Kani <toshi.kani@hp.com> wrote:

> Memory hot-delete to a memory range present at boot causes an
> error message in __release_region(), such as:
> 
>  Trying to free nonexistent resource <0000000070000000-0000000077ffffff>
> 
> Hot-delete operation still continues since __release_region() is 
> a void function, but the target memory range is not freed from
> iomem_resource as the result.  This also leads a failure in a 
> subsequent hot-add operation to the same memory range since the
> address range is still in-use in iomem_resource.
> 
> This problem happens because the granularity of memory resource ranges
> may be different between boot and hot-delete.

So we don't need this new code if CONFIG_MEMORY_HOTPLUG=n?  If so, can
we please arrange for it to not be present if the user doesn't need it?

>  During bootup,
> iomem_resource is set up from the boot descriptor table, such as EFI
> Memory Table and e820.  Each resource entry usually covers the whole
> contiguous memory range.  Hot-delete request, on the other hand, may
> target to a particular range of memory resource, and its size can be
> much smaller than the whole contiguous memory.  Since the existing
> release interfaces like __release_region() require a requested region
> to be exactly matched to a resource entry, they do not allow a partial
> resource to be released.
> 
> This patchset introduces release_mem_region_adjustable() for memory
> hot-delete operations, which allows releasing a partial memory range
> and adjusts remaining resource accordingly.  This patchset makes no
> changes to the existing interfaces since their restriction is still
> valid for I/O resources.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/3] Support memory hot-delete to boot memory
  2013-04-08 20:44 ` [PATCH v2 0/3] Support memory hot-delete to boot memory Andrew Morton
@ 2013-04-08 20:58   ` Toshi Kani
  2013-04-10  5:52     ` David Rientjes
  0 siblings, 1 reply; 13+ messages in thread
From: Toshi Kani @ 2013-04-08 20:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, linuxram, guz.fnst, tmac,
	isimatu.yasuaki, wency, tangchen, jiang.liu

On Mon, 2013-04-08 at 13:44 -0700, Andrew Morton wrote:
> On Mon,  8 Apr 2013 11:09:53 -0600 Toshi Kani <toshi.kani@hp.com> wrote:
> 
> > Memory hot-delete to a memory range present at boot causes an
> > error message in __release_region(), such as:
> > 
> >  Trying to free nonexistent resource <0000000070000000-0000000077ffffff>
> > 
> > Hot-delete operation still continues since __release_region() is 
> > a void function, but the target memory range is not freed from
> > iomem_resource as the result.  This also leads a failure in a 
> > subsequent hot-add operation to the same memory range since the
> > address range is still in-use in iomem_resource.
> > 
> > This problem happens because the granularity of memory resource ranges
> > may be different between boot and hot-delete.
> 
> So we don't need this new code if CONFIG_MEMORY_HOTPLUG=n?  If so, can
> we please arrange for it to not be present if the user doesn't need it?

Good point!  Yes, since the new function is intended for memory
hot-delete and is only called from __remove_pages() in
mm/memory_hotplug.c, it should be added as #ifdef CONFIG_MEMORY_HOTPLUG
in PATCH 2/3.

I will make the change, and send an updated patch to PATCH 2/3.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/3] Support memory hot-delete to boot memory
  2013-04-08 20:58   ` Toshi Kani
@ 2013-04-10  5:52     ` David Rientjes
  2013-04-10  6:07       ` [patch] mm, hotplug: avoid compiling memory hotremove functions when disabled David Rientjes
  0 siblings, 1 reply; 13+ messages in thread
From: David Rientjes @ 2013-04-10  5:52 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Andrew Morton, linux-mm, linux-kernel, linuxram, guz.fnst, tmac,
	isimatu.yasuaki, wency, tangchen, jiang.liu

On Mon, 8 Apr 2013, Toshi Kani wrote:

> > So we don't need this new code if CONFIG_MEMORY_HOTPLUG=n?  If so, can
> > we please arrange for it to not be present if the user doesn't need it?
> 
> Good point!  Yes, since the new function is intended for memory
> hot-delete and is only called from __remove_pages() in
> mm/memory_hotplug.c, it should be added as #ifdef CONFIG_MEMORY_HOTPLUG
> in PATCH 2/3.
> 
> I will make the change, and send an updated patch to PATCH 2/3.
> 

It should actually depend on CONFIG_MEMORY_HOTREMOVE, but the pseries 
OF_RECONFIG_DETACH_NODE code seems to be the only code that doesn't 
make that distinction.  CONFIG_MEMORY_HOTREMOVE acts as a wrapper to 
protect configs that don't have ARCH_ENABLE_MEMORY_HOTREMOVE, so we'll 
want to keep it around and presumably that powerpc code depends on it as 
well.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [patch] mm, hotplug: avoid compiling memory hotremove functions when disabled
  2013-04-10  5:52     ` David Rientjes
@ 2013-04-10  6:07       ` David Rientjes
  2013-04-10 17:29         ` Toshi Kani
  0 siblings, 1 reply; 13+ messages in thread
From: David Rientjes @ 2013-04-10  6:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Toshi Kani, Benjamin Herrenschmidt, Paul Mackerras,
	Greg Kroah-Hartman, Wen Congyang, Tang Chen, Yasuaki Ishimatsu,
	linux-kernel, linuxppc-dev, linux-mm

__remove_pages() is only necessary for CONFIG_MEMORY_HOTREMOVE.  PowerPC
pseries will return -EOPNOTSUPP if unsupported.

Adding an #ifdef causes several other functions it depends on to also
become unnecessary, which saves in .text when disabled (it's disabled in
most defconfigs besides powerpc, including x86).  remove_memory_block()
becomes static since it is not referenced outside of
drivers/base/memory.c.

Build tested on x86 and powerpc with CONFIG_MEMORY_HOTREMOVE both enabled
and disabled.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 arch/powerpc/platforms/pseries/hotplug-memory.c | 12 +++++
 drivers/base/memory.c                           | 44 +++++++--------
 include/linux/memory.h                          |  3 +-
 include/linux/memory_hotplug.h                  |  4 +-
 mm/memory_hotplug.c                             | 68 +++++++++++------------
 mm/sparse.c                                     | 72 +++++++++++++------------
 6 files changed, 113 insertions(+), 90 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -72,6 +72,7 @@ unsigned long memory_block_size_bytes(void)
 	return get_memblock_size();
 }
 
+#ifdef CONFIG_MEMORY_HOTREMOVE
 static int pseries_remove_memblock(unsigned long base, unsigned int memblock_size)
 {
 	unsigned long start, start_pfn;
@@ -153,6 +154,17 @@ static int pseries_remove_memory(struct device_node *np)
 	ret = pseries_remove_memblock(base, lmb_size);
 	return ret;
 }
+#else
+static inline int pseries_remove_memblock(unsigned long base,
+					  unsigned int memblock_size)
+{
+	return -EOPNOTSUPP;
+}
+static inline int pseries_remove_memory(struct device_node *np)
+{
+	return -EOPNOTSUPP;
+}
+#endif /* CONFIG_MEMORY_HOTREMOVE */
 
 static int pseries_add_memory(struct device_node *np)
 {
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -93,16 +93,6 @@ int register_memory(struct memory_block *memory)
 	return error;
 }
 
-static void
-unregister_memory(struct memory_block *memory)
-{
-	BUG_ON(memory->dev.bus != &memory_subsys);
-
-	/* drop the ref. we got in remove_memory_block() */
-	kobject_put(&memory->dev.kobj);
-	device_unregister(&memory->dev);
-}
-
 unsigned long __weak memory_block_size_bytes(void)
 {
 	return MIN_MEMORY_BLOCK_SIZE;
@@ -637,8 +627,28 @@ static int add_memory_section(int nid, struct mem_section *section,
 	return ret;
 }
 
-int remove_memory_block(unsigned long node_id, struct mem_section *section,
-		int phys_device)
+/*
+ * need an interface for the VM to add new memory regions,
+ * but without onlining it.
+ */
+int register_new_memory(int nid, struct mem_section *section)
+{
+	return add_memory_section(nid, section, NULL, MEM_OFFLINE, HOTPLUG);
+}
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+static void
+unregister_memory(struct memory_block *memory)
+{
+	BUG_ON(memory->dev.bus != &memory_subsys);
+
+	/* drop the ref. we got in remove_memory_block() */
+	kobject_put(&memory->dev.kobj);
+	device_unregister(&memory->dev);
+}
+
+static int remove_memory_block(unsigned long node_id,
+			       struct mem_section *section, int phys_device)
 {
 	struct memory_block *mem;
 
@@ -661,15 +671,6 @@ int remove_memory_block(unsigned long node_id, struct mem_section *section,
 	return 0;
 }
 
-/*
- * need an interface for the VM to add new memory regions,
- * but without onlining it.
- */
-int register_new_memory(int nid, struct mem_section *section)
-{
-	return add_memory_section(nid, section, NULL, MEM_OFFLINE, HOTPLUG);
-}
-
 int unregister_memory_section(struct mem_section *section)
 {
 	if (!present_section(section))
@@ -677,6 +678,7 @@ int unregister_memory_section(struct mem_section *section)
 
 	return remove_memory_block(0, section, 0);
 }
+#endif /* CONFIG_MEMORY_HOTREMOVE */
 
 /*
  * offline one memory block. If the memory block has been offlined, do nothing.
diff --git a/include/linux/memory.h b/include/linux/memory.h
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -114,9 +114,10 @@ extern void unregister_memory_notifier(struct notifier_block *nb);
 extern int register_memory_isolate_notifier(struct notifier_block *nb);
 extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
 extern int register_new_memory(int, struct mem_section *);
+#ifdef CONFIG_MEMORY_HOTREMOVE
 extern int unregister_memory_section(struct mem_section *);
+#endif
 extern int memory_dev_init(void);
-extern int remove_memory_block(unsigned long, struct mem_section *, int);
 extern int memory_notify(unsigned long val, void *v);
 extern int memory_isolate_notify(unsigned long val, void *v);
 extern struct memory_block *find_memory_block_hinted(struct mem_section *,
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -97,13 +97,13 @@ extern void __online_page_free(struct page *page);
 #ifdef CONFIG_MEMORY_HOTREMOVE
 extern bool is_pageblock_removable_nolock(struct page *page);
 extern int arch_remove_memory(u64 start, u64 size);
+extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
+	unsigned long nr_pages);
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
 /* reasonably generic interface to expand the physical pages in a zone  */
 extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn,
 	unsigned long nr_pages);
-extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
-	unsigned long nr_pages);
 
 #ifdef CONFIG_NUMA
 extern int memory_add_physaddr_to_nid(u64 start);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -436,6 +436,40 @@ static int __meminit __add_section(int nid, struct zone *zone,
 	return register_new_memory(nid, __pfn_to_section(phys_start_pfn));
 }
 
+/*
+ * Reasonably generic function for adding memory.  It is
+ * expected that archs that support memory hotplug will
+ * call this function after deciding the zone to which to
+ * add the new pages.
+ */
+int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
+			unsigned long nr_pages)
+{
+	unsigned long i;
+	int err = 0;
+	int start_sec, end_sec;
+	/* during initialize mem_map, align hot-added range to section */
+	start_sec = pfn_to_section_nr(phys_start_pfn);
+	end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
+
+	for (i = start_sec; i <= end_sec; i++) {
+		err = __add_section(nid, zone, i << PFN_SECTION_SHIFT);
+
+		/*
+		 * EEXIST is finally dealt with by ioresource collision
+		 * check. see add_memory() => register_memory_resource()
+		 * Warning will be printed if there is collision.
+		 */
+		if (err && (err != -EEXIST))
+			break;
+		err = 0;
+	}
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(__add_pages);
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
 /* find the smallest valid pfn in the range [start_pfn, end_pfn) */
 static int find_smallest_section_pfn(int nid, struct zone *zone,
 				     unsigned long start_pfn,
@@ -658,39 +692,6 @@ static int __remove_section(struct zone *zone, struct mem_section *ms)
 	return 0;
 }
 
-/*
- * Reasonably generic function for adding memory.  It is
- * expected that archs that support memory hotplug will
- * call this function after deciding the zone to which to
- * add the new pages.
- */
-int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
-			unsigned long nr_pages)
-{
-	unsigned long i;
-	int err = 0;
-	int start_sec, end_sec;
-	/* during initialize mem_map, align hot-added range to section */
-	start_sec = pfn_to_section_nr(phys_start_pfn);
-	end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
-
-	for (i = start_sec; i <= end_sec; i++) {
-		err = __add_section(nid, zone, i << PFN_SECTION_SHIFT);
-
-		/*
-		 * EEXIST is finally dealt with by ioresource collision
-		 * check. see add_memory() => register_memory_resource()
-		 * Warning will be printed if there is collision.
-		 */
-		if (err && (err != -EEXIST))
-			break;
-		err = 0;
-	}
-
-	return err;
-}
-EXPORT_SYMBOL_GPL(__add_pages);
-
 /**
  * __remove_pages() - remove sections of pages from a zone
  * @zone: zone from which pages need to be removed
@@ -726,6 +727,7 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 	return ret;
 }
 EXPORT_SYMBOL_GPL(__remove_pages);
+#endif /* CONFIG_MEMORY_HOTREMOVE */
 
 int set_online_page_callback(online_page_callback_t callback)
 {
diff --git a/mm/sparse.c b/mm/sparse.c
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -620,6 +620,7 @@ static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
 
 	vmemmap_free(start, end);
 }
+#ifdef CONFIG_MEMORY_HOTREMOVE
 static void free_map_bootmem(struct page *memmap, unsigned long nr_pages)
 {
 	unsigned long start = (unsigned long)memmap;
@@ -627,6 +628,7 @@ static void free_map_bootmem(struct page *memmap, unsigned long nr_pages)
 
 	vmemmap_free(start, end);
 }
+#endif /* CONFIG_MEMORY_HOTREMOVE */
 #else
 static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
 {
@@ -664,6 +666,7 @@ static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
 			   get_order(sizeof(struct page) * nr_pages));
 }
 
+#ifdef CONFIG_MEMORY_HOTREMOVE
 static void free_map_bootmem(struct page *memmap, unsigned long nr_pages)
 {
 	unsigned long maps_section_nr, removing_section_nr, i;
@@ -690,40 +693,9 @@ static void free_map_bootmem(struct page *memmap, unsigned long nr_pages)
 			put_page_bootmem(page);
 	}
 }
+#endif /* CONFIG_MEMORY_HOTREMOVE */
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
-static void free_section_usemap(struct page *memmap, unsigned long *usemap)
-{
-	struct page *usemap_page;
-	unsigned long nr_pages;
-
-	if (!usemap)
-		return;
-
-	usemap_page = virt_to_page(usemap);
-	/*
-	 * Check to see if allocation came from hot-plug-add
-	 */
-	if (PageSlab(usemap_page) || PageCompound(usemap_page)) {
-		kfree(usemap);
-		if (memmap)
-			__kfree_section_memmap(memmap, PAGES_PER_SECTION);
-		return;
-	}
-
-	/*
-	 * The usemap came from bootmem. This is packed with other usemaps
-	 * on the section which has pgdat at boot time. Just keep it as is now.
-	 */
-
-	if (memmap) {
-		nr_pages = PAGE_ALIGN(PAGES_PER_SECTION * sizeof(struct page))
-			>> PAGE_SHIFT;
-
-		free_map_bootmem(memmap, nr_pages);
-	}
-}
-
 /*
  * returns the number of sections whose mem_maps were properly
  * set.  If this is <=0, then that means that the passed-in
@@ -800,6 +772,39 @@ static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
 }
 #endif
 
+#ifdef CONFIG_MEMORY_HOTREMOVE
+static void free_section_usemap(struct page *memmap, unsigned long *usemap)
+{
+	struct page *usemap_page;
+	unsigned long nr_pages;
+
+	if (!usemap)
+		return;
+
+	usemap_page = virt_to_page(usemap);
+	/*
+	 * Check to see if allocation came from hot-plug-add
+	 */
+	if (PageSlab(usemap_page) || PageCompound(usemap_page)) {
+		kfree(usemap);
+		if (memmap)
+			__kfree_section_memmap(memmap, PAGES_PER_SECTION);
+		return;
+	}
+
+	/*
+	 * The usemap came from bootmem. This is packed with other usemaps
+	 * on the section which has pgdat at boot time. Just keep it as is now.
+	 */
+
+	if (memmap) {
+		nr_pages = PAGE_ALIGN(PAGES_PER_SECTION * sizeof(struct page))
+			>> PAGE_SHIFT;
+
+		free_map_bootmem(memmap, nr_pages);
+	}
+}
+
 void sparse_remove_one_section(struct zone *zone, struct mem_section *ms)
 {
 	struct page *memmap = NULL;
@@ -819,4 +824,5 @@ void sparse_remove_one_section(struct zone *zone, struct mem_section *ms)
 	clear_hwpoisoned_pages(memmap, PAGES_PER_SECTION);
 	free_section_usemap(memmap, usemap);
 }
-#endif
+#endif /* CONFIG_MEMORY_HOTREMOVE */
+#endif /* CONFIG_MEMORY_HOTPLUG */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/3] resource: Add __adjust_resource() for internal use
  2013-04-08 17:09 ` [PATCH v2 1/3] resource: Add __adjust_resource() for internal use Toshi Kani
@ 2013-04-10  6:10   ` David Rientjes
  2013-04-10 15:39     ` Toshi Kani
  0 siblings, 1 reply; 13+ messages in thread
From: David Rientjes @ 2013-04-10  6:10 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, linux-mm, linux-kernel, linuxram, guz.fnst, tmac,
	isimatu.yasuaki, wency, tangchen, jiang.liu

On Mon, 8 Apr 2013, Toshi Kani wrote:

> Added __adjust_resource(), which is called by adjust_resource()
> internally after the resource_lock is held.  There is no interface
> change to adjust_resource().  This change allows other functions
> to call __adjust_resource() internally while the resource_lock is
> held.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] resource: Add release_mem_region_adjustable()
  2013-04-08 17:09 ` [PATCH v2 2/3] resource: Add release_mem_region_adjustable() Toshi Kani
@ 2013-04-10  6:16   ` David Rientjes
  2013-04-10 16:36     ` Toshi Kani
  0 siblings, 1 reply; 13+ messages in thread
From: David Rientjes @ 2013-04-10  6:16 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, linux-mm, linux-kernel, linuxram, guz.fnst, tmac,
	isimatu.yasuaki, wency, tangchen, jiang.liu

On Mon, 8 Apr 2013, Toshi Kani wrote:

> Added release_mem_region_adjustable(), which releases a requested
> region from a currently busy memory resource.  This interface
> adjusts the matched memory resource accordingly even if the
> requested region does not match exactly but still fits into.
> 
> This new interface is intended for memory hot-delete.  During
> bootup, memory resources are inserted from the boot descriptor
> table, such as EFI Memory Table and e820.  Each memory resource
> entry usually covers the whole contigous memory range.  Memory
> hot-delete request, on the other hand, may target to a particular
> range of memory resource, and its size can be much smaller than
> the whole contiguous memory.  Since the existing release interfaces
> like __release_region() require a requested region to be exactly
> matched to a resource entry, they do not allow a partial resource
> to be released.
> 
> There is no change to the existing interfaces since their restriction
> is valid for I/O resources.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

Should this emit a warning for attempting to free a non-existant region 
like __release_region() does?

I think it would be better to base this off my patch and surround it with 
#ifdef CONFIG_MEMORY_HOTREMOVE as suggested by Andrew.  There shouldn't be 
any conflicts.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/3] resource: Add __adjust_resource() for internal use
  2013-04-10  6:10   ` David Rientjes
@ 2013-04-10 15:39     ` Toshi Kani
  0 siblings, 0 replies; 13+ messages in thread
From: Toshi Kani @ 2013-04-10 15:39 UTC (permalink / raw)
  To: David Rientjes
  Cc: akpm, linux-mm, linux-kernel, linuxram, guz.fnst, tmac,
	isimatu.yasuaki, wency, tangchen, jiang.liu

On Tue, 2013-04-09 at 23:10 -0700, David Rientjes wrote:
> On Mon, 8 Apr 2013, Toshi Kani wrote:
> 
> > Added __adjust_resource(), which is called by adjust_resource()
> > internally after the resource_lock is held.  There is no interface
> > change to adjust_resource().  This change allows other functions
> > to call __adjust_resource() internally while the resource_lock is
> > held.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> 
> Acked-by: David Rientjes <rientjes@google.com>

Great!  Thanks David!
-Toshi


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] resource: Add release_mem_region_adjustable()
  2013-04-10  6:16   ` David Rientjes
@ 2013-04-10 16:36     ` Toshi Kani
  0 siblings, 0 replies; 13+ messages in thread
From: Toshi Kani @ 2013-04-10 16:36 UTC (permalink / raw)
  To: David Rientjes
  Cc: akpm, linux-mm, linux-kernel, linuxram, guz.fnst, tmac,
	isimatu.yasuaki, wency, tangchen, jiang.liu

On Tue, 2013-04-09 at 23:16 -0700, David Rientjes wrote:
> On Mon, 8 Apr 2013, Toshi Kani wrote:
> 
> > Added release_mem_region_adjustable(), which releases a requested
> > region from a currently busy memory resource.  This interface
> > adjusts the matched memory resource accordingly even if the
> > requested region does not match exactly but still fits into.
> > 
> > This new interface is intended for memory hot-delete.  During
> > bootup, memory resources are inserted from the boot descriptor
> > table, such as EFI Memory Table and e820.  Each memory resource
> > entry usually covers the whole contigous memory range.  Memory
> > hot-delete request, on the other hand, may target to a particular
> > range of memory resource, and its size can be much smaller than
> > the whole contiguous memory.  Since the existing release interfaces
> > like __release_region() require a requested region to be exactly
> > matched to a resource entry, they do not allow a partial resource
> > to be released.
> > 
> > There is no change to the existing interfaces since their restriction
> > is valid for I/O resources.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> 
> Should this emit a warning for attempting to free a non-existant region 
> like __release_region() does?

Since __release_region() is a void function, it needs to emit a warning
within the func.  I made release_mem_region_adjustable() as an int
function so that the caller can receive an error and decide what to do
based on its operation.  I changed the caller __remove_pages() to emit a
warning message in PATCH 3/3 in this case.

> I think it would be better to base this off my patch and surround it with 
> #ifdef CONFIG_MEMORY_HOTREMOVE as suggested by Andrew.  There shouldn't be 
> any conflicts.

Yes, I realized that CONFIG_MEMORY_HOTREMOVE was a better choice, but I
had to use CONFIG_MEMORY_HOTPLUG at this time.  So, thanks for doing the
cleanup!

Since it's already rc6, I will keep my patchset independent for now.  I
will make minor change to update CONFIG_MEMORY_HOTPLUG to
CONFIG_MEMORY_HOTREMOVE after your patch gets accepted -- either by
sending a separate patch (if my patchset is already accepted) or
updating my current patchset (if my patchset is not accepted yet).

Thanks!
-Toshi


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch] mm, hotplug: avoid compiling memory hotremove functions when disabled
  2013-04-10  6:07       ` [patch] mm, hotplug: avoid compiling memory hotremove functions when disabled David Rientjes
@ 2013-04-10 17:29         ` Toshi Kani
  0 siblings, 0 replies; 13+ messages in thread
From: Toshi Kani @ 2013-04-10 17:29 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Benjamin Herrenschmidt, Paul Mackerras,
	Greg Kroah-Hartman, Wen Congyang, Tang Chen, Yasuaki Ishimatsu,
	linux-kernel, linuxppc-dev, linux-mm

On Tue, 2013-04-09 at 23:07 -0700, David Rientjes wrote:
> __remove_pages() is only necessary for CONFIG_MEMORY_HOTREMOVE.  PowerPC
> pseries will return -EOPNOTSUPP if unsupported.
> 
> Adding an #ifdef causes several other functions it depends on to also
> become unnecessary, which saves in .text when disabled (it's disabled in
> most defconfigs besides powerpc, including x86).  remove_memory_block()
> becomes static since it is not referenced outside of
> drivers/base/memory.c.
> 
> Build tested on x86 and powerpc with CONFIG_MEMORY_HOTREMOVE both enabled
> and disabled.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>

Acked-by: Toshi Kani <toshi.kani@hp.com>

Thanks,
-Toshi


> ---
>  arch/powerpc/platforms/pseries/hotplug-memory.c | 12 +++++
>  drivers/base/memory.c                           | 44 +++++++--------
>  include/linux/memory.h                          |  3 +-
>  include/linux/memory_hotplug.h                  |  4 +-
>  mm/memory_hotplug.c                             | 68 +++++++++++------------
>  mm/sparse.c                                     | 72 +++++++++++++------------
>  6 files changed, 113 insertions(+), 90 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -72,6 +72,7 @@ unsigned long memory_block_size_bytes(void)
>  	return get_memblock_size();
>  }
>  
> +#ifdef CONFIG_MEMORY_HOTREMOVE
>  static int pseries_remove_memblock(unsigned long base, unsigned int memblock_size)
>  {
>  	unsigned long start, start_pfn;
> @@ -153,6 +154,17 @@ static int pseries_remove_memory(struct device_node *np)
>  	ret = pseries_remove_memblock(base, lmb_size);
>  	return ret;
>  }
> +#else
> +static inline int pseries_remove_memblock(unsigned long base,
> +					  unsigned int memblock_size)
> +{
> +	return -EOPNOTSUPP;
> +}
> +static inline int pseries_remove_memory(struct device_node *np)
> +{
> +	return -EOPNOTSUPP;
> +}
> +#endif /* CONFIG_MEMORY_HOTREMOVE */
>  
>  static int pseries_add_memory(struct device_node *np)
>  {
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -93,16 +93,6 @@ int register_memory(struct memory_block *memory)
>  	return error;
>  }
>  
> -static void
> -unregister_memory(struct memory_block *memory)
> -{
> -	BUG_ON(memory->dev.bus != &memory_subsys);
> -
> -	/* drop the ref. we got in remove_memory_block() */
> -	kobject_put(&memory->dev.kobj);
> -	device_unregister(&memory->dev);
> -}
> -
>  unsigned long __weak memory_block_size_bytes(void)
>  {
>  	return MIN_MEMORY_BLOCK_SIZE;
> @@ -637,8 +627,28 @@ static int add_memory_section(int nid, struct mem_section *section,
>  	return ret;
>  }
>  
> -int remove_memory_block(unsigned long node_id, struct mem_section *section,
> -		int phys_device)
> +/*
> + * need an interface for the VM to add new memory regions,
> + * but without onlining it.
> + */
> +int register_new_memory(int nid, struct mem_section *section)
> +{
> +	return add_memory_section(nid, section, NULL, MEM_OFFLINE, HOTPLUG);
> +}
> +
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +static void
> +unregister_memory(struct memory_block *memory)
> +{
> +	BUG_ON(memory->dev.bus != &memory_subsys);
> +
> +	/* drop the ref. we got in remove_memory_block() */
> +	kobject_put(&memory->dev.kobj);
> +	device_unregister(&memory->dev);
> +}
> +
> +static int remove_memory_block(unsigned long node_id,
> +			       struct mem_section *section, int phys_device)
>  {
>  	struct memory_block *mem;
>  
> @@ -661,15 +671,6 @@ int remove_memory_block(unsigned long node_id, struct mem_section *section,
>  	return 0;
>  }
>  
> -/*
> - * need an interface for the VM to add new memory regions,
> - * but without onlining it.
> - */
> -int register_new_memory(int nid, struct mem_section *section)
> -{
> -	return add_memory_section(nid, section, NULL, MEM_OFFLINE, HOTPLUG);
> -}
> -
>  int unregister_memory_section(struct mem_section *section)
>  {
>  	if (!present_section(section))
> @@ -677,6 +678,7 @@ int unregister_memory_section(struct mem_section *section)
>  
>  	return remove_memory_block(0, section, 0);
>  }
> +#endif /* CONFIG_MEMORY_HOTREMOVE */
>  
>  /*
>   * offline one memory block. If the memory block has been offlined, do nothing.
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -114,9 +114,10 @@ extern void unregister_memory_notifier(struct notifier_block *nb);
>  extern int register_memory_isolate_notifier(struct notifier_block *nb);
>  extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
>  extern int register_new_memory(int, struct mem_section *);
> +#ifdef CONFIG_MEMORY_HOTREMOVE
>  extern int unregister_memory_section(struct mem_section *);
> +#endif
>  extern int memory_dev_init(void);
> -extern int remove_memory_block(unsigned long, struct mem_section *, int);
>  extern int memory_notify(unsigned long val, void *v);
>  extern int memory_isolate_notify(unsigned long val, void *v);
>  extern struct memory_block *find_memory_block_hinted(struct mem_section *,
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -97,13 +97,13 @@ extern void __online_page_free(struct page *page);
>  #ifdef CONFIG_MEMORY_HOTREMOVE
>  extern bool is_pageblock_removable_nolock(struct page *page);
>  extern int arch_remove_memory(u64 start, u64 size);
> +extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
> +	unsigned long nr_pages);
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>  
>  /* reasonably generic interface to expand the physical pages in a zone  */
>  extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn,
>  	unsigned long nr_pages);
> -extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
> -	unsigned long nr_pages);
>  
>  #ifdef CONFIG_NUMA
>  extern int memory_add_physaddr_to_nid(u64 start);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -436,6 +436,40 @@ static int __meminit __add_section(int nid, struct zone *zone,
>  	return register_new_memory(nid, __pfn_to_section(phys_start_pfn));
>  }
>  
> +/*
> + * Reasonably generic function for adding memory.  It is
> + * expected that archs that support memory hotplug will
> + * call this function after deciding the zone to which to
> + * add the new pages.
> + */
> +int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
> +			unsigned long nr_pages)
> +{
> +	unsigned long i;
> +	int err = 0;
> +	int start_sec, end_sec;
> +	/* during initialize mem_map, align hot-added range to section */
> +	start_sec = pfn_to_section_nr(phys_start_pfn);
> +	end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
> +
> +	for (i = start_sec; i <= end_sec; i++) {
> +		err = __add_section(nid, zone, i << PFN_SECTION_SHIFT);
> +
> +		/*
> +		 * EEXIST is finally dealt with by ioresource collision
> +		 * check. see add_memory() => register_memory_resource()
> +		 * Warning will be printed if there is collision.
> +		 */
> +		if (err && (err != -EEXIST))
> +			break;
> +		err = 0;
> +	}
> +
> +	return err;
> +}
> +EXPORT_SYMBOL_GPL(__add_pages);
> +
> +#ifdef CONFIG_MEMORY_HOTREMOVE
>  /* find the smallest valid pfn in the range [start_pfn, end_pfn) */
>  static int find_smallest_section_pfn(int nid, struct zone *zone,
>  				     unsigned long start_pfn,
> @@ -658,39 +692,6 @@ static int __remove_section(struct zone *zone, struct mem_section *ms)
>  	return 0;
>  }
>  
> -/*
> - * Reasonably generic function for adding memory.  It is
> - * expected that archs that support memory hotplug will
> - * call this function after deciding the zone to which to
> - * add the new pages.
> - */
> -int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
> -			unsigned long nr_pages)
> -{
> -	unsigned long i;
> -	int err = 0;
> -	int start_sec, end_sec;
> -	/* during initialize mem_map, align hot-added range to section */
> -	start_sec = pfn_to_section_nr(phys_start_pfn);
> -	end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
> -
> -	for (i = start_sec; i <= end_sec; i++) {
> -		err = __add_section(nid, zone, i << PFN_SECTION_SHIFT);
> -
> -		/*
> -		 * EEXIST is finally dealt with by ioresource collision
> -		 * check. see add_memory() => register_memory_resource()
> -		 * Warning will be printed if there is collision.
> -		 */
> -		if (err && (err != -EEXIST))
> -			break;
> -		err = 0;
> -	}
> -
> -	return err;
> -}
> -EXPORT_SYMBOL_GPL(__add_pages);
> -
>  /**
>   * __remove_pages() - remove sections of pages from a zone
>   * @zone: zone from which pages need to be removed
> @@ -726,6 +727,7 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(__remove_pages);
> +#endif /* CONFIG_MEMORY_HOTREMOVE */
>  
>  int set_online_page_callback(online_page_callback_t callback)
>  {
> diff --git a/mm/sparse.c b/mm/sparse.c
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -620,6 +620,7 @@ static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
>  
>  	vmemmap_free(start, end);
>  }
> +#ifdef CONFIG_MEMORY_HOTREMOVE
>  static void free_map_bootmem(struct page *memmap, unsigned long nr_pages)
>  {
>  	unsigned long start = (unsigned long)memmap;
> @@ -627,6 +628,7 @@ static void free_map_bootmem(struct page *memmap, unsigned long nr_pages)
>  
>  	vmemmap_free(start, end);
>  }
> +#endif /* CONFIG_MEMORY_HOTREMOVE */
>  #else
>  static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
>  {
> @@ -664,6 +666,7 @@ static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
>  			   get_order(sizeof(struct page) * nr_pages));
>  }
>  
> +#ifdef CONFIG_MEMORY_HOTREMOVE
>  static void free_map_bootmem(struct page *memmap, unsigned long nr_pages)
>  {
>  	unsigned long maps_section_nr, removing_section_nr, i;
> @@ -690,40 +693,9 @@ static void free_map_bootmem(struct page *memmap, unsigned long nr_pages)
>  			put_page_bootmem(page);
>  	}
>  }
> +#endif /* CONFIG_MEMORY_HOTREMOVE */
>  #endif /* CONFIG_SPARSEMEM_VMEMMAP */
>  
> -static void free_section_usemap(struct page *memmap, unsigned long *usemap)
> -{
> -	struct page *usemap_page;
> -	unsigned long nr_pages;
> -
> -	if (!usemap)
> -		return;
> -
> -	usemap_page = virt_to_page(usemap);
> -	/*
> -	 * Check to see if allocation came from hot-plug-add
> -	 */
> -	if (PageSlab(usemap_page) || PageCompound(usemap_page)) {
> -		kfree(usemap);
> -		if (memmap)
> -			__kfree_section_memmap(memmap, PAGES_PER_SECTION);
> -		return;
> -	}
> -
> -	/*
> -	 * The usemap came from bootmem. This is packed with other usemaps
> -	 * on the section which has pgdat at boot time. Just keep it as is now.
> -	 */
> -
> -	if (memmap) {
> -		nr_pages = PAGE_ALIGN(PAGES_PER_SECTION * sizeof(struct page))
> -			>> PAGE_SHIFT;
> -
> -		free_map_bootmem(memmap, nr_pages);
> -	}
> -}
> -
>  /*
>   * returns the number of sections whose mem_maps were properly
>   * set.  If this is <=0, then that means that the passed-in
> @@ -800,6 +772,39 @@ static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
>  }
>  #endif
>  
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +static void free_section_usemap(struct page *memmap, unsigned long *usemap)
> +{
> +	struct page *usemap_page;
> +	unsigned long nr_pages;
> +
> +	if (!usemap)
> +		return;
> +
> +	usemap_page = virt_to_page(usemap);
> +	/*
> +	 * Check to see if allocation came from hot-plug-add
> +	 */
> +	if (PageSlab(usemap_page) || PageCompound(usemap_page)) {
> +		kfree(usemap);
> +		if (memmap)
> +			__kfree_section_memmap(memmap, PAGES_PER_SECTION);
> +		return;
> +	}
> +
> +	/*
> +	 * The usemap came from bootmem. This is packed with other usemaps
> +	 * on the section which has pgdat at boot time. Just keep it as is now.
> +	 */
> +
> +	if (memmap) {
> +		nr_pages = PAGE_ALIGN(PAGES_PER_SECTION * sizeof(struct page))
> +			>> PAGE_SHIFT;
> +
> +		free_map_bootmem(memmap, nr_pages);
> +	}
> +}
> +
>  void sparse_remove_one_section(struct zone *zone, struct mem_section *ms)
>  {
>  	struct page *memmap = NULL;
> @@ -819,4 +824,5 @@ void sparse_remove_one_section(struct zone *zone, struct mem_section *ms)
>  	clear_hwpoisoned_pages(memmap, PAGES_PER_SECTION);
>  	free_section_usemap(memmap, usemap);
>  }
> -#endif
> +#endif /* CONFIG_MEMORY_HOTREMOVE */
> +#endif /* CONFIG_MEMORY_HOTPLUG */



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-04-10 17:42 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-08 17:09 [PATCH v2 0/3] Support memory hot-delete to boot memory Toshi Kani
2013-04-08 17:09 ` [PATCH v2 1/3] resource: Add __adjust_resource() for internal use Toshi Kani
2013-04-10  6:10   ` David Rientjes
2013-04-10 15:39     ` Toshi Kani
2013-04-08 17:09 ` [PATCH v2 2/3] resource: Add release_mem_region_adjustable() Toshi Kani
2013-04-10  6:16   ` David Rientjes
2013-04-10 16:36     ` Toshi Kani
2013-04-08 17:09 ` [PATCH v2 3/3] mm: Change __remove_pages() to call release_mem_region_adjustable() Toshi Kani
2013-04-08 20:44 ` [PATCH v2 0/3] Support memory hot-delete to boot memory Andrew Morton
2013-04-08 20:58   ` Toshi Kani
2013-04-10  5:52     ` David Rientjes
2013-04-10  6:07       ` [patch] mm, hotplug: avoid compiling memory hotremove functions when disabled David Rientjes
2013-04-10 17:29         ` Toshi Kani

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).