linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory
@ 2012-08-28 10:00 wency
  2012-08-28 10:00 ` [RFC v8 PATCH 01/20] memory-hotplug: rename remove_memory() to offline_memory()/offline_pages() wency
                   ` (20 more replies)
  0 siblings, 21 replies; 43+ messages in thread
From: wency @ 2012-08-28 10:00 UTC (permalink / raw)
  To: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux
  Cc: rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim, akpm,
	kosaki.motohiro, isimatu.yasuaki, Wen Congyang

From: Wen Congyang <wency@cn.fujitsu.com>

This patch series aims to support physical memory hot-remove.

The patches can free/remove the following things:

  - acpi_memory_info                          : [RFC PATCH 4/19]
  - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
  - iomem_resource                            : [RFC PATCH 9/19]
  - mem_section and related sysfs files       : [RFC PATCH 10-11, 13-16/19]
  - page table of removed memory              : [RFC PATCH 12/19]
  - node and related sysfs files              : [RFC PATCH 18-19/19]

If you find lack of function for physical memory hot-remove, please let me
know.

Known problems:
1. memory can't be offlined when CONFIG_MEMCG is selected.

change log of v8:
 [RFC PATCH v8 17/20]
   * Fix problems when one node's range include the other nodes
 [RFC PATCH v8 18/20]
   * fix building error when CONFIG_MEMORY_HOTPLUG_SPARSE or CONFIG_HUGETLBFS
     is not defined.
 [RFC PATCH v8 19/20]
   * don't offline node when some memory sections are not removed
 [RFC PATCH v8 20/20]
   * create new patch: clear hwpoisoned flag when onlining pages

change log of v7:
 [RFC PATCH v7 4/19]
   * do not continue if acpi_memory_device_remove_memory() fails.
 [RFC PATCH v7 15/19]
   * handle usemap in register_page_bootmem_info_section() too.

change log of v6:
 [RFC PATCH v6 12/19]
   * fix building error on other archtitectures than x86

 [RFC PATCH v6 15-16/19]
   * fix building error on other archtitectures than x86

change log of v5:
 * merge the patchset to clear page table and the patchset to hot remove
   memory(from ishimatsu) to one big patchset.

 [RFC PATCH v5 1/19]
   * rename remove_memory() to offline_memory()/offline_pages()

 [RFC PATCH v5 2/19]
   * new patch: implement offline_memory(). This function offlines pages,
     update memory block's state, and notify the userspace that the memory
     block's state is changed.

 [RFC PATCH v5 4/19]
   * offline and remove memory in acpi_memory_disable_device() too.

 [RFC PATCH v5 17/19]
   * new patch: add a new function __remove_zone() to revert the things done
     in the function __add_zone().

 [RFC PATCH v5 18/19]
   * flush work befor reseting node device.

change log of v4:
 * remove "memory-hotplug : unify argument of firmware_map_add_early/hotplug"
   from the patch series, since the patch is a bugfix. It is being disccussed
   on other thread. But for testing the patch series, the patch is needed.
   So I added the patch as [PATCH 0/13].

 [RFC PATCH v4 2/13]
   * check memory is online or not at remove_memory()
   * add memory_add_physaddr_to_nid() to acpi_memory_device_remove() for
     getting node id
 
 [RFC PATCH v4 3/13]
   * create new patch : check memory is online or not at online_pages()

 [RFC PATCH v4 4/13]
   * add __ref section to remove_memory()
   * call firmware_map_remove_entry() before remove_sysfs_fw_map_entry()

 [RFC PATCH v4 11/13]
   * rewrite register_page_bootmem_memmap() for removing page used as PT/PMD

change log of v3:
 * rebase to 3.5.0-rc6

 [RFC PATCH v2 2/13]
   * remove extra kobject_put()

   * The patch was commented by Wen. Wen's comment is
     "acpi_memory_device_remove() should ignore a return value of
     remove_memory() since caller does not care the return value".
     But I did not change it since I think caller should care the
     return value. And I am trying to fix it as follow:

     https://lkml.org/lkml/2012/7/5/624

 [RFC PATCH v2 4/13]
   * remove a firmware_memmap_entry allocated by kzmalloc()

change log of v2:
 [RFC PATCH v2 2/13]
   * check whether memory block is offline or not before calling offline_memory()
   * check whether section is valid or not in is_memblk_offline()
   * call kobject_put() for each memory_block in is_memblk_offline()

 [RFC PATCH v2 3/13]
   * unify the end argument of firmware_map_add_early/hotplug

 [RFC PATCH v2 4/13]
   * add release_firmware_map_entry() for freeing firmware_map_entry

 [RFC PATCH v2 6/13]
  * add release_memory_block() for freeing memory_block

 [RFC PATCH v2 11/13]
  * fix wrong arguments of free_pages()

Wen Congyang (7):
  memory-hotplug: implement offline_memory()
  memory-hotplug: store the node id in acpi_memory_device
  memory-hotplug: export the function acpi_bus_remove()
  memory-hotplug: call acpi_bus_remove() to remove memory device
  memory-hotplug: introduce new function arch_remove_memory()
  memory-hotplug: remove sysfs file of node
  memory-hotplug: clear hwpoisoned flag when onlining pages

Yasuaki Ishimatsu (13):
  memory-hotplug: rename remove_memory() to
    offline_memory()/offline_pages()
  memory-hotplug: offline and remove memory when removing the memory
    device
  memory-hotplug: check whether memory is present or not
  memory-hotplug: remove /sys/firmware/memmap/X sysfs
  memory-hotplug: does not release memory region in PAGES_PER_SECTION
    chunks
  memory-hotplug: add memory_block_release
  memory-hotplug: remove_memory calls __remove_pages
  memory-hotplug: check page type in get_page_bootmem
  memory-hotplug: move register_page_bootmem_info_node and
    put_page_bootmem for sparse-vmemmap
  memory-hotplug: implement register_page_bootmem_info_section of
    sparse-vmemmap
  memory-hotplug: free memmap of sparse-vmemmap
  memory_hotplug: clear zone when the memory is removed
  memory-hotplug: add node_device_release

 arch/ia64/mm/discontig.c                        |   14 +
 arch/ia64/mm/init.c                             |   16 +
 arch/powerpc/mm/init_64.c                       |   14 +
 arch/powerpc/mm/mem.c                           |   14 +
 arch/powerpc/platforms/pseries/hotplug-memory.c |   16 +-
 arch/s390/mm/init.c                             |   12 +
 arch/s390/mm/vmem.c                             |   14 +
 arch/sh/mm/init.c                               |   15 +
 arch/sparc/mm/init_64.c                         |   14 +
 arch/tile/mm/init.c                             |    8 +
 arch/x86/include/asm/pgtable_types.h            |    1 +
 arch/x86/mm/init_32.c                           |   10 +
 arch/x86/mm/init_64.c                           |  331 +++++++++++++++++++
 arch/x86/mm/pageattr.c                          |   47 ++--
 drivers/acpi/acpi_memhotplug.c                  |   54 +++-
 drivers/acpi/scan.c                             |    3 +-
 drivers/base/memory.c                           |   90 +++++-
 drivers/base/node.c                             |   11 +
 drivers/firmware/memmap.c                       |   78 +++++-
 include/acpi/acpi_bus.h                         |    1 +
 include/linux/firmware-map.h                    |    6 +
 include/linux/memory.h                          |    5 +
 include/linux/memory_hotplug.h                  |   25 +-
 include/linux/mm.h                              |    5 +-
 include/linux/mmzone.h                          |   19 +
 mm/memory_hotplug.c                             |  404 +++++++++++++++++++++--
 mm/sparse.c                                     |    5 +-
 27 files changed, 1141 insertions(+), 91 deletions(-)


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC v8 PATCH 01/20] memory-hotplug: rename remove_memory() to offline_memory()/offline_pages()
  2012-08-28 10:00 [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory wency
@ 2012-08-28 10:00 ` wency
  2012-08-28 10:00 ` [RFC v8 PATCH 02/20] memory-hotplug: implement offline_memory() wency
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: wency @ 2012-08-28 10:00 UTC (permalink / raw)
  To: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux
  Cc: rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim, akpm,
	kosaki.motohiro, isimatu.yasuaki, Wen Congyang

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

remove_memory() only try to offline pages. It is called in two cases:
1. hot remove a memory device
2. echo offline >/sys/devices/system/memory/memoryXX/state

In the 1st case, we should also change memory block's state, and notify
the userspace that the memory block's state is changed after offlining
pages.

So rename remove_memory() to offline_memory()/offline_pages(). And in
the 1st case, offline_memory() will be used. The function offline_memory()
is not implemented. In the 2nd case, offline_pages() will be used.

CC: David Rientjes <rientjes@google.com>
CC: Jiang Liu <liuj97@gmail.com>
CC: Len Brown <len.brown@intel.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 drivers/acpi/acpi_memhotplug.c |    2 +-
 drivers/base/memory.c          |    9 +++------
 include/linux/memory_hotplug.h |    3 ++-
 mm/memory_hotplug.c            |   22 ++++++++++++++--------
 4 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 24c807f..2a7beac 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -318,7 +318,7 @@ static int acpi_memory_disable_device(struct acpi_memory_device *mem_device)
 	 */
 	list_for_each_entry_safe(info, n, &mem_device->res_list, list) {
 		if (info->enabled) {
-			result = remove_memory(info->start_addr, info->length);
+			result = offline_memory(info->start_addr, info->length);
 			if (result)
 				return result;
 		}
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 7dda4f7..44e7de6 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -248,26 +248,23 @@ static bool pages_correctly_reserved(unsigned long start_pfn,
 static int
 memory_block_action(unsigned long phys_index, unsigned long action)
 {
-	unsigned long start_pfn, start_paddr;
+	unsigned long start_pfn;
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
 	struct page *first_page;
 	int ret;
 
 	first_page = pfn_to_page(phys_index << PFN_SECTION_SHIFT);
+	start_pfn = page_to_pfn(first_page);
 
 	switch (action) {
 		case MEM_ONLINE:
-			start_pfn = page_to_pfn(first_page);
-
 			if (!pages_correctly_reserved(start_pfn, nr_pages))
 				return -EBUSY;
 
 			ret = online_pages(start_pfn, nr_pages);
 			break;
 		case MEM_OFFLINE:
-			start_paddr = page_to_pfn(first_page) << PAGE_SHIFT;
-			ret = remove_memory(start_paddr,
-					    nr_pages << PAGE_SHIFT);
+			ret = offline_pages(start_pfn, nr_pages);
 			break;
 		default:
 			WARN(1, KERN_WARNING "%s(%ld, %ld) unknown action: "
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 910550f..c183f39 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -233,7 +233,8 @@ static inline int is_mem_section_removable(unsigned long pfn,
 extern int mem_online_node(int nid);
 extern int add_memory(int nid, u64 start, u64 size);
 extern int arch_add_memory(int nid, u64 start, u64 size);
-extern int remove_memory(u64 start, u64 size);
+extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
+extern int offline_memory(u64 start, u64 size);
 extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
 								int nr_pages);
 extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3ad25f9..c182c76 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -866,7 +866,7 @@ check_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
 	return offlined;
 }
 
-static int __ref offline_pages(unsigned long start_pfn,
+static int __ref __offline_pages(unsigned long start_pfn,
 		  unsigned long end_pfn, unsigned long timeout)
 {
 	unsigned long pfn, nr_pages, expire;
@@ -994,18 +994,24 @@ out:
 	return ret;
 }
 
-int remove_memory(u64 start, u64 size)
+int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 {
-	unsigned long start_pfn, end_pfn;
+	return __offline_pages(start_pfn, start_pfn + nr_pages, 120 * HZ);
+}
 
-	start_pfn = PFN_DOWN(start);
-	end_pfn = start_pfn + PFN_DOWN(size);
-	return offline_pages(start_pfn, end_pfn, 120 * HZ);
+int offline_memory(u64 start, u64 size)
+{
+	return -EINVAL;
 }
 #else
-int remove_memory(u64 start, u64 size)
+int offline_pages(u64 start, u64 size)
+{
+	return -EINVAL;
+}
+
+int offline_memory(u64 start, u64 size)
 {
 	return -EINVAL;
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
-EXPORT_SYMBOL_GPL(remove_memory);
+EXPORT_SYMBOL_GPL(offline_memory);
-- 
1.7.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC v8 PATCH 02/20] memory-hotplug: implement offline_memory()
  2012-08-28 10:00 [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory wency
  2012-08-28 10:00 ` [RFC v8 PATCH 01/20] memory-hotplug: rename remove_memory() to offline_memory()/offline_pages() wency
@ 2012-08-28 10:00 ` wency
  2012-08-28 10:00 ` [RFC v8 PATCH 03/20] memory-hotplug: store the node id in acpi_memory_device wency
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: wency @ 2012-08-28 10:00 UTC (permalink / raw)
  To: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux
  Cc: rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim, akpm,
	kosaki.motohiro, isimatu.yasuaki, Wen Congyang,
	Vasilis Liaskovitis

From: Wen Congyang <wency@cn.fujitsu.com>

The function offline_memory() will be called when hot removing a
memory device. The memory device may contain more than one memory
block. If the memory block has been offlined, __offline_pages()
will fail. So we should try to offline one memory block at a
time.

If the memory block is offlined in offline_memory(), we also
update it's state, and notify the userspace that its state is
changed.

The function offline_memory() also check each memory block's
state. So there is no need to check the memory block's state
before calling offline_memory().

CC: David Rientjes <rientjes@google.com>
CC: Jiang Liu <liuj97@gmail.com>
CC: Len Brown <len.brown@intel.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
CC: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 drivers/base/memory.c          |   31 +++++++++++++++++++++++++++----
 include/linux/memory_hotplug.h |    2 ++
 mm/memory_hotplug.c            |   37 ++++++++++++++++++++++++++++++++++++-
 3 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 44e7de6..86c8821 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -275,13 +275,11 @@ memory_block_action(unsigned long phys_index, unsigned long action)
 	return ret;
 }
 
-static int memory_block_change_state(struct memory_block *mem,
+static int __memory_block_change_state(struct memory_block *mem,
 		unsigned long to_state, unsigned long from_state_req)
 {
 	int ret = 0;
 
-	mutex_lock(&mem->state_mutex);
-
 	if (mem->state != from_state_req) {
 		ret = -EINVAL;
 		goto out;
@@ -309,10 +307,20 @@ static int memory_block_change_state(struct memory_block *mem,
 		break;
 	}
 out:
-	mutex_unlock(&mem->state_mutex);
 	return ret;
 }
 
+static int memory_block_change_state(struct memory_block *mem,
+		unsigned long to_state, unsigned long from_state_req)
+{
+	int ret;
+
+	mutex_lock(&mem->state_mutex);
+	ret = __memory_block_change_state(mem, to_state, from_state_req);
+	mutex_unlock(&mem->state_mutex);
+
+	return ret;
+}
 static ssize_t
 store_mem_state(struct device *dev,
 		struct device_attribute *attr, const char *buf, size_t count)
@@ -653,6 +661,21 @@ int unregister_memory_section(struct mem_section *section)
 }
 
 /*
+ * offline one memory block. If the memory block has been offlined, do nothing.
+ */
+int offline_memory_block(struct memory_block *mem)
+{
+	int ret = 0;
+
+	mutex_lock(&mem->state_mutex);
+	if (mem->state != MEM_OFFLINE)
+		ret = __memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE);
+	mutex_unlock(&mem->state_mutex);
+
+	return ret;
+}
+
+/*
  * Initialize the sysfs support for memory devices...
  */
 int __init memory_dev_init(void)
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index c183f39..0b040bb 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -10,6 +10,7 @@ struct page;
 struct zone;
 struct pglist_data;
 struct mem_section;
+struct memory_block;
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 
@@ -234,6 +235,7 @@ extern int mem_online_node(int nid);
 extern int add_memory(int nid, u64 start, u64 size);
 extern int arch_add_memory(int nid, u64 start, u64 size);
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
+extern int offline_memory_block(struct memory_block *mem);
 extern int offline_memory(u64 start, u64 size);
 extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
 								int nr_pages);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c182c76..3113cd4 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1001,7 +1001,42 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 
 int offline_memory(u64 start, u64 size)
 {
-	return -EINVAL;
+	struct memory_block *mem = NULL;
+	struct mem_section *section;
+	unsigned long start_pfn, end_pfn;
+	unsigned long pfn, section_nr;
+	int ret;
+
+	start_pfn = PFN_DOWN(start);
+	end_pfn = start_pfn + PFN_DOWN(size);
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
+		section_nr = pfn_to_section_nr(pfn);
+		if (!present_section_nr(section_nr))
+			continue;
+
+		section = __nr_to_section(section_nr);
+		/* same memblock? */
+		if (mem)
+			if ((section_nr >= mem->start_section_nr) &&
+			    (section_nr <= mem->end_section_nr))
+				continue;
+
+		mem = find_memory_block_hinted(section, mem);
+		if (!mem)
+			continue;
+
+		ret = offline_memory_block(mem);
+		if (ret) {
+			kobject_put(&mem->dev.kobj);
+			return ret;
+		}
+	}
+
+	if (mem)
+		kobject_put(&mem->dev.kobj);
+
+	return 0;
 }
 #else
 int offline_pages(u64 start, u64 size)
-- 
1.7.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC v8 PATCH 03/20] memory-hotplug: store the node id in acpi_memory_device
  2012-08-28 10:00 [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory wency
  2012-08-28 10:00 ` [RFC v8 PATCH 01/20] memory-hotplug: rename remove_memory() to offline_memory()/offline_pages() wency
  2012-08-28 10:00 ` [RFC v8 PATCH 02/20] memory-hotplug: implement offline_memory() wency
@ 2012-08-28 10:00 ` wency
  2012-08-28 10:00 ` [RFC v8 PATCH 04/20] memory-hotplug: offline and remove memory when removing the memory device wency
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: wency @ 2012-08-28 10:00 UTC (permalink / raw)
  To: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux
  Cc: rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim, akpm,
	kosaki.motohiro, isimatu.yasuaki, Wen Congyang

From: Wen Congyang <wency@cn.fujitsu.com>

The memory device has only one node id. Store the node id when
enable the memory device, and we can reuse it when removing the
memory device.

CC: David Rientjes <rientjes@google.com>
CC: Jiang Liu <liuj97@gmail.com>
CC: Len Brown <len.brown@intel.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
---
 drivers/acpi/acpi_memhotplug.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 2a7beac..7873832 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -83,6 +83,7 @@ struct acpi_memory_info {
 struct acpi_memory_device {
 	struct acpi_device * device;
 	unsigned int state;	/* State of the memory device */
+	int nid;
 	struct list_head res_list;
 };
 
@@ -256,6 +257,9 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 		info->enabled = 1;
 		num_enabled++;
 	}
+
+	mem_device->nid = node;
+
 	if (!num_enabled) {
 		printk(KERN_ERR PREFIX "add_memory failed\n");
 		mem_device->state = MEMORY_INVALID_STATE;
-- 
1.7.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC v8 PATCH 04/20] memory-hotplug: offline and remove memory when removing the memory device
  2012-08-28 10:00 [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory wency
                   ` (2 preceding siblings ...)
  2012-08-28 10:00 ` [RFC v8 PATCH 03/20] memory-hotplug: store the node id in acpi_memory_device wency
@ 2012-08-28 10:00 ` wency
  2012-08-31 20:55   ` Andrew Morton
  2012-08-28 10:00 ` [RFC v8 PATCH 05/20] memory-hotplug: check whether memory is present or not wency
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 43+ messages in thread
From: wency @ 2012-08-28 10:00 UTC (permalink / raw)
  To: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux
  Cc: rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim, akpm,
	kosaki.motohiro, isimatu.yasuaki, Wen Congyang

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

We should offline and remove memory when removing the memory device.
The memory device can be removed by 2 ways:
1. send eject request by SCI
2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject

In the 1st case, acpi_memory_disable_device() will be called. In the 2nd
case, acpi_memory_device_remove() will be called. acpi_memory_device_remove()
will also be called when we unbind the memory device from the driver
acpi_memhotplug. If the type is ACPI_BUS_REMOVAL_EJECT, it means
that the user wants to eject the memory device, and we should offline
and remove memory in acpi_memory_device_remove().

The function remove_memory() is not implemeted now. It only check whether
all memory has been offllined now.

CC: David Rientjes <rientjes@google.com>
CC: Jiang Liu <liuj97@gmail.com>
CC: Len Brown <len.brown@intel.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 drivers/acpi/acpi_memhotplug.c |   45 +++++++++++++++++++++++++++++++++------
 drivers/base/memory.c          |   39 ++++++++++++++++++++++++++++++++++
 include/linux/memory.h         |    5 ++++
 include/linux/memory_hotplug.h |    5 ++++
 mm/memory_hotplug.c            |   22 +++++++++++++++++++
 5 files changed, 109 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 7873832..9d47458 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -29,6 +29,7 @@
 #include <linux/module.h>
 #include <linux/init.h>
 #include <linux/types.h>
+#include <linux/memory.h>
 #include <linux/memory_hotplug.h>
 #include <linux/slab.h>
 #include <acpi/acpi_drivers.h>
@@ -310,25 +311,44 @@ static int acpi_memory_powerdown_device(struct acpi_memory_device *mem_device)
 	return 0;
 }
 
-static int acpi_memory_disable_device(struct acpi_memory_device *mem_device)
+static int
+acpi_memory_device_remove_memory(struct acpi_memory_device *mem_device)
 {
 	int result;
 	struct acpi_memory_info *info, *n;
+	int node = mem_device->nid;
 
-
-	/*
-	 * Ask the VM to offline this memory range.
-	 * Note: Assume that this function returns zero on success
-	 */
 	list_for_each_entry_safe(info, n, &mem_device->res_list, list) {
 		if (info->enabled) {
 			result = offline_memory(info->start_addr, info->length);
 			if (result)
 				return result;
+
+			result = remove_memory(node, info->start_addr,
+					       info->length);
+			if (result)
+				return result;
 		}
+
+		list_del(&info->list);
 		kfree(info);
 	}
 
+	return 0;
+}
+
+static int acpi_memory_disable_device(struct acpi_memory_device *mem_device)
+{
+	int result;
+
+	/*
+	 * Ask the VM to offline this memory range.
+	 * Note: Assume that this function returns zero on success
+	 */
+	result = acpi_memory_device_remove_memory(mem_device);
+	if (result)
+		return result;
+
 	/* Power-off and eject the device */
 	result = acpi_memory_powerdown_device(mem_device);
 	if (result) {
@@ -477,12 +497,23 @@ static int acpi_memory_device_add(struct acpi_device *device)
 static int acpi_memory_device_remove(struct acpi_device *device, int type)
 {
 	struct acpi_memory_device *mem_device = NULL;
-
+	int result;
 
 	if (!device || !acpi_driver_data(device))
 		return -EINVAL;
 
 	mem_device = acpi_driver_data(device);
+
+	if (type == ACPI_BUS_REMOVAL_EJECT) {
+		/*
+		 * offline and remove memory only when the memory device is
+		 * ejected.
+		 */
+		result = acpi_memory_device_remove_memory(mem_device);
+		if (result)
+			return result;
+	}
+
 	kfree(mem_device);
 
 	return 0;
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 86c8821..038be73 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -70,6 +70,45 @@ void unregister_memory_isolate_notifier(struct notifier_block *nb)
 }
 EXPORT_SYMBOL(unregister_memory_isolate_notifier);
 
+bool is_memblk_offline(unsigned long start, unsigned long size)
+{
+	struct memory_block *mem = NULL;
+	struct mem_section *section;
+	unsigned long start_pfn, end_pfn;
+	unsigned long pfn, section_nr;
+
+	start_pfn = PFN_DOWN(start);
+	end_pfn = PFN_UP(start + size);
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
+		section_nr = pfn_to_section_nr(pfn);
+		if (!present_section_nr(section_nr))
+			continue;
+
+		section = __nr_to_section(section_nr);
+		/* same memblock? */
+		if (mem)
+			if ((section_nr >= mem->start_section_nr) &&
+			    (section_nr <= mem->end_section_nr))
+				continue;
+
+		mem = find_memory_block_hinted(section, mem);
+		if (!mem)
+			continue;
+		if (mem->state == MEM_OFFLINE)
+			continue;
+
+		kobject_put(&mem->dev.kobj);
+		return false;
+	}
+
+	if (mem)
+		kobject_put(&mem->dev.kobj);
+
+	return true;
+}
+EXPORT_SYMBOL(is_memblk_offline);
+
 /*
  * register_memory - Setup a sysfs device for a memory block
  */
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 1ac7f6e..7c66126 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -106,6 +106,10 @@ static inline int memory_isolate_notify(unsigned long val, void *v)
 {
 	return 0;
 }
+static inline bool is_memblk_offline(unsigned long start, unsigned long size)
+{
+	return false;
+}
 #else
 extern int register_memory_notifier(struct notifier_block *nb);
 extern void unregister_memory_notifier(struct notifier_block *nb);
@@ -120,6 +124,7 @@ extern int memory_isolate_notify(unsigned long val, void *v);
 extern struct memory_block *find_memory_block_hinted(struct mem_section *,
 							struct memory_block *);
 extern struct memory_block *find_memory_block(struct mem_section *);
+extern bool is_memblk_offline(unsigned long start, unsigned long size);
 #define CONFIG_MEM_BLOCK_SIZE	(PAGES_PER_SECTION<<PAGE_SHIFT)
 enum mem_add_context { BOOT, HOTPLUG };
 #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 0b040bb..fd84ea9 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -222,6 +222,7 @@ static inline void unlock_memory_hotplug(void) {}
 #ifdef CONFIG_MEMORY_HOTREMOVE
 
 extern int is_mem_section_removable(unsigned long pfn, unsigned long nr_pages);
+extern int remove_memory(int nid, u64 start, u64 size);
 
 #else
 static inline int is_mem_section_removable(unsigned long pfn,
@@ -229,6 +230,10 @@ static inline int is_mem_section_removable(unsigned long pfn,
 {
 	return 0;
 }
+static inline int remove_memory(int nid, u64 start, u64 size)
+{
+	return -EBUSY;
+}
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
 extern int mem_online_node(int nid);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3113cd4..80cded7 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1038,6 +1038,28 @@ int offline_memory(u64 start, u64 size)
 
 	return 0;
 }
+
+int remove_memory(int nid, u64 start, u64 size)
+{
+	int ret = -EBUSY;
+	lock_memory_hotplug();
+	/*
+	 * The memory might become online by other task, even if you offine it.
+	 * So we check whether the cpu has been onlined or not.
+	 */
+	if (!is_memblk_offline(start, size)) {
+		pr_warn("memory removing [mem %#010llx-%#010llx] failed, "
+			"because the memmory range is online\n",
+			start, start + size);
+		ret = -EAGAIN;
+	}
+
+	unlock_memory_hotplug();
+	return ret;
+
+}
+EXPORT_SYMBOL_GPL(remove_memory);
+
 #else
 int offline_pages(u64 start, u64 size)
 {
-- 
1.7.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC v8 PATCH 05/20] memory-hotplug: check whether memory is present or not
  2012-08-28 10:00 [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory wency
                   ` (3 preceding siblings ...)
  2012-08-28 10:00 ` [RFC v8 PATCH 04/20] memory-hotplug: offline and remove memory when removing the memory device wency
@ 2012-08-28 10:00 ` wency
  2012-08-28 10:00 ` [RFC v8 PATCH 06/20] memory-hotplug: export the function acpi_bus_remove() wency
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: wency @ 2012-08-28 10:00 UTC (permalink / raw)
  To: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux
  Cc: rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim, akpm,
	kosaki.motohiro, isimatu.yasuaki, Wen Congyang

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

If system supports memory hot-remove, online_pages() may online removed pages.
So online_pages() need to check whether onlining pages are present or not.

CC: David Rientjes <rientjes@google.com>
CC: Jiang Liu <liuj97@gmail.com>
CC: Len Brown <len.brown@intel.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
---
 include/linux/mmzone.h |   19 +++++++++++++++++++
 mm/memory_hotplug.c    |   13 +++++++++++++
 2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2daa54f..ac3ae30 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1180,6 +1180,25 @@ void sparse_init(void);
 #define sparse_index_init(_sec, _nid)  do {} while (0)
 #endif /* CONFIG_SPARSEMEM */
 
+#ifdef CONFIG_SPARSEMEM
+static inline int pfns_present(unsigned long pfn, unsigned long nr_pages)
+{
+	int i;
+	for (i = 0; i < nr_pages; i++) {
+		if (pfn_present(pfn + i))
+			continue;
+		else
+			return -EINVAL;
+	}
+	return 0;
+}
+#else
+static inline int pfns_present(unsigned long pfn, unsigned long nr_pages)
+{
+	return 0;
+}
+#endif /* CONFIG_SPARSEMEM*/
+
 #ifdef CONFIG_NODES_SPAN_OTHER_NODES
 bool early_pfn_in_nid(unsigned long pfn, int nid);
 #else
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 80cded7..3f1d7c5 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -467,6 +467,19 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
 	struct memory_notify arg;
 
 	lock_memory_hotplug();
+	/*
+	 * If system supports memory hot-remove, the memory may have been
+	 * removed. So we check whether the memory has been removed or not.
+	 *
+	 * Note: When CONFIG_SPARSEMEM is defined, pfns_present() become
+	 *       effective. If CONFIG_SPARSEMEM is not defined, pfns_present()
+	 *       always returns 0.
+	 */
+	ret = pfns_present(pfn, nr_pages);
+	if (ret) {
+		unlock_memory_hotplug();
+		return ret;
+	}
 	arg.start_pfn = pfn;
 	arg.nr_pages = nr_pages;
 	arg.status_change_nid = -1;
-- 
1.7.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC v8 PATCH 06/20] memory-hotplug: export the function acpi_bus_remove()
  2012-08-28 10:00 [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory wency
                   ` (4 preceding siblings ...)
  2012-08-28 10:00 ` [RFC v8 PATCH 05/20] memory-hotplug: check whether memory is present or not wency
@ 2012-08-28 10:00 ` wency
  2012-08-28 10:00 ` [RFC v8 PATCH 07/20] memory-hotplug: call acpi_bus_remove() to remove memory device wency
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: wency @ 2012-08-28 10:00 UTC (permalink / raw)
  To: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux
  Cc: rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim, akpm,
	kosaki.motohiro, isimatu.yasuaki, Wen Congyang

From: Wen Congyang <wency@cn.fujitsu.com>

The function acpi_bus_remove() can remove a acpi device from acpi device.
When a acpi device is removed, we need to call this function to remove
the acpi device from acpi bus. So export this function.

CC: David Rientjes <rientjes@google.com>
CC: Jiang Liu <liuj97@gmail.com>
CC: Len Brown <len.brown@intel.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 drivers/acpi/scan.c     |    3 ++-
 include/acpi/acpi_bus.h |    1 +
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index d1ecca2..1cefc34 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1224,7 +1224,7 @@ static int acpi_device_set_context(struct acpi_device *device)
 	return -ENODEV;
 }
 
-static int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
+int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
 {
 	if (!dev)
 		return -EINVAL;
@@ -1246,6 +1246,7 @@ static int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
 
 	return 0;
 }
+EXPORT_SYMBOL(acpi_bus_remove);
 
 static int acpi_add_single_object(struct acpi_device **child,
 				  acpi_handle handle, int type,
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index bde976e..2ccf109 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -360,6 +360,7 @@ bool acpi_bus_power_manageable(acpi_handle handle);
 bool acpi_bus_can_wakeup(acpi_handle handle);
 int acpi_power_resource_register_device(struct device *dev, acpi_handle handle);
 void acpi_power_resource_unregister_device(struct device *dev, acpi_handle handle);
+int acpi_bus_remove(struct acpi_device *dev, int rmdevice);
 #ifdef CONFIG_ACPI_PROC_EVENT
 int acpi_bus_generate_proc_event(struct acpi_device *device, u8 type, int data);
 int acpi_bus_generate_proc_event4(const char *class, const char *bid, u8 type, int data);
-- 
1.7.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC v8 PATCH 07/20] memory-hotplug: call acpi_bus_remove() to remove memory device
  2012-08-28 10:00 [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory wency
                   ` (5 preceding siblings ...)
  2012-08-28 10:00 ` [RFC v8 PATCH 06/20] memory-hotplug: export the function acpi_bus_remove() wency
@ 2012-08-28 10:00 ` wency
  2012-08-28 10:00 ` [RFC v8 PATCH 08/20] memory-hotplug: remove /sys/firmware/memmap/X sysfs wency
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: wency @ 2012-08-28 10:00 UTC (permalink / raw)
  To: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux
  Cc: rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim, akpm,
	kosaki.motohiro, isimatu.yasuaki, Wen Congyang

From: Wen Congyang <wency@cn.fujitsu.com>

The memory device has been ejected and powoffed, so we can call
acpi_bus_remove() to remove the memory device from acpi bus.

CC: David Rientjes <rientjes@google.com>
CC: Jiang Liu <liuj97@gmail.com>
CC: Len Brown <len.brown@intel.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 drivers/acpi/acpi_memhotplug.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 9d47458..b152767 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -425,8 +425,9 @@ static void acpi_memory_device_notify(acpi_handle handle, u32 event, void *data)
 		}
 
 		/*
-		 * TBD: Invoke acpi_bus_remove to cleanup data structures
+		 * Invoke acpi_bus_remove() to remove memory device
 		 */
+		acpi_bus_remove(device, 1);
 
 		/* _EJ0 succeeded; _OST is not necessary */
 		return;
-- 
1.7.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC v8 PATCH 08/20] memory-hotplug: remove /sys/firmware/memmap/X sysfs
  2012-08-28 10:00 [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory wency
                   ` (6 preceding siblings ...)
  2012-08-28 10:00 ` [RFC v8 PATCH 07/20] memory-hotplug: call acpi_bus_remove() to remove memory device wency
@ 2012-08-28 10:00 ` wency
  2012-08-31 21:06   ` Andrew Morton
  2012-08-28 10:00 ` [RFC v8 PATCH 09/20] memory-hotplug: does not release memory region in PAGES_PER_SECTION chunks wency
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 43+ messages in thread
From: wency @ 2012-08-28 10:00 UTC (permalink / raw)
  To: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux
  Cc: rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim, akpm,
	kosaki.motohiro, isimatu.yasuaki, Wen Congyang

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type}
sysfs files are created. But there is no code to remove these files. The patch
implements the function to remove them.

Note : The code does not free firmware_map_entry since there is no way to free
       memory which is allocated by bootmem.

CC: David Rientjes <rientjes@google.com>
CC: Jiang Liu <liuj97@gmail.com>
CC: Len Brown <len.brown@intel.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
---
 drivers/firmware/memmap.c    |   78 +++++++++++++++++++++++++++++++++++++++++-
 include/linux/firmware-map.h |    6 +++
 mm/memory_hotplug.c          |    9 ++++-
 3 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/drivers/firmware/memmap.c b/drivers/firmware/memmap.c
index c1cdc92..b2e7e5e 100644
--- a/drivers/firmware/memmap.c
+++ b/drivers/firmware/memmap.c
@@ -21,6 +21,7 @@
 #include <linux/types.h>
 #include <linux/bootmem.h>
 #include <linux/slab.h>
+#include <linux/mm.h>
 
 /*
  * Data types ------------------------------------------------------------------
@@ -79,7 +80,22 @@ static const struct sysfs_ops memmap_attr_ops = {
 	.show = memmap_attr_show,
 };
 
+#define to_memmap_entry(obj) container_of(obj, struct firmware_map_entry, kobj)
+
+static void release_firmware_map_entry(struct kobject *kobj)
+{
+	struct firmware_map_entry *entry = to_memmap_entry(kobj);
+	struct page *page;
+
+	page = virt_to_page(entry);
+	if (PageSlab(page) || PageCompound(page))
+		kfree(entry);
+
+	/* There is no way to free memory allocated from bootmem*/
+}
+
 static struct kobj_type memmap_ktype = {
+	.release	= release_firmware_map_entry,
 	.sysfs_ops	= &memmap_attr_ops,
 	.default_attrs	= def_attrs,
 };
@@ -123,6 +139,16 @@ static int firmware_map_add_entry(u64 start, u64 end,
 	return 0;
 }
 
+/**
+ * firmware_map_remove_entry() - Does the real work to remove a firmware
+ * memmap entry.
+ * @entry: removed entry.
+ **/
+static inline void firmware_map_remove_entry(struct firmware_map_entry *entry)
+{
+	list_del(&entry->list);
+}
+
 /*
  * Add memmap entry on sysfs
  */
@@ -144,6 +170,31 @@ static int add_sysfs_fw_map_entry(struct firmware_map_entry *entry)
 	return 0;
 }
 
+/*
+ * Remove memmap entry on sysfs
+ */
+static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry *entry)
+{
+	kobject_put(&entry->kobj);
+}
+
+/*
+ * Search memmap entry
+ */
+
+struct firmware_map_entry * __meminit
+find_firmware_map_entry(u64 start, u64 end, const char *type)
+{
+	struct firmware_map_entry *entry;
+
+	list_for_each_entry(entry, &map_entries, list)
+		if ((entry->start == start) && (entry->end == end) &&
+		    (!strcmp(entry->type, type)))
+			return entry;
+
+	return NULL;
+}
+
 /**
  * firmware_map_add_hotplug() - Adds a firmware mapping entry when we do
  * memory hotplug.
@@ -196,6 +247,32 @@ int __init firmware_map_add_early(u64 start, u64 end, const char *type)
 	return firmware_map_add_entry(start, end, type, entry);
 }
 
+/**
+ * firmware_map_remove() - remove a firmware mapping entry
+ * @start: Start of the memory range.
+ * @end:   End of the memory range.
+ * @type:  Type of the memory range.
+ *
+ * removes a firmware mapping entry.
+ *
+ * Returns 0 on success, or -EINVAL if no entry.
+ **/
+int __meminit firmware_map_remove(u64 start, u64 end, const char *type)
+{
+	struct firmware_map_entry *entry;
+
+	entry = find_firmware_map_entry(start, end - 1, type);
+	if (!entry)
+		return -EINVAL;
+
+	firmware_map_remove_entry(entry);
+
+	/* remove the memmap entry */
+	remove_sysfs_fw_map_entry(entry);
+
+	return 0;
+}
+
 /*
  * Sysfs functions -------------------------------------------------------------
  */
@@ -218,7 +295,6 @@ static ssize_t type_show(struct firmware_map_entry *entry, char *buf)
 }
 
 #define to_memmap_attr(_attr) container_of(_attr, struct memmap_attribute, attr)
-#define to_memmap_entry(obj) container_of(obj, struct firmware_map_entry, kobj)
 
 static ssize_t memmap_attr_show(struct kobject *kobj,
 				struct attribute *attr, char *buf)
diff --git a/include/linux/firmware-map.h b/include/linux/firmware-map.h
index 43fe52f..71d4fa7 100644
--- a/include/linux/firmware-map.h
+++ b/include/linux/firmware-map.h
@@ -25,6 +25,7 @@
 
 int firmware_map_add_early(u64 start, u64 end, const char *type);
 int firmware_map_add_hotplug(u64 start, u64 end, const char *type);
+int firmware_map_remove(u64 start, u64 end, const char *type);
 
 #else /* CONFIG_FIRMWARE_MEMMAP */
 
@@ -38,6 +39,11 @@ static inline int firmware_map_add_hotplug(u64 start, u64 end, const char *type)
 	return 0;
 }
 
+static inline int firmware_map_remove(u64 start, u64 end, const char *type)
+{
+	return 0;
+}
+
 #endif /* CONFIG_FIRMWARE_MEMMAP */
 
 #endif /* _LINUX_FIRMWARE_MAP_H */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3f1d7c5..45b03b3 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1052,9 +1052,9 @@ int offline_memory(u64 start, u64 size)
 	return 0;
 }
 
-int remove_memory(int nid, u64 start, u64 size)
+int __ref remove_memory(int nid, u64 start, u64 size)
 {
-	int ret = -EBUSY;
+	int ret = 0;
 	lock_memory_hotplug();
 	/*
 	 * The memory might become online by other task, even if you offine it.
@@ -1065,8 +1065,13 @@ int remove_memory(int nid, u64 start, u64 size)
 			"because the memmory range is online\n",
 			start, start + size);
 		ret = -EAGAIN;
+		goto out;
 	}
 
+	/* remove memmap entry */
+	firmware_map_remove(start, start + size, "System RAM");
+
+out:
 	unlock_memory_hotplug();
 	return ret;
 
-- 
1.7.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC v8 PATCH 09/20] memory-hotplug: does not release memory region in PAGES_PER_SECTION chunks
  2012-08-28 10:00 [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory wency
                   ` (7 preceding siblings ...)
  2012-08-28 10:00 ` [RFC v8 PATCH 08/20] memory-hotplug: remove /sys/firmware/memmap/X sysfs wency
@ 2012-08-28 10:00 ` wency
  2012-08-28 10:00 ` [RFC v8 PATCH 10/20] memory-hotplug: add memory_block_release wency
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: wency @ 2012-08-28 10:00 UTC (permalink / raw)
  To: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux
  Cc: rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim, akpm,
	kosaki.motohiro, isimatu.yasuaki, Wen Congyang

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

Since applying a patch(de7f0cba96786c), release_mem_region() has been changed
as called in PAGES_PER_SECTION chunks because register_memory_resource() is
called in PAGES_PER_SECTION chunks by add_memory(). But it seems firmware
dependency. If CRS are written in the PAGES_PER_SECTION chunks in ACPI DSDT
Table, register_memory_resource() is called in PAGES_PER_SECTION chunks.
But if CRS are written in the DIMM unit in ACPI DSDT Table,
register_memory_resource() is called in DIMM unit. So release_mem_region()
should not be called in PAGES_PER_SECTION chunks. The patch fixes it.

CC: David Rientjes <rientjes@google.com>
CC: Jiang Liu <liuj97@gmail.com>
CC: Len Brown <len.brown@intel.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |   13 +++++++++----
 mm/memory_hotplug.c                             |    4 ++--
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 11d8e05..dc0a035 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -77,7 +77,8 @@ static int pseries_remove_memblock(unsigned long base, unsigned int memblock_siz
 {
 	unsigned long start, start_pfn;
 	struct zone *zone;
-	int ret;
+	int i, ret;
+	int sections_to_remove;
 
 	start_pfn = base >> PAGE_SHIFT;
 
@@ -97,9 +98,13 @@ static int pseries_remove_memblock(unsigned long base, unsigned int memblock_siz
 	 * to sysfs "state" file and we can't remove sysfs entries
 	 * while writing to it. So we have to defer it to here.
 	 */
-	ret = __remove_pages(zone, start_pfn, memblock_size >> PAGE_SHIFT);
-	if (ret)
-		return ret;
+	sections_to_remove = (memblock_size >> PAGE_SHIFT) / PAGES_PER_SECTION;
+	for (i = 0; i < sections_to_remove; i++) {
+		unsigned long pfn = start_pfn + i * PAGES_PER_SECTION;
+		ret = __remove_pages(zone, start_pfn,  PAGES_PER_SECTION);
+		if (ret)
+			return ret;
+	}
 
 	/*
 	 * Update memory regions for memory remove
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 45b03b3..29aff4d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -358,11 +358,11 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 	BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK);
 	BUG_ON(nr_pages % PAGES_PER_SECTION);
 
+	release_mem_region(phys_start_pfn << PAGE_SHIFT,  nr_pages * PAGE_SIZE);
+
 	sections_to_remove = nr_pages / PAGES_PER_SECTION;
 	for (i = 0; i < sections_to_remove; i++) {
 		unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
-		release_mem_region(pfn << PAGE_SHIFT,
-				   PAGES_PER_SECTION << PAGE_SHIFT);
 		ret = __remove_section(zone, __pfn_to_section(pfn));
 		if (ret)
 			break;
-- 
1.7.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC v8 PATCH 10/20] memory-hotplug: add memory_block_release
  2012-08-28 10:00 [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory wency
                   ` (8 preceding siblings ...)
  2012-08-28 10:00 ` [RFC v8 PATCH 09/20] memory-hotplug: does not release memory region in PAGES_PER_SECTION chunks wency
@ 2012-08-28 10:00 ` wency
  2012-08-28 10:00 ` [RFC v8 PATCH 11/20] memory-hotplug: remove_memory calls __remove_pages wency
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: wency @ 2012-08-28 10:00 UTC (permalink / raw)
  To: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux
  Cc: rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim, akpm,
	kosaki.motohiro, isimatu.yasuaki, Wen Congyang

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

When calling remove_memory_block(), the function shows following message at
device_release().

Device 'memory528' does not have a release() function, it is broken and must
be fixed.

remove_memory_block() calls kfree(mem). I think it shouled be called from
device_release(). So the patch implements memory_block_release()

CC: David Rientjes <rientjes@google.com>
CC: Jiang Liu <liuj97@gmail.com>
CC: Len Brown <len.brown@intel.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
---
 drivers/base/memory.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 038be73..1cd3ef3 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -109,6 +109,15 @@ bool is_memblk_offline(unsigned long start, unsigned long size)
 }
 EXPORT_SYMBOL(is_memblk_offline);
 
+#define to_memory_block(device) container_of(device, struct memory_block, dev)
+
+static void release_memory_block(struct device *dev)
+{
+	struct memory_block *mem = to_memory_block(dev);
+
+	kfree(mem);
+}
+
 /*
  * register_memory - Setup a sysfs device for a memory block
  */
@@ -119,6 +128,7 @@ int register_memory(struct memory_block *memory)
 
 	memory->dev.bus = &memory_subsys;
 	memory->dev.id = memory->start_section_nr / sections_per_block;
+	memory->dev.release = release_memory_block;
 
 	error = device_register(&memory->dev);
 	return error;
@@ -674,7 +684,6 @@ int remove_memory_block(unsigned long node_id, struct mem_section *section,
 		mem_remove_simple_file(mem, phys_device);
 		mem_remove_simple_file(mem, removable);
 		unregister_memory(mem);
-		kfree(mem);
 	} else
 		kobject_put(&mem->dev.kobj);
 
-- 
1.7.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC v8 PATCH 11/20] memory-hotplug: remove_memory calls __remove_pages
  2012-08-28 10:00 [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory wency
                   ` (9 preceding siblings ...)
  2012-08-28 10:00 ` [RFC v8 PATCH 10/20] memory-hotplug: add memory_block_release wency
@ 2012-08-28 10:00 ` wency
  2012-08-28 10:00 ` [RFC v8 PATCH 12/20] memory-hotplug: introduce new function arch_remove_memory() wency
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: wency @ 2012-08-28 10:00 UTC (permalink / raw)
  To: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux
  Cc: rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim, akpm,
	kosaki.motohiro, isimatu.yasuaki, Wen Congyang

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

The patch adds __remove_pages() to remove_memory(). Then the range of
phys_start_pfn argument and nr_pages argument in __remove_pagse() may
have different zone. So zone argument is removed from __remove_pages()
and __remove_pages() caluculates zone in each section.

When CONFIG_SPARSEMEM_VMEMMAP is defined, there is no way to remove a memmap.
So __remove_section only calls unregister_memory_section().

CC: David Rientjes <rientjes@google.com>
CC: Jiang Liu <liuj97@gmail.com>
CC: Len Brown <len.brown@intel.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |    5 +----
 include/linux/memory_hotplug.h                  |    3 +--
 mm/memory_hotplug.c                             |   18 +++++++++++-------
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index dc0a035..cc14da4 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -76,7 +76,6 @@ unsigned long memory_block_size_bytes(void)
 static int pseries_remove_memblock(unsigned long base, unsigned int memblock_size)
 {
 	unsigned long start, start_pfn;
-	struct zone *zone;
 	int i, ret;
 	int sections_to_remove;
 
@@ -87,8 +86,6 @@ static int pseries_remove_memblock(unsigned long base, unsigned int memblock_siz
 		return 0;
 	}
 
-	zone = page_zone(pfn_to_page(start_pfn));
-
 	/*
 	 * Remove section mappings and sysfs entries for the
 	 * section of the memory we are removing.
@@ -101,7 +98,7 @@ static int pseries_remove_memblock(unsigned long base, unsigned int memblock_siz
 	sections_to_remove = (memblock_size >> PAGE_SHIFT) / PAGES_PER_SECTION;
 	for (i = 0; i < sections_to_remove; i++) {
 		unsigned long pfn = start_pfn + i * PAGES_PER_SECTION;
-		ret = __remove_pages(zone, start_pfn,  PAGES_PER_SECTION);
+		ret = __remove_pages(start_pfn,  PAGES_PER_SECTION);
 		if (ret)
 			return ret;
 	}
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index fd84ea9..8bf820d 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -90,8 +90,7 @@ extern bool is_pageblock_removable_nolock(struct page *page);
 /* reasonably generic interface to expand the physical pages in a zone  */
 extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn,
 	unsigned long nr_pages);
-extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
-	unsigned long nr_pages);
+extern int __remove_pages(unsigned long start_pfn, unsigned long nr_pages);
 
 #ifdef CONFIG_NUMA
 extern int memory_add_physaddr_to_nid(u64 start);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 29aff4d..713f1b9 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -275,11 +275,14 @@ static int __meminit __add_section(int nid, struct zone *zone,
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 static int __remove_section(struct zone *zone, struct mem_section *ms)
 {
-	/*
-	 * XXX: Freeing memmap with vmemmap is not implement yet.
-	 *      This should be removed later.
-	 */
-	return -EBUSY;
+	int ret = -EINVAL;
+
+	if (!valid_section(ms))
+		return ret;
+
+	ret = unregister_memory_section(ms);
+
+	return ret;
 }
 #else
 static int __remove_section(struct zone *zone, struct mem_section *ms)
@@ -346,11 +349,11 @@ EXPORT_SYMBOL_GPL(__add_pages);
  * sure that pages are marked reserved and zones are adjust properly by
  * calling offline_pages().
  */
-int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
-		 unsigned long nr_pages)
+int __remove_pages(unsigned long phys_start_pfn, unsigned long nr_pages)
 {
 	unsigned long i, ret = 0;
 	int sections_to_remove;
+	struct zone *zone;
 
 	/*
 	 * We can only remove entire sections
@@ -363,6 +366,7 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 	sections_to_remove = nr_pages / PAGES_PER_SECTION;
 	for (i = 0; i < sections_to_remove; i++) {
 		unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
+		zone = page_zone(pfn_to_page(pfn));
 		ret = __remove_section(zone, __pfn_to_section(pfn));
 		if (ret)
 			break;
-- 
1.7.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC v8 PATCH 12/20] memory-hotplug: introduce new function arch_remove_memory()
  2012-08-28 10:00 [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory wency
                   ` (10 preceding siblings ...)
  2012-08-28 10:00 ` [RFC v8 PATCH 11/20] memory-hotplug: remove_memory calls __remove_pages wency
@ 2012-08-28 10:00 ` wency
  2012-08-28 10:00 ` [RFC v8 PATCH 13/20] memory-hotplug: check page type in get_page_bootmem wency
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: wency @ 2012-08-28 10:00 UTC (permalink / raw)
  To: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux
  Cc: rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim, akpm,
	kosaki.motohiro, isimatu.yasuaki, Wen Congyang

From: Wen Congyang <wency@cn.fujitsu.com>

We don't call __add_pages() directly in the function add_memory()
because some other architecture related things need to be done
before or after calling __add_pages(). So we should introduce
a new function arch_remove_memory() to revert the things
done in arch_add_memory().

Note: the function for s390 is not implemented(I don't know how to
implement it for s390).

CC: David Rientjes <rientjes@google.com>
CC: Jiang Liu <liuj97@gmail.com>
CC: Len Brown <len.brown@intel.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 arch/ia64/mm/init.c                  |   16 ++++
 arch/powerpc/mm/mem.c                |   14 +++
 arch/s390/mm/init.c                  |   12 +++
 arch/sh/mm/init.c                    |   15 +++
 arch/tile/mm/init.c                  |    8 ++
 arch/x86/include/asm/pgtable_types.h |    1 +
 arch/x86/mm/init_32.c                |   10 ++
 arch/x86/mm/init_64.c                |  160 ++++++++++++++++++++++++++++++++++
 arch/x86/mm/pageattr.c               |   47 +++++-----
 include/linux/memory_hotplug.h       |    1 +
 mm/memory_hotplug.c                  |    1 +
 11 files changed, 263 insertions(+), 22 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 0eab454..1e345ed 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -688,6 +688,22 @@ int arch_add_memory(int nid, u64 start, u64 size)
 
 	return ret;
 }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+	unsigned long start_pfn = start >> PAGE_SHIFT;
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+	int ret;
+
+	ret = __remove_pages(start_pfn, nr_pages);
+	if (ret)
+		pr_warn("%s: Problem encountered in __remove_pages() as"
+			" ret=%d\n", __func__,  ret);
+
+	return ret;
+}
+#endif
 #endif
 
 /*
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index fbdad0e..011170b 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -133,6 +133,20 @@ int arch_add_memory(int nid, u64 start, u64 size)
 
 	return __add_pages(nid, zone, start_pfn, nr_pages);
 }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+	unsigned long start_pfn = start >> PAGE_SHIFT;
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+
+	start = (unsigned long)__va(start);
+	if (remove_section_mapping(start, start + size))
+		return -EINVAL;
+
+	return __remove_pages(start_pfn, nr_pages);
+}
+#endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 /*
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 6adbc08..501b20e 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -257,4 +257,16 @@ int arch_add_memory(int nid, u64 start, u64 size)
 		vmem_remove_mapping(start, size);
 	return rc;
 }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+	/*
+	 * There is no hardware or firmware interface which could trigger a
+	 * hot memory remove on s390. So there is nothing that needs to be
+	 * implemented.
+	 */
+	return -EBUSY;
+}
+#endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 82cc576..fc84491 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -558,4 +558,19 @@ int memory_add_physaddr_to_nid(u64 addr)
 EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
 #endif
 
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+	unsigned long start_pfn = start >> PAGE_SHIFT;
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+	int ret;
+
+	ret = __remove_pages(start_pfn, nr_pages);
+	if (unlikely(ret))
+		pr_warn("%s: Failed, __remove_pages() == %d\n", __func__,
+			ret);
+
+	return ret;
+}
+#endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/tile/mm/init.c b/arch/tile/mm/init.c
index ef29d6c..2749515 100644
--- a/arch/tile/mm/init.c
+++ b/arch/tile/mm/init.c
@@ -935,6 +935,14 @@ int remove_memory(u64 start, u64 size)
 {
 	return -EINVAL;
 }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+	/* TODO */
+	return -EBUSY;
+}
+#endif
 #endif
 
 struct kmem_cache *pgd_cache;
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 013286a..b725af2 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -334,6 +334,7 @@ static inline void update_page_count(int level, unsigned long pages) { }
  * as a pte too.
  */
 extern pte_t *lookup_address(unsigned long address, unsigned int *level);
+extern int __split_large_page(pte_t *kpte, unsigned long address, pte_t *pbase);
 
 #endif	/* !__ASSEMBLY__ */
 
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 575d86f..41eefe8 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -842,6 +842,16 @@ int arch_add_memory(int nid, u64 start, u64 size)
 
 	return __add_pages(nid, zone, start_pfn, nr_pages);
 }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+	unsigned long start_pfn = start >> PAGE_SHIFT;
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+
+	return __remove_pages(start_pfn, nr_pages);
+}
+#endif
 #endif
 
 /*
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 2b6b4a3..e0d88ba 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -675,6 +675,166 @@ int arch_add_memory(int nid, u64 start, u64 size)
 }
 EXPORT_SYMBOL_GPL(arch_add_memory);
 
+static void __meminit
+phys_pte_remove(pte_t *pte_page, unsigned long addr, unsigned long end)
+{
+	unsigned pages = 0;
+	int i = pte_index(addr);
+
+	pte_t *pte = pte_page + pte_index(addr);
+
+	for (; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE, pte++) {
+
+		if (addr >= end)
+			break;
+
+		if (!pte_present(*pte))
+			continue;
+
+		pages++;
+		set_pte(pte, __pte(0));
+	}
+
+	update_page_count(PG_LEVEL_4K, -pages);
+}
+
+static void __meminit
+phys_pmd_remove(pmd_t *pmd_page, unsigned long addr, unsigned long end)
+{
+	unsigned long pages = 0, next;
+	int i = pmd_index(addr);
+
+	for (; i < PTRS_PER_PMD; i++, addr = next) {
+		unsigned long pte_phys;
+		pmd_t *pmd = pmd_page + pmd_index(addr);
+		pte_t *pte;
+
+		if (addr >= end)
+			break;
+
+		next = (addr & PMD_MASK) + PMD_SIZE;
+
+		if (!pmd_present(*pmd))
+			continue;
+
+		if (pmd_large(*pmd)) {
+			if ((addr & ~PMD_MASK) == 0 && next <= end) {
+				set_pmd(pmd, __pmd(0));
+				pages++;
+				continue;
+			}
+
+			/*
+			 * We use 2M page, but we need to remove part of them,
+			 * so split 2M page to 4K page.
+			 */
+			pte = alloc_low_page(&pte_phys);
+			__split_large_page((pte_t *)pmd, addr, pte);
+
+			spin_lock(&init_mm.page_table_lock);
+			pmd_populate_kernel(&init_mm, pmd, __va(pte_phys));
+			spin_unlock(&init_mm.page_table_lock);
+		}
+
+		spin_lock(&init_mm.page_table_lock);
+		pte = map_low_page((pte_t *)pmd_page_vaddr(*pmd));
+		phys_pte_remove(pte, addr, end);
+		unmap_low_page(pte);
+		spin_unlock(&init_mm.page_table_lock);
+	}
+	update_page_count(PG_LEVEL_2M, -pages);
+}
+
+static void __meminit
+phys_pud_remove(pud_t *pud_page, unsigned long addr, unsigned long end)
+{
+	unsigned long pages = 0, next;
+	int i = pud_index(addr);
+
+	for (; i < PTRS_PER_PUD; i++, addr = next) {
+		unsigned long pmd_phys;
+		pud_t *pud = pud_page + pud_index(addr);
+		pmd_t *pmd;
+
+		if (addr >= end)
+			break;
+
+		next = (addr & PUD_MASK) + PUD_SIZE;
+
+		if (!pud_present(*pud))
+			continue;
+
+		if (pud_large(*pud)) {
+			if ((addr & ~PUD_MASK) == 0 && next <= end) {
+				set_pud(pud, __pud(0));
+				pages++;
+				continue;
+			}
+
+			/*
+			 * We use 1G page, but we need to remove part of them,
+			 * so split 1G page to 2M page.
+			 */
+			pmd = alloc_low_page(&pmd_phys);
+			__split_large_page((pte_t *)pud, addr, (pte_t *)pmd);
+
+			spin_lock(&init_mm.page_table_lock);
+			pud_populate(&init_mm, pud, __va(pmd_phys));
+			spin_unlock(&init_mm.page_table_lock);
+		}
+
+		pmd = map_low_page(pmd_offset(pud, 0));
+		phys_pmd_remove(pmd, addr, end);
+		unmap_low_page(pmd);
+		__flush_tlb_all();
+	}
+	__flush_tlb_all();
+
+	update_page_count(PG_LEVEL_1G, -pages);
+}
+
+void __meminit
+kernel_physical_mapping_remove(unsigned long start, unsigned long end)
+{
+	unsigned long next;
+
+	start = (unsigned long)__va(start);
+	end = (unsigned long)__va(end);
+
+	for (; start < end; start = next) {
+		pgd_t *pgd = pgd_offset_k(start);
+		pud_t *pud;
+
+		next = (start + PGDIR_SIZE) & PGDIR_MASK;
+		if (next > end)
+			next = end;
+
+		if (!pgd_present(*pgd))
+			continue;
+
+		pud = map_low_page((pud_t *)pgd_page_vaddr(*pgd));
+		phys_pud_remove(pud, __pa(start), __pa(end));
+		unmap_low_page(pud);
+	}
+
+	__flush_tlb_all();
+}
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int __ref arch_remove_memory(u64 start, u64 size)
+{
+	unsigned long start_pfn = start >> PAGE_SHIFT;
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+	int ret;
+
+	ret = __remove_pages(start_pfn, nr_pages);
+	WARN_ON_ONCE(ret);
+
+	kernel_physical_mapping_remove(start, start + size);
+
+	return ret;
+}
+#endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 static struct kcore_list kcore_vsyscall;
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index a718e0d..7dcb6f9 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -501,21 +501,13 @@ out_unlock:
 	return do_split;
 }
 
-static int split_large_page(pte_t *kpte, unsigned long address)
+int __split_large_page(pte_t *kpte, unsigned long address, pte_t *pbase)
 {
 	unsigned long pfn, pfninc = 1;
 	unsigned int i, level;
-	pte_t *pbase, *tmp;
+	pte_t *tmp;
 	pgprot_t ref_prot;
-	struct page *base;
-
-	if (!debug_pagealloc)
-		spin_unlock(&cpa_lock);
-	base = alloc_pages(GFP_KERNEL | __GFP_NOTRACK, 0);
-	if (!debug_pagealloc)
-		spin_lock(&cpa_lock);
-	if (!base)
-		return -ENOMEM;
+	struct page *base = virt_to_page(pbase);
 
 	spin_lock(&pgd_lock);
 	/*
@@ -523,10 +515,11 @@ static int split_large_page(pte_t *kpte, unsigned long address)
 	 * up for us already:
 	 */
 	tmp = lookup_address(address, &level);
-	if (tmp != kpte)
-		goto out_unlock;
+	if (tmp != kpte) {
+		spin_unlock(&pgd_lock);
+		return 1;
+	}
 
-	pbase = (pte_t *)page_address(base);
 	paravirt_alloc_pte(&init_mm, page_to_pfn(base));
 	ref_prot = pte_pgprot(pte_clrhuge(*kpte));
 	/*
@@ -579,17 +572,27 @@ static int split_large_page(pte_t *kpte, unsigned long address)
 	 * going on.
 	 */
 	__flush_tlb_all();
+	spin_unlock(&pgd_lock);
 
-	base = NULL;
+	return 0;
+}
 
-out_unlock:
-	/*
-	 * If we dropped out via the lookup_address check under
-	 * pgd_lock then stick the page back into the pool:
-	 */
-	if (base)
+static int split_large_page(pte_t *kpte, unsigned long address)
+{
+	pte_t *pbase;
+	struct page *base;
+
+	if (!debug_pagealloc)
+		spin_unlock(&cpa_lock);
+	base = alloc_pages(GFP_KERNEL | __GFP_NOTRACK, 0);
+	if (!debug_pagealloc)
+		spin_lock(&cpa_lock);
+	if (!base)
+		return -ENOMEM;
+
+	pbase = (pte_t *)page_address(base);
+	if (__split_large_page(kpte, address, pbase))
 		__free_page(base);
-	spin_unlock(&pgd_lock);
 
 	return 0;
 }
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 8bf820d..cdbbd79 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -85,6 +85,7 @@ extern void __online_page_free(struct page *page);
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
 extern bool is_pageblock_removable_nolock(struct page *page);
+extern int arch_remove_memory(u64 start, u64 size);
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
 /* reasonably generic interface to expand the physical pages in a zone  */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 713f1b9..5f9f8c7 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1075,6 +1075,7 @@ int __ref remove_memory(int nid, u64 start, u64 size)
 	/* remove memmap entry */
 	firmware_map_remove(start, start + size, "System RAM");
 
+	arch_remove_memory(start, size);
 out:
 	unlock_memory_hotplug();
 	return ret;
-- 
1.7.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC v8 PATCH 13/20] memory-hotplug: check page type in get_page_bootmem
  2012-08-28 10:00 [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory wency
                   ` (11 preceding siblings ...)
  2012-08-28 10:00 ` [RFC v8 PATCH 12/20] memory-hotplug: introduce new function arch_remove_memory() wency
@ 2012-08-28 10:00 ` wency
  2012-08-31 21:30   ` Andrew Morton
  2012-08-28 10:00 ` [RFC v8 PATCH 14/20] memory-hotplug: move register_page_bootmem_info_node and put_page_bootmem for sparse-vmemmap wency
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 43+ messages in thread
From: wency @ 2012-08-28 10:00 UTC (permalink / raw)
  To: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux
  Cc: rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim, akpm,
	kosaki.motohiro, isimatu.yasuaki, Wen Congyang

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

There is a possibility that get_page_bootmem() is called to the same page many
times. So when get_page_bootmem is called to the same page, the function only
increments page->_count.

CC: David Rientjes <rientjes@google.com>
CC: Jiang Liu <liuj97@gmail.com>
CC: Len Brown <len.brown@intel.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
---
 mm/memory_hotplug.c |   15 +++++++++++----
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 5f9f8c7..d85af6d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -95,10 +95,17 @@ static void release_memory_resource(struct resource *res)
 static void get_page_bootmem(unsigned long info,  struct page *page,
 			     unsigned long type)
 {
-	page->lru.next = (struct list_head *) type;
-	SetPagePrivate(page);
-	set_page_private(page, info);
-	atomic_inc(&page->_count);
+	unsigned long page_type;
+
+	page_type = (unsigned long) page->lru.next;
+	if (page_type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
+	    page_type > MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE){
+		page->lru.next = (struct list_head *) type;
+		SetPagePrivate(page);
+		set_page_private(page, info);
+		atomic_inc(&page->_count);
+	} else
+		atomic_inc(&page->_count);
 }
 
 /* reference to __meminit __free_pages_bootmem is valid
-- 
1.7.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC v8 PATCH 14/20] memory-hotplug: move register_page_bootmem_info_node and put_page_bootmem for sparse-vmemmap
  2012-08-28 10:00 [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory wency
                   ` (12 preceding siblings ...)
  2012-08-28 10:00 ` [RFC v8 PATCH 13/20] memory-hotplug: check page type in get_page_bootmem wency
@ 2012-08-28 10:00 ` wency
  2012-08-28 10:00 ` [RFC v8 PATCH 15/20] memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap wency
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: wency @ 2012-08-28 10:00 UTC (permalink / raw)
  To: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux
  Cc: rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim, akpm,
	kosaki.motohiro, isimatu.yasuaki, Wen Congyang

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

For implementing register_page_bootmem_info_node of sparse-vmemmap,
register_page_bootmem_info_node and put_page_bootmem are moved to
memory_hotplug.c

CC: David Rientjes <rientjes@google.com>
CC: Jiang Liu <liuj97@gmail.com>
CC: Len Brown <len.brown@intel.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
---
 include/linux/memory_hotplug.h |    9 ---------
 mm/memory_hotplug.c            |    8 ++++++--
 2 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index cdbbd79..1133e63 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -162,17 +162,8 @@ static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat)
 #endif /* CONFIG_NUMA */
 #endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */
 
-#ifdef CONFIG_SPARSEMEM_VMEMMAP
-static inline void register_page_bootmem_info_node(struct pglist_data *pgdat)
-{
-}
-static inline void put_page_bootmem(struct page *page)
-{
-}
-#else
 extern void register_page_bootmem_info_node(struct pglist_data *pgdat);
 extern void put_page_bootmem(struct page *page);
-#endif
 
 /*
  * Lock for memory hotplug guarantees 1) all callbacks for memory hotplug
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d85af6d..3ca66bc 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -91,7 +91,6 @@ static void release_memory_resource(struct resource *res)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
-#ifndef CONFIG_SPARSEMEM_VMEMMAP
 static void get_page_bootmem(unsigned long info,  struct page *page,
 			     unsigned long type)
 {
@@ -127,6 +126,7 @@ void __ref put_page_bootmem(struct page *page)
 
 }
 
+#ifndef CONFIG_SPARSEMEM_VMEMMAP
 static void register_page_bootmem_info_section(unsigned long start_pfn)
 {
 	unsigned long *usemap, mapsize, section_nr, i;
@@ -163,6 +163,11 @@ static void register_page_bootmem_info_section(unsigned long start_pfn)
 		get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
 
 }
+#else
+static inline void register_page_bootmem_info_section(unsigned long start_pfn)
+{
+}
+#endif
 
 void register_page_bootmem_info_node(struct pglist_data *pgdat)
 {
@@ -198,7 +203,6 @@ void register_page_bootmem_info_node(struct pglist_data *pgdat)
 		register_page_bootmem_info_section(pfn);
 
 }
-#endif /* !CONFIG_SPARSEMEM_VMEMMAP */
 
 static void grow_zone_span(struct zone *zone, unsigned long start_pfn,
 			   unsigned long end_pfn)
-- 
1.7.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC v8 PATCH 15/20] memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap
  2012-08-28 10:00 [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory wency
                   ` (13 preceding siblings ...)
  2012-08-28 10:00 ` [RFC v8 PATCH 14/20] memory-hotplug: move register_page_bootmem_info_node and put_page_bootmem for sparse-vmemmap wency
@ 2012-08-28 10:00 ` wency
  2012-08-28 10:00 ` [RFC v8 PATCH 16/20] memory-hotplug: free memmap " wency
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: wency @ 2012-08-28 10:00 UTC (permalink / raw)
  To: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux
  Cc: rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim, akpm,
	kosaki.motohiro, isimatu.yasuaki, Wen Congyang

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

For removing memmap region of sparse-vmemmap which is allocated bootmem,
memmap region of sparse-vmemmap needs to be registered by get_page_bootmem().
So the patch searches pages of virtual mapping and registers the pages by
get_page_bootmem().

Note: register_page_bootmem_memmap() is not implemented for ia64, ppc, s390,
and sparc.

CC: David Rientjes <rientjes@google.com>
CC: Jiang Liu <liuj97@gmail.com>
CC: Len Brown <len.brown@intel.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 arch/ia64/mm/discontig.c       |    6 ++++
 arch/powerpc/mm/init_64.c      |    6 ++++
 arch/s390/mm/vmem.c            |    6 ++++
 arch/sparc/mm/init_64.c        |    6 ++++
 arch/x86/mm/init_64.c          |   52 ++++++++++++++++++++++++++++++++++++++++
 include/linux/memory_hotplug.h |    2 +
 include/linux/mm.h             |    3 +-
 mm/memory_hotplug.c            |   31 +++++++++++++++++++++--
 8 files changed, 108 insertions(+), 4 deletions(-)

diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
index c641333..33943db 100644
--- a/arch/ia64/mm/discontig.c
+++ b/arch/ia64/mm/discontig.c
@@ -822,4 +822,10 @@ int __meminit vmemmap_populate(struct page *start_page,
 {
 	return vmemmap_populate_basepages(start_page, size, node);
 }
+
+void register_page_bootmem_memmap(unsigned long section_nr,
+				  struct page *start_page, unsigned long size)
+{
+	/* TODO */
+}
 #endif
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 620b7ac..3690c44 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -298,5 +298,11 @@ int __meminit vmemmap_populate(struct page *start_page,
 
 	return 0;
 }
+
+void register_page_bootmem_memmap(unsigned long section_nr,
+				  struct page *start_page, unsigned long size)
+{
+	/* TODO */
+}
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 6f896e7..eda55cd 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -227,6 +227,12 @@ out:
 	return ret;
 }
 
+void register_page_bootmem_memmap(unsigned long section_nr,
+				  struct page *start_page, unsigned long size)
+{
+	/* TODO */
+}
+
 /*
  * Add memory segment to the segment list if it doesn't overlap with
  * an already present segment.
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index d58edf5..add1cc7 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2077,6 +2077,12 @@ void __meminit vmemmap_populate_print_last(void)
 		node_start = 0;
 	}
 }
+
+void register_page_bootmem_memmap(unsigned long section_nr,
+				  struct page *start_page, unsigned long size)
+{
+	/* TODO */
+}
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
 static void prot_init_common(unsigned long page_none,
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index e0d88ba..0075592 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1138,6 +1138,58 @@ vmemmap_populate(struct page *start_page, unsigned long size, int node)
 	return 0;
 }
 
+void register_page_bootmem_memmap(unsigned long section_nr,
+				  struct page *start_page, unsigned long size)
+{
+	unsigned long addr = (unsigned long)start_page;
+	unsigned long end = (unsigned long)(start_page + size);
+	unsigned long next;
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+
+	for (; addr < end; addr = next) {
+		pte_t *pte = NULL;
+
+		pgd = pgd_offset_k(addr);
+		if (pgd_none(*pgd)) {
+			next = (addr + PAGE_SIZE) & PAGE_MASK;
+			continue;
+		}
+		get_page_bootmem(section_nr, pgd_page(*pgd), MIX_SECTION_INFO);
+
+		pud = pud_offset(pgd, addr);
+		if (pud_none(*pud)) {
+			next = (addr + PAGE_SIZE) & PAGE_MASK;
+			continue;
+		}
+		get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO);
+
+		if (!cpu_has_pse) {
+			next = (addr + PAGE_SIZE) & PAGE_MASK;
+			pmd = pmd_offset(pud, addr);
+			if (pmd_none(*pmd))
+				continue;
+			get_page_bootmem(section_nr, pmd_page(*pmd),
+					 MIX_SECTION_INFO);
+
+			pte = pte_offset_kernel(pmd, addr);
+			if (pte_none(*pte))
+				continue;
+			get_page_bootmem(section_nr, pte_page(*pte),
+					 SECTION_INFO);
+		} else {
+			next = pmd_addr_end(addr, end);
+
+			pmd = pmd_offset(pud, addr);
+			if (pmd_none(*pmd))
+				continue;
+			get_page_bootmem(section_nr, pmd_page(*pmd),
+					 SECTION_INFO);
+		}
+	}
+}
+
 void __meminit vmemmap_populate_print_last(void)
 {
 	if (p_start) {
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 1133e63..2d18235 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -164,6 +164,8 @@ static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat)
 
 extern void register_page_bootmem_info_node(struct pglist_data *pgdat);
 extern void put_page_bootmem(struct page *page);
+extern void get_page_bootmem(unsigned long ingo, struct page *page,
+			     unsigned long type);
 
 /*
  * Lock for memory hotplug guarantees 1) all callbacks for memory hotplug
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 311be90..c607913 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1618,7 +1618,8 @@ int vmemmap_populate_basepages(struct page *start_page,
 						unsigned long pages, int node);
 int vmemmap_populate(struct page *start_page, unsigned long pages, int node);
 void vmemmap_populate_print_last(void);
-
+void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
+				  unsigned long size);
 
 enum mf_flags {
 	MF_COUNT_INCREASED = 1 << 0,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3ca66bc..d8495ff 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -91,8 +91,8 @@ static void release_memory_resource(struct resource *res)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
-static void get_page_bootmem(unsigned long info,  struct page *page,
-			     unsigned long type)
+void get_page_bootmem(unsigned long info,  struct page *page,
+		      unsigned long type)
 {
 	unsigned long page_type;
 
@@ -164,8 +164,33 @@ static void register_page_bootmem_info_section(unsigned long start_pfn)
 
 }
 #else
-static inline void register_page_bootmem_info_section(unsigned long start_pfn)
+static void register_page_bootmem_info_section(unsigned long start_pfn)
 {
+	unsigned long *usemap, mapsize, section_nr, i;
+	struct mem_section *ms;
+	struct page *page, *memmap;
+
+	if (!pfn_valid(start_pfn))
+		return;
+
+	section_nr = pfn_to_section_nr(start_pfn);
+	ms = __nr_to_section(section_nr);
+
+	memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
+
+	page = virt_to_page(memmap);
+	mapsize = sizeof(struct page) * PAGES_PER_SECTION;
+	mapsize = PAGE_ALIGN(mapsize) >> PAGE_SHIFT;
+
+	register_page_bootmem_memmap(section_nr, memmap, PAGES_PER_SECTION);
+
+	usemap = __nr_to_section(section_nr)->pageblock_flags;
+	page = virt_to_page(usemap);
+
+	mapsize = PAGE_ALIGN(usemap_size()) >> PAGE_SHIFT;
+
+	for (i = 0; i < mapsize; i++, page++)
+		get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
 }
 #endif
 
-- 
1.7.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC v8 PATCH 16/20] memory-hotplug: free memmap of sparse-vmemmap
  2012-08-28 10:00 [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory wency
                   ` (14 preceding siblings ...)
  2012-08-28 10:00 ` [RFC v8 PATCH 15/20] memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap wency
@ 2012-08-28 10:00 ` wency
  2012-08-28 10:00 ` [RFC v8 PATCH 17/20] memory_hotplug: clear zone when the memory is removed wency
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: wency @ 2012-08-28 10:00 UTC (permalink / raw)
  To: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux
  Cc: rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim, akpm,
	kosaki.motohiro, isimatu.yasuaki, Wen Congyang

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

All pages of virtual mapping in removed memory cannot be freed, since some pages
used as PGD/PUD includes not only removed memory but also other memory. So the
patch checks whether page can be freed or not.

How to check whether page can be freed or not?
 1. When removing memory, the page structs of the revmoved memory are filled
    with 0FD.
 2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
    In this case, the page used as PT/PMD can be freed.

Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated
into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted.

Note:  vmemmap_kfree() and vmemmap_free_bootmem() are not implemented for ia64,
ppc, s390, and sparc.

CC: David Rientjes <rientjes@google.com>
CC: Jiang Liu <liuj97@gmail.com>
CC: Len Brown <len.brown@intel.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
---
 arch/ia64/mm/discontig.c  |    8 +++
 arch/powerpc/mm/init_64.c |    8 +++
 arch/s390/mm/vmem.c       |    8 +++
 arch/sparc/mm/init_64.c   |    8 +++
 arch/x86/mm/init_64.c     |  119 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/mm.h        |    2 +
 mm/memory_hotplug.c       |   17 +------
 mm/sparse.c               |    5 +-
 8 files changed, 158 insertions(+), 17 deletions(-)

diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
index 33943db..0d23b69 100644
--- a/arch/ia64/mm/discontig.c
+++ b/arch/ia64/mm/discontig.c
@@ -823,6 +823,14 @@ int __meminit vmemmap_populate(struct page *start_page,
 	return vmemmap_populate_basepages(start_page, size, node);
 }
 
+void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
+{
+}
+
+void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
+{
+}
+
 void register_page_bootmem_memmap(unsigned long section_nr,
 				  struct page *start_page, unsigned long size)
 {
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 3690c44..835a2b3 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -299,6 +299,14 @@ int __meminit vmemmap_populate(struct page *start_page,
 	return 0;
 }
 
+void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
+{
+}
+
+void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
+{
+}
+
 void register_page_bootmem_memmap(unsigned long section_nr,
 				  struct page *start_page, unsigned long size)
 {
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index eda55cd..4b42b0b 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -227,6 +227,14 @@ out:
 	return ret;
 }
 
+void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
+{
+}
+
+void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
+{
+}
+
 void register_page_bootmem_memmap(unsigned long section_nr,
 				  struct page *start_page, unsigned long size)
 {
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index add1cc7..1384826 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2078,6 +2078,14 @@ void __meminit vmemmap_populate_print_last(void)
 	}
 }
 
+void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
+{
+}
+
+void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
+{
+}
+
 void register_page_bootmem_memmap(unsigned long section_nr,
 				  struct page *start_page, unsigned long size)
 {
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 0075592..4e8f8a4 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1138,6 +1138,125 @@ vmemmap_populate(struct page *start_page, unsigned long size, int node)
 	return 0;
 }
 
+#define PAGE_INUSE 0xFD
+
+unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
+			    struct page **pp, int *page_size)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+	void *page_addr;
+	unsigned long next;
+
+	*pp = NULL;
+
+	pgd = pgd_offset_k(addr);
+	if (pgd_none(*pgd))
+		return pgd_addr_end(addr, end);
+
+	pud = pud_offset(pgd, addr);
+	if (pud_none(*pud))
+		return pud_addr_end(addr, end);
+
+	if (!cpu_has_pse) {
+		next = (addr + PAGE_SIZE) & PAGE_MASK;
+		pmd = pmd_offset(pud, addr);
+		if (pmd_none(*pmd))
+			return next;
+
+		pte = pte_offset_kernel(pmd, addr);
+		if (pte_none(*pte))
+			return next;
+
+		*page_size = PAGE_SIZE;
+		*pp = pte_page(*pte);
+	} else {
+		next = pmd_addr_end(addr, end);
+
+		pmd = pmd_offset(pud, addr);
+		if (pmd_none(*pmd))
+			return next;
+
+		*page_size = PMD_SIZE;
+		*pp = pmd_page(*pmd);
+	}
+
+	/*
+	 * Removed page structs are filled with 0xFD.
+	 */
+	memset((void *)addr, PAGE_INUSE, next - addr);
+
+	page_addr = page_address(*pp);
+
+	/*
+	 * Check the page is filled with 0xFD or not.
+	 * memchr_inv() returns the address. In this case, we cannot
+	 * clear PTE/PUD entry, since the page is used by other.
+	 * So we cannot also free the page.
+	 *
+	 * memchr_inv() returns NULL. In this case, we can clear
+	 * PTE/PUD entry, since the page is not used by other.
+	 * So we can also free the page.
+	 */
+	if (memchr_inv(page_addr, PAGE_INUSE, *page_size)) {
+		*pp = NULL;
+		return next;
+	}
+
+	if (!cpu_has_pse)
+		pte_clear(&init_mm, addr, pte);
+	else
+		pmd_clear(pmd);
+
+	return next;
+}
+
+void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
+{
+	unsigned long addr = (unsigned long)memmap;
+	unsigned long end = (unsigned long)(memmap + nr_pages);
+	unsigned long next;
+	struct page *page;
+	int page_size;
+
+	for (; addr < end; addr = next) {
+		page = NULL;
+		page_size = 0;
+		next = find_and_clear_pte_page(addr, end, &page, &page_size);
+		if (!page)
+			continue;
+
+		free_pages((unsigned long)page_address(page),
+			    get_order(page_size));
+		__flush_tlb_one(addr);
+	}
+}
+
+void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
+{
+	unsigned long addr = (unsigned long)memmap;
+	unsigned long end = (unsigned long)(memmap + nr_pages);
+	unsigned long next;
+	struct page *page;
+	int page_size;
+	unsigned long magic;
+
+	for (; addr < end; addr = next) {
+		page = NULL;
+		page_size = 0;
+		next = find_and_clear_pte_page(addr, end, &page, &page_size);
+		if (!page)
+			continue;
+
+		magic = (unsigned long) page->lru.next;
+		if (magic == SECTION_INFO)
+			put_page_bootmem(page);
+		flush_tlb_kernel_range(addr, end);
+	}
+}
+
 void register_page_bootmem_memmap(unsigned long section_nr,
 				  struct page *start_page, unsigned long size)
 {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c607913..fb0d1fc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1620,6 +1620,8 @@ int vmemmap_populate(struct page *start_page, unsigned long pages, int node);
 void vmemmap_populate_print_last(void);
 void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
 				  unsigned long size);
+void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages);
+void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages);
 
 enum mf_flags {
 	MF_COUNT_INCREASED = 1 << 0,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d8495ff..3aa0766 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -308,19 +308,6 @@ static int __meminit __add_section(int nid, struct zone *zone,
 	return register_new_memory(nid, __pfn_to_section(phys_start_pfn));
 }
 
-#ifdef CONFIG_SPARSEMEM_VMEMMAP
-static int __remove_section(struct zone *zone, struct mem_section *ms)
-{
-	int ret = -EINVAL;
-
-	if (!valid_section(ms))
-		return ret;
-
-	ret = unregister_memory_section(ms);
-
-	return ret;
-}
-#else
 static int __remove_section(struct zone *zone, struct mem_section *ms)
 {
 	unsigned long flags;
@@ -337,9 +324,9 @@ static int __remove_section(struct zone *zone, struct mem_section *ms)
 	pgdat_resize_lock(pgdat, &flags);
 	sparse_remove_one_section(zone, ms);
 	pgdat_resize_unlock(pgdat, &flags);
-	return 0;
+
+	return ret;
 }
-#endif
 
 /*
  * Reasonably generic function for adding memory.  It is
diff --git a/mm/sparse.c b/mm/sparse.c
index fac95f2..ab9d755 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -613,12 +613,13 @@ static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
 	/* This will make the necessary allocations eventually. */
 	return sparse_mem_map_populate(pnum, nid);
 }
-static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
+static void __kfree_section_memmap(struct page *page, unsigned long nr_pages)
 {
-	return; /* XXX: Not implemented yet */
+	vmemmap_kfree(page, nr_pages);
 }
 static void free_map_bootmem(struct page *page, unsigned long nr_pages)
 {
+	vmemmap_free_bootmem(page, nr_pages);
 }
 #else
 static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
-- 
1.7.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC v8 PATCH 17/20] memory_hotplug: clear zone when the memory is removed
  2012-08-28 10:00 [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory wency
                   ` (15 preceding siblings ...)
  2012-08-28 10:00 ` [RFC v8 PATCH 16/20] memory-hotplug: free memmap " wency
@ 2012-08-28 10:00 ` wency
  2012-08-28 10:00 ` [RFC v8 PATCH 18/20] memory-hotplug: add node_device_release wency
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: wency @ 2012-08-28 10:00 UTC (permalink / raw)
  To: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux
  Cc: rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim, akpm,
	kosaki.motohiro, isimatu.yasuaki, Wen Congyang

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

When a memory is added, we update zone's and pgdat's start_pfn and spanned_pages
in the function __add_zone(). So we should revert these when the memory is
removed. Add a new function __remove_zone() to do this.

CC: David Rientjes <rientjes@google.com>
CC: Jiang Liu <liuj97@gmail.com>
CC: Len Brown <len.brown@intel.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 mm/memory_hotplug.c |  207 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 207 insertions(+), 0 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3aa0766..493298f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -308,10 +308,213 @@ static int __meminit __add_section(int nid, struct zone *zone,
 	return register_new_memory(nid, __pfn_to_section(phys_start_pfn));
 }
 
+/* find the smallest valid pfn in the range [start_pfn, end_pfn) */
+static int find_smallest_section_pfn(int nid, struct zone *zone,
+				     unsigned long start_pfn,
+				     unsigned long end_pfn)
+{
+	struct mem_section *ms;
+
+	for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SECTION) {
+		ms = __pfn_to_section(start_pfn);
+
+		if (unlikely(!valid_section(ms)))
+			continue;
+
+		if (unlikely(pfn_to_nid(start_pfn)) != nid)
+			continue;
+
+		if (zone && zone != page_zone(pfn_to_page(start_pfn)))
+			continue;
+
+		return start_pfn;
+	}
+
+	return 0;
+}
+
+/* find the biggest valid pfn in the range [start_pfn, end_pfn). */
+static int find_biggest_section_pfn(int nid, struct zone *zone,
+				    unsigned long start_pfn,
+				    unsigned long end_pfn)
+{
+	struct mem_section *ms;
+	unsigned long pfn;
+
+	/* pfn is the end pfn of a memory section. */
+	pfn = end_pfn - 1;
+	for (; pfn >= start_pfn; pfn -= PAGES_PER_SECTION) {
+		ms = __pfn_to_section(pfn);
+
+		if (unlikely(!valid_section(ms)))
+			continue;
+
+		if (unlikely(pfn_to_nid(pfn)) != nid)
+			continue;
+
+		if (zone && zone != page_zone(pfn_to_page(pfn)))
+			continue;
+
+		return pfn;
+	}
+
+	return 0;
+}
+
+static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
+			     unsigned long end_pfn)
+{
+	unsigned long zone_start_pfn =  zone->zone_start_pfn;
+	unsigned long zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
+	unsigned long pfn;
+	struct mem_section *ms;
+	int nid = zone_to_nid(zone);
+
+	zone_span_writelock(zone);
+	if (zone_start_pfn == start_pfn) {
+		/*
+		 * If the section is smallest section in the zone, it need
+		 * shrink zone->zone_start_pfn and zone->zone_spanned_pages.
+		 * In this case, we find second smallest valid mem_section
+		 * for shrinking zone.
+		 */
+		pfn = find_smallest_section_pfn(nid, zone, end_pfn,
+						zone_end_pfn);
+		if (pfn) {
+			zone->zone_start_pfn = pfn;
+			zone->spanned_pages = zone_end_pfn - pfn;
+		}
+	} else if (zone_end_pfn == end_pfn) {
+		/*
+		 * If the section is biggest section in the zone, it need
+		 * shrink zone->spanned_pages.
+		 * In this case, we find second biggest valid mem_section for
+		 * shrinking zone.
+		 */
+		pfn = find_biggest_section_pfn(nid, zone, zone_start_pfn,
+					       start_pfn);
+		if (pfn)
+			zone->spanned_pages = pfn - zone_start_pfn + 1;
+	}
+
+	/*
+	 * The section is not biggest or smallest mem_section in the zone, it
+	 * only creates a hole in the zone. So in this case, we need not
+	 * change the zone. But perhaps, the zone has only hole data. Thus
+	 * it check the zone has only hole or not.
+	 */
+	pfn = zone_start_pfn;
+	for (; pfn < zone_end_pfn; pfn += PAGES_PER_SECTION) {
+		ms = __pfn_to_section(pfn);
+
+		if (unlikely(!valid_section(ms)))
+			continue;
+
+		if (page_zone(pfn_to_page(pfn)) != zone)
+			continue;
+
+		 /* If the section is current section, it continues the loop */
+		if (start_pfn == pfn)
+			continue;
+
+		/* If we find valid section, we have nothing to do */
+		zone_span_writeunlock(zone);
+		return;
+	}
+
+	/* The zone has no valid section */
+	zone->zone_start_pfn = 0;
+	zone->spanned_pages = 0;
+	zone_span_writeunlock(zone);
+}
+
+static void shrink_pgdat_span(struct pglist_data *pgdat,
+			      unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pgdat_start_pfn =  pgdat->node_start_pfn;
+	unsigned long pgdat_end_pfn =
+		pgdat->node_start_pfn + pgdat->node_spanned_pages;
+	unsigned long pfn;
+	struct mem_section *ms;
+	int nid = pgdat->node_id;
+
+	if (pgdat_start_pfn == start_pfn) {
+		/*
+		 * If the section is smallest section in the pgdat, it need
+		 * shrink pgdat->node_start_pfn and pgdat->node_spanned_pages.
+		 * In this case, we find second smallest valid mem_section
+		 * for shrinking zone.
+		 */
+		pfn = find_smallest_section_pfn(nid, NULL, end_pfn,
+						pgdat_end_pfn);
+		if (pfn) {
+			pgdat->node_start_pfn = pfn;
+			pgdat->node_spanned_pages = pgdat_end_pfn - pfn;
+		}
+	} else if (pgdat_end_pfn == end_pfn) {
+		/*
+		 * If the section is biggest section in the pgdat, it need
+		 * shrink pgdat->node_spanned_pages.
+		 * In this case, we find second biggest valid mem_section for
+		 * shrinking zone.
+		 */
+		pfn = find_biggest_section_pfn(nid, NULL, pgdat_start_pfn,
+					       start_pfn);
+		if (pfn)
+			pgdat->node_spanned_pages = pfn - pgdat_start_pfn + 1;
+	}
+
+	/*
+	 * If the section is not biggest or smallest mem_section in the pgdat,
+	 * it only creates a hole in the pgdat. So in this case, we need not
+	 * change the pgdat.
+	 * But perhaps, the pgdat has only hole data. Thus it check the pgdat
+	 * has only hole or not.
+	 */
+	pfn = pgdat_start_pfn;
+	for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SECTION) {
+		ms = __pfn_to_section(pfn);
+
+		if (unlikely(!valid_section(ms)))
+			continue;
+
+		if (pfn_to_nid(pfn) != nid)
+			continue;
+
+		 /* If the section is current section, it continues the loop */
+		if (start_pfn == pfn)
+			continue;
+
+		/* If we find valid section, we have nothing to do */
+		return;
+	}
+
+	/* The pgdat has no valid section */
+	pgdat->node_start_pfn = 0;
+	pgdat->node_spanned_pages = 0;
+}
+
+static void __remove_zone(struct zone *zone, unsigned long start_pfn)
+{
+	struct pglist_data *pgdat = zone->zone_pgdat;
+	int nr_pages = PAGES_PER_SECTION;
+	int zone_type;
+	unsigned long flags;
+
+	zone_type = zone - pgdat->node_zones;
+
+	pgdat_resize_lock(zone->zone_pgdat, &flags);
+	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
+	shrink_pgdat_span(pgdat, start_pfn, start_pfn + nr_pages);
+	pgdat_resize_unlock(zone->zone_pgdat, &flags);
+}
+
 static int __remove_section(struct zone *zone, struct mem_section *ms)
 {
 	unsigned long flags;
 	struct pglist_data *pgdat = zone->zone_pgdat;
+	unsigned long start_pfn;
+	int scn_nr;
 	int ret = -EINVAL;
 
 	if (!valid_section(ms))
@@ -321,6 +524,10 @@ static int __remove_section(struct zone *zone, struct mem_section *ms)
 	if (ret)
 		return ret;
 
+	scn_nr = __section_nr(ms);
+	start_pfn = section_nr_to_pfn(scn_nr);
+	__remove_zone(zone, start_pfn);
+
 	pgdat_resize_lock(pgdat, &flags);
 	sparse_remove_one_section(zone, ms);
 	pgdat_resize_unlock(pgdat, &flags);
-- 
1.7.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC v8 PATCH 18/20] memory-hotplug: add node_device_release
  2012-08-28 10:00 [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory wency
                   ` (16 preceding siblings ...)
  2012-08-28 10:00 ` [RFC v8 PATCH 17/20] memory_hotplug: clear zone when the memory is removed wency
@ 2012-08-28 10:00 ` wency
  2012-08-28 10:00 ` [RFC v8 PATCH 19/20] memory-hotplug: remove sysfs file of node wency
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: wency @ 2012-08-28 10:00 UTC (permalink / raw)
  To: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux
  Cc: rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim, akpm,
	kosaki.motohiro, isimatu.yasuaki, Wen Congyang

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

When calling unregister_node(), the function shows following message at
device_release().

Device 'node2' does not have a release() function, it is broken and must be
fixed.

So the patch implements node_device_release()

CC: David Rientjes <rientjes@google.com>
CC: Jiang Liu <liuj97@gmail.com>
CC: Len Brown <len.brown@intel.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 drivers/base/node.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index af1a177..07523fb 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -252,6 +252,16 @@ static inline void hugetlb_register_node(struct node *node) {}
 static inline void hugetlb_unregister_node(struct node *node) {}
 #endif
 
+static void node_device_release(struct device *dev)
+{
+	struct node *node_dev = to_node(dev);
+
+#if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HUGETLBFS)
+	flush_work(&node_dev->node_work);
+#endif
+
+	memset(node_dev, 0, sizeof(struct node));
+}
 
 /*
  * register_node - Setup a sysfs device for a node.
@@ -265,6 +275,7 @@ int register_node(struct node *node, int num, struct node *parent)
 
 	node->dev.id = num;
 	node->dev.bus = &node_subsys;
+	node->dev.release = node_device_release;
 	error = device_register(&node->dev);
 
 	if (!error){
-- 
1.7.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC v8 PATCH 19/20] memory-hotplug: remove sysfs file of node
  2012-08-28 10:00 [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory wency
                   ` (17 preceding siblings ...)
  2012-08-28 10:00 ` [RFC v8 PATCH 18/20] memory-hotplug: add node_device_release wency
@ 2012-08-28 10:00 ` wency
  2012-08-28 10:00 ` [RFC v8 PATCH 20/20] memory-hotplug: clear hwpoisoned flag when onlining pages wency
  2012-08-31 20:49 ` [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory Andrew Morton
  20 siblings, 0 replies; 43+ messages in thread
From: wency @ 2012-08-28 10:00 UTC (permalink / raw)
  To: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux
  Cc: rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim, akpm,
	kosaki.motohiro, isimatu.yasuaki, Wen Congyang

From: Wen Congyang <wency@cn.fujitsu.com>

This patch introduces a new function try_offline_node() to
remove sysfs file of node when all memory sections of this
node are removed. If some memory sections of this node are
not removed, this function does nothing.

CC: David Rientjes <rientjes@google.com>
CC: Jiang Liu <liuj97@gmail.com>
CC: Len Brown <len.brown@intel.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 mm/memory_hotplug.c |   33 +++++++++++++++++++++++++++++++++
 1 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 493298f..fb8af64 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1286,6 +1286,37 @@ int offline_memory(u64 start, u64 size)
 	return 0;
 }
 
+/* offline the node if all memory sections of this node are removed */
+static void try_offline_node(int nid)
+{
+	unsigned long start_pfn = NODE_DATA(nid)->node_start_pfn;
+	unsigned long end_pfn = start_pfn + NODE_DATA(nid)->node_spanned_pages;
+	unsigned long pfn;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
+		unsigned long section_nr = pfn_to_section_nr(pfn);
+
+		if (!present_section_nr(section_nr))
+			continue;
+
+		if (pfn_to_nid(pfn) != nid)
+			continue;
+
+		/*
+		 * some memory sections of this node are not removed, and we
+		 * can't offline node now.
+		 */
+		return;
+	}
+
+	/*
+	 * all memory sections of this node are removed, we can offline this
+	 * node now.
+	 */
+	node_set_offline(nid);
+	unregister_one_node(nid);
+}
+
 int __ref remove_memory(int nid, u64 start, u64 size)
 {
 	int ret = 0;
@@ -1306,6 +1337,8 @@ int __ref remove_memory(int nid, u64 start, u64 size)
 	firmware_map_remove(start, start + size, "System RAM");
 
 	arch_remove_memory(start, size);
+
+	try_offline_node(nid);
 out:
 	unlock_memory_hotplug();
 	return ret;
-- 
1.7.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC v8 PATCH 20/20] memory-hotplug: clear hwpoisoned flag when onlining pages
  2012-08-28 10:00 [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory wency
                   ` (18 preceding siblings ...)
  2012-08-28 10:00 ` [RFC v8 PATCH 19/20] memory-hotplug: remove sysfs file of node wency
@ 2012-08-28 10:00 ` wency
  2012-08-31 20:49 ` [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory Andrew Morton
  20 siblings, 0 replies; 43+ messages in thread
From: wency @ 2012-08-28 10:00 UTC (permalink / raw)
  To: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux
  Cc: rientjes, liuj97, len.brown, benh, paulus, cl, minchan.kim, akpm,
	kosaki.motohiro, isimatu.yasuaki, Wen Congyang

From: Wen Congyang <wency@cn.fujitsu.com>

hwpoisoned may set when we offline a page by the sysfs interface
/sys/devices/system/memory/soft_offline_page or
/sys/devices/system/memory/hard_offline_page. If we don't clear
this flag when onlining pages, this page can't be freed, and will
not in free list. So we can't offline these pages again. So we
should clear this flag when onlining pages.

CC: David Rientjes <rientjes@google.com>
CC: Jiang Liu <liuj97@gmail.com>
CC: Len Brown <len.brown@intel.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 mm/memory_hotplug.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index fb8af64..85603c4 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -661,6 +661,11 @@ EXPORT_SYMBOL_GPL(__online_page_increment_counters);
 
 void __online_page_free(struct page *page)
 {
+#ifdef CONFIG_MEMORY_FAILURE
+	/* The page may be marked HWPoisoned by soft/hard offline page */
+	ClearPageHWPoison(page);
+#endif
+
 	ClearPageReserved(page);
 	init_page_count(page);
 	__free_page(page);
-- 
1.7.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory
  2012-08-28 10:00 [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory wency
                   ` (19 preceding siblings ...)
  2012-08-28 10:00 ` [RFC v8 PATCH 20/20] memory-hotplug: clear hwpoisoned flag when onlining pages wency
@ 2012-08-31 20:49 ` Andrew Morton
  2012-09-10  1:46   ` Yasuaki Ishimatsu
  20 siblings, 1 reply; 43+ messages in thread
From: Andrew Morton @ 2012-08-31 20:49 UTC (permalink / raw)
  To: wency
  Cc: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux, rientjes,
	liuj97, len.brown, benh, paulus, cl, minchan.kim,
	kosaki.motohiro, isimatu.yasuaki

On Tue, 28 Aug 2012 18:00:07 +0800
wency@cn.fujitsu.com wrote:

> This patch series aims to support physical memory hot-remove.

Have you had much review and testing feedback yet?

> The patches can free/remove the following things:
> 
>   - acpi_memory_info                          : [RFC PATCH 4/19]
>   - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
>   - iomem_resource                            : [RFC PATCH 9/19]
>   - mem_section and related sysfs files       : [RFC PATCH 10-11, 13-16/19]
>   - page table of removed memory              : [RFC PATCH 12/19]
>   - node and related sysfs files              : [RFC PATCH 18-19/19]
> 
> If you find lack of function for physical memory hot-remove, please let me
> know.

I doubt if many people have hardware which permits physical memory
removal?  How would you suggest that people with regular hardware can
test these chagnes?

> Known problems:
> 1. memory can't be offlined when CONFIG_MEMCG is selected.

That's quite a problem!  Do you have a description of why this is the
case, and a plan for fixing it?


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC v8 PATCH 04/20] memory-hotplug: offline and remove memory when removing the memory device
  2012-08-28 10:00 ` [RFC v8 PATCH 04/20] memory-hotplug: offline and remove memory when removing the memory device wency
@ 2012-08-31 20:55   ` Andrew Morton
  2012-09-03  1:30     ` Wen Congyang
  0 siblings, 1 reply; 43+ messages in thread
From: Andrew Morton @ 2012-08-31 20:55 UTC (permalink / raw)
  To: wency
  Cc: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux, rientjes,
	liuj97, len.brown, benh, paulus, cl, minchan.kim,
	kosaki.motohiro, isimatu.yasuaki

On Tue, 28 Aug 2012 18:00:11 +0800
wency@cn.fujitsu.com wrote:

> +int remove_memory(int nid, u64 start, u64 size)
> +{
> +	int ret = -EBUSY;
> +	lock_memory_hotplug();
> +	/*
> +	 * The memory might become online by other task, even if you offine it.
> +	 * So we check whether the cpu has been onlined or not.

I think you meant "memory", not "cpu".

Actually, "check whether any part of this memory range has been
onlined" would be better.  If that is accurate ;)

> +	 */
> +	if (!is_memblk_offline(start, size)) {
> +		pr_warn("memory removing [mem %#010llx-%#010llx] failed, "
> +			"because the memmory range is online\n",
> +			start, start + size);
> +		ret = -EAGAIN;
> +	}
> +
> +	unlock_memory_hotplug();
> +	return ret;
> +
> +}
> +EXPORT_SYMBOL_GPL(remove_memory);

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC v8 PATCH 08/20] memory-hotplug: remove /sys/firmware/memmap/X sysfs
  2012-08-28 10:00 ` [RFC v8 PATCH 08/20] memory-hotplug: remove /sys/firmware/memmap/X sysfs wency
@ 2012-08-31 21:06   ` Andrew Morton
  2012-09-03  5:51     ` Wen Congyang
  2012-09-03  7:31     ` Wen Congyang
  0 siblings, 2 replies; 43+ messages in thread
From: Andrew Morton @ 2012-08-31 21:06 UTC (permalink / raw)
  To: wency
  Cc: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux, rientjes,
	liuj97, len.brown, benh, paulus, cl, minchan.kim,
	kosaki.motohiro, isimatu.yasuaki

On Tue, 28 Aug 2012 18:00:15 +0800
wency@cn.fujitsu.com wrote:

> From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> 
> When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type}
> sysfs files are created. But there is no code to remove these files. The patch
> implements the function to remove them.
> 
> Note : The code does not free firmware_map_entry since there is no way to free
>        memory which is allocated by bootmem.
> 
> ....
>
> +#define to_memmap_entry(obj) container_of(obj, struct firmware_map_entry, kobj)

It would be better to implement this as an inlined C function.  That
has improved type safety and improved readability.

> +static void release_firmware_map_entry(struct kobject *kobj)
> +{
> +	struct firmware_map_entry *entry = to_memmap_entry(kobj);
> +	struct page *page;
> +
> +	page = virt_to_page(entry);
> +	if (PageSlab(page) || PageCompound(page))

That PageCompound() test looks rather odd.  Why is this done?

> +		kfree(entry);
> +
> +	/* There is no way to free memory allocated from bootmem*/
> +}

This function is a bit ugly - poking around in page flags to determine
whether or not the memory came from bootmem.  It would be cleaner to
use a separate boolean.  Although I guess we can live with it as you
have it here.

>  static struct kobj_type memmap_ktype = {
> +	.release	= release_firmware_map_entry,
>  	.sysfs_ops	= &memmap_attr_ops,
>  	.default_attrs	= def_attrs,
>  };
> @@ -123,6 +139,16 @@ static int firmware_map_add_entry(u64 start, u64 end,
>  	return 0;
>  }
>  
> +/**
> + * firmware_map_remove_entry() - Does the real work to remove a firmware
> + * memmap entry.
> + * @entry: removed entry.
> + **/
> +static inline void firmware_map_remove_entry(struct firmware_map_entry *entry)
> +{
> +	list_del(&entry->list);
> +}

Is there no locking  to protect that list?

>  /*
>   * Add memmap entry on sysfs
>   */
> @@ -144,6 +170,31 @@ static int add_sysfs_fw_map_entry(struct firmware_map_entry *entry)
>  	return 0;
>  }
>  
> +/*
> + * Remove memmap entry on sysfs
> + */
> +static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry *entry)
> +{
> +	kobject_put(&entry->kobj);
> +}
> +
> +/*
> + * Search memmap entry
> + */
> +
> +struct firmware_map_entry * __meminit
> +find_firmware_map_entry(u64 start, u64 end, const char *type)

A better name would be firmware_map_find_entry().  To retain the (good)
convention that symbols exported from here all start with
"firmware_map_".

> +{
> +	struct firmware_map_entry *entry;
> +
> +	list_for_each_entry(entry, &map_entries, list)
> +		if ((entry->start == start) && (entry->end == end) &&
> +		    (!strcmp(entry->type, type)))
> +			return entry;
> +
> +	return NULL;
> +}
> +
>  /**
>   * firmware_map_add_hotplug() - Adds a firmware mapping entry when we do
>   * memory hotplug.
> @@ -196,6 +247,32 @@ int __init firmware_map_add_early(u64 start, u64 end, const char *type)
>  	return firmware_map_add_entry(start, end, type, entry);
>  }
>  
> +/**
> + * firmware_map_remove() - remove a firmware mapping entry
> + * @start: Start of the memory range.
> + * @end:   End of the memory range.
> + * @type:  Type of the memory range.
> + *
> + * removes a firmware mapping entry.
> + *
> + * Returns 0 on success, or -EINVAL if no entry.
> + **/
> +int __meminit firmware_map_remove(u64 start, u64 end, const char *type)
> +{
> +	struct firmware_map_entry *entry;
> +
> +	entry = find_firmware_map_entry(start, end - 1, type);
> +	if (!entry)
> +		return -EINVAL;
> +
> +	firmware_map_remove_entry(entry);
> +
> +	/* remove the memmap entry */
> +	remove_sysfs_fw_map_entry(entry);
> +
> +	return 0;
> +}

Again, the lack of locking looks bad.

> ...
>
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1052,9 +1052,9 @@ int offline_memory(u64 start, u64 size)
>  	return 0;
>  }
>  
> -int remove_memory(int nid, u64 start, u64 size)
> +int __ref remove_memory(int nid, u64 start, u64 size)

Why was __ref added?

>  {
> -	int ret = -EBUSY;
> +	int ret = 0;
>  	lock_memory_hotplug();
>  	/*
>  	 * The memory might become online by other task, even if you offine it.
>
> ...
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC v8 PATCH 13/20] memory-hotplug: check page type in get_page_bootmem
  2012-08-28 10:00 ` [RFC v8 PATCH 13/20] memory-hotplug: check page type in get_page_bootmem wency
@ 2012-08-31 21:30   ` Andrew Morton
  2012-09-04  3:46     ` Wen Congyang
  0 siblings, 1 reply; 43+ messages in thread
From: Andrew Morton @ 2012-08-31 21:30 UTC (permalink / raw)
  To: wency
  Cc: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux, rientjes,
	liuj97, len.brown, benh, paulus, cl, minchan.kim,
	kosaki.motohiro, isimatu.yasuaki

On Tue, 28 Aug 2012 18:00:20 +0800
wency@cn.fujitsu.com wrote:

> From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> 
> There is a possibility that get_page_bootmem() is called to the same page many
> times. So when get_page_bootmem is called to the same page, the function only
> increments page->_count.

I really don't understand this explanation, even after having looked at
the code.  Can you please have another attempt at the changelog?

> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -95,10 +95,17 @@ static void release_memory_resource(struct resource *res)
>  static void get_page_bootmem(unsigned long info,  struct page *page,
>  			     unsigned long type)
>  {
> -	page->lru.next = (struct list_head *) type;
> -	SetPagePrivate(page);
> -	set_page_private(page, info);
> -	atomic_inc(&page->_count);
> +	unsigned long page_type;
> +
> +	page_type = (unsigned long) page->lru.next;
> +	if (page_type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
> +	    page_type > MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE){
> +		page->lru.next = (struct list_head *) type;
> +		SetPagePrivate(page);
> +		set_page_private(page, info);
> +		atomic_inc(&page->_count);
> +	} else
> +		atomic_inc(&page->_count);
>  }

And a code comment which explains what is going on would be good.  As
is always the case ;)


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC v8 PATCH 04/20] memory-hotplug: offline and remove memory when removing the memory device
  2012-08-31 20:55   ` Andrew Morton
@ 2012-09-03  1:30     ` Wen Congyang
  0 siblings, 0 replies; 43+ messages in thread
From: Wen Congyang @ 2012-09-03  1:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux, rientjes,
	liuj97, len.brown, benh, paulus, cl, minchan.kim,
	kosaki.motohiro, isimatu.yasuaki

At 09/01/2012 04:55 AM, Andrew Morton Wrote:
> On Tue, 28 Aug 2012 18:00:11 +0800
> wency@cn.fujitsu.com wrote:
> 
>> +int remove_memory(int nid, u64 start, u64 size)
>> +{
>> +	int ret = -EBUSY;
>> +	lock_memory_hotplug();
>> +	/*
>> +	 * The memory might become online by other task, even if you offine it.
>> +	 * So we check whether the cpu has been onlined or not.
> 
> I think you meant "memory", not "cpu".

Yes. I will fix it.

Thanks
Wen Congyang

> 
> Actually, "check whether any part of this memory range has been
> onlined" would be better.  If that is accurate ;)
> 
>> +	 */
>> +	if (!is_memblk_offline(start, size)) {
>> +		pr_warn("memory removing [mem %#010llx-%#010llx] failed, "
>> +			"because the memmory range is online\n",
>> +			start, start + size);
>> +		ret = -EAGAIN;
>> +	}
>> +
>> +	unlock_memory_hotplug();
>> +	return ret;
>> +
>> +}
>> +EXPORT_SYMBOL_GPL(remove_memory);
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC v8 PATCH 08/20] memory-hotplug: remove /sys/firmware/memmap/X sysfs
  2012-08-31 21:06   ` Andrew Morton
@ 2012-09-03  5:51     ` Wen Congyang
  2012-09-04 23:16       ` Andrew Morton
  2012-09-03  7:31     ` Wen Congyang
  1 sibling, 1 reply; 43+ messages in thread
From: Wen Congyang @ 2012-09-03  5:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux, rientjes,
	liuj97, len.brown, benh, paulus, cl, minchan.kim,
	kosaki.motohiro, isimatu.yasuaki

At 09/01/2012 05:06 AM, Andrew Morton Wrote:
> On Tue, 28 Aug 2012 18:00:15 +0800
> wency@cn.fujitsu.com wrote:
> 
>> From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>
>> When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type}
>> sysfs files are created. But there is no code to remove these files. The patch
>> implements the function to remove them.
>>
>> Note : The code does not free firmware_map_entry since there is no way to free
>>        memory which is allocated by bootmem.
>>
>> ....
>>
>> +#define to_memmap_entry(obj) container_of(obj, struct firmware_map_entry, kobj)
> 
> It would be better to implement this as an inlined C function.  That
> has improved type safety and improved readability.
> 
>> +static void release_firmware_map_entry(struct kobject *kobj)
>> +{
>> +	struct firmware_map_entry *entry = to_memmap_entry(kobj);
>> +	struct page *page;
>> +
>> +	page = virt_to_page(entry);
>> +	if (PageSlab(page) || PageCompound(page))
> 
> That PageCompound() test looks rather odd.  Why is this done?

Liu Jiang and Christoph Lameter discussed how to find slab page
in this mail:
https://lkml.org/lkml/2012/7/6/333.

> 
>> +		kfree(entry);
>> +
>> +	/* There is no way to free memory allocated from bootmem*/
>> +}
> 
> This function is a bit ugly - poking around in page flags to determine
> whether or not the memory came from bootmem.  It would be cleaner to
> use a separate boolean.  Although I guess we can live with it as you
> have it here.
> 
>>  static struct kobj_type memmap_ktype = {
>> +	.release	= release_firmware_map_entry,
>>  	.sysfs_ops	= &memmap_attr_ops,
>>  	.default_attrs	= def_attrs,
>>  };
>> @@ -123,6 +139,16 @@ static int firmware_map_add_entry(u64 start, u64 end,
>>  	return 0;
>>  }
>>  
>> +/**
>> + * firmware_map_remove_entry() - Does the real work to remove a firmware
>> + * memmap entry.
>> + * @entry: removed entry.
>> + **/
>> +static inline void firmware_map_remove_entry(struct firmware_map_entry *entry)
>> +{
>> +	list_del(&entry->list);
>> +}
> 
> Is there no locking  to protect that list?
> 
>>  /*
>>   * Add memmap entry on sysfs
>>   */
>> @@ -144,6 +170,31 @@ static int add_sysfs_fw_map_entry(struct firmware_map_entry *entry)
>>  	return 0;
>>  }
>>  
>> +/*
>> + * Remove memmap entry on sysfs
>> + */
>> +static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry *entry)
>> +{
>> +	kobject_put(&entry->kobj);
>> +}
>> +
>> +/*
>> + * Search memmap entry
>> + */
>> +
>> +struct firmware_map_entry * __meminit
>> +find_firmware_map_entry(u64 start, u64 end, const char *type)
> 
> A better name would be firmware_map_find_entry().  To retain the (good)
> convention that symbols exported from here all start with
> "firmware_map_".

OK.

> 
>> +{
>> +	struct firmware_map_entry *entry;
>> +
>> +	list_for_each_entry(entry, &map_entries, list)
>> +		if ((entry->start == start) && (entry->end == end) &&
>> +		    (!strcmp(entry->type, type)))
>> +			return entry;
>> +
>> +	return NULL;
>> +}
>> +
>>  /**
>>   * firmware_map_add_hotplug() - Adds a firmware mapping entry when we do
>>   * memory hotplug.
>> @@ -196,6 +247,32 @@ int __init firmware_map_add_early(u64 start, u64 end, const char *type)
>>  	return firmware_map_add_entry(start, end, type, entry);
>>  }
>>  
>> +/**
>> + * firmware_map_remove() - remove a firmware mapping entry
>> + * @start: Start of the memory range.
>> + * @end:   End of the memory range.
>> + * @type:  Type of the memory range.
>> + *
>> + * removes a firmware mapping entry.
>> + *
>> + * Returns 0 on success, or -EINVAL if no entry.
>> + **/
>> +int __meminit firmware_map_remove(u64 start, u64 end, const char *type)
>> +{
>> +	struct firmware_map_entry *entry;
>> +
>> +	entry = find_firmware_map_entry(start, end - 1, type);
>> +	if (!entry)
>> +		return -EINVAL;
>> +
>> +	firmware_map_remove_entry(entry);
>> +
>> +	/* remove the memmap entry */
>> +	remove_sysfs_fw_map_entry(entry);
>> +
>> +	return 0;
>> +}
> 
> Again, the lack of locking looks bad.
> 
>> ...
>>
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1052,9 +1052,9 @@ int offline_memory(u64 start, u64 size)
>>  	return 0;
>>  }
>>  
>> -int remove_memory(int nid, u64 start, u64 size)
>> +int __ref remove_memory(int nid, u64 start, u64 size)
> 
> Why was __ref added?

Hmm, firmware_map_remove() was put in meminit section, and we call it
in this function, so __ref is added here.

Thanks
Wen Congyang

> 
>>  {
>> -	int ret = -EBUSY;
>> +	int ret = 0;
>>  	lock_memory_hotplug();
>>  	/*
>>  	 * The memory might become online by other task, even if you offine it.
>>
>> ...
>>
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC v8 PATCH 08/20] memory-hotplug: remove /sys/firmware/memmap/X sysfs
  2012-08-31 21:06   ` Andrew Morton
  2012-09-03  5:51     ` Wen Congyang
@ 2012-09-03  7:31     ` Wen Congyang
  1 sibling, 0 replies; 43+ messages in thread
From: Wen Congyang @ 2012-09-03  7:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux, rientjes,
	liuj97, len.brown, benh, paulus, cl, minchan.kim,
	kosaki.motohiro, isimatu.yasuaki

At 09/01/2012 05:06 AM, Andrew Morton Wrote:
> On Tue, 28 Aug 2012 18:00:15 +0800
> wency@cn.fujitsu.com wrote:
> 
>> From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>
>> When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type}
>> sysfs files are created. But there is no code to remove these files. The patch
>> implements the function to remove them.
>>
>> Note : The code does not free firmware_map_entry since there is no way to free
>>        memory which is allocated by bootmem.
>>
>> ....
>>
>> +#define to_memmap_entry(obj) container_of(obj, struct firmware_map_entry, kobj)
> 
> It would be better to implement this as an inlined C function.  That
> has improved type safety and improved readability.

Hmm, this macro is not a new macro. It is defined after the function
release_firmware_map_entry(). We just moved it here because we
need it in the function release_firmware_map_entry().

> 
>> +static void release_firmware_map_entry(struct kobject *kobj)
>> +{
>> +	struct firmware_map_entry *entry = to_memmap_entry(kobj);
>> +	struct page *page;
>> +
>> +	page = virt_to_page(entry);
>> +	if (PageSlab(page) || PageCompound(page))
> 
> That PageCompound() test looks rather odd.  Why is this done?
> 
>> +		kfree(entry);
>> +
>> +	/* There is no way to free memory allocated from bootmem*/
>> +}
> 
> This function is a bit ugly - poking around in page flags to determine
> whether or not the memory came from bootmem.  It would be cleaner to
> use a separate boolean.  Although I guess we can live with it as you
> have it here.
> 
>>  static struct kobj_type memmap_ktype = {
>> +	.release	= release_firmware_map_entry,
>>  	.sysfs_ops	= &memmap_attr_ops,
>>  	.default_attrs	= def_attrs,
>>  };
>> @@ -123,6 +139,16 @@ static int firmware_map_add_entry(u64 start, u64 end,
>>  	return 0;
>>  }
>>  
>> +/**
>> + * firmware_map_remove_entry() - Does the real work to remove a firmware
>> + * memmap entry.
>> + * @entry: removed entry.
>> + **/
>> +static inline void firmware_map_remove_entry(struct firmware_map_entry *entry)
>> +{
>> +	list_del(&entry->list);
>> +}
> 
> Is there no locking  to protect that list?

OK, I will add a lock to protect it.

Thanks
Wen Congyang

> 
>>  /*
>>   * Add memmap entry on sysfs
>>   */
>> @@ -144,6 +170,31 @@ static int add_sysfs_fw_map_entry(struct firmware_map_entry *entry)
>>  	return 0;
>>  }
>>  
>> +/*
>> + * Remove memmap entry on sysfs
>> + */
>> +static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry *entry)
>> +{
>> +	kobject_put(&entry->kobj);
>> +}
>> +
>> +/*
>> + * Search memmap entry
>> + */
>> +
>> +struct firmware_map_entry * __meminit
>> +find_firmware_map_entry(u64 start, u64 end, const char *type)
> 
> A better name would be firmware_map_find_entry().  To retain the (good)
> convention that symbols exported from here all start with
> "firmware_map_".
> 
>> +{
>> +	struct firmware_map_entry *entry;
>> +
>> +	list_for_each_entry(entry, &map_entries, list)
>> +		if ((entry->start == start) && (entry->end == end) &&
>> +		    (!strcmp(entry->type, type)))
>> +			return entry;
>> +
>> +	return NULL;
>> +}
>> +
>>  /**
>>   * firmware_map_add_hotplug() - Adds a firmware mapping entry when we do
>>   * memory hotplug.
>> @@ -196,6 +247,32 @@ int __init firmware_map_add_early(u64 start, u64 end, const char *type)
>>  	return firmware_map_add_entry(start, end, type, entry);
>>  }
>>  
>> +/**
>> + * firmware_map_remove() - remove a firmware mapping entry
>> + * @start: Start of the memory range.
>> + * @end:   End of the memory range.
>> + * @type:  Type of the memory range.
>> + *
>> + * removes a firmware mapping entry.
>> + *
>> + * Returns 0 on success, or -EINVAL if no entry.
>> + **/
>> +int __meminit firmware_map_remove(u64 start, u64 end, const char *type)
>> +{
>> +	struct firmware_map_entry *entry;
>> +
>> +	entry = find_firmware_map_entry(start, end - 1, type);
>> +	if (!entry)
>> +		return -EINVAL;
>> +
>> +	firmware_map_remove_entry(entry);
>> +
>> +	/* remove the memmap entry */
>> +	remove_sysfs_fw_map_entry(entry);
>> +
>> +	return 0;
>> +}
> 
> Again, the lack of locking looks bad.
> 
>> ...
>>
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1052,9 +1052,9 @@ int offline_memory(u64 start, u64 size)
>>  	return 0;
>>  }
>>  
>> -int remove_memory(int nid, u64 start, u64 size)
>> +int __ref remove_memory(int nid, u64 start, u64 size)
> 
> Why was __ref added?
> 
>>  {
>> -	int ret = -EBUSY;
>> +	int ret = 0;
>>  	lock_memory_hotplug();
>>  	/*
>>  	 * The memory might become online by other task, even if you offine it.
>>
>> ...
>>
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC v8 PATCH 13/20] memory-hotplug: check page type in get_page_bootmem
  2012-08-31 21:30   ` Andrew Morton
@ 2012-09-04  3:46     ` Wen Congyang
  2012-09-04  9:54       ` Yasuaki Ishimatsu
  0 siblings, 1 reply; 43+ messages in thread
From: Wen Congyang @ 2012-09-04  3:46 UTC (permalink / raw)
  To: Andrew Morton, isimatu.yasuaki
  Cc: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux, rientjes,
	liuj97, len.brown, benh, paulus, cl, minchan.kim,
	kosaki.motohiro

Hi, isimatu-san

At 09/01/2012 05:30 AM, Andrew Morton Wrote:
> On Tue, 28 Aug 2012 18:00:20 +0800
> wency@cn.fujitsu.com wrote:
> 
>> From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>
>> There is a possibility that get_page_bootmem() is called to the same page many
>> times. So when get_page_bootmem is called to the same page, the function only
>> increments page->_count.
> 
> I really don't understand this explanation, even after having looked at
> the code.  Can you please have another attempt at the changelog?

What is the problem that you want to fix? The function get_page_bootmem()
may be called to the same page more than once, but I don't find any problem
about current implementation.

Thanks
Wen Congyang

> 
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -95,10 +95,17 @@ static void release_memory_resource(struct resource *res)
>>  static void get_page_bootmem(unsigned long info,  struct page *page,
>>  			     unsigned long type)
>>  {
>> -	page->lru.next = (struct list_head *) type;
>> -	SetPagePrivate(page);
>> -	set_page_private(page, info);
>> -	atomic_inc(&page->_count);
>> +	unsigned long page_type;
>> +
>> +	page_type = (unsigned long) page->lru.next;
>> +	if (page_type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
>> +	    page_type > MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE){
>> +		page->lru.next = (struct list_head *) type;
>> +		SetPagePrivate(page);
>> +		set_page_private(page, info);
>> +		atomic_inc(&page->_count);
>> +	} else
>> +		atomic_inc(&page->_count);
>>  }
> 
> And a code comment which explains what is going on would be good.  As
> is always the case ;)
> 
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC v8 PATCH 13/20] memory-hotplug: check page type in get_page_bootmem
  2012-09-04  3:46     ` Wen Congyang
@ 2012-09-04  9:54       ` Yasuaki Ishimatsu
  0 siblings, 0 replies; 43+ messages in thread
From: Yasuaki Ishimatsu @ 2012-09-04  9:54 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Andrew Morton, x86, linux-mm, linux-kernel, linuxppc-dev,
	linux-acpi, linux-s390, linux-sh, linux-ia64, cmetcalf,
	sparclinux, rientjes, liuj97, len.brown, benh, paulus, cl,
	minchan.kim, kosaki.motohiro

Hi Wen,

2012/09/04 12:46, Wen Congyang wrote:
> Hi, isimatu-san
>
> At 09/01/2012 05:30 AM, Andrew Morton Wrote:
>> On Tue, 28 Aug 2012 18:00:20 +0800
>> wency@cn.fujitsu.com wrote:
>>
>>> From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>>
>>> There is a possibility that get_page_bootmem() is called to the same page many
>>> times. So when get_page_bootmem is called to the same page, the function only
>>> increments page->_count.
>>
>> I really don't understand this explanation, even after having looked at
>> the code.  Can you please have another attempt at the changelog?
>
> What is the problem that you want to fix? The function get_page_bootmem()
> may be called to the same page more than once, but I don't find any problem
> about current implementation.

The patch is just optimization. The patch does not fix a problems.
As you know, the function may be called many times for the same page.
I think if a page is sets to page_type and Page Private flag and page->private,
the page need not be set the same things again. So we check page_type when
get_page_bootmem() is called. And if the page has been set to them, the page
is only incremented page->_count.

Thanks,
Yasuaki Ishimatsu

>
> Thanks
> Wen Congyang
>
>>
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -95,10 +95,17 @@ static void release_memory_resource(struct resource *res)
>>>   static void get_page_bootmem(unsigned long info,  struct page *page,
>>>   			     unsigned long type)
>>>   {
>>> -	page->lru.next = (struct list_head *) type;
>>> -	SetPagePrivate(page);
>>> -	set_page_private(page, info);
>>> -	atomic_inc(&page->_count);
>>> +	unsigned long page_type;
>>> +
>>> +	page_type = (unsigned long) page->lru.next;
>>> +	if (page_type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
>>> +	    page_type > MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE){
>>> +		page->lru.next = (struct list_head *) type;
>>> +		SetPagePrivate(page);
>>> +		set_page_private(page, info);
>>> +		atomic_inc(&page->_count);
>>> +	} else
>>> +		atomic_inc(&page->_count);
>>>   }
>>
>> And a code comment which explains what is going on would be good.  As
>> is always the case ;)
>>
>>
>



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC v8 PATCH 08/20] memory-hotplug: remove /sys/firmware/memmap/X sysfs
  2012-09-03  5:51     ` Wen Congyang
@ 2012-09-04 23:16       ` Andrew Morton
  2012-09-05  1:41         ` Wen Congyang
  0 siblings, 1 reply; 43+ messages in thread
From: Andrew Morton @ 2012-09-04 23:16 UTC (permalink / raw)
  To: Wen Congyang
  Cc: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux, rientjes,
	liuj97, len.brown, benh, paulus, cl, minchan.kim,
	kosaki.motohiro, isimatu.yasuaki, Christoph Lameter

On Mon, 03 Sep 2012 13:51:10 +0800
Wen Congyang <wency@cn.fujitsu.com> wrote:

> >> +static void release_firmware_map_entry(struct kobject *kobj)
> >> +{
> >> +	struct firmware_map_entry *entry = to_memmap_entry(kobj);
> >> +	struct page *page;
> >> +
> >> +	page = virt_to_page(entry);
> >> +	if (PageSlab(page) || PageCompound(page))
> > 
> > That PageCompound() test looks rather odd.  Why is this done?
> 
> Liu Jiang and Christoph Lameter discussed how to find slab page
> in this mail:
> https://lkml.org/lkml/2012/7/6/333.

Well, please add a code comment to release_firmware_map_entry() which
fully explains these things.

I see that Christoph and I agree: "It would be cleaner if memory
hotplug had an indicator which allocation mechanism was used and would
use the corresponding free action".  You didn't respond to this
suggestion when he made it, nor when I made it.  What are your thoughts
on this?


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC v8 PATCH 08/20] memory-hotplug: remove /sys/firmware/memmap/X sysfs
  2012-09-04 23:16       ` Andrew Morton
@ 2012-09-05  1:41         ` Wen Congyang
  0 siblings, 0 replies; 43+ messages in thread
From: Wen Congyang @ 2012-09-05  1:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: x86, linux-mm, linux-kernel, linuxppc-dev, linux-acpi,
	linux-s390, linux-sh, linux-ia64, cmetcalf, sparclinux, rientjes,
	liuj97, len.brown, benh, paulus, cl, minchan.kim,
	kosaki.motohiro, isimatu.yasuaki, Christoph Lameter

At 09/05/2012 07:16 AM, Andrew Morton Wrote:
> On Mon, 03 Sep 2012 13:51:10 +0800
> Wen Congyang <wency@cn.fujitsu.com> wrote:
> 
>>>> +static void release_firmware_map_entry(struct kobject *kobj)
>>>> +{
>>>> +	struct firmware_map_entry *entry = to_memmap_entry(kobj);
>>>> +	struct page *page;
>>>> +
>>>> +	page = virt_to_page(entry);
>>>> +	if (PageSlab(page) || PageCompound(page))
>>>
>>> That PageCompound() test looks rather odd.  Why is this done?
>>
>> Liu Jiang and Christoph Lameter discussed how to find slab page
>> in this mail:
>> https://lkml.org/lkml/2012/7/6/333.
> 
> Well, please add a code comment to release_firmware_map_entry() which
> fully explains these things.
> 
> I see that Christoph and I agree: "It would be cleaner if memory
> hotplug had an indicator which allocation mechanism was used and would
> use the corresponding free action".  You didn't respond to this
> suggestion when he made it, nor when I made it.  What are your thoughts
> on this?

Hmm, I think it is better to use an indicator which allocation mechanism was
used. I will do it in the next version.

Thanks
Wen Congyang

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory
  2012-08-31 20:49 ` [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory Andrew Morton
@ 2012-09-10  1:46   ` Yasuaki Ishimatsu
  2012-09-10  2:01     ` Wen Congyang
  0 siblings, 1 reply; 43+ messages in thread
From: Yasuaki Ishimatsu @ 2012-09-10  1:46 UTC (permalink / raw)
  To: wency
  Cc: Andrew Morton, x86, linux-mm, linux-kernel, linuxppc-dev,
	linux-acpi, linux-s390, linux-sh, linux-ia64, cmetcalf,
	sparclinux, rientjes, liuj97, len.brown, benh, paulus, cl,
	minchan.kim, kosaki.motohiro

Hi Wen,

2012/09/01 5:49, Andrew Morton wrote:
> On Tue, 28 Aug 2012 18:00:07 +0800
> wency@cn.fujitsu.com wrote:
>
>> This patch series aims to support physical memory hot-remove.
>
> Have you had much review and testing feedback yet?
>
>> The patches can free/remove the following things:
>>
>>    - acpi_memory_info                          : [RFC PATCH 4/19]
>>    - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
>>    - iomem_resource                            : [RFC PATCH 9/19]
>>    - mem_section and related sysfs files       : [RFC PATCH 10-11, 13-16/19]
>>    - page table of removed memory              : [RFC PATCH 12/19]
>>    - node and related sysfs files              : [RFC PATCH 18-19/19]
>>
>> If you find lack of function for physical memory hot-remove, please let me
>> know.
>

> I doubt if many people have hardware which permits physical memory
> removal?  How would you suggest that people with regular hardware can
> test these chagnes?

How do you test the patch? As Andrew says, for hot-removing memory,
we need a particular hardware. I think so too. So many people may want
to know how to test the patch.
If we apply following patch to kvm guest, can we hot-remove memory on
kvm guest?

http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01389.html

Thanks,
Yasuaki Ishimatsu

>
>> Known problems:
>> 1. memory can't be offlined when CONFIG_MEMCG is selected.
>
> That's quite a problem!  Do you have a description of why this is the
> case, and a plan for fixing it?
>



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory
  2012-09-10  1:46   ` Yasuaki Ishimatsu
@ 2012-09-10  2:01     ` Wen Congyang
  2012-09-10 13:52       ` Vasilis Liaskovitis
  0 siblings, 1 reply; 43+ messages in thread
From: Wen Congyang @ 2012-09-10  2:01 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: Andrew Morton, x86, linux-mm, linux-kernel, linuxppc-dev,
	linux-acpi, linux-s390, linux-sh, linux-ia64, cmetcalf,
	sparclinux, rientjes, liuj97, len.brown, benh, paulus, cl,
	minchan.kim, kosaki.motohiro

At 09/10/2012 09:46 AM, Yasuaki Ishimatsu Wrote:
> Hi Wen,
> 
> 2012/09/01 5:49, Andrew Morton wrote:
>> On Tue, 28 Aug 2012 18:00:07 +0800
>> wency@cn.fujitsu.com wrote:
>>
>>> This patch series aims to support physical memory hot-remove.
>>
>> Have you had much review and testing feedback yet?
>>
>>> The patches can free/remove the following things:
>>>
>>>    - acpi_memory_info                          : [RFC PATCH 4/19]
>>>    - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
>>>    - iomem_resource                            : [RFC PATCH 9/19]
>>>    - mem_section and related sysfs files       : [RFC PATCH 10-11,
>>> 13-16/19]
>>>    - page table of removed memory              : [RFC PATCH 12/19]
>>>    - node and related sysfs files              : [RFC PATCH 18-19/19]
>>>
>>> If you find lack of function for physical memory hot-remove, please
>>> let me
>>> know.
>>
> 
>> I doubt if many people have hardware which permits physical memory
>> removal?  How would you suggest that people with regular hardware can
>> test these chagnes?
> 
> How do you test the patch? As Andrew says, for hot-removing memory,
> we need a particular hardware. I think so too. So many people may want
> to know how to test the patch.
> If we apply following patch to kvm guest, can we hot-remove memory on
> kvm guest?
> 
> http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01389.html

Yes, if we apply this patchset, we can test hot-remove memory on kvm guest.
But that patchset doesn't implement _PS3, so there is some restriction.

Thanks
Wen Congyang

> 
> Thanks,
> Yasuaki Ishimatsu
> 
>>
>>> Known problems:
>>> 1. memory can't be offlined when CONFIG_MEMCG is selected.
>>
>> That's quite a problem!  Do you have a description of why this is the
>> case, and a plan for fixing it?
>>
> 
> 
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory
  2012-09-10  2:01     ` Wen Congyang
@ 2012-09-10 13:52       ` Vasilis Liaskovitis
       [not found]         ` <CAAV+Mu7YWRWnxt78F4ZDMrrUsWB=n-_qkYOcQT7WQ2HwP89Obw@mail.gmail.com>
  2012-09-12  5:20         ` Wen Congyang
  0 siblings, 2 replies; 43+ messages in thread
From: Vasilis Liaskovitis @ 2012-09-10 13:52 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Yasuaki Ishimatsu, Andrew Morton, x86, linux-mm, linux-kernel,
	linuxppc-dev, linux-acpi, linux-s390, linux-sh, linux-ia64,
	cmetcalf, sparclinux, rientjes, liuj97, len.brown, benh, paulus,
	cl, minchan.kim, kosaki.motohiro

Hi,

On Mon, Sep 10, 2012 at 10:01:44AM +0800, Wen Congyang wrote:
> At 09/10/2012 09:46 AM, Yasuaki Ishimatsu Wrote:
> > Hi Wen,
> > 
> > 2012/09/01 5:49, Andrew Morton wrote:
> >> On Tue, 28 Aug 2012 18:00:07 +0800
> >> wency@cn.fujitsu.com wrote:
> >>
> >>> This patch series aims to support physical memory hot-remove.
> >>
> >> I doubt if many people have hardware which permits physical memory
> >> removal?  How would you suggest that people with regular hardware can
> >> test these chagnes?
> > 
> > How do you test the patch? As Andrew says, for hot-removing memory,
> > we need a particular hardware. I think so too. So many people may want
> > to know how to test the patch.
> > If we apply following patch to kvm guest, can we hot-remove memory on
> > kvm guest?
> > 
> > http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01389.html
> 
> Yes, if we apply this patchset, we can test hot-remove memory on kvm guest.
> But that patchset doesn't implement _PS3, so there is some restriction.

the following repos contain the patchset above, plus 2 more patches that add
PS3 support to the dimm devices in qemu/seabios:

https://github.com/vliaskov/seabios/commits/memhp-v2
https://github.com/vliaskov/qemu-kvm/commits/memhp-v2

I have not posted the PS3 patches yet in the qemu list, but will post them
soon for v3 of the memory hotplug series. If you have issues testing, let me
know.

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory
       [not found]         ` <CAAV+Mu7YWRWnxt78F4ZDMrrUsWB=n-_qkYOcQT7WQ2HwP89Obw@mail.gmail.com>
@ 2012-09-11  1:23           ` Minchan Kim
       [not found]             ` <CAAV+Mu4hb0qbW2Ry6w5FAGUM06puDH0v_H-jr584-G9CzJqSGw@mail.gmail.com>
  2012-09-11  1:25           ` Wen Congyang
  1 sibling, 1 reply; 43+ messages in thread
From: Minchan Kim @ 2012-09-11  1:23 UTC (permalink / raw)
  To: Jerry
  Cc: Wen Congyang, Andrew Morton, Yasuaki Ishimatsu,
	Vasilis Liaskovitis, x86, linux-mm, linux-kernel, linuxppc-dev,
	linux-acpi, linux-s390, linux-sh, linux-ia64, cmetcalf,
	sparclinux, rientjes, liuj97, len.brown, benh, paulus, cl,
	kosaki.motohiro

Hi Jerry,

On Tue, Sep 11, 2012 at 08:27:40AM +0800, Jerry wrote:
> Hi Wen,
> 
> I have been arranged a job related memory hotplug on ARM architecture.
> Maybe I know some new issues about memory hotplug on ARM architecture. I
> just enabled it on ARM, and it works well in my Android tablet now.
> However, I have not send out my patches. The real reason is that I don't
> know how to do it. Maybe I need to read "Documentation/SubmittingPatches".
> 
> Hi Andrew,
> This is my first time to send you a e-mail. I am so nervous about if I have
> some mistakes or not.

Don't be afraid.
If you might make a mistake, it's very natural to newbie.
I am sure anyone doesn't blame you. :)
If you have a good patch, please send out.

> 
> Some peoples maybe think memory hotplug need to be supported by special
> hardware. Maybe it means memory physical hotplug. Some times, we just need
> to use memory logical hotplug, doesn't remove the memory in physical. It is
> also usefully for power saving in my platform. Because I doesn't want
> the offline memory is in *self-refresh* state.

Just out of curiosity.
What's the your scenario and gain?
AFAIK, there were some effort about it in embedded side but gain isn't rather big
IIRC.

> 
> Any comments are appreciated.
> 
> Thanks,
> Jerry
> 
> 2012/9/10 Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> 
> > Hi,
> >
> > On Mon, Sep 10, 2012 at 10:01:44AM +0800, Wen Congyang wrote:
> > > At 09/10/2012 09:46 AM, Yasuaki Ishimatsu Wrote:
> > > > Hi Wen,
> > > >
> > > > 2012/09/01 5:49, Andrew Morton wrote:
> > > >> On Tue, 28 Aug 2012 18:00:07 +0800
> > > >> wency@cn.fujitsu.com wrote:
> > > >>
> > > >>> This patch series aims to support physical memory hot-remove.
> > > >>
> > > >> I doubt if many people have hardware which permits physical memory
> > > >> removal?  How would you suggest that people with regular hardware can
> > > >> test these chagnes?
> > > >
> > > > How do you test the patch? As Andrew says, for hot-removing memory,
> > > > we need a particular hardware. I think so too. So many people may want
> > > > to know how to test the patch.
> > > > If we apply following patch to kvm guest, can we hot-remove memory on
> > > > kvm guest?
> > > >
> > > > http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01389.html
> > >
> > > Yes, if we apply this patchset, we can test hot-remove memory on kvm
> > guest.
> > > But that patchset doesn't implement _PS3, so there is some restriction.
> >
> > the following repos contain the patchset above, plus 2 more patches that
> > add
> > PS3 support to the dimm devices in qemu/seabios:
> >
> > https://github.com/vliaskov/seabios/commits/memhp-v2
> > https://github.com/vliaskov/qemu-kvm/commits/memhp-v2
> >
> > I have not posted the PS3 patches yet in the qemu list, but will post them
> > soon for v3 of the memory hotplug series. If you have issues testing, let
> > me
> > know.
> >
> > thanks,
> >
> > - Vasilis
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >
> 
> 
> 
> -- 
> I love linux!!!

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory
       [not found]         ` <CAAV+Mu7YWRWnxt78F4ZDMrrUsWB=n-_qkYOcQT7WQ2HwP89Obw@mail.gmail.com>
  2012-09-11  1:23           ` Minchan Kim
@ 2012-09-11  1:25           ` Wen Congyang
  1 sibling, 0 replies; 43+ messages in thread
From: Wen Congyang @ 2012-09-11  1:25 UTC (permalink / raw)
  To: Jerry
  Cc: Andrew Morton, Yasuaki Ishimatsu, Vasilis Liaskovitis, x86,
	linux-mm, linux-kernel, linuxppc-dev, linux-acpi, linux-s390,
	linux-sh, linux-ia64, cmetcalf, sparclinux, rientjes, liuj97,
	len.brown, benh, paulus, cl, minchan.kim, kosaki.motohiro

At 09/11/2012 08:27 AM, Jerry Wrote:
> Hi Wen,
> 
> I have been arranged a job related memory hotplug on ARM architecture.
> Maybe I know some new issues about memory hotplug on ARM architecture. I
> just enabled it on ARM, and it works well in my Android tablet now.
> However, I have not send out my patches. The real reason is that I don't
> know how to do it. Maybe I need to read "Documentation/SubmittingPatches".
> 
> Hi Andrew,
> This is my first time to send you a e-mail. I am so nervous about if I have
> some mistakes or not.
> 
> Some peoples maybe think memory hotplug need to be supported by special
> hardware. Maybe it means memory physical hotplug. Some times, we just need
> to use memory logical hotplug, doesn't remove the memory in physical. It is
> also usefully for power saving in my platform. Because I doesn't want
> the offline memory is in *self-refresh* state.

Power saving? Do you need _PSx support?

Thanks
Wen Congyang

> 
> Any comments are appreciated.
> 
> Thanks,
> Jerry
> 
> 2012/9/10 Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> 
>> Hi,
>>
>> On Mon, Sep 10, 2012 at 10:01:44AM +0800, Wen Congyang wrote:
>>> At 09/10/2012 09:46 AM, Yasuaki Ishimatsu Wrote:
>>>> Hi Wen,
>>>>
>>>> 2012/09/01 5:49, Andrew Morton wrote:
>>>>> On Tue, 28 Aug 2012 18:00:07 +0800
>>>>> wency@cn.fujitsu.com wrote:
>>>>>
>>>>>> This patch series aims to support physical memory hot-remove.
>>>>>
>>>>> I doubt if many people have hardware which permits physical memory
>>>>> removal?  How would you suggest that people with regular hardware can
>>>>> test these chagnes?
>>>>
>>>> How do you test the patch? As Andrew says, for hot-removing memory,
>>>> we need a particular hardware. I think so too. So many people may want
>>>> to know how to test the patch.
>>>> If we apply following patch to kvm guest, can we hot-remove memory on
>>>> kvm guest?
>>>>
>>>> http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01389.html
>>>
>>> Yes, if we apply this patchset, we can test hot-remove memory on kvm
>> guest.
>>> But that patchset doesn't implement _PS3, so there is some restriction.
>>
>> the following repos contain the patchset above, plus 2 more patches that
>> add
>> PS3 support to the dimm devices in qemu/seabios:
>>
>> https://github.com/vliaskov/seabios/commits/memhp-v2
>> https://github.com/vliaskov/qemu-kvm/commits/memhp-v2
>>
>> I have not posted the PS3 patches yet in the qemu list, but will post them
>> soon for v3 of the memory hotplug series. If you have issues testing, let
>> me
>> know.
>>
>> thanks,
>>
>> - Vasilis
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
> 
> 
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory
       [not found]             ` <CAAV+Mu4hb0qbW2Ry6w5FAGUM06puDH0v_H-jr584-G9CzJqSGw@mail.gmail.com>
@ 2012-09-11  5:39               ` Wen Congyang
  2012-09-12  6:17               ` Minchan Kim
  1 sibling, 0 replies; 43+ messages in thread
From: Wen Congyang @ 2012-09-11  5:39 UTC (permalink / raw)
  To: Jerry
  Cc: Minchan Kim, Andrew Morton, Yasuaki Ishimatsu,
	Vasilis Liaskovitis, x86, linux-mm, linux-kernel, linuxppc-dev,
	linux-acpi, linux-s390, linux-sh, linux-ia64, cmetcalf,
	sparclinux, rientjes, liuj97, len.brown, benh, paulus, cl,
	kosaki.motohiro

At 09/11/2012 01:18 PM, Jerry Wrote:
> Hi Kim,
> 
> Thank you for your kindness. Let me clarify this:
> 
> On ARM architecture, there are 32 bits physical addresses space. However,
> the addresses space is divided into 8 banks normally. Each bank
> disabled/enabled by a chip selector signal. In my platform, bank0 connects
> a DDR chip, and bank1 also connects another DDR chip. And each DDR chip
> whose capability is 512MB is integrated into the main board. So, it could
> not be removed by hand. We can disable/enable each bank by peripheral
> device controller registers.
> 
> When system enter suspend state, if all the pages allocated could be
> migrated to one bank, there are no valid data in the another bank. In this
> time, I could disable the free bank. It isn't necessary to provided power
> to this chip in the suspend state. When system resume, I just need to
> enable it again.
> 
> Hi Wen,
> 
> I am sorry for that I doesn't know the "_PSx support" means. Maybe I
> needn't it.

Hmm, arm doesn't support ACPI, so please ignore it.

Thanks
Wen Congyang

> 
> Thanks,
> Jerry
> 
> 2012/9/11 Minchan Kim <minchan@kernel.org>
> 
>> Hi Jerry,
>>
>> On Tue, Sep 11, 2012 at 08:27:40AM +0800, Jerry wrote:
>>> Hi Wen,
>>>
>>> I have been arranged a job related memory hotplug on ARM architecture.
>>> Maybe I know some new issues about memory hotplug on ARM architecture. I
>>> just enabled it on ARM, and it works well in my Android tablet now.
>>> However, I have not send out my patches. The real reason is that I don't
>>> know how to do it. Maybe I need to read
>> "Documentation/SubmittingPatches".
>>>
>>> Hi Andrew,
>>> This is my first time to send you a e-mail. I am so nervous about if I
>> have
>>> some mistakes or not.
>>
>> Don't be afraid.
>> If you might make a mistake, it's very natural to newbie.
>> I am sure anyone doesn't blame you. :)
>> If you have a good patch, please send out.
>>
>>>
>>> Some peoples maybe think memory hotplug need to be supported by special
>>> hardware. Maybe it means memory physical hotplug. Some times, we just
>> need
>>> to use memory logical hotplug, doesn't remove the memory in physical. It
>> is
>>> also usefully for power saving in my platform. Because I doesn't want
>>> the offline memory is in *self-refresh* state.
>>
>> Just out of curiosity.
>> What's the your scenario and gain?
>> AFAIK, there were some effort about it in embedded side but gain isn't
>> rather big
>> IIRC.
>>
>>>
>>> Any comments are appreciated.
>>>
>>> Thanks,
>>> Jerry
>>>
>>> 2012/9/10 Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
>>>
>>>> Hi,
>>>>
>>>> On Mon, Sep 10, 2012 at 10:01:44AM +0800, Wen Congyang wrote:
>>>>> At 09/10/2012 09:46 AM, Yasuaki Ishimatsu Wrote:
>>>>>> Hi Wen,
>>>>>>
>>>>>> 2012/09/01 5:49, Andrew Morton wrote:
>>>>>>> On Tue, 28 Aug 2012 18:00:07 +0800
>>>>>>> wency@cn.fujitsu.com wrote:
>>>>>>>
>>>>>>>> This patch series aims to support physical memory hot-remove.
>>>>>>>
>>>>>>> I doubt if many people have hardware which permits physical memory
>>>>>>> removal?  How would you suggest that people with regular hardware
>> can
>>>>>>> test these chagnes?
>>>>>>
>>>>>> How do you test the patch? As Andrew says, for hot-removing memory,
>>>>>> we need a particular hardware. I think so too. So many people may
>> want
>>>>>> to know how to test the patch.
>>>>>> If we apply following patch to kvm guest, can we hot-remove memory
>> on
>>>>>> kvm guest?
>>>>>>
>>>>>> http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01389.html
>>>>>
>>>>> Yes, if we apply this patchset, we can test hot-remove memory on kvm
>>>> guest.
>>>>> But that patchset doesn't implement _PS3, so there is some
>> restriction.
>>>>
>>>> the following repos contain the patchset above, plus 2 more patches
>> that
>>>> add
>>>> PS3 support to the dimm devices in qemu/seabios:
>>>>
>>>> https://github.com/vliaskov/seabios/commits/memhp-v2
>>>> https://github.com/vliaskov/qemu-kvm/commits/memhp-v2
>>>>
>>>> I have not posted the PS3 patches yet in the qemu list, but will post
>> them
>>>> soon for v3 of the memory hotplug series. If you have issues testing,
>> let
>>>> me
>>>> know.
>>>>
>>>> thanks,
>>>>
>>>> - Vasilis
>>>>
>>>> --
>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>> see: http://www.linux-mm.org/ .
>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>>
>>>
>>>
>>>
>>> --
>>> I love linux!!!
>>
>> --
>> Kind regards,
>> Minchan Kim
>>
> 
> 
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory
  2012-09-10 13:52       ` Vasilis Liaskovitis
       [not found]         ` <CAAV+Mu7YWRWnxt78F4ZDMrrUsWB=n-_qkYOcQT7WQ2HwP89Obw@mail.gmail.com>
@ 2012-09-12  5:20         ` Wen Congyang
  2012-09-12 17:18           ` Vasilis Liaskovitis
  1 sibling, 1 reply; 43+ messages in thread
From: Wen Congyang @ 2012-09-12  5:20 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: Yasuaki Ishimatsu, Andrew Morton, x86, linux-mm, linux-kernel,
	linuxppc-dev, linux-acpi, linux-s390, linux-sh, linux-ia64,
	cmetcalf, sparclinux, rientjes, liuj97, len.brown, benh, paulus,
	cl, minchan.kim, kosaki.motohiro

At 09/10/2012 09:52 PM, Vasilis Liaskovitis Wrote:
> Hi,
> 
> On Mon, Sep 10, 2012 at 10:01:44AM +0800, Wen Congyang wrote:
>> At 09/10/2012 09:46 AM, Yasuaki Ishimatsu Wrote:
>>> Hi Wen,
>>>
>>> 2012/09/01 5:49, Andrew Morton wrote:
>>>> On Tue, 28 Aug 2012 18:00:07 +0800
>>>> wency@cn.fujitsu.com wrote:
>>>>
>>>>> This patch series aims to support physical memory hot-remove.
>>>>
>>>> I doubt if many people have hardware which permits physical memory
>>>> removal?  How would you suggest that people with regular hardware can
>>>> test these chagnes?
>>>
>>> How do you test the patch? As Andrew says, for hot-removing memory,
>>> we need a particular hardware. I think so too. So many people may want
>>> to know how to test the patch.
>>> If we apply following patch to kvm guest, can we hot-remove memory on
>>> kvm guest?
>>>
>>> http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01389.html
>>
>> Yes, if we apply this patchset, we can test hot-remove memory on kvm guest.
>> But that patchset doesn't implement _PS3, so there is some restriction.
> 
> the following repos contain the patchset above, plus 2 more patches that add
> PS3 support to the dimm devices in qemu/seabios:
> 
> https://github.com/vliaskov/seabios/commits/memhp-v2
> https://github.com/vliaskov/qemu-kvm/commits/memhp-v2
> 
> I have not posted the PS3 patches yet in the qemu list, but will post them
> soon for v3 of the memory hotplug series. If you have issues testing, let me
> know.

Hmm, seabios doesn't support ACPI table SLIT. We can specify node it for dimm
device, so I think we should support SLIT in seabios. Otherwise we may meet
the following kernel messages:
[  325.016769] init_memory_mapping: [mem 0x40000000-0x5fffffff]
[  325.018060]  [mem 0x40000000-0x5fffffff] page 2M
[  325.019168] [ffffea0001000000-ffffea00011fffff] potential offnode page_structs
[  325.024172] [ffffea0001200000-ffffea00013fffff] potential offnode page_structs
[  325.028596]  [ffffea0001400000-ffffea00017fffff] PMD -> [ffff880035000000-ffff8800353fffff] on node 1
[  325.031775] [ffffea0001600000-ffffea00017fffff] potential offnode page_structs

Do you have plan to do it?

Thanks
Wen Congyang

> 
> thanks,
> 
> - Vasilis
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory
       [not found]             ` <CAAV+Mu4hb0qbW2Ry6w5FAGUM06puDH0v_H-jr584-G9CzJqSGw@mail.gmail.com>
  2012-09-11  5:39               ` Wen Congyang
@ 2012-09-12  6:17               ` Minchan Kim
  1 sibling, 0 replies; 43+ messages in thread
From: Minchan Kim @ 2012-09-12  6:17 UTC (permalink / raw)
  To: Jerry
  Cc: Wen Congyang, Andrew Morton, Yasuaki Ishimatsu,
	Vasilis Liaskovitis, x86, linux-mm, linux-kernel, linuxppc-dev,
	linux-acpi, linux-s390, linux-sh, linux-ia64, cmetcalf,
	sparclinux, rientjes, liuj97, len.brown, benh, paulus, cl,
	kosaki.motohiro

On Tue, Sep 11, 2012 at 01:18:24PM +0800, Jerry wrote:
> Hi Kim,
> 
> Thank you for your kindness. Let me clarify this:
> 
> On ARM architecture, there are 32 bits physical addresses space. However,
> the addresses space is divided into 8 banks normally. Each bank
> disabled/enabled by a chip selector signal. In my platform, bank0 connects
> a DDR chip, and bank1 also connects another DDR chip. And each DDR chip
> whose capability is 512MB is integrated into the main board. So, it could
> not be removed by hand. We can disable/enable each bank by peripheral
> device controller registers.
> 
> When system enter suspend state, if all the pages allocated could be
> migrated to one bank, there are no valid data in the another bank. In this
> time, I could disable the free bank. It isn't necessary to provided power
> to this chip in the suspend state. When system resume, I just need to
> enable it again.
> 

Yes. I already know it and other trials for that a few years ago[1].
A few years ago, I investigated the benefit between power consumption
benefit during suspend VS start-up latency of resume and
power consumption cost of migration(page migration and IO write for
migration) and concluded normally the gain is not big. :)
The situation could be changed these days as workload are changing
but I'm skeptical about that approach, still.

Anyway, it's my private thought so you don't need to care about that.
If you are ready to submit the patchset, please send out.

1. http://lwn.net/Articles/478049/

Thanks.

- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory
  2012-09-12  5:20         ` Wen Congyang
@ 2012-09-12 17:18           ` Vasilis Liaskovitis
  2012-09-18  9:39             ` Wen Congyang
  0 siblings, 1 reply; 43+ messages in thread
From: Vasilis Liaskovitis @ 2012-09-12 17:18 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Yasuaki Ishimatsu, Andrew Morton, x86, linux-mm, linux-kernel,
	linuxppc-dev, linux-acpi, linux-s390, linux-sh, linux-ia64,
	cmetcalf, sparclinux, rientjes, liuj97, len.brown, benh, paulus,
	cl, minchan.kim, kosaki.motohiro

Hi,

On Wed, Sep 12, 2012 at 01:20:28PM +0800, Wen Congyang wrote:
> > 
> > On Mon, Sep 10, 2012 at 10:01:44AM +0800, Wen Congyang wrote:
> >> At 09/10/2012 09:46 AM, Yasuaki Ishimatsu Wrote:
> >>> How do you test the patch? As Andrew says, for hot-removing memory,
> >>> we need a particular hardware. I think so too. So many people may want
> >>> to know how to test the patch.
> >>> If we apply following patch to kvm guest, can we hot-remove memory on
> >>> kvm guest?
> >>>
> >>> http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01389.html
> >>
> >> Yes, if we apply this patchset, we can test hot-remove memory on kvm guest.
> >> But that patchset doesn't implement _PS3, so there is some restriction.
> > 
> > the following repos contain the patchset above, plus 2 more patches that add
> > PS3 support to the dimm devices in qemu/seabios:
> > 
> > https://github.com/vliaskov/seabios/commits/memhp-v2
> > https://github.com/vliaskov/qemu-kvm/commits/memhp-v2
> > 
> > I have not posted the PS3 patches yet in the qemu list, but will post them
> > soon for v3 of the memory hotplug series. If you have issues testing, let me
> > know.
> 
> Hmm, seabios doesn't support ACPI table SLIT. We can specify node it for dimm
> device, so I think we should support SLIT in seabios. Otherwise we may meet
> the following kernel messages:
> [  325.016769] init_memory_mapping: [mem 0x40000000-0x5fffffff]
> [  325.018060]  [mem 0x40000000-0x5fffffff] page 2M
> [  325.019168] [ffffea0001000000-ffffea00011fffff] potential offnode page_structs
> [  325.024172] [ffffea0001200000-ffffea00013fffff] potential offnode page_structs
> [  325.028596]  [ffffea0001400000-ffffea00017fffff] PMD -> [ffff880035000000-ffff8800353fffff] on node 1
> [  325.031775] [ffffea0001600000-ffffea00017fffff] potential offnode page_structs
> 
> Do you have plan to do it?
thanks for testing.

commit 5294828 from https://github.com/vliaskov/seabios/commits/memhp-v2
implements a SLIT table for the given numa nodes.

However I am not sure the SLIT is the problem. The kernel builds a default
numa_distance table in arch/x86/mm/numa.c: numa_alloc_distance(). If the BIOS
doesn't present a SLIT, this should take effect (numactl --hardware should
report this table)

Do you have more details on how to reproduce the warning? e.g. how many dimms
are present in the system? Does this happen on the first dimm hot-plugged?
Are all SRAT entries parsed correctly at boot-time or do you see any other
warnings at boot-time?

I 'll investigate a bit more and report back.

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory
  2012-09-12 17:18           ` Vasilis Liaskovitis
@ 2012-09-18  9:39             ` Wen Congyang
  2012-09-19 17:02               ` Vasilis Liaskovitis
  0 siblings, 1 reply; 43+ messages in thread
From: Wen Congyang @ 2012-09-18  9:39 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: Yasuaki Ishimatsu, Andrew Morton, x86, linux-mm, linux-kernel,
	linuxppc-dev, linux-acpi, linux-s390, linux-sh, linux-ia64,
	cmetcalf, sparclinux, rientjes, liuj97, len.brown, benh, paulus,
	cl, minchan.kim, kosaki.motohiro

At 09/13/2012 01:18 AM, Vasilis Liaskovitis Wrote:
> Hi,
> 
> On Wed, Sep 12, 2012 at 01:20:28PM +0800, Wen Congyang wrote:
>>>
>>> On Mon, Sep 10, 2012 at 10:01:44AM +0800, Wen Congyang wrote:
>>>> At 09/10/2012 09:46 AM, Yasuaki Ishimatsu Wrote:
>>>>> How do you test the patch? As Andrew says, for hot-removing memory,
>>>>> we need a particular hardware. I think so too. So many people may want
>>>>> to know how to test the patch.
>>>>> If we apply following patch to kvm guest, can we hot-remove memory on
>>>>> kvm guest?
>>>>>
>>>>> http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01389.html
>>>>
>>>> Yes, if we apply this patchset, we can test hot-remove memory on kvm guest.
>>>> But that patchset doesn't implement _PS3, so there is some restriction.
>>>
>>> the following repos contain the patchset above, plus 2 more patches that add
>>> PS3 support to the dimm devices in qemu/seabios:
>>>
>>> https://github.com/vliaskov/seabios/commits/memhp-v2
>>> https://github.com/vliaskov/qemu-kvm/commits/memhp-v2
>>>
>>> I have not posted the PS3 patches yet in the qemu list, but will post them
>>> soon for v3 of the memory hotplug series. If you have issues testing, let me
>>> know.
>>
>> Hmm, seabios doesn't support ACPI table SLIT. We can specify node it for dimm
>> device, so I think we should support SLIT in seabios. Otherwise we may meet
>> the following kernel messages:
>> [  325.016769] init_memory_mapping: [mem 0x40000000-0x5fffffff]
>> [  325.018060]  [mem 0x40000000-0x5fffffff] page 2M
>> [  325.019168] [ffffea0001000000-ffffea00011fffff] potential offnode page_structs
>> [  325.024172] [ffffea0001200000-ffffea00013fffff] potential offnode page_structs
>> [  325.028596]  [ffffea0001400000-ffffea00017fffff] PMD -> [ffff880035000000-ffff8800353fffff] on node 1
>> [  325.031775] [ffffea0001600000-ffffea00017fffff] potential offnode page_structs
>>
>> Do you have plan to do it?
> thanks for testing.
> 
> commit 5294828 from https://github.com/vliaskov/seabios/commits/memhp-v2
> implements a SLIT table for the given numa nodes.

Hmm, why do you set node_distance(i, j) to REMOTE_DISTANCE if i != j?

> 
> However I am not sure the SLIT is the problem. The kernel builds a default
> numa_distance table in arch/x86/mm/numa.c: numa_alloc_distance(). If the BIOS
> doesn't present a SLIT, this should take effect (numactl --hardware should
> report this table)

If the BIOS doesn't present a SLIT, numa_distance_cnt is set to 0 in the
function numa_reset_distance(). So node_distance(i, j) is REMOTE_DISTANCE(i != j).

> 
> Do you have more details on how to reproduce the warning? e.g. how many dimms
> are present in the system? Does this happen on the first dimm hot-plugged?
> Are all SRAT entries parsed correctly at boot-time or do you see any other
> warnings at boot-time?

I can't reproduce it again. IIRC, I only do the following things:
hotplug a memory device, online the pages, offline the pages and hot remove
the memory device.

Thanks
Wen Congyang

> 
> I 'll investigate a bit more and report back.
> 
> thanks,
> 
> - Vasilis
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory
  2012-09-18  9:39             ` Wen Congyang
@ 2012-09-19 17:02               ` Vasilis Liaskovitis
  0 siblings, 0 replies; 43+ messages in thread
From: Vasilis Liaskovitis @ 2012-09-19 17:02 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Yasuaki Ishimatsu, Andrew Morton, x86, linux-mm, linux-kernel,
	linuxppc-dev, linux-acpi, linux-s390, linux-sh, linux-ia64,
	cmetcalf, sparclinux, rientjes, liuj97, len.brown, benh, paulus,
	cl, minchan.kim, kosaki.motohiro

Hi,
On Tue, Sep 18, 2012 at 05:39:37PM +0800, Wen Congyang wrote:
> At 09/13/2012 01:18 AM, Vasilis Liaskovitis Wrote:
> > Hi,
> > 
> > On Wed, Sep 12, 2012 at 01:20:28PM +0800, Wen Congyang wrote:
> >> Hmm, seabios doesn't support ACPI table SLIT. We can specify node it for dimm
> >> device, so I think we should support SLIT in seabios. Otherwise we may meet
> >> the following kernel messages:
> >> [  325.016769] init_memory_mapping: [mem 0x40000000-0x5fffffff]
> >> [  325.018060]  [mem 0x40000000-0x5fffffff] page 2M
> >> [  325.019168] [ffffea0001000000-ffffea00011fffff] potential offnode page_structs
> >> [  325.024172] [ffffea0001200000-ffffea00013fffff] potential offnode page_structs
> >> [  325.028596]  [ffffea0001400000-ffffea00017fffff] PMD -> [ffff880035000000-ffff8800353fffff] on node 1
> >> [  325.031775] [ffffea0001600000-ffffea00017fffff] potential offnode page_structs
> >>
> >> Do you have plan to do it?
> > thanks for testing.
> > 
> > commit 5294828 from https://github.com/vliaskov/seabios/commits/memhp-v2
> > implements a SLIT table for the given numa nodes.
> 
> Hmm, why do you set node_distance(i, j) to REMOTE_DISTANCE if i != j?

What's the alternative?

Afaik SLIT[i][j] shows the distance between proximity domains (_PXM) i and j. It
doesn't correspond to individual SRAT entries. So i and j here are not memory
ranges associated with 2 different dimms. They denote domains i and j, which map
to 2 different logical nodeids in the kernel.

A default setting would be to set the entry to REMOTE_DISTANCE for all different
domains (i!=j). So this SLIT implementation is not useful, since it results
in the same numa_distance values as the non-SLIT kernel calculation in
include/linux/topology.h

> 
> > 
> > However I am not sure the SLIT is the problem. The kernel builds a default
> > numa_distance table in arch/x86/mm/numa.c: numa_alloc_distance(). If the BIOS
> > doesn't present a SLIT, this should take effect (numactl --hardware should
> > report this table)
> 
> If the BIOS doesn't present a SLIT, numa_distance_cnt is set to 0 in the
> function numa_reset_distance(). So node_distance(i, j) is REMOTE_DISTANCE(i != j).
> 
> > 
> > Do you have more details on how to reproduce the warning? e.g. how many dimms
> > are present in the system? Does this happen on the first dimm hot-plugged?
> > Are all SRAT entries parsed correctly at boot-time or do you see any other
> > warnings at boot-time?
> 
> I can't reproduce it again. IIRC, I only do the following things:
> hotplug a memory device, online the pages, offline the pages and hot remove
> the memory device.

Is the sparse_vmemmap allocation supposed to guarantee no off-node allocations?
If not, then the warning could be valid.

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2012-09-19 17:02 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-28 10:00 [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory wency
2012-08-28 10:00 ` [RFC v8 PATCH 01/20] memory-hotplug: rename remove_memory() to offline_memory()/offline_pages() wency
2012-08-28 10:00 ` [RFC v8 PATCH 02/20] memory-hotplug: implement offline_memory() wency
2012-08-28 10:00 ` [RFC v8 PATCH 03/20] memory-hotplug: store the node id in acpi_memory_device wency
2012-08-28 10:00 ` [RFC v8 PATCH 04/20] memory-hotplug: offline and remove memory when removing the memory device wency
2012-08-31 20:55   ` Andrew Morton
2012-09-03  1:30     ` Wen Congyang
2012-08-28 10:00 ` [RFC v8 PATCH 05/20] memory-hotplug: check whether memory is present or not wency
2012-08-28 10:00 ` [RFC v8 PATCH 06/20] memory-hotplug: export the function acpi_bus_remove() wency
2012-08-28 10:00 ` [RFC v8 PATCH 07/20] memory-hotplug: call acpi_bus_remove() to remove memory device wency
2012-08-28 10:00 ` [RFC v8 PATCH 08/20] memory-hotplug: remove /sys/firmware/memmap/X sysfs wency
2012-08-31 21:06   ` Andrew Morton
2012-09-03  5:51     ` Wen Congyang
2012-09-04 23:16       ` Andrew Morton
2012-09-05  1:41         ` Wen Congyang
2012-09-03  7:31     ` Wen Congyang
2012-08-28 10:00 ` [RFC v8 PATCH 09/20] memory-hotplug: does not release memory region in PAGES_PER_SECTION chunks wency
2012-08-28 10:00 ` [RFC v8 PATCH 10/20] memory-hotplug: add memory_block_release wency
2012-08-28 10:00 ` [RFC v8 PATCH 11/20] memory-hotplug: remove_memory calls __remove_pages wency
2012-08-28 10:00 ` [RFC v8 PATCH 12/20] memory-hotplug: introduce new function arch_remove_memory() wency
2012-08-28 10:00 ` [RFC v8 PATCH 13/20] memory-hotplug: check page type in get_page_bootmem wency
2012-08-31 21:30   ` Andrew Morton
2012-09-04  3:46     ` Wen Congyang
2012-09-04  9:54       ` Yasuaki Ishimatsu
2012-08-28 10:00 ` [RFC v8 PATCH 14/20] memory-hotplug: move register_page_bootmem_info_node and put_page_bootmem for sparse-vmemmap wency
2012-08-28 10:00 ` [RFC v8 PATCH 15/20] memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap wency
2012-08-28 10:00 ` [RFC v8 PATCH 16/20] memory-hotplug: free memmap " wency
2012-08-28 10:00 ` [RFC v8 PATCH 17/20] memory_hotplug: clear zone when the memory is removed wency
2012-08-28 10:00 ` [RFC v8 PATCH 18/20] memory-hotplug: add node_device_release wency
2012-08-28 10:00 ` [RFC v8 PATCH 19/20] memory-hotplug: remove sysfs file of node wency
2012-08-28 10:00 ` [RFC v8 PATCH 20/20] memory-hotplug: clear hwpoisoned flag when onlining pages wency
2012-08-31 20:49 ` [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory Andrew Morton
2012-09-10  1:46   ` Yasuaki Ishimatsu
2012-09-10  2:01     ` Wen Congyang
2012-09-10 13:52       ` Vasilis Liaskovitis
     [not found]         ` <CAAV+Mu7YWRWnxt78F4ZDMrrUsWB=n-_qkYOcQT7WQ2HwP89Obw@mail.gmail.com>
2012-09-11  1:23           ` Minchan Kim
     [not found]             ` <CAAV+Mu4hb0qbW2Ry6w5FAGUM06puDH0v_H-jr584-G9CzJqSGw@mail.gmail.com>
2012-09-11  5:39               ` Wen Congyang
2012-09-12  6:17               ` Minchan Kim
2012-09-11  1:25           ` Wen Congyang
2012-09-12  5:20         ` Wen Congyang
2012-09-12 17:18           ` Vasilis Liaskovitis
2012-09-18  9:39             ` Wen Congyang
2012-09-19 17:02               ` Vasilis Liaskovitis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).