linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] v3 De-couple sysfs memory directories from memory sections
@ 2010-10-01 18:22 Nathan Fontenot
  2010-10-01 18:28 ` [PATCH 1/9] v3 Move find_memory_block routine Nathan Fontenot
                   ` (9 more replies)
  0 siblings, 10 replies; 35+ messages in thread
From: Nathan Fontenot @ 2010-10-01 18:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev
  Cc: Greg KH, steiner, Robin Holt, KAMEZAWA Hiroyuki, Dave Hansen

This set of patches decouples the concept that a single memory
section corresponds to a single directory in 
/sys/devices/system/memory/.  On systems
with large amounts of memory (1+ TB) there are performance issues
related to creating the large number of sysfs directories.  For
a powerpc machine with 1 TB of memory we are creating 63,000+
directories.  This is resulting in boot times of around 45-50
minutes for systems with 1 TB of memory and 8 hours for systems
with 2 TB of memory.  With this patch set applied I am now seeing
boot times of 5 minutes or less.

The root of this issue is in sysfs directory creation. Every time
a directory is created a string compare is done against all sibling
directories to ensure we do not create duplicates.  The list of
directory nodes in sysfs is kept as an unsorted list which results
in this being an exponentially longer operation as the number of
directories are created.

The solution solved by this patch set is to allow a single
directory in sysfs to span multiple memory sections.  This is
controlled by an optional architecturally defined function
memory_block_size_bytes().  The default definition of this
routine returns a memory block size equal to the memory section
size. This maintains the current layout of sysfs memory
directories as it appears to userspace to remain the same as it
is today.

For architectures that define their own version of this routine,
as is done for powerpc and x86_64 in this patchset, the view in userspace
would change such that each memoryXXX directory would span
multiple memory sections.  The number of sections spanned would
depend on the value reported by memory_block_size_bytes.

In both cases a new file 'end_phys_index' is created in each
memoryXXX directory.  This file will contain the physical id
of the last memory section covered by the sysfs directory.  For
the default case, the value in 'end_phys_index' will be the same
as in the existng 'phys_index' file.

Updates for this version of the patch:

- Patches 2 and 3 have been swapped which has alleviated the need for the
  section count in the memory_block struct to be an atomic.

- The get_memory_block_size and memory_block_size_bytes routines now return
  an unsigned long instead of a u32.  This affects patches 4, 7, and 8.

- [Patch 5/9] The phys_index member of the memory block struct is changed to
  start_section_nr and the new end_phys_index is now named end_section_nr.

- [Patch 8/9] A new patch added to the set to define a version of
  memory_block_size_bytes() for x86_64 when CONFIG_X86_UV is set.

- [Patch 9/9] Correct the updates to hotplug documentation to indicate that
  4 or 5 files may be seen for each memory directory in sysfs.

-Nathan Fontenot

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/9] v3 Move find_memory_block routine
  2010-10-01 18:22 [PATCH 0/9] v3 De-couple sysfs memory directories from memory sections Nathan Fontenot
@ 2010-10-01 18:28 ` Nathan Fontenot
  2010-10-01 18:40   ` Robin Holt
  2010-10-05  5:01   ` KAMEZAWA Hiroyuki
  2010-10-01 18:29 ` [PATCH 2/9] v3 Add mutex for adding/removing memory blocks Nathan Fontenot
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Nathan Fontenot @ 2010-10-01 18:28 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev
  Cc: Greg KH, steiner, Robin Holt, KAMEZAWA Hiroyuki, Dave Hansen

Move the find_memory_block() routine up to avoid needing a forward
declaration in subsequent patches.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>

---
 drivers/base/memory.c |   62 +++++++++++++++++++++++++-------------------------
 1 file changed, 31 insertions(+), 31 deletions(-)

Index: linux-next/drivers/base/memory.c
===================================================================
--- linux-next.orig/drivers/base/memory.c	2010-09-29 14:56:26.000000000 -0500
+++ linux-next/drivers/base/memory.c	2010-09-30 14:09:36.000000000 -0500
@@ -435,6 +435,37 @@
 	return 0;
 }
 
+/*
+ * For now, we have a linear search to go find the appropriate
+ * memory_block corresponding to a particular phys_index. If
+ * this gets to be a real problem, we can always use a radix
+ * tree or something here.
+ *
+ * This could be made generic for all sysdev classes.
+ */
+struct memory_block *find_memory_block(struct mem_section *section)
+{
+	struct kobject *kobj;
+	struct sys_device *sysdev;
+	struct memory_block *mem;
+	char name[sizeof(MEMORY_CLASS_NAME) + 9 + 1];
+
+	/*
+	 * This only works because we know that section == sysdev->id
+	 * slightly redundant with sysdev_register()
+	 */
+	sprintf(&name[0], "%s%d", MEMORY_CLASS_NAME, __section_nr(section));
+
+	kobj = kset_find_obj(&memory_sysdev_class.kset, name);
+	if (!kobj)
+		return NULL;
+
+	sysdev = container_of(kobj, struct sys_device, kobj);
+	mem = container_of(sysdev, struct memory_block, sysdev);
+
+	return mem;
+}
+
 static int add_memory_block(int nid, struct mem_section *section,
 			unsigned long state, enum mem_add_context context)
 {
@@ -468,37 +499,6 @@
 	return ret;
 }
 
-/*
- * For now, we have a linear search to go find the appropriate
- * memory_block corresponding to a particular phys_index. If
- * this gets to be a real problem, we can always use a radix
- * tree or something here.
- *
- * This could be made generic for all sysdev classes.
- */
-struct memory_block *find_memory_block(struct mem_section *section)
-{
-	struct kobject *kobj;
-	struct sys_device *sysdev;
-	struct memory_block *mem;
-	char name[sizeof(MEMORY_CLASS_NAME) + 9 + 1];
-
-	/*
-	 * This only works because we know that section == sysdev->id
-	 * slightly redundant with sysdev_register()
-	 */
-	sprintf(&name[0], "%s%d", MEMORY_CLASS_NAME, __section_nr(section));
-
-	kobj = kset_find_obj(&memory_sysdev_class.kset, name);
-	if (!kobj)
-		return NULL;
-
-	sysdev = container_of(kobj, struct sys_device, kobj);
-	mem = container_of(sysdev, struct memory_block, sysdev);
-
-	return mem;
-}
-
 int remove_memory_block(unsigned long node_id, struct mem_section *section,
 		int phys_device)
 {

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 2/9] v3 Add mutex for adding/removing memory blocks
  2010-10-01 18:22 [PATCH 0/9] v3 De-couple sysfs memory directories from memory sections Nathan Fontenot
  2010-10-01 18:28 ` [PATCH 1/9] v3 Move find_memory_block routine Nathan Fontenot
@ 2010-10-01 18:29 ` Nathan Fontenot
  2010-10-01 18:45   ` Robin Holt
  2010-10-05  5:06   ` KAMEZAWA Hiroyuki
  2010-10-01 18:30 ` [PATCH 3/9] v3 Add section count to memory_block struct Nathan Fontenot
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Nathan Fontenot @ 2010-10-01 18:29 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev
  Cc: Greg KH, steiner, Robin Holt, KAMEZAWA Hiroyuki, Dave Hansen

Add a new mutex for use in adding and removing of memory blocks.  This
is needed to avoid any race conditions in which the same memory block could
be added and removed at the same time.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>

---
 drivers/base/memory.c |    7 +++++++
 1 file changed, 7 insertions(+)

Index: linux-next/drivers/base/memory.c
===================================================================
--- linux-next.orig/drivers/base/memory.c	2010-09-30 14:09:36.000000000 -0500
+++ linux-next/drivers/base/memory.c	2010-09-30 14:12:41.000000000 -0500
@@ -27,6 +27,8 @@
 #include <asm/atomic.h>
 #include <asm/uaccess.h>
 
+static DEFINE_MUTEX(mem_sysfs_mutex);
+
 #define MEMORY_CLASS_NAME	"memory"
 
 static struct sysdev_class memory_sysdev_class = {
@@ -476,6 +478,8 @@
 	if (!mem)
 		return -ENOMEM;
 
+	mutex_lock(&mem_sysfs_mutex);
+
 	mem->phys_index = __section_nr(section);
 	mem->state = state;
 	mutex_init(&mem->state_mutex);
@@ -496,6 +500,7 @@
 			ret = register_mem_sect_under_node(mem, nid);
 	}
 
+	mutex_unlock(&mem_sysfs_mutex);
 	return ret;
 }
 
@@ -504,6 +509,7 @@
 {
 	struct memory_block *mem;
 
+	mutex_lock(&mem_sysfs_mutex);
 	mem = find_memory_block(section);
 	unregister_mem_sect_under_nodes(mem);
 	mem_remove_simple_file(mem, phys_index);
@@ -512,6 +518,7 @@
 	mem_remove_simple_file(mem, removable);
 	unregister_memory(mem, section);
 
+	mutex_unlock(&mem_sysfs_mutex);
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 3/9] v3 Add section count to memory_block struct
  2010-10-01 18:22 [PATCH 0/9] v3 De-couple sysfs memory directories from memory sections Nathan Fontenot
  2010-10-01 18:28 ` [PATCH 1/9] v3 Move find_memory_block routine Nathan Fontenot
  2010-10-01 18:29 ` [PATCH 2/9] v3 Add mutex for adding/removing memory blocks Nathan Fontenot
@ 2010-10-01 18:30 ` Nathan Fontenot
  2010-10-01 18:46   ` Robin Holt
  2010-10-05  5:08   ` KAMEZAWA Hiroyuki
  2010-10-01 18:31 ` [PATCH 4/9] v3 Allow memory blocks to span multiple memory sections Nathan Fontenot
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Nathan Fontenot @ 2010-10-01 18:30 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev
  Cc: Greg KH, steiner, Robin Holt, KAMEZAWA Hiroyuki, Dave Hansen

Add a section count property to the memory_block struct to track the number
of memory sections that have been added/removed from a memory block. This
allows us to know when the last memory section of a memory block has been
removed so we can remove the memory block.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>

---
 drivers/base/memory.c  |   17 +++++++++++------
 include/linux/memory.h |    2 ++
 2 files changed, 13 insertions(+), 6 deletions(-)

Index: linux-next/drivers/base/memory.c
===================================================================
--- linux-next.orig/drivers/base/memory.c	2010-09-30 14:12:41.000000000 -0500
+++ linux-next/drivers/base/memory.c	2010-09-30 14:13:50.000000000 -0500
@@ -482,6 +482,7 @@
 
 	mem->phys_index = __section_nr(section);
 	mem->state = state;
+	mem->section_count++;
 	mutex_init(&mem->state_mutex);
 	start_pfn = section_nr_to_pfn(mem->phys_index);
 	mem->phys_device = arch_get_memory_phys_device(start_pfn);
@@ -511,12 +512,16 @@
 
 	mutex_lock(&mem_sysfs_mutex);
 	mem = find_memory_block(section);
-	unregister_mem_sect_under_nodes(mem);
-	mem_remove_simple_file(mem, phys_index);
-	mem_remove_simple_file(mem, state);
-	mem_remove_simple_file(mem, phys_device);
-	mem_remove_simple_file(mem, removable);
-	unregister_memory(mem, section);
+
+	mem->section_count--;
+	if (mem->section_count == 0) {
+		unregister_mem_sect_under_nodes(mem);
+		mem_remove_simple_file(mem, phys_index);
+		mem_remove_simple_file(mem, state);
+		mem_remove_simple_file(mem, phys_device);
+		mem_remove_simple_file(mem, removable);
+		unregister_memory(mem, section);
+	}
 
 	mutex_unlock(&mem_sysfs_mutex);
 	return 0;
Index: linux-next/include/linux/memory.h
===================================================================
--- linux-next.orig/include/linux/memory.h	2010-09-29 14:56:29.000000000 -0500
+++ linux-next/include/linux/memory.h	2010-09-30 14:13:50.000000000 -0500
@@ -23,6 +23,8 @@
 struct memory_block {
 	unsigned long phys_index;
 	unsigned long state;
+	int section_count;
+
 	/*
 	 * This serializes all state change requests.  It isn't
 	 * held during creation because the control files are

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 4/9] v3 Allow memory blocks to span multiple memory sections
  2010-10-01 18:22 [PATCH 0/9] v3 De-couple sysfs memory directories from memory sections Nathan Fontenot
                   ` (2 preceding siblings ...)
  2010-10-01 18:30 ` [PATCH 3/9] v3 Add section count to memory_block struct Nathan Fontenot
@ 2010-10-01 18:31 ` Nathan Fontenot
  2010-10-01 18:52   ` Robin Holt
  2010-10-01 19:00   ` Nathan Fontenot
  2010-10-01 18:33 ` [PATCH 5/9] v3 rename phys_index properties of memory block struct Nathan Fontenot
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Nathan Fontenot @ 2010-10-01 18:31 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev
  Cc: Greg KH, steiner, Robin Holt, KAMEZAWA Hiroyuki, Dave Hansen

Update the memory sysfs code such that each sysfs memory directory is now
considered a memory block that can span multiple memory sections per
memory block.  The default size of each memory block is SECTION_SIZE_BITS
to maintain the current behavior of having a single memory section per
memory block (i.e. one sysfs directory per memory section).

For architectures that want to have memory blocks span multiple
memory sections they need only define their own memory_block_size_bytes()
routine.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>

---
 drivers/base/memory.c |  155 ++++++++++++++++++++++++++++++++++----------------
 1 file changed, 108 insertions(+), 47 deletions(-)

Index: linux-next/drivers/base/memory.c
===================================================================
--- linux-next.orig/drivers/base/memory.c	2010-09-30 14:13:50.000000000 -0500
+++ linux-next/drivers/base/memory.c	2010-09-30 14:46:00.000000000 -0500
@@ -30,6 +30,14 @@
 static DEFINE_MUTEX(mem_sysfs_mutex);
 
 #define MEMORY_CLASS_NAME	"memory"
+#define MIN_MEMORY_BLOCK_SIZE	(1 << SECTION_SIZE_BITS)
+
+static int sections_per_block;
+
+static inline int base_memory_block_id(int section_nr)
+{
+	return section_nr / sections_per_block;
+}
 
 static struct sysdev_class memory_sysdev_class = {
 	.name = MEMORY_CLASS_NAME,
@@ -84,28 +92,47 @@
  * register_memory - Setup a sysfs device for a memory block
  */
 static
-int register_memory(struct memory_block *memory, struct mem_section *section)
+int register_memory(struct memory_block *memory)
 {
 	int error;
 
 	memory->sysdev.cls = &memory_sysdev_class;
-	memory->sysdev.id = __section_nr(section);
+	memory->sysdev.id = memory->phys_index / sections_per_block;
 
 	error = sysdev_register(&memory->sysdev);
 	return error;
 }
 
 static void
-unregister_memory(struct memory_block *memory, struct mem_section *section)
+unregister_memory(struct memory_block *memory)
 {
 	BUG_ON(memory->sysdev.cls != &memory_sysdev_class);
-	BUG_ON(memory->sysdev.id != __section_nr(section));
 
 	/* drop the ref. we got in remove_memory_block() */
 	kobject_put(&memory->sysdev.kobj);
 	sysdev_unregister(&memory->sysdev);
 }
 
+unsigned long __weak memory_block_size_bytes(void)
+{
+	return MIN_MEMORY_BLOCK_SIZE;
+}
+
+static unsigned long get_memory_block_size(void)
+{
+	u32 block_sz;
+
+	block_sz = memory_block_size_bytes();
+
+	/* Validate blk_sz is a power of 2 and not less than section size */
+	if ((block_sz & (block_sz - 1)) || (block_sz < MIN_MEMORY_BLOCK_SIZE)) {
+		WARN_ON(1);
+		block_sz = MIN_MEMORY_BLOCK_SIZE;
+	}
+
+	return block_sz;
+}
+
 /*
  * use this as the physical section index that this memsection
  * uses.
@@ -116,7 +143,7 @@
 {
 	struct memory_block *mem =
 		container_of(dev, struct memory_block, sysdev);
-	return sprintf(buf, "%08lx\n", mem->phys_index);
+	return sprintf(buf, "%08lx\n", mem->phys_index / sections_per_block);
 }
 
 /*
@@ -125,13 +152,16 @@
 static ssize_t show_mem_removable(struct sys_device *dev,
 			struct sysdev_attribute *attr, char *buf)
 {
-	unsigned long start_pfn;
-	int ret;
+	unsigned long i, pfn;
+	int ret = 1;
 	struct memory_block *mem =
 		container_of(dev, struct memory_block, sysdev);
 
-	start_pfn = section_nr_to_pfn(mem->phys_index);
-	ret = is_mem_section_removable(start_pfn, PAGES_PER_SECTION);
+	for (i = 0; i < sections_per_block; i++) {
+		pfn = section_nr_to_pfn(mem->phys_index + i);
+		ret &= is_mem_section_removable(pfn, PAGES_PER_SECTION);
+	}
+
 	return sprintf(buf, "%d\n", ret);
 }
 
@@ -184,17 +214,14 @@
  * OK to have direct references to sparsemem variables in here.
  */
 static int
-memory_block_action(struct memory_block *mem, unsigned long action)
+memory_section_action(unsigned long phys_index, unsigned long action)
 {
 	int i;
-	unsigned long psection;
 	unsigned long start_pfn, start_paddr;
 	struct page *first_page;
 	int ret;
-	int old_state = mem->state;
 
-	psection = mem->phys_index;
-	first_page = pfn_to_page(psection << PFN_SECTION_SHIFT);
+	first_page = pfn_to_page(phys_index << PFN_SECTION_SHIFT);
 
 	/*
 	 * The probe routines leave the pages reserved, just
@@ -207,8 +234,8 @@
 				continue;
 
 			printk(KERN_WARNING "section number %ld page number %d "
-				"not reserved, was it already online? \n",
-				psection, i);
+				"not reserved, was it already online?\n",
+				phys_index, i);
 			return -EBUSY;
 		}
 	}
@@ -219,18 +246,13 @@
 			ret = online_pages(start_pfn, PAGES_PER_SECTION);
 			break;
 		case MEM_OFFLINE:
-			mem->state = MEM_GOING_OFFLINE;
 			start_paddr = page_to_pfn(first_page) << PAGE_SHIFT;
 			ret = remove_memory(start_paddr,
 					    PAGES_PER_SECTION << PAGE_SHIFT);
-			if (ret) {
-				mem->state = old_state;
-				break;
-			}
 			break;
 		default:
-			WARN(1, KERN_WARNING "%s(%p, %ld) unknown action: %ld\n",
-					__func__, mem, action, action);
+			WARN(1, KERN_WARNING "%s(%ld, %ld) unknown action: "
+			     "%ld\n", __func__, phys_index, action, action);
 			ret = -EINVAL;
 	}
 
@@ -240,7 +262,8 @@
 static int memory_block_change_state(struct memory_block *mem,
 		unsigned long to_state, unsigned long from_state_req)
 {
-	int ret = 0;
+	int i, ret = 0;
+
 	mutex_lock(&mem->state_mutex);
 
 	if (mem->state != from_state_req) {
@@ -248,8 +271,22 @@
 		goto out;
 	}
 
-	ret = memory_block_action(mem, to_state);
-	if (!ret)
+	if (to_state == MEM_OFFLINE)
+		mem->state = MEM_GOING_OFFLINE;
+
+	for (i = 0; i < sections_per_block; i++) {
+		ret = memory_section_action(mem->phys_index + i, to_state);
+		if (ret)
+			break;
+	}
+
+	if (ret) {
+		for (i = 0; i < sections_per_block; i++)
+			memory_section_action(mem->phys_index + i,
+					      from_state_req);
+
+		mem->state = from_state_req;
+	} else
 		mem->state = to_state;
 
 out:
@@ -262,20 +299,15 @@
 		struct sysdev_attribute *attr, const char *buf, size_t count)
 {
 	struct memory_block *mem;
-	unsigned int phys_section_nr;
 	int ret = -EINVAL;
 
 	mem = container_of(dev, struct memory_block, sysdev);
-	phys_section_nr = mem->phys_index;
-
-	if (!present_section_nr(phys_section_nr))
-		goto out;
 
 	if (!strncmp(buf, "online", min((int)count, 6)))
 		ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE);
 	else if(!strncmp(buf, "offline", min((int)count, 7)))
 		ret = memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE);
-out:
+
 	if (ret)
 		return ret;
 	return count;
@@ -315,7 +347,7 @@
 print_block_size(struct sysdev_class *class, struct sysdev_class_attribute *attr,
 		 char *buf)
 {
-	return sprintf(buf, "%lx\n", (unsigned long)PAGES_PER_SECTION * PAGE_SIZE);
+	return sprintf(buf, "%lx\n", get_memory_block_size());
 }
 
 static SYSDEV_CLASS_ATTR(block_size_bytes, 0444, print_block_size, NULL);
@@ -451,12 +483,13 @@
 	struct sys_device *sysdev;
 	struct memory_block *mem;
 	char name[sizeof(MEMORY_CLASS_NAME) + 9 + 1];
+	int block_id = base_memory_block_id(__section_nr(section));
 
 	/*
 	 * This only works because we know that section == sysdev->id
 	 * slightly redundant with sysdev_register()
 	 */
-	sprintf(&name[0], "%s%d", MEMORY_CLASS_NAME, __section_nr(section));
+	sprintf(&name[0], "%s%d", MEMORY_CLASS_NAME, block_id);
 
 	kobj = kset_find_obj(&memory_sysdev_class.kset, name);
 	if (!kobj)
@@ -468,26 +501,27 @@
 	return mem;
 }
 
-static int add_memory_block(int nid, struct mem_section *section,
-			unsigned long state, enum mem_add_context context)
+static int init_memory_block(struct memory_block **memory,
+			     struct mem_section *section, unsigned long state)
 {
-	struct memory_block *mem = kzalloc(sizeof(*mem), GFP_KERNEL);
+	struct memory_block *mem;
 	unsigned long start_pfn;
+	int scn_nr;
 	int ret = 0;
 
+	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
 	if (!mem)
 		return -ENOMEM;
 
-	mutex_lock(&mem_sysfs_mutex);
-
-	mem->phys_index = __section_nr(section);
+	scn_nr = __section_nr(section);
+	mem->phys_index = base_memory_block_id(scn_nr) * sections_per_block;
 	mem->state = state;
 	mem->section_count++;
 	mutex_init(&mem->state_mutex);
 	start_pfn = section_nr_to_pfn(mem->phys_index);
 	mem->phys_device = arch_get_memory_phys_device(start_pfn);
 
-	ret = register_memory(mem, section);
+	ret = register_memory(mem);
 	if (!ret)
 		ret = mem_create_simple_file(mem, phys_index);
 	if (!ret)
@@ -496,8 +530,29 @@
 		ret = mem_create_simple_file(mem, phys_device);
 	if (!ret)
 		ret = mem_create_simple_file(mem, removable);
+
+	*memory = mem;
+	return ret;
+}
+
+static int add_memory_section(int nid, struct mem_section *section,
+			unsigned long state, enum mem_add_context context)
+{
+	struct memory_block *mem;
+	int ret = 0;
+
+	mutex_lock(&mem_sysfs_mutex);
+
+	mem = find_memory_block(section);
+	if (mem) {
+		mem->section_count++;
+		kobject_put(&mem->sysdev.kobj);
+	} else
+		ret = init_memory_block(&mem, section, state);
+
 	if (!ret) {
-		if (context == HOTPLUG)
+		if (context == HOTPLUG &&
+		    mem->section_count == sections_per_block)
 			ret = register_mem_sect_under_node(mem, nid);
 	}
 
@@ -520,8 +575,10 @@
 		mem_remove_simple_file(mem, state);
 		mem_remove_simple_file(mem, phys_device);
 		mem_remove_simple_file(mem, removable);
-		unregister_memory(mem, section);
-	}
+		unregister_memory(mem);
+		kfree(mem);
+	} else
+		kobject_put(&mem->sysdev.kobj);
 
 	mutex_unlock(&mem_sysfs_mutex);
 	return 0;
@@ -533,7 +590,7 @@
  */
 int register_new_memory(int nid, struct mem_section *section)
 {
-	return add_memory_block(nid, section, MEM_OFFLINE, HOTPLUG);
+	return add_memory_section(nid, section, MEM_OFFLINE, HOTPLUG);
 }
 
 int unregister_memory_section(struct mem_section *section)
@@ -552,12 +609,16 @@
 	unsigned int i;
 	int ret;
 	int err;
+	unsigned long block_sz;
 
 	memory_sysdev_class.kset.uevent_ops = &memory_uevent_ops;
 	ret = sysdev_class_register(&memory_sysdev_class);
 	if (ret)
 		goto out;
 
+	block_sz = get_memory_block_size();
+	sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
+
 	/*
 	 * Create entries for memory sections that were found
 	 * during boot and have been initialized
@@ -565,8 +626,8 @@
 	for (i = 0; i < NR_MEM_SECTIONS; i++) {
 		if (!present_section_nr(i))
 			continue;
-		err = add_memory_block(0, __nr_to_section(i), MEM_ONLINE,
-				       BOOT);
+		err = add_memory_section(0, __nr_to_section(i), MEM_ONLINE,
+					 BOOT);
 		if (!ret)
 			ret = err;
 	}

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/9] v3 rename phys_index properties of memory block struct
  2010-10-01 18:22 [PATCH 0/9] v3 De-couple sysfs memory directories from memory sections Nathan Fontenot
                   ` (3 preceding siblings ...)
  2010-10-01 18:31 ` [PATCH 4/9] v3 Allow memory blocks to span multiple memory sections Nathan Fontenot
@ 2010-10-01 18:33 ` Nathan Fontenot
  2010-10-01 18:54   ` Robin Holt
  2010-10-05  5:14   ` KAMEZAWA Hiroyuki
  2010-10-01 18:34 ` [PATCH 6/9] v3 Update node sysfs code Nathan Fontenot
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Nathan Fontenot @ 2010-10-01 18:33 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev
  Cc: Greg KH, steiner, Robin Holt, KAMEZAWA Hiroyuki, Dave Hansen

Update the 'phys_index' property of a the memory_block struct to be
called start_section_nr, and add a end_section_nr property.  The
data tracked here is the same but the updated naming is more in line
with what is stored here, namely the first and last section number
that the memory block spans.

The names presented to userspace remain the same, phys_index for
start_section_nr and end_phys_index for end_section_nr, to avoid breaking
anything in userspace.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>

---
 drivers/base/memory.c  |   39 ++++++++++++++++++++++++++++++---------
 include/linux/memory.h |    3 ++-
 2 files changed, 32 insertions(+), 10 deletions(-)

Index: linux-next/drivers/base/memory.c
===================================================================
--- linux-next.orig/drivers/base/memory.c	2010-09-30 14:46:00.000000000 -0500
+++ linux-next/drivers/base/memory.c	2010-09-30 14:46:09.000000000 -0500
@@ -97,7 +97,7 @@
 	int error;
 
 	memory->sysdev.cls = &memory_sysdev_class;
-	memory->sysdev.id = memory->phys_index / sections_per_block;
+	memory->sysdev.id = memory->start_section_nr / sections_per_block;
 
 	error = sysdev_register(&memory->sysdev);
 	return error;
@@ -138,12 +138,26 @@
  * uses.
  */
 
-static ssize_t show_mem_phys_index(struct sys_device *dev,
+static ssize_t show_mem_start_phys_index(struct sys_device *dev,
 			struct sysdev_attribute *attr, char *buf)
 {
 	struct memory_block *mem =
 		container_of(dev, struct memory_block, sysdev);
-	return sprintf(buf, "%08lx\n", mem->phys_index / sections_per_block);
+	unsigned long phys_index;
+
+	phys_index = mem->start_section_nr / sections_per_block;
+	return sprintf(buf, "%08lx\n", phys_index);
+}
+
+static ssize_t show_mem_end_phys_index(struct sys_device *dev,
+			struct sysdev_attribute *attr, char *buf)
+{
+	struct memory_block *mem =
+		container_of(dev, struct memory_block, sysdev);
+	unsigned long phys_index;
+
+	phys_index = mem->end_section_nr / sections_per_block;
+	return sprintf(buf, "%08lx\n", phys_index);
 }
 
 /*
@@ -158,7 +172,7 @@
 		container_of(dev, struct memory_block, sysdev);
 
 	for (i = 0; i < sections_per_block; i++) {
-		pfn = section_nr_to_pfn(mem->phys_index + i);
+		pfn = section_nr_to_pfn(mem->start_section_nr + i);
 		ret &= is_mem_section_removable(pfn, PAGES_PER_SECTION);
 	}
 
@@ -275,14 +289,15 @@
 		mem->state = MEM_GOING_OFFLINE;
 
 	for (i = 0; i < sections_per_block; i++) {
-		ret = memory_section_action(mem->phys_index + i, to_state);
+		ret = memory_section_action(mem->start_section_nr + i,
+					    to_state);
 		if (ret)
 			break;
 	}
 
 	if (ret) {
 		for (i = 0; i < sections_per_block; i++)
-			memory_section_action(mem->phys_index + i,
+			memory_section_action(mem->start_section_nr + i,
 					      from_state_req);
 
 		mem->state = from_state_req;
@@ -330,7 +345,8 @@
 	return sprintf(buf, "%d\n", mem->phys_device);
 }
 
-static SYSDEV_ATTR(phys_index, 0444, show_mem_phys_index, NULL);
+static SYSDEV_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL);
+static SYSDEV_ATTR(end_phys_index, 0444, show_mem_end_phys_index, NULL);
 static SYSDEV_ATTR(state, 0644, show_mem_state, store_mem_state);
 static SYSDEV_ATTR(phys_device, 0444, show_phys_device, NULL);
 static SYSDEV_ATTR(removable, 0444, show_mem_removable, NULL);
@@ -514,17 +530,21 @@
 		return -ENOMEM;
 
 	scn_nr = __section_nr(section);
-	mem->phys_index = base_memory_block_id(scn_nr) * sections_per_block;
+	mem->start_section_nr =
+			base_memory_block_id(scn_nr) * sections_per_block;
+	mem->end_section_nr = mem->start_section_nr + sections_per_block - 1;
 	mem->state = state;
 	mem->section_count++;
 	mutex_init(&mem->state_mutex);
-	start_pfn = section_nr_to_pfn(mem->phys_index);
+	start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	mem->phys_device = arch_get_memory_phys_device(start_pfn);
 
 	ret = register_memory(mem);
 	if (!ret)
 		ret = mem_create_simple_file(mem, phys_index);
 	if (!ret)
+		ret = mem_create_simple_file(mem, end_phys_index);
+	if (!ret)
 		ret = mem_create_simple_file(mem, state);
 	if (!ret)
 		ret = mem_create_simple_file(mem, phys_device);
@@ -572,6 +592,7 @@
 	if (mem->section_count == 0) {
 		unregister_mem_sect_under_nodes(mem);
 		mem_remove_simple_file(mem, phys_index);
+		mem_remove_simple_file(mem, end_phys_index);
 		mem_remove_simple_file(mem, state);
 		mem_remove_simple_file(mem, phys_device);
 		mem_remove_simple_file(mem, removable);
Index: linux-next/include/linux/memory.h
===================================================================
--- linux-next.orig/include/linux/memory.h	2010-09-30 14:44:39.000000000 -0500
+++ linux-next/include/linux/memory.h	2010-09-30 14:46:09.000000000 -0500
@@ -21,7 +21,8 @@
 #include <linux/mutex.h>
 
 struct memory_block {
-	unsigned long phys_index;
+	unsigned long start_section_nr;
+	unsigned long end_section_nr;
 	unsigned long state;
 	int section_count;
 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 6/9] v3 Update node sysfs code
  2010-10-01 18:22 [PATCH 0/9] v3 De-couple sysfs memory directories from memory sections Nathan Fontenot
                   ` (4 preceding siblings ...)
  2010-10-01 18:33 ` [PATCH 5/9] v3 rename phys_index properties of memory block struct Nathan Fontenot
@ 2010-10-01 18:34 ` Nathan Fontenot
  2010-10-01 18:55   ` Robin Holt
  2010-10-05  5:15   ` KAMEZAWA Hiroyuki
  2010-10-01 18:35 ` [PATCH 7/9] v3 Define memory_block_size_bytes for powerpc/pseries Nathan Fontenot
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Nathan Fontenot @ 2010-10-01 18:34 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev
  Cc: Greg KH, steiner, Robin Holt, KAMEZAWA Hiroyuki, Dave Hansen

Update the node sysfs code to be aware of the new capability for a memory
block to contain multiple memory sections and be aware of the memory block
structure name changes (start_section_nr).  This requires an additional
parameter to unregister_mem_sect_under_nodes so that we know which memory
section of the memory block to unregister.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>

---
 drivers/base/memory.c |    2 +-
 drivers/base/node.c   |   12 ++++++++----
 include/linux/node.h  |    6 ++++--
 3 files changed, 13 insertions(+), 7 deletions(-)

Index: linux-next/drivers/base/node.c
===================================================================
--- linux-next.orig/drivers/base/node.c	2010-09-30 14:44:38.000000000 -0500
+++ linux-next/drivers/base/node.c	2010-09-30 14:46:12.000000000 -0500
@@ -346,8 +346,10 @@
 		return -EFAULT;
 	if (!node_online(nid))
 		return 0;
-	sect_start_pfn = section_nr_to_pfn(mem_blk->phys_index);
-	sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
+
+	sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
+	sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
+	sect_end_pfn += PAGES_PER_SECTION - 1;
 	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
 		int page_nid;
 
@@ -371,7 +373,8 @@
 }
 
 /* unregister memory section under all nodes that it spans */
-int unregister_mem_sect_under_nodes(struct memory_block *mem_blk)
+int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
+				    unsigned long phys_index)
 {
 	NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
 	unsigned long pfn, sect_start_pfn, sect_end_pfn;
@@ -383,7 +386,8 @@
 	if (!unlinked_nodes)
 		return -ENOMEM;
 	nodes_clear(*unlinked_nodes);
-	sect_start_pfn = section_nr_to_pfn(mem_blk->phys_index);
+
+	sect_start_pfn = section_nr_to_pfn(phys_index);
 	sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
 	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
 		int nid;
Index: linux-next/drivers/base/memory.c
===================================================================
--- linux-next.orig/drivers/base/memory.c	2010-09-30 14:46:09.000000000 -0500
+++ linux-next/drivers/base/memory.c	2010-09-30 14:46:12.000000000 -0500
@@ -587,10 +587,10 @@
 
 	mutex_lock(&mem_sysfs_mutex);
 	mem = find_memory_block(section);
+	unregister_mem_sect_under_nodes(mem, __section_nr(section));
 
 	mem->section_count--;
 	if (mem->section_count == 0) {
-		unregister_mem_sect_under_nodes(mem);
 		mem_remove_simple_file(mem, phys_index);
 		mem_remove_simple_file(mem, end_phys_index);
 		mem_remove_simple_file(mem, state);
Index: linux-next/include/linux/node.h
===================================================================
--- linux-next.orig/include/linux/node.h	2010-09-30 14:44:38.000000000 -0500
+++ linux-next/include/linux/node.h	2010-09-30 14:46:12.000000000 -0500
@@ -44,7 +44,8 @@
 extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int register_mem_sect_under_node(struct memory_block *mem_blk,
 						int nid);
-extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk);
+extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
+					   unsigned long phys_index);
 
 #ifdef CONFIG_HUGETLBFS
 extern void register_hugetlbfs_with_node(node_registration_func_t doregister,
@@ -72,7 +73,8 @@
 {
 	return 0;
 }
-static inline int unregister_mem_sect_under_nodes(struct memory_block *mem_blk)
+static inline int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
+						  unsigned long phys_index)
 {
 	return 0;
 }

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 7/9] v3 Define memory_block_size_bytes for powerpc/pseries
  2010-10-01 18:22 [PATCH 0/9] v3 De-couple sysfs memory directories from memory sections Nathan Fontenot
                   ` (5 preceding siblings ...)
  2010-10-01 18:34 ` [PATCH 6/9] v3 Update node sysfs code Nathan Fontenot
@ 2010-10-01 18:35 ` Nathan Fontenot
  2010-10-01 18:56   ` Robin Holt
  2010-10-03 17:55   ` Balbir Singh
  2010-10-01 18:37 ` [PATCH 8/9] v3 Define memory_block_size_bytes for x86_64 with CONFIG_X86_UV set Nathan Fontenot
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Nathan Fontenot @ 2010-10-01 18:35 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev
  Cc: Greg KH, steiner, Robin Holt, KAMEZAWA Hiroyuki, Dave Hansen

Define a version of memory_block_size_bytes() for powerpc/pseries such that
a memory block spans an entire lmb.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>

---
 arch/powerpc/platforms/pseries/hotplug-memory.c |   66 +++++++++++++++++++-----
 1 file changed, 53 insertions(+), 13 deletions(-)

Index: linux-next/arch/powerpc/platforms/pseries/hotplug-memory.c
===================================================================
--- linux-next.orig/arch/powerpc/platforms/pseries/hotplug-memory.c	2010-09-30 14:44:37.000000000 -0500
+++ linux-next/arch/powerpc/platforms/pseries/hotplug-memory.c	2010-09-30 14:47:04.000000000 -0500
@@ -17,6 +17,54 @@
 #include <asm/pSeries_reconfig.h>
 #include <asm/sparsemem.h>
 
+static unsigned long get_memblock_size(void)
+{
+	struct device_node *np;
+	unsigned int memblock_size = 0;
+
+	np = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
+	if (np) {
+		const unsigned long *size;
+
+		size = of_get_property(np, "ibm,lmb-size", NULL);
+		memblock_size = size ? *size : 0;
+
+		of_node_put(np);
+	} else {
+		unsigned int memzero_size = 0;
+		const unsigned int *regs;
+
+		np = of_find_node_by_path("/memory@0");
+		if (np) {
+			regs = of_get_property(np, "reg", NULL);
+			memzero_size = regs ? regs[3] : 0;
+			of_node_put(np);
+		}
+
+		if (memzero_size) {
+			/* We now know the size of memory@0, use this to find
+			 * the first memoryblock and get its size.
+			 */
+			char buf[64];
+
+			sprintf(buf, "/memory@%x", memzero_size);
+			np = of_find_node_by_path(buf);
+			if (np) {
+				regs = of_get_property(np, "reg", NULL);
+				memblock_size = regs ? regs[3] : 0;
+				of_node_put(np);
+			}
+		}
+	}
+
+	return memblock_size;
+}
+
+unsigned long memory_block_size_bytes(void)
+{
+	return get_memblock_size();
+}
+
 static int pseries_remove_memblock(unsigned long base, unsigned int memblock_size)
 {
 	unsigned long start, start_pfn;
@@ -127,30 +175,22 @@
 
 static int pseries_drconf_memory(unsigned long *base, unsigned int action)
 {
-	struct device_node *np;
-	const unsigned long *lmb_size;
+	unsigned long memblock_size;
 	int rc;
 
-	np = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
-	if (!np)
+	memblock_size = get_memblock_size();
+	if (!memblock_size)
 		return -EINVAL;
 
-	lmb_size = of_get_property(np, "ibm,lmb-size", NULL);
-	if (!lmb_size) {
-		of_node_put(np);
-		return -EINVAL;
-	}
-
 	if (action == PSERIES_DRCONF_MEM_ADD) {
-		rc = memblock_add(*base, *lmb_size);
+		rc = memblock_add(*base, memblock_size);
 		rc = (rc < 0) ? -EINVAL : 0;
 	} else if (action == PSERIES_DRCONF_MEM_REMOVE) {
-		rc = pseries_remove_memblock(*base, *lmb_size);
+		rc = pseries_remove_memblock(*base, memblock_size);
 	} else {
 		rc = -EINVAL;
 	}
 
-	of_node_put(np);
 	return rc;
 }
 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 8/9] v3 Define memory_block_size_bytes for x86_64 with CONFIG_X86_UV set
  2010-10-01 18:22 [PATCH 0/9] v3 De-couple sysfs memory directories from memory sections Nathan Fontenot
                   ` (6 preceding siblings ...)
  2010-10-01 18:35 ` [PATCH 7/9] v3 Define memory_block_size_bytes for powerpc/pseries Nathan Fontenot
@ 2010-10-01 18:37 ` Nathan Fontenot
  2010-10-01 18:57   ` Robin Holt
  2010-10-01 18:37 ` [PATCH 9/9] v3 Update memory hotplug documentation Nathan Fontenot
  2010-10-21 12:05 ` [PATCH 0/9] v3 De-couple sysfs memory directories from memory sections Nikanth Karthikesan
  9 siblings, 1 reply; 35+ messages in thread
From: Nathan Fontenot @ 2010-10-01 18:37 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev
  Cc: Greg KH, steiner, Robin Holt, KAMEZAWA Hiroyuki, Dave Hansen

Define a version of memory_block_size_bytes for x86_64 when CONFIG_X86_UV is
set.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Jack Steiner <steiner@sgi.com>

---
 arch/x86/mm/init_64.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

Index: linux-next/arch/x86/mm/init_64.c
===================================================================
--- linux-next.orig/arch/x86/mm/init_64.c	2010-09-29 14:56:25.000000000 -0500
+++ linux-next/arch/x86/mm/init_64.c	2010-10-01 13:00:50.000000000 -0500
@@ -51,6 +51,7 @@
 #include <asm/numa.h>
 #include <asm/cacheflush.h>
 #include <asm/init.h>
+#include <asm/uv/uv.h>
 #include <linux/bootmem.h>
 
 static int __init parse_direct_gbpages_off(char *arg)
@@ -902,6 +903,19 @@
 	return NULL;
 }
 
+#ifdef CONFIG_X86_UV
+#define MIN_MEMORY_BLOCK_SIZE   (1 << SECTION_SIZE_BITS)
+
+unsigned long memory_block_size_bytes(void)
+{
+	if (is_uv_system()) {
+		printk(KERN_INFO "UV: memory block size 2GB\n");
+		return 2UL * 1024 * 1024 * 1024;
+	}
+	return MIN_MEMORY_BLOCK_SIZE;
+}
+#endif
+
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 /*
  * Initialise the sparsemem vmemmap using huge-pages at the PMD level.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 9/9] v3 Update memory hotplug documentation
  2010-10-01 18:22 [PATCH 0/9] v3 De-couple sysfs memory directories from memory sections Nathan Fontenot
                   ` (7 preceding siblings ...)
  2010-10-01 18:37 ` [PATCH 8/9] v3 Define memory_block_size_bytes for x86_64 with CONFIG_X86_UV set Nathan Fontenot
@ 2010-10-01 18:37 ` Nathan Fontenot
  2010-10-01 18:58   ` Robin Holt
  2010-10-05  5:18   ` KAMEZAWA Hiroyuki
  2010-10-21 12:05 ` [PATCH 0/9] v3 De-couple sysfs memory directories from memory sections Nikanth Karthikesan
  9 siblings, 2 replies; 35+ messages in thread
From: Nathan Fontenot @ 2010-10-01 18:37 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev
  Cc: Greg KH, steiner, Robin Holt, KAMEZAWA Hiroyuki, Dave Hansen

Update the memory hotplug documentation to reflect the new behaviors of
memory blocks reflected in sysfs.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>

---
 Documentation/memory-hotplug.txt |   47 +++++++++++++++++++++++++--------------
 1 file changed, 31 insertions(+), 16 deletions(-)

Index: linux-next/Documentation/memory-hotplug.txt
===================================================================
--- linux-next.orig/Documentation/memory-hotplug.txt	2010-09-29 14:56:24.000000000 -0500
+++ linux-next/Documentation/memory-hotplug.txt	2010-09-30 14:59:47.000000000 -0500
@@ -126,36 +126,51 @@
 --------------------------------
 4 sysfs files for memory hotplug
 --------------------------------
-All sections have their device information under /sys/devices/system/memory as
+All sections have their device information in sysfs.  Each section is part of
+a memory block under /sys/devices/system/memory as
 
 /sys/devices/system/memory/memoryXXX
-(XXX is section id.)
+(XXX is the section id.)
 
-Now, XXX is defined as start_address_of_section / section_size.
+Now, XXX is defined as (start_address_of_section / section_size) of the first
+section contained in the memory block.  The files 'phys_index' and
+'end_phys_index' under each directory report the beginning and end section id's
+for the memory block covered by the sysfs directory.  It is expected that all
+memory sections in this range are present and no memory holes exist in the
+range. Currently there is no way to determine if there is a memory hole, but
+the existence of one should not affect the hotplug capabilities of the memory
+block.
 
 For example, assume 1GiB section size. A device for a memory starting at
 0x100000000 is /sys/device/system/memory/memory4
 (0x100000000 / 1Gib = 4)
 This device covers address range [0x100000000 ... 0x140000000)
 
-Under each section, you can see 4 files.
+Under each section, you can see 4 or 5 files, the end_phys_index file being
+a recent addition and not present on older kernels.
 
-/sys/devices/system/memory/memoryXXX/phys_index
+/sys/devices/system/memory/memoryXXX/start_phys_index
+/sys/devices/system/memory/memoryXXX/end_phys_index
 /sys/devices/system/memory/memoryXXX/phys_device
 /sys/devices/system/memory/memoryXXX/state
 /sys/devices/system/memory/memoryXXX/removable
 
-'phys_index' : read-only and contains section id, same as XXX.
-'state'      : read-write
-               at read:  contains online/offline state of memory.
-               at write: user can specify "online", "offline" command
-'phys_device': read-only: designed to show the name of physical memory device.
-               This is not well implemented now.
-'removable'  : read-only: contains an integer value indicating
-               whether the memory section is removable or not
-               removable.  A value of 1 indicates that the memory
-               section is removable and a value of 0 indicates that
-               it is not removable.
+'phys_index'      : read-only and contains section id of the first section
+		    in the memory block, same as XXX.
+'end_phys_index'  : read-only and contains section id of the last section
+		    in the memory block.
+'state'           : read-write
+                    at read:  contains online/offline state of memory.
+                    at write: user can specify "online", "offline" command
+                    which will be performed on al sections in the block.
+'phys_device'     : read-only: designed to show the name of physical memory
+                    device.  This is not well implemented now.
+'removable'       : read-only: contains an integer value indicating
+                    whether the memory block is removable or not
+                    removable.  A value of 1 indicates that the memory
+                    block is removable and a value of 0 indicates that
+                    it is not removable. A memory block is removable only if
+                    every section in the block is removable.
 
 NOTE:
   These directories/files appear after physical memory hotplug phase.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] v3 Move find_memory_block routine
  2010-10-01 18:28 ` [PATCH 1/9] v3 Move find_memory_block routine Nathan Fontenot
@ 2010-10-01 18:40   ` Robin Holt
  2010-10-05  5:01   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 35+ messages in thread
From: Robin Holt @ 2010-10-01 18:40 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: Greg KH, steiner, linux-kernel, Dave Hansen, linux-mm,
	Robin Holt, linuxppc-dev, KAMEZAWA Hiroyuki

On Fri, Oct 01, 2010 at 01:28:39PM -0500, Nathan Fontenot wrote:
> Move the find_memory_block() routine up to avoid needing a forward
> declaration in subsequent patches.
> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>

Reviewed-by: Robin Holt <holt@sgi.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/9] v3 Add mutex for adding/removing memory blocks
  2010-10-01 18:29 ` [PATCH 2/9] v3 Add mutex for adding/removing memory blocks Nathan Fontenot
@ 2010-10-01 18:45   ` Robin Holt
  2010-10-05  5:06   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 35+ messages in thread
From: Robin Holt @ 2010-10-01 18:45 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: Greg KH, steiner, linux-kernel, Dave Hansen, linux-mm,
	Robin Holt, linuxppc-dev, KAMEZAWA Hiroyuki

On Fri, Oct 01, 2010 at 01:29:42PM -0500, Nathan Fontenot wrote:
> Add a new mutex for use in adding and removing of memory blocks.  This
> is needed to avoid any race conditions in which the same memory block could
> be added and removed at the same time.
> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>

Reviewed-by: Robin Holt <holt@sgi.com>

I am fine with this patch by itself, but its only real function is
to protect the count introduced by the next patch.  You might want to
combine the patches, but if not, that is fine as well.

Robin

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/9] v3 Add section count to memory_block struct
  2010-10-01 18:30 ` [PATCH 3/9] v3 Add section count to memory_block struct Nathan Fontenot
@ 2010-10-01 18:46   ` Robin Holt
  2010-10-05  5:08   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 35+ messages in thread
From: Robin Holt @ 2010-10-01 18:46 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: Greg KH, steiner, linux-kernel, Dave Hansen, linux-mm,
	Robin Holt, linuxppc-dev, KAMEZAWA Hiroyuki

On Fri, Oct 01, 2010 at 01:30:40PM -0500, Nathan Fontenot wrote:
> Add a section count property to the memory_block struct to track the number
> of memory sections that have been added/removed from a memory block. This
> allows us to know when the last memory section of a memory block has been
> removed so we can remove the memory block.
> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>

Reviewed-by: Robin Holt <holt@sgi.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/9] v3 Allow memory blocks to span multiple memory sections
  2010-10-01 18:31 ` [PATCH 4/9] v3 Allow memory blocks to span multiple memory sections Nathan Fontenot
@ 2010-10-01 18:52   ` Robin Holt
  2010-10-01 18:56     ` Nathan Fontenot
  2010-10-01 19:00   ` Nathan Fontenot
  1 sibling, 1 reply; 35+ messages in thread
From: Robin Holt @ 2010-10-01 18:52 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: Greg KH, steiner, linux-kernel, Dave Hansen, linux-mm,
	Robin Holt, linuxppc-dev, KAMEZAWA Hiroyuki

On Fri, Oct 01, 2010 at 01:31:51PM -0500, Nathan Fontenot wrote:
> Update the memory sysfs code such that each sysfs memory directory is now
> considered a memory block that can span multiple memory sections per
> memory block.  The default size of each memory block is SECTION_SIZE_BITS
> to maintain the current behavior of having a single memory section per
> memory block (i.e. one sysfs directory per memory section).
> 
> For architectures that want to have memory blocks span multiple
> memory sections they need only define their own memory_block_size_bytes()
> routine.
> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
> 
> ---
>  drivers/base/memory.c |  155 ++++++++++++++++++++++++++++++++++----------------
>  1 file changed, 108 insertions(+), 47 deletions(-)
> 
> Index: linux-next/drivers/base/memory.c
> ===================================================================
> --- linux-next.orig/drivers/base/memory.c	2010-09-30 14:13:50.000000000 -0500
> +++ linux-next/drivers/base/memory.c	2010-09-30 14:46:00.000000000 -0500
...
> +static unsigned long get_memory_block_size(void)
> +{
> +	u32 block_sz;
        ^^^

I think this should be unsigned long.  u32 will work, but everything
else has been changed to use unsigned long.  If you disagree, I will
happily acquiesce as nothing is currently broken.  If SGI decides to make
memory_block_size_bytes more dynamic, we will fix this up at that time.

Robin

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/9] v3 rename phys_index properties of memory block struct
  2010-10-01 18:33 ` [PATCH 5/9] v3 rename phys_index properties of memory block struct Nathan Fontenot
@ 2010-10-01 18:54   ` Robin Holt
  2010-10-05  5:14   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 35+ messages in thread
From: Robin Holt @ 2010-10-01 18:54 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: Greg KH, steiner, linux-kernel, Dave Hansen, linux-mm,
	Robin Holt, linuxppc-dev, KAMEZAWA Hiroyuki

On Fri, Oct 01, 2010 at 01:33:38PM -0500, Nathan Fontenot wrote:
> Update the 'phys_index' property of a the memory_block struct to be
> called start_section_nr, and add a end_section_nr property.  The
> data tracked here is the same but the updated naming is more in line
> with what is stored here, namely the first and last section number
> that the memory block spans.
> 
> The names presented to userspace remain the same, phys_index for
> start_section_nr and end_phys_index for end_section_nr, to avoid breaking
> anything in userspace.
> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>

Reviewed-by: Robin Holt <holt@sgi.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 6/9] v3 Update node sysfs code
  2010-10-01 18:34 ` [PATCH 6/9] v3 Update node sysfs code Nathan Fontenot
@ 2010-10-01 18:55   ` Robin Holt
  2010-10-05  5:15   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 35+ messages in thread
From: Robin Holt @ 2010-10-01 18:55 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: Greg KH, steiner, linux-kernel, Dave Hansen, linux-mm,
	Robin Holt, linuxppc-dev, KAMEZAWA Hiroyuki

On Fri, Oct 01, 2010 at 01:34:34PM -0500, Nathan Fontenot wrote:
> Update the node sysfs code to be aware of the new capability for a memory
> block to contain multiple memory sections and be aware of the memory block
> structure name changes (start_section_nr).  This requires an additional
> parameter to unregister_mem_sect_under_nodes so that we know which memory
> section of the memory block to unregister.
> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>

Reviewed-by: Robin Holt <holt@sgi.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/9] v3 Define memory_block_size_bytes for powerpc/pseries
  2010-10-01 18:35 ` [PATCH 7/9] v3 Define memory_block_size_bytes for powerpc/pseries Nathan Fontenot
@ 2010-10-01 18:56   ` Robin Holt
  2010-10-03 17:55   ` Balbir Singh
  1 sibling, 0 replies; 35+ messages in thread
From: Robin Holt @ 2010-10-01 18:56 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: Greg KH, steiner, linux-kernel, Dave Hansen, linux-mm,
	Robin Holt, linuxppc-dev, KAMEZAWA Hiroyuki

On Fri, Oct 01, 2010 at 01:35:54PM -0500, Nathan Fontenot wrote:
> Define a version of memory_block_size_bytes() for powerpc/pseries such that
> a memory block spans an entire lmb.
> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>

Reviewed-by: Robin Holt <holt@sgi.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/9] v3 Allow memory blocks to span multiple memory sections
  2010-10-01 18:52   ` Robin Holt
@ 2010-10-01 18:56     ` Nathan Fontenot
  0 siblings, 0 replies; 35+ messages in thread
From: Nathan Fontenot @ 2010-10-01 18:56 UTC (permalink / raw)
  To: Robin Holt
  Cc: Greg KH, steiner, linux-kernel, Dave Hansen, linux-mm,
	linuxppc-dev, KAMEZAWA Hiroyuki

On 10/01/2010 01:52 PM, Robin Holt wrote:
> On Fri, Oct 01, 2010 at 01:31:51PM -0500, Nathan Fontenot wrote:
>> Update the memory sysfs code such that each sysfs memory directory is now
>> considered a memory block that can span multiple memory sections per
>> memory block.  The default size of each memory block is SECTION_SIZE_BITS
>> to maintain the current behavior of having a single memory section per
>> memory block (i.e. one sysfs directory per memory section).
>>
>> For architectures that want to have memory blocks span multiple
>> memory sections they need only define their own memory_block_size_bytes()
>> routine.
>>
>> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
>>
>> ---
>>  drivers/base/memory.c |  155 ++++++++++++++++++++++++++++++++++----------------
>>  1 file changed, 108 insertions(+), 47 deletions(-)
>>
>> Index: linux-next/drivers/base/memory.c
>> ===================================================================
>> --- linux-next.orig/drivers/base/memory.c	2010-09-30 14:13:50.000000000 -0500
>> +++ linux-next/drivers/base/memory.c	2010-09-30 14:46:00.000000000 -0500
> ...
>> +static unsigned long get_memory_block_size(void)
>> +{
>> +	u32 block_sz;
>         ^^^
> 
> I think this should be unsigned long.  u32 will work, but everything
> else has been changed to use unsigned long.  If you disagree, I will
> happily acquiesce as nothing is currently broken.  If SGI decides to make
> memory_block_size_bytes more dynamic, we will fix this up at that time.

You're right, that should have been made an unsigned long also.  I'll attach a new
patch with that corrected.

-Nathan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 8/9] v3 Define memory_block_size_bytes for x86_64 with CONFIG_X86_UV set
  2010-10-01 18:37 ` [PATCH 8/9] v3 Define memory_block_size_bytes for x86_64 with CONFIG_X86_UV set Nathan Fontenot
@ 2010-10-01 18:57   ` Robin Holt
  0 siblings, 0 replies; 35+ messages in thread
From: Robin Holt @ 2010-10-01 18:57 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: Greg KH, steiner, linux-kernel, Dave Hansen, linux-mm,
	Robin Holt, linuxppc-dev, KAMEZAWA Hiroyuki

On Fri, Oct 01, 2010 at 01:37:05PM -0500, Nathan Fontenot wrote:
> Define a version of memory_block_size_bytes for x86_64 when CONFIG_X86_UV is
> set.
> 
> Signed-off-by: Robin Holt <holt@sgi.com>
> Signed-off-by: Jack Steiner <steiner@sgi.com>

I think this technically needs a Signed-off-by: <you> since you
are passing it upstream.

> 
> ---
>  arch/x86/mm/init_64.c |   14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> Index: linux-next/arch/x86/mm/init_64.c
> ===================================================================
> --- linux-next.orig/arch/x86/mm/init_64.c	2010-09-29 14:56:25.000000000 -0500
> +++ linux-next/arch/x86/mm/init_64.c	2010-10-01 13:00:50.000000000 -0500
> @@ -51,6 +51,7 @@
>  #include <asm/numa.h>
>  #include <asm/cacheflush.h>
>  #include <asm/init.h>
> +#include <asm/uv/uv.h>
>  #include <linux/bootmem.h>
>  
>  static int __init parse_direct_gbpages_off(char *arg)
> @@ -902,6 +903,19 @@
>  	return NULL;
>  }
>  
> +#ifdef CONFIG_X86_UV
> +#define MIN_MEMORY_BLOCK_SIZE   (1 << SECTION_SIZE_BITS)
> +
> +unsigned long memory_block_size_bytes(void)
> +{
> +	if (is_uv_system()) {
> +		printk(KERN_INFO "UV: memory block size 2GB\n");
> +		return 2UL * 1024 * 1024 * 1024;
> +	}
> +	return MIN_MEMORY_BLOCK_SIZE;
> +}
> +#endif
> +
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  /*
>   * Initialise the sparsemem vmemmap using huge-pages at the PMD level.
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 9/9] v3 Update memory hotplug documentation
  2010-10-01 18:37 ` [PATCH 9/9] v3 Update memory hotplug documentation Nathan Fontenot
@ 2010-10-01 18:58   ` Robin Holt
  2010-10-05  5:18   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 35+ messages in thread
From: Robin Holt @ 2010-10-01 18:58 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: Greg KH, steiner, linux-kernel, Dave Hansen, linux-mm,
	Robin Holt, linuxppc-dev, KAMEZAWA Hiroyuki

On Fri, Oct 01, 2010 at 01:37:49PM -0500, Nathan Fontenot wrote:
> Update the memory hotplug documentation to reflect the new behaviors of
> memory blocks reflected in sysfs.
> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>

Reviewed-by: Robin Holt <holt@sgi.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/9] v3 Allow memory blocks to span multiple memory sections
  2010-10-01 18:31 ` [PATCH 4/9] v3 Allow memory blocks to span multiple memory sections Nathan Fontenot
  2010-10-01 18:52   ` Robin Holt
@ 2010-10-01 19:00   ` Nathan Fontenot
  2010-10-01 19:20     ` Robin Holt
  2010-10-05  5:13     ` KAMEZAWA Hiroyuki
  1 sibling, 2 replies; 35+ messages in thread
From: Nathan Fontenot @ 2010-10-01 19:00 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev
  Cc: Greg KH, steiner, Robin Holt, KAMEZAWA Hiroyuki, Dave Hansen

Update the memory sysfs code such that each sysfs memory directory is now
considered a memory block that can span multiple memory sections per
memory block.  The default size of each memory block is SECTION_SIZE_BITS
to maintain the current behavior of having a single memory section per
memory block (i.e. one sysfs directory per memory section).

For architectures that want to have memory blocks span multiple
memory sections they need only define their own memory_block_size_bytes()
routine.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>

Updated patch to correct get_memory_block_size() variable block_sz to be
an unsigned long.

---
 drivers/base/memory.c |  155 ++++++++++++++++++++++++++++++++++----------------
 1 file changed, 108 insertions(+), 47 deletions(-)

Index: linux-next/drivers/base/memory.c
===================================================================
--- linux-next.orig/drivers/base/memory.c	2010-09-30 14:13:50.000000000 -0500
+++ linux-next/drivers/base/memory.c	2010-10-01 13:50:19.000000000 -0500
@@ -30,6 +30,14 @@
 static DEFINE_MUTEX(mem_sysfs_mutex);
 
 #define MEMORY_CLASS_NAME	"memory"
+#define MIN_MEMORY_BLOCK_SIZE	(1 << SECTION_SIZE_BITS)
+
+static int sections_per_block;
+
+static inline int base_memory_block_id(int section_nr)
+{
+	return section_nr / sections_per_block;
+}
 
 static struct sysdev_class memory_sysdev_class = {
 	.name = MEMORY_CLASS_NAME,
@@ -84,28 +92,47 @@
  * register_memory - Setup a sysfs device for a memory block
  */
 static
-int register_memory(struct memory_block *memory, struct mem_section *section)
+int register_memory(struct memory_block *memory)
 {
 	int error;
 
 	memory->sysdev.cls = &memory_sysdev_class;
-	memory->sysdev.id = __section_nr(section);
+	memory->sysdev.id = memory->phys_index / sections_per_block;
 
 	error = sysdev_register(&memory->sysdev);
 	return error;
 }
 
 static void
-unregister_memory(struct memory_block *memory, struct mem_section *section)
+unregister_memory(struct memory_block *memory)
 {
 	BUG_ON(memory->sysdev.cls != &memory_sysdev_class);
-	BUG_ON(memory->sysdev.id != __section_nr(section));
 
 	/* drop the ref. we got in remove_memory_block() */
 	kobject_put(&memory->sysdev.kobj);
 	sysdev_unregister(&memory->sysdev);
 }
 
+unsigned long __weak memory_block_size_bytes(void)
+{
+	return MIN_MEMORY_BLOCK_SIZE;
+}
+
+static unsigned long get_memory_block_size(void)
+{
+	unsigned long block_sz;
+
+	block_sz = memory_block_size_bytes();
+
+	/* Validate blk_sz is a power of 2 and not less than section size */
+	if ((block_sz & (block_sz - 1)) || (block_sz < MIN_MEMORY_BLOCK_SIZE)) {
+		WARN_ON(1);
+		block_sz = MIN_MEMORY_BLOCK_SIZE;
+	}
+
+	return block_sz;
+}
+
 /*
  * use this as the physical section index that this memsection
  * uses.
@@ -116,7 +143,7 @@
 {
 	struct memory_block *mem =
 		container_of(dev, struct memory_block, sysdev);
-	return sprintf(buf, "%08lx\n", mem->phys_index);
+	return sprintf(buf, "%08lx\n", mem->phys_index / sections_per_block);
 }
 
 /*
@@ -125,13 +152,16 @@
 static ssize_t show_mem_removable(struct sys_device *dev,
 			struct sysdev_attribute *attr, char *buf)
 {
-	unsigned long start_pfn;
-	int ret;
+	unsigned long i, pfn;
+	int ret = 1;
 	struct memory_block *mem =
 		container_of(dev, struct memory_block, sysdev);
 
-	start_pfn = section_nr_to_pfn(mem->phys_index);
-	ret = is_mem_section_removable(start_pfn, PAGES_PER_SECTION);
+	for (i = 0; i < sections_per_block; i++) {
+		pfn = section_nr_to_pfn(mem->phys_index + i);
+		ret &= is_mem_section_removable(pfn, PAGES_PER_SECTION);
+	}
+
 	return sprintf(buf, "%d\n", ret);
 }
 
@@ -184,17 +214,14 @@
  * OK to have direct references to sparsemem variables in here.
  */
 static int
-memory_block_action(struct memory_block *mem, unsigned long action)
+memory_section_action(unsigned long phys_index, unsigned long action)
 {
 	int i;
-	unsigned long psection;
 	unsigned long start_pfn, start_paddr;
 	struct page *first_page;
 	int ret;
-	int old_state = mem->state;
 
-	psection = mem->phys_index;
-	first_page = pfn_to_page(psection << PFN_SECTION_SHIFT);
+	first_page = pfn_to_page(phys_index << PFN_SECTION_SHIFT);
 
 	/*
 	 * The probe routines leave the pages reserved, just
@@ -207,8 +234,8 @@
 				continue;
 
 			printk(KERN_WARNING "section number %ld page number %d "
-				"not reserved, was it already online? \n",
-				psection, i);
+				"not reserved, was it already online?\n",
+				phys_index, i);
 			return -EBUSY;
 		}
 	}
@@ -219,18 +246,13 @@
 			ret = online_pages(start_pfn, PAGES_PER_SECTION);
 			break;
 		case MEM_OFFLINE:
-			mem->state = MEM_GOING_OFFLINE;
 			start_paddr = page_to_pfn(first_page) << PAGE_SHIFT;
 			ret = remove_memory(start_paddr,
 					    PAGES_PER_SECTION << PAGE_SHIFT);
-			if (ret) {
-				mem->state = old_state;
-				break;
-			}
 			break;
 		default:
-			WARN(1, KERN_WARNING "%s(%p, %ld) unknown action: %ld\n",
-					__func__, mem, action, action);
+			WARN(1, KERN_WARNING "%s(%ld, %ld) unknown action: "
+			     "%ld\n", __func__, phys_index, action, action);
 			ret = -EINVAL;
 	}
 
@@ -240,7 +262,8 @@
 static int memory_block_change_state(struct memory_block *mem,
 		unsigned long to_state, unsigned long from_state_req)
 {
-	int ret = 0;
+	int i, ret = 0;
+
 	mutex_lock(&mem->state_mutex);
 
 	if (mem->state != from_state_req) {
@@ -248,8 +271,22 @@
 		goto out;
 	}
 
-	ret = memory_block_action(mem, to_state);
-	if (!ret)
+	if (to_state == MEM_OFFLINE)
+		mem->state = MEM_GOING_OFFLINE;
+
+	for (i = 0; i < sections_per_block; i++) {
+		ret = memory_section_action(mem->phys_index + i, to_state);
+		if (ret)
+			break;
+	}
+
+	if (ret) {
+		for (i = 0; i < sections_per_block; i++)
+			memory_section_action(mem->phys_index + i,
+					      from_state_req);
+
+		mem->state = from_state_req;
+	} else
 		mem->state = to_state;
 
 out:
@@ -262,20 +299,15 @@
 		struct sysdev_attribute *attr, const char *buf, size_t count)
 {
 	struct memory_block *mem;
-	unsigned int phys_section_nr;
 	int ret = -EINVAL;
 
 	mem = container_of(dev, struct memory_block, sysdev);
-	phys_section_nr = mem->phys_index;
-
-	if (!present_section_nr(phys_section_nr))
-		goto out;
 
 	if (!strncmp(buf, "online", min((int)count, 6)))
 		ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE);
 	else if(!strncmp(buf, "offline", min((int)count, 7)))
 		ret = memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE);
-out:
+
 	if (ret)
 		return ret;
 	return count;
@@ -315,7 +347,7 @@
 print_block_size(struct sysdev_class *class, struct sysdev_class_attribute *attr,
 		 char *buf)
 {
-	return sprintf(buf, "%lx\n", (unsigned long)PAGES_PER_SECTION * PAGE_SIZE);
+	return sprintf(buf, "%lx\n", get_memory_block_size());
 }
 
 static SYSDEV_CLASS_ATTR(block_size_bytes, 0444, print_block_size, NULL);
@@ -451,12 +483,13 @@
 	struct sys_device *sysdev;
 	struct memory_block *mem;
 	char name[sizeof(MEMORY_CLASS_NAME) + 9 + 1];
+	int block_id = base_memory_block_id(__section_nr(section));
 
 	/*
 	 * This only works because we know that section == sysdev->id
 	 * slightly redundant with sysdev_register()
 	 */
-	sprintf(&name[0], "%s%d", MEMORY_CLASS_NAME, __section_nr(section));
+	sprintf(&name[0], "%s%d", MEMORY_CLASS_NAME, block_id);
 
 	kobj = kset_find_obj(&memory_sysdev_class.kset, name);
 	if (!kobj)
@@ -468,26 +501,27 @@
 	return mem;
 }
 
-static int add_memory_block(int nid, struct mem_section *section,
-			unsigned long state, enum mem_add_context context)
+static int init_memory_block(struct memory_block **memory,
+			     struct mem_section *section, unsigned long state)
 {
-	struct memory_block *mem = kzalloc(sizeof(*mem), GFP_KERNEL);
+	struct memory_block *mem;
 	unsigned long start_pfn;
+	int scn_nr;
 	int ret = 0;
 
+	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
 	if (!mem)
 		return -ENOMEM;
 
-	mutex_lock(&mem_sysfs_mutex);
-
-	mem->phys_index = __section_nr(section);
+	scn_nr = __section_nr(section);
+	mem->phys_index = base_memory_block_id(scn_nr) * sections_per_block;
 	mem->state = state;
 	mem->section_count++;
 	mutex_init(&mem->state_mutex);
 	start_pfn = section_nr_to_pfn(mem->phys_index);
 	mem->phys_device = arch_get_memory_phys_device(start_pfn);
 
-	ret = register_memory(mem, section);
+	ret = register_memory(mem);
 	if (!ret)
 		ret = mem_create_simple_file(mem, phys_index);
 	if (!ret)
@@ -496,8 +530,29 @@
 		ret = mem_create_simple_file(mem, phys_device);
 	if (!ret)
 		ret = mem_create_simple_file(mem, removable);
+
+	*memory = mem;
+	return ret;
+}
+
+static int add_memory_section(int nid, struct mem_section *section,
+			unsigned long state, enum mem_add_context context)
+{
+	struct memory_block *mem;
+	int ret = 0;
+
+	mutex_lock(&mem_sysfs_mutex);
+
+	mem = find_memory_block(section);
+	if (mem) {
+		mem->section_count++;
+		kobject_put(&mem->sysdev.kobj);
+	} else
+		ret = init_memory_block(&mem, section, state);
+
 	if (!ret) {
-		if (context == HOTPLUG)
+		if (context == HOTPLUG &&
+		    mem->section_count == sections_per_block)
 			ret = register_mem_sect_under_node(mem, nid);
 	}
 
@@ -520,8 +575,10 @@
 		mem_remove_simple_file(mem, state);
 		mem_remove_simple_file(mem, phys_device);
 		mem_remove_simple_file(mem, removable);
-		unregister_memory(mem, section);
-	}
+		unregister_memory(mem);
+		kfree(mem);
+	} else
+		kobject_put(&mem->sysdev.kobj);
 
 	mutex_unlock(&mem_sysfs_mutex);
 	return 0;
@@ -533,7 +590,7 @@
  */
 int register_new_memory(int nid, struct mem_section *section)
 {
-	return add_memory_block(nid, section, MEM_OFFLINE, HOTPLUG);
+	return add_memory_section(nid, section, MEM_OFFLINE, HOTPLUG);
 }
 
 int unregister_memory_section(struct mem_section *section)
@@ -552,12 +609,16 @@
 	unsigned int i;
 	int ret;
 	int err;
+	unsigned long block_sz;
 
 	memory_sysdev_class.kset.uevent_ops = &memory_uevent_ops;
 	ret = sysdev_class_register(&memory_sysdev_class);
 	if (ret)
 		goto out;
 
+	block_sz = get_memory_block_size();
+	sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
+
 	/*
 	 * Create entries for memory sections that were found
 	 * during boot and have been initialized
@@ -565,8 +626,8 @@
 	for (i = 0; i < NR_MEM_SECTIONS; i++) {
 		if (!present_section_nr(i))
 			continue;
-		err = add_memory_block(0, __nr_to_section(i), MEM_ONLINE,
-				       BOOT);
+		err = add_memory_section(0, __nr_to_section(i), MEM_ONLINE,
+					 BOOT);
 		if (!ret)
 			ret = err;
 	}

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/9] v3 Allow memory blocks to span multiple memory sections
  2010-10-01 19:00   ` Nathan Fontenot
@ 2010-10-01 19:20     ` Robin Holt
  2010-10-05  5:13     ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 35+ messages in thread
From: Robin Holt @ 2010-10-01 19:20 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: Greg KH, steiner, linux-kernel, Dave Hansen, linux-mm,
	Robin Holt, linuxppc-dev, KAMEZAWA Hiroyuki

On Fri, Oct 01, 2010 at 02:00:50PM -0500, Nathan Fontenot wrote:
> Update the memory sysfs code such that each sysfs memory directory is now
> considered a memory block that can span multiple memory sections per
> memory block.  The default size of each memory block is SECTION_SIZE_BITS
> to maintain the current behavior of having a single memory section per
> memory block (i.e. one sysfs directory per memory section).
> 
> For architectures that want to have memory blocks span multiple
> memory sections they need only define their own memory_block_size_bytes()
> routine.
> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>

Reviewed-by: Robin Holt <holt@sgi.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/9] v3 Define memory_block_size_bytes for powerpc/pseries
  2010-10-01 18:35 ` [PATCH 7/9] v3 Define memory_block_size_bytes for powerpc/pseries Nathan Fontenot
  2010-10-01 18:56   ` Robin Holt
@ 2010-10-03 17:55   ` Balbir Singh
  2010-10-03 18:07     ` Robin Holt
  1 sibling, 1 reply; 35+ messages in thread
From: Balbir Singh @ 2010-10-03 17:55 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: Greg KH, steiner, linux-kernel, Dave Hansen, linux-mm,
	Robin Holt, linuxppc-dev, KAMEZAWA Hiroyuki

* Nathan Fontenot <nfont@austin.ibm.com> [2010-10-01 13:35:54]:

> Define a version of memory_block_size_bytes() for powerpc/pseries such that
> a memory block spans an entire lmb.

I hope I am not missing anything obvious, but why not just call it
lmb_size, why do we need memblock_size?

Is lmb_size == memblock_size after your changes true for all
platforms?

> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
> 
> ---
>  arch/powerpc/platforms/pseries/hotplug-memory.c |   66 +++++++++++++++++++-----
>  1 file changed, 53 insertions(+), 13 deletions(-)
> 
> Index: linux-next/arch/powerpc/platforms/pseries/hotplug-memory.c
> ===================================================================
> --- linux-next.orig/arch/powerpc/platforms/pseries/hotplug-memory.c	2010-09-30 14:44:37.000000000 -0500
> +++ linux-next/arch/powerpc/platforms/pseries/hotplug-memory.c	2010-09-30 14:47:04.000000000 -0500
> @@ -17,6 +17,54 @@
>  #include <asm/pSeries_reconfig.h>
>  #include <asm/sparsemem.h>
> 
> +static unsigned long get_memblock_size(void)
> +{
> +	struct device_node *np;
> +	unsigned int memblock_size = 0;
> +
> +	np = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
> +	if (np) {
> +		const unsigned long *size;
> +
> +		size = of_get_property(np, "ibm,lmb-size", NULL);
> +		memblock_size = size ? *size : 0;
> +
> +		of_node_put(np);
> +	} else {
> +		unsigned int memzero_size = 0;
> +		const unsigned int *regs;
> +
> +		np = of_find_node_by_path("/memory@0");
> +		if (np) {
> +			regs = of_get_property(np, "reg", NULL);
> +			memzero_size = regs ? regs[3] : 0;
> +			of_node_put(np);
> +		}
> +
> +		if (memzero_size) {
> +			/* We now know the size of memory@0, use this to find
> +			 * the first memoryblock and get its size.
> +			 */

Nit: comment style is not correct

> +			char buf[64];
> +
> +			sprintf(buf, "/memory@%x", memzero_size);
> +			np = of_find_node_by_path(buf);
> +			if (np) {
> +				regs = of_get_property(np, "reg", NULL);
> +				memblock_size = regs ? regs[3] : 0;
> +				of_node_put(np);
> +			}
> +		}
> +	}



> +
> +	return memblock_size;
> +}
> +
> +unsigned long memory_block_size_bytes(void)
> +{
> +	return get_memblock_size();
> +}
> +
>  static int pseries_remove_memblock(unsigned long base, unsigned int memblock_size)
>  {
>  	unsigned long start, start_pfn;
> @@ -127,30 +175,22 @@
> 
>  static int pseries_drconf_memory(unsigned long *base, unsigned int action)
>  {
> -	struct device_node *np;
> -	const unsigned long *lmb_size;
> +	unsigned long memblock_size;
>  	int rc;
> 
> -	np = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
> -	if (!np)
> +	memblock_size = get_memblock_size();
> +	if (!memblock_size)
>  		return -EINVAL;
> 
> -	lmb_size = of_get_property(np, "ibm,lmb-size", NULL);
> -	if (!lmb_size) {
> -		of_node_put(np);
> -		return -EINVAL;
> -	}
> -
>  	if (action == PSERIES_DRCONF_MEM_ADD) {
> -		rc = memblock_add(*base, *lmb_size);
> +		rc = memblock_add(*base, memblock_size);
>  		rc = (rc < 0) ? -EINVAL : 0;
>  	} else if (action == PSERIES_DRCONF_MEM_REMOVE) {
> -		rc = pseries_remove_memblock(*base, *lmb_size);
> +		rc = pseries_remove_memblock(*base, memblock_size);
>  	} else {
>  		rc = -EINVAL;
>  	}
> 
> -	of_node_put(np);
>  	return rc;
>  }
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/9] v3 Define memory_block_size_bytes for powerpc/pseries
  2010-10-03 17:55   ` Balbir Singh
@ 2010-10-03 18:07     ` Robin Holt
  2010-10-03 18:11       ` Dave Hansen
  0 siblings, 1 reply; 35+ messages in thread
From: Robin Holt @ 2010-10-03 18:07 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Greg KH, steiner, linux-kernel, Dave Hansen, linux-mm,
	Robin Holt, linuxppc-dev, KAMEZAWA Hiroyuki

On Sun, Oct 03, 2010 at 11:25:00PM +0530, Balbir Singh wrote:
> * Nathan Fontenot <nfont@austin.ibm.com> [2010-10-01 13:35:54]:
> 
> > Define a version of memory_block_size_bytes() for powerpc/pseries such that
> > a memory block spans an entire lmb.
> 
> I hope I am not missing anything obvious, but why not just call it
> lmb_size, why do we need memblock_size?
> 
> Is lmb_size == memblock_size after your changes true for all
> platforms?

What is an lmb?  I don't recall anything like lmb being referred to in
the rest of the kernel.

Robin

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/9] v3 Define memory_block_size_bytes for powerpc/pseries
  2010-10-03 18:07     ` Robin Holt
@ 2010-10-03 18:11       ` Dave Hansen
  2010-10-03 18:27         ` Balbir Singh
  0 siblings, 1 reply; 35+ messages in thread
From: Dave Hansen @ 2010-10-03 18:11 UTC (permalink / raw)
  To: Robin Holt
  Cc: Greg KH, steiner, linux-kernel, linux-mm, linuxppc-dev,
	KAMEZAWA Hiroyuki, Balbir Singh

On Sun, 2010-10-03 at 13:07 -0500, Robin Holt wrote:
> On Sun, Oct 03, 2010 at 11:25:00PM +0530, Balbir Singh wrote:
> > * Nathan Fontenot <nfont@austin.ibm.com> [2010-10-01 13:35:54]:
> > 
> > > Define a version of memory_block_size_bytes() for powerpc/pseries such that
> > > a memory block spans an entire lmb.
> > 
> > I hope I am not missing anything obvious, but why not just call it
> > lmb_size, why do we need memblock_size?
> > 
> > Is lmb_size == memblock_size after your changes true for all
> > platforms?
> 
> What is an lmb?  I don't recall anything like lmb being referred to in
> the rest of the kernel.

Heh.  It's the OpenFirmware name for a Logical Memory Block.  Basically
what we use to determine the SECTION_SIZE on powerpc.  Probably not the
best terminology to use elsewhere in the kernel.

-- Dave

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/9] v3 Define memory_block_size_bytes for powerpc/pseries
  2010-10-03 18:11       ` Dave Hansen
@ 2010-10-03 18:27         ` Balbir Singh
  2010-10-04 14:45           ` Nathan Fontenot
  0 siblings, 1 reply; 35+ messages in thread
From: Balbir Singh @ 2010-10-03 18:27 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Greg KH, steiner, linux-kernel, linux-mm, Robin Holt,
	linuxppc-dev, KAMEZAWA Hiroyuki

* Dave Hansen <dave@linux.vnet.ibm.com> [2010-10-03 11:11:01]:

> On Sun, 2010-10-03 at 13:07 -0500, Robin Holt wrote:
> > On Sun, Oct 03, 2010 at 11:25:00PM +0530, Balbir Singh wrote:
> > > * Nathan Fontenot <nfont@austin.ibm.com> [2010-10-01 13:35:54]:
> > > 
> > > > Define a version of memory_block_size_bytes() for powerpc/pseries such that
> > > > a memory block spans an entire lmb.
> > > 
> > > I hope I am not missing anything obvious, but why not just call it
> > > lmb_size, why do we need memblock_size?
> > > 
> > > Is lmb_size == memblock_size after your changes true for all
> > > platforms?
> > 
> > What is an lmb?  I don't recall anything like lmb being referred to in
> > the rest of the kernel.
> 
> Heh.  It's the OpenFirmware name for a Logical Memory Block.  Basically
> what we use to determine the SECTION_SIZE on powerpc.  Probably not the
> best terminology to use elsewhere in the kernel.

Agreed for the kernel, this patch was for powerpc/pseries, hence was
checking in this context.

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/9] v3 Define memory_block_size_bytes for powerpc/pseries
  2010-10-03 18:27         ` Balbir Singh
@ 2010-10-04 14:45           ` Nathan Fontenot
  0 siblings, 0 replies; 35+ messages in thread
From: Nathan Fontenot @ 2010-10-04 14:45 UTC (permalink / raw)
  To: balbir
  Cc: Greg KH, steiner, linux-kernel, Dave Hansen, linux-mm,
	Robin Holt, linuxppc-dev, KAMEZAWA Hiroyuki

On 10/03/2010 01:27 PM, Balbir Singh wrote:
> * Dave Hansen <dave@linux.vnet.ibm.com> [2010-10-03 11:11:01]:
> 
>> On Sun, 2010-10-03 at 13:07 -0500, Robin Holt wrote:
>>> On Sun, Oct 03, 2010 at 11:25:00PM +0530, Balbir Singh wrote:
>>>> * Nathan Fontenot <nfont@austin.ibm.com> [2010-10-01 13:35:54]:
>>>>
>>>>> Define a version of memory_block_size_bytes() for powerpc/pseries such that
>>>>> a memory block spans an entire lmb.
>>>>
>>>> I hope I am not missing anything obvious, but why not just call it
>>>> lmb_size, why do we need memblock_size?
>>>>
>>>> Is lmb_size == memblock_size after your changes true for all
>>>> platforms?
>>>
>>> What is an lmb?  I don't recall anything like lmb being referred to in
>>> the rest of the kernel.
>>
>> Heh.  It's the OpenFirmware name for a Logical Memory Block.  Basically
>> what we use to determine the SECTION_SIZE on powerpc.  Probably not the
>> best terminology to use elsewhere in the kernel.
> 
> Agreed for the kernel, this patch was for powerpc/pseries, hence was
> checking in this context.
> 

I don't really see a reason to name it lmb_size, it seems easier
to stick with the naming used by the rest of the kernel.

-Nathan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] v3 Move find_memory_block routine
  2010-10-01 18:28 ` [PATCH 1/9] v3 Move find_memory_block routine Nathan Fontenot
  2010-10-01 18:40   ` Robin Holt
@ 2010-10-05  5:01   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 35+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-05  5:01 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: Greg KH, steiner, linux-kernel, Dave Hansen, linux-mm,
	Robin Holt, linuxppc-dev

On Fri, 01 Oct 2010 13:28:39 -0500
Nathan Fontenot <nfont@austin.ibm.com> wrote:

> Move the find_memory_block() routine up to avoid needing a forward
> declaration in subsequent patches.
> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
> 
Reviewd-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/9] v3 Add mutex for adding/removing memory blocks
  2010-10-01 18:29 ` [PATCH 2/9] v3 Add mutex for adding/removing memory blocks Nathan Fontenot
  2010-10-01 18:45   ` Robin Holt
@ 2010-10-05  5:06   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 35+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-05  5:06 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: Greg KH, steiner, linux-kernel, Dave Hansen, linux-mm,
	Robin Holt, linuxppc-dev

On Fri, 01 Oct 2010 13:29:42 -0500
Nathan Fontenot <nfont@austin.ibm.com> wrote:

> Add a new mutex for use in adding and removing of memory blocks.  This
> is needed to avoid any race conditions in which the same memory block could
> be added and removed at the same time.
> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
> 
Reviewed-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/9] v3 Add section count to memory_block struct
  2010-10-01 18:30 ` [PATCH 3/9] v3 Add section count to memory_block struct Nathan Fontenot
  2010-10-01 18:46   ` Robin Holt
@ 2010-10-05  5:08   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 35+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-05  5:08 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: Greg KH, steiner, linux-kernel, Dave Hansen, linux-mm,
	Robin Holt, linuxppc-dev

On Fri, 01 Oct 2010 13:30:40 -0500
Nathan Fontenot <nfont@austin.ibm.com> wrote:

> Add a section count property to the memory_block struct to track the number
> of memory sections that have been added/removed from a memory block. This
> allows us to know when the last memory section of a memory block has been
> removed so we can remove the memory block.
> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
> 

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

a nitpick,


> Index: linux-next/include/linux/memory.h
> ===================================================================
> --- linux-next.orig/include/linux/memory.h	2010-09-29 14:56:29.000000000 -0500
> +++ linux-next/include/linux/memory.h	2010-09-30 14:13:50.000000000 -0500
> @@ -23,6 +23,8 @@
>  struct memory_block {
>  	unsigned long phys_index;
>  	unsigned long state;
> +	int section_count;

I prefer
	int section_count; /* updated under mutex */

or some for this kind of non-atomic counters. but nitpick.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/9] v3 Allow memory blocks to span multiple memory sections
  2010-10-01 19:00   ` Nathan Fontenot
  2010-10-01 19:20     ` Robin Holt
@ 2010-10-05  5:13     ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 35+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-05  5:13 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: Greg KH, steiner, linux-kernel, Dave Hansen, linux-mm,
	Robin Holt, linuxppc-dev

On Fri, 01 Oct 2010 14:00:50 -0500
Nathan Fontenot <nfont@austin.ibm.com> wrote:

> Update the memory sysfs code such that each sysfs memory directory is now
> considered a memory block that can span multiple memory sections per
> memory block.  The default size of each memory block is SECTION_SIZE_BITS
> to maintain the current behavior of having a single memory section per
> memory block (i.e. one sysfs directory per memory section).
> 
> For architectures that want to have memory blocks span multiple
> memory sections they need only define their own memory_block_size_bytes()
> routine.
> 
This should be commented in code before MEMORY_BLOCK_SIZE declaration.

> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
> 

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/9] v3 rename phys_index properties of memory block struct
  2010-10-01 18:33 ` [PATCH 5/9] v3 rename phys_index properties of memory block struct Nathan Fontenot
  2010-10-01 18:54   ` Robin Holt
@ 2010-10-05  5:14   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 35+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-05  5:14 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: Greg KH, steiner, linux-kernel, Dave Hansen, linux-mm,
	Robin Holt, linuxppc-dev

On Fri, 01 Oct 2010 13:33:38 -0500
Nathan Fontenot <nfont@austin.ibm.com> wrote:

> Update the 'phys_index' property of a the memory_block struct to be
> called start_section_nr, and add a end_section_nr property.  The
> data tracked here is the same but the updated naming is more in line
> with what is stored here, namely the first and last section number
> that the memory block spans.
> 
> The names presented to userspace remain the same, phys_index for
> start_section_nr and end_phys_index for end_section_nr, to avoid breaking
> anything in userspace.
> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 6/9] v3 Update node sysfs code
  2010-10-01 18:34 ` [PATCH 6/9] v3 Update node sysfs code Nathan Fontenot
  2010-10-01 18:55   ` Robin Holt
@ 2010-10-05  5:15   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 35+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-05  5:15 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: Greg KH, steiner, linux-kernel, Dave Hansen, linux-mm,
	Robin Holt, linuxppc-dev

On Fri, 01 Oct 2010 13:34:34 -0500
Nathan Fontenot <nfont@austin.ibm.com> wrote:

> Update the node sysfs code to be aware of the new capability for a memory
> block to contain multiple memory sections and be aware of the memory block
> structure name changes (start_section_nr).  This requires an additional
> parameter to unregister_mem_sect_under_nodes so that we know which memory
> section of the memory block to unregister.
> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 9/9] v3 Update memory hotplug documentation
  2010-10-01 18:37 ` [PATCH 9/9] v3 Update memory hotplug documentation Nathan Fontenot
  2010-10-01 18:58   ` Robin Holt
@ 2010-10-05  5:18   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 35+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-05  5:18 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: Greg KH, steiner, linux-kernel, Dave Hansen, linux-mm,
	Robin Holt, linuxppc-dev

On Fri, 01 Oct 2010 13:37:49 -0500
Nathan Fontenot <nfont@austin.ibm.com> wrote:

> Update the memory hotplug documentation to reflect the new behaviors of
> memory blocks reflected in sysfs.
> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
> 
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Thank you for your patient work!.



> ---
>  Documentation/memory-hotplug.txt |   47 +++++++++++++++++++++++++--------------
>  1 file changed, 31 insertions(+), 16 deletions(-)
> 
> Index: linux-next/Documentation/memory-hotplug.txt
> ===================================================================
> --- linux-next.orig/Documentation/memory-hotplug.txt	2010-09-29 14:56:24.000000000 -0500
> +++ linux-next/Documentation/memory-hotplug.txt	2010-09-30 14:59:47.000000000 -0500
> @@ -126,36 +126,51 @@
>  --------------------------------
>  4 sysfs files for memory hotplug
>  --------------------------------
> -All sections have their device information under /sys/devices/system/memory as
> +All sections have their device information in sysfs.  Each section is part of
> +a memory block under /sys/devices/system/memory as
>  
>  /sys/devices/system/memory/memoryXXX
> -(XXX is section id.)
> +(XXX is the section id.)
>  
> -Now, XXX is defined as start_address_of_section / section_size.
> +Now, XXX is defined as (start_address_of_section / section_size) of the first
> +section contained in the memory block.  The files 'phys_index' and
> +'end_phys_index' under each directory report the beginning and end section id's
> +for the memory block covered by the sysfs directory.  It is expected that all
> +memory sections in this range are present and no memory holes exist in the
> +range. Currently there is no way to determine if there is a memory hole, but
> +the existence of one should not affect the hotplug capabilities of the memory
> +block.
>  
>  For example, assume 1GiB section size. A device for a memory starting at
>  0x100000000 is /sys/device/system/memory/memory4
>  (0x100000000 / 1Gib = 4)
>  This device covers address range [0x100000000 ... 0x140000000)
>  
> -Under each section, you can see 4 files.
> +Under each section, you can see 4 or 5 files, the end_phys_index file being
> +a recent addition and not present on older kernels.
>  
> -/sys/devices/system/memory/memoryXXX/phys_index
> +/sys/devices/system/memory/memoryXXX/start_phys_index
> +/sys/devices/system/memory/memoryXXX/end_phys_index
>  /sys/devices/system/memory/memoryXXX/phys_device
>  /sys/devices/system/memory/memoryXXX/state
>  /sys/devices/system/memory/memoryXXX/removable
>  
> -'phys_index' : read-only and contains section id, same as XXX.
> -'state'      : read-write
> -               at read:  contains online/offline state of memory.
> -               at write: user can specify "online", "offline" command
> -'phys_device': read-only: designed to show the name of physical memory device.
> -               This is not well implemented now.
> -'removable'  : read-only: contains an integer value indicating
> -               whether the memory section is removable or not
> -               removable.  A value of 1 indicates that the memory
> -               section is removable and a value of 0 indicates that
> -               it is not removable.
> +'phys_index'      : read-only and contains section id of the first section
> +		    in the memory block, same as XXX.
> +'end_phys_index'  : read-only and contains section id of the last section
> +		    in the memory block.
> +'state'           : read-write
> +                    at read:  contains online/offline state of memory.
> +                    at write: user can specify "online", "offline" command
> +                    which will be performed on al sections in the block.
> +'phys_device'     : read-only: designed to show the name of physical memory
> +                    device.  This is not well implemented now.
> +'removable'       : read-only: contains an integer value indicating
> +                    whether the memory block is removable or not
> +                    removable.  A value of 1 indicates that the memory
> +                    block is removable and a value of 0 indicates that
> +                    it is not removable. A memory block is removable only if
> +                    every section in the block is removable.
>  
>  NOTE:
>    These directories/files appear after physical memory hotplug phase.
> 
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/9] v3 De-couple sysfs memory directories from memory sections
  2010-10-01 18:22 [PATCH 0/9] v3 De-couple sysfs memory directories from memory sections Nathan Fontenot
                   ` (8 preceding siblings ...)
  2010-10-01 18:37 ` [PATCH 9/9] v3 Update memory hotplug documentation Nathan Fontenot
@ 2010-10-21 12:05 ` Nikanth Karthikesan
  9 siblings, 0 replies; 35+ messages in thread
From: Nikanth Karthikesan @ 2010-10-21 12:05 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: Greg KH, steiner, linux-kernel, Dave Hansen, linux-mm,
	Robin Holt, linuxppc-dev, KAMEZAWA Hiroyuki

On Friday 01 October 2010 23:52:56 Nathan Fontenot wrote:
> This set of patches decouples the concept that a single memory
> section corresponds to a single directory in
> /sys/devices/system/memory/.  On systems
> with large amounts of memory (1+ TB) there are performance issues
> related to creating the large number of sysfs directories.  For
> a powerpc machine with 1 TB of memory we are creating 63,000+
> directories.  This is resulting in boot times of around 45-50
> minutes for systems with 1 TB of memory and 8 hours for systems
> with 2 TB of memory.  With this patch set applied I am now seeing
> boot times of 5 minutes or less.
> 
> The root of this issue is in sysfs directory creation. Every time
> a directory is created a string compare is done against all sibling
> directories to ensure we do not create duplicates.  The list of
> directory nodes in sysfs is kept as an unsorted list which results
> in this being an exponentially longer operation as the number of
> directories are created.
> 

Can we simply remove this check for this case alone?! :)

Thanks
Nikanth

Do not check for an entry with the same name is already present, when
__sysfs_add_one() is directly called, bypassing sysfs_add_one().

Currently register_mem_sect_under_node() calls
sysfs_create_link_nowarn(), which is the only caller to do so.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>

---

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 7e54bac..14d965c 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -368,21 +368,16 @@ void sysfs_addrm_start(struct sysfs_addrm_cxt *acxt,
  *	This function should be called between calls to
  *	sysfs_addrm_start() and sysfs_addrm_finish() and should be
  *	passed the same @acxt as passed to sysfs_addrm_start().
+ *	And there should be no sibling with the same name.
  *
  *	LOCKING:
  *	Determined by sysfs_addrm_start().
  *
- *	RETURNS:
- *	0 on success, -EEXIST if entry with the given name already
- *	exists.
  */
-int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
+void __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
 {
 	struct sysfs_inode_attrs *ps_iattr;
 
-	if (sysfs_find_dirent(acxt->parent_sd, sd->s_ns, sd->s_name))
-		return -EEXIST;
-
 	sd->s_parent = sysfs_get(acxt->parent_sd);
 
 	sysfs_link_sibling(sd);
@@ -394,7 +389,6 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
 		ps_iattrs->ia_ctime = ps_iattrs->ia_mtime = CURRENT_TIME;
 	}
 
-	return 0;
 }
 
 /**
@@ -439,10 +433,9 @@ static char *sysfs_pathname(struct sysfs_dirent *sd, char *path)
  */
 int sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
 {
-	int ret;
+	int ret = 0;
 
-	ret = __sysfs_add_one(acxt, sd);
-	if (ret == -EEXIST) {
+	if (sysfs_find_dirent(acxt->parent_sd, sd->s_ns, sd->s_name)) {
 		char *path = kzalloc(PATH_MAX, GFP_KERNEL);
 		WARN(1, KERN_WARNING
 		     "sysfs: cannot create duplicate filename '%s'\n",
@@ -450,8 +443,11 @@ int sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
 		     strcat(strcat(sysfs_pathname(acxt->parent_sd, path), "/"),
 		            sd->s_name));
 		kfree(path);
+		ret = -EEXIST;
 	}
 
+	__sysfs_add_one(acxt, sd);
+
 	return ret;
 }
 
diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
index a7ac78f..7c56d34 100644
--- a/fs/sysfs/symlink.c
+++ b/fs/sysfs/symlink.c
@@ -72,7 +72,7 @@ static int sysfs_do_create_link(struct kobject *kobj, struct kobject *target,
 		if (warn)
 			error = sysfs_add_one(&acxt, sd);
 		else
-			error = __sysfs_add_one(&acxt, sd);
+			__sysfs_add_one(&acxt, sd);
 	} else {
 		error = -EINVAL;
 		WARN(1, KERN_WARNING
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index d9be60a..35449c8 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -155,7 +155,7 @@ struct sysfs_dirent *sysfs_get_active(struct sysfs_dirent *sd);
 void sysfs_put_active(struct sysfs_dirent *sd);
 void sysfs_addrm_start(struct sysfs_addrm_cxt *acxt,
 		       struct sysfs_dirent *parent_sd);
-int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd);
+void __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd);
 int sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd);
 void sysfs_remove_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd);
 void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt);

^ permalink raw reply related	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2010-10-21 12:02 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-01 18:22 [PATCH 0/9] v3 De-couple sysfs memory directories from memory sections Nathan Fontenot
2010-10-01 18:28 ` [PATCH 1/9] v3 Move find_memory_block routine Nathan Fontenot
2010-10-01 18:40   ` Robin Holt
2010-10-05  5:01   ` KAMEZAWA Hiroyuki
2010-10-01 18:29 ` [PATCH 2/9] v3 Add mutex for adding/removing memory blocks Nathan Fontenot
2010-10-01 18:45   ` Robin Holt
2010-10-05  5:06   ` KAMEZAWA Hiroyuki
2010-10-01 18:30 ` [PATCH 3/9] v3 Add section count to memory_block struct Nathan Fontenot
2010-10-01 18:46   ` Robin Holt
2010-10-05  5:08   ` KAMEZAWA Hiroyuki
2010-10-01 18:31 ` [PATCH 4/9] v3 Allow memory blocks to span multiple memory sections Nathan Fontenot
2010-10-01 18:52   ` Robin Holt
2010-10-01 18:56     ` Nathan Fontenot
2010-10-01 19:00   ` Nathan Fontenot
2010-10-01 19:20     ` Robin Holt
2010-10-05  5:13     ` KAMEZAWA Hiroyuki
2010-10-01 18:33 ` [PATCH 5/9] v3 rename phys_index properties of memory block struct Nathan Fontenot
2010-10-01 18:54   ` Robin Holt
2010-10-05  5:14   ` KAMEZAWA Hiroyuki
2010-10-01 18:34 ` [PATCH 6/9] v3 Update node sysfs code Nathan Fontenot
2010-10-01 18:55   ` Robin Holt
2010-10-05  5:15   ` KAMEZAWA Hiroyuki
2010-10-01 18:35 ` [PATCH 7/9] v3 Define memory_block_size_bytes for powerpc/pseries Nathan Fontenot
2010-10-01 18:56   ` Robin Holt
2010-10-03 17:55   ` Balbir Singh
2010-10-03 18:07     ` Robin Holt
2010-10-03 18:11       ` Dave Hansen
2010-10-03 18:27         ` Balbir Singh
2010-10-04 14:45           ` Nathan Fontenot
2010-10-01 18:37 ` [PATCH 8/9] v3 Define memory_block_size_bytes for x86_64 with CONFIG_X86_UV set Nathan Fontenot
2010-10-01 18:57   ` Robin Holt
2010-10-01 18:37 ` [PATCH 9/9] v3 Update memory hotplug documentation Nathan Fontenot
2010-10-01 18:58   ` Robin Holt
2010-10-05  5:18   ` KAMEZAWA Hiroyuki
2010-10-21 12:05 ` [PATCH 0/9] v3 De-couple sysfs memory directories from memory sections Nikanth Karthikesan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).