linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] v2 De-couple sysfs memory directories from memory section size
@ 2010-07-15 18:30 Nathan Fontenot
  2010-07-15 18:37 ` [PATCH 1/5] v2 Split the memory_block structure Nathan Fontenot
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Nathan Fontenot @ 2010-07-15 18:30 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev; +Cc: KAMEZAWA Hiroyuki

This set of patches de-couples the idea that there is a single
directory in sysfs for each memory section.  The intent of the
patches is to reduce the number of sysfs directories created to
resolve a boot-time performance issue.  On very large systems
boot time are getting very long (as seen on powerpc hardware)
due to the enormous number of sysfs directories being created.
On a system with 1 TB of memory we create ~63,000 directories.
For even larger systems boot times are being measured in hours.

This set of patches allows for each directory created in sysfs
to cover more than one memory section.  The default behavior for
sysfs directory creation is the same, in that each directory
represents a single memory section.  A new file 'end_phys_index'
in each directory contains the physical_id of the last memory
section covered by the directory so that users can easily
determine the memory section range of a directory.

For version 2 of this patchset the capability to split a
directory has been removed.

Thanks,

Nathan Fontenot

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/5] v2 Split the memory_block structure
  2010-07-15 18:30 [PATCH 0/5] v2 De-couple sysfs memory directories from memory section size Nathan Fontenot
@ 2010-07-15 18:37 ` Nathan Fontenot
  2010-07-16  0:06   ` KAMEZAWA Hiroyuki
  2010-07-16 17:15   ` Dave Hansen
  2010-07-15 18:38 ` [PATCH 2/5] v2 Create new 'end_phys_index' file Nathan Fontenot
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 18+ messages in thread
From: Nathan Fontenot @ 2010-07-15 18:37 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev; +Cc: KAMEZAWA Hiroyuki

Split the memory_block struct into a memory_block
struct to cover each sysfs directory and a new memory_block_section
struct for each memory section covered by the sysfs directory.
This change allows for creation of memory sysfs directories that
can span multiple memory sections.

This can be beneficial in that it can reduce the number of memory
sysfs directories created at boot.  This also allows different
architectures to define how many memory sections are covered by
a sysfs directory.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
---
 drivers/base/memory.c  |  222 ++++++++++++++++++++++++++++++++++---------------
 include/linux/memory.h |   11 +-
 2 files changed, 167 insertions(+), 66 deletions(-)

Index: linux-2.6/drivers/base/memory.c
===================================================================
--- linux-2.6.orig/drivers/base/memory.c	2010-07-15 08:48:41.000000000 -0500
+++ linux-2.6/drivers/base/memory.c	2010-07-15 09:55:54.000000000 -0500
@@ -28,6 +28,14 @@
 #include <asm/uaccess.h>
 
 #define MEMORY_CLASS_NAME	"memory"
+#define MIN_MEMORY_BLOCK_SIZE	(1 << SECTION_SIZE_BITS)
+
+static int sections_per_block;
+
+static inline int base_memory_block_id(int section_nr)
+{
+	return (section_nr / sections_per_block) * sections_per_block;
+}
 
 static struct sysdev_class memory_sysdev_class = {
 	.name = MEMORY_CLASS_NAME,
@@ -94,10 +102,9 @@
 }
 
 static void
-unregister_memory(struct memory_block *memory, struct mem_section *section)
+unregister_memory(struct memory_block *memory)
 {
 	BUG_ON(memory->sysdev.cls != &memory_sysdev_class);
-	BUG_ON(memory->sysdev.id != __section_nr(section));
 
 	/* drop the ref. we got in remove_memory_block() */
 	kobject_put(&memory->sysdev.kobj);
@@ -123,13 +130,20 @@
 static ssize_t show_mem_removable(struct sys_device *dev,
 			struct sysdev_attribute *attr, char *buf)
 {
+	struct memory_block *mem;
+	struct memory_block_section *mbs;
 	unsigned long start_pfn;
-	int ret;
-	struct memory_block *mem =
-		container_of(dev, struct memory_block, sysdev);
+	int ret = 1;
+
+	mem = container_of(dev, struct memory_block, sysdev);
+	mutex_lock(&mem->state_mutex);
 
-	start_pfn = section_nr_to_pfn(mem->phys_index);
-	ret = is_mem_section_removable(start_pfn, PAGES_PER_SECTION);
+	list_for_each_entry(mbs, &mem->sections, next) {
+		start_pfn = section_nr_to_pfn(mbs->phys_index);
+		ret &= is_mem_section_removable(start_pfn, PAGES_PER_SECTION);
+	}
+
+	mutex_unlock(&mem->state_mutex);
 	return sprintf(buf, "%d\n", ret);
 }
 
@@ -182,16 +196,16 @@
  * OK to have direct references to sparsemem variables in here.
  */
 static int
-memory_block_action(struct memory_block *mem, unsigned long action)
+memory_block_action(struct memory_block_section *mbs, unsigned long action)
 {
 	int i;
 	unsigned long psection;
 	unsigned long start_pfn, start_paddr;
 	struct page *first_page;
 	int ret;
-	int old_state = mem->state;
+	int old_state = mbs->state;
 
-	psection = mem->phys_index;
+	psection = mbs->phys_index;
 	first_page = pfn_to_page(psection << PFN_SECTION_SHIFT);
 
 	/*
@@ -217,18 +231,18 @@
 			ret = online_pages(start_pfn, PAGES_PER_SECTION);
 			break;
 		case MEM_OFFLINE:
-			mem->state = MEM_GOING_OFFLINE;
+			mbs->state = MEM_GOING_OFFLINE;
 			start_paddr = page_to_pfn(first_page) << PAGE_SHIFT;
 			ret = remove_memory(start_paddr,
 					    PAGES_PER_SECTION << PAGE_SHIFT);
 			if (ret) {
-				mem->state = old_state;
+				mbs->state = old_state;
 				break;
 			}
 			break;
 		default:
 			WARN(1, KERN_WARNING "%s(%p, %ld) unknown action: %ld\n",
-					__func__, mem, action, action);
+					__func__, mbs, action, action);
 			ret = -EINVAL;
 	}
 
@@ -238,19 +252,34 @@
 static int memory_block_change_state(struct memory_block *mem,
 		unsigned long to_state, unsigned long from_state_req)
 {
+	struct memory_block_section *mbs;
 	int ret = 0;
+
 	mutex_lock(&mem->state_mutex);
 
-	if (mem->state != from_state_req) {
-		ret = -EINVAL;
-		goto out;
+	list_for_each_entry(mbs, &mem->sections, next) {
+		if (mbs->state != from_state_req)
+			continue;
+
+		ret = memory_block_action(mbs, to_state);
+		if (ret)
+			break;
+	}
+
+	if (ret) {
+		list_for_each_entry(mbs, &mem->sections, next) {
+			if (mbs->state == from_state_req)
+				continue;
+
+			if (memory_block_action(mbs, to_state))
+				printk(KERN_ERR "Could not re-enable memory "
+				       "section %lx\n", mbs->phys_index);
+		}
 	}
 
-	ret = memory_block_action(mem, to_state);
 	if (!ret)
 		mem->state = to_state;
 
-out:
 	mutex_unlock(&mem->state_mutex);
 	return ret;
 }
@@ -260,20 +289,15 @@
 		struct sysdev_attribute *attr, const char *buf, size_t count)
 {
 	struct memory_block *mem;
-	unsigned int phys_section_nr;
 	int ret = -EINVAL;
 
 	mem = container_of(dev, struct memory_block, sysdev);
-	phys_section_nr = mem->phys_index;
-
-	if (!present_section_nr(phys_section_nr))
-		goto out;
 
 	if (!strncmp(buf, "online", min((int)count, 6)))
 		ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE);
 	else if(!strncmp(buf, "offline", min((int)count, 7)))
 		ret = memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE);
-out:
+
 	if (ret)
 		return ret;
 	return count;
@@ -435,39 +459,6 @@
 	return 0;
 }
 
-static int add_memory_block(int nid, struct mem_section *section,
-			unsigned long state, enum mem_add_context context)
-{
-	struct memory_block *mem = kzalloc(sizeof(*mem), GFP_KERNEL);
-	unsigned long start_pfn;
-	int ret = 0;
-
-	if (!mem)
-		return -ENOMEM;
-
-	mem->phys_index = __section_nr(section);
-	mem->state = state;
-	mutex_init(&mem->state_mutex);
-	start_pfn = section_nr_to_pfn(mem->phys_index);
-	mem->phys_device = arch_get_memory_phys_device(start_pfn);
-
-	ret = register_memory(mem, section);
-	if (!ret)
-		ret = mem_create_simple_file(mem, phys_index);
-	if (!ret)
-		ret = mem_create_simple_file(mem, state);
-	if (!ret)
-		ret = mem_create_simple_file(mem, phys_device);
-	if (!ret)
-		ret = mem_create_simple_file(mem, removable);
-	if (!ret) {
-		if (context == HOTPLUG)
-			ret = register_mem_sect_under_node(mem, nid);
-	}
-
-	return ret;
-}
-
 /*
  * For now, we have a linear search to go find the appropriate
  * memory_block corresponding to a particular phys_index. If
@@ -482,12 +473,13 @@
 	struct sys_device *sysdev;
 	struct memory_block *mem;
 	char name[sizeof(MEMORY_CLASS_NAME) + 9 + 1];
+	int block_id = base_memory_block_id(__section_nr(section));
 
 	/*
 	 * This only works because we know that section == sysdev->id
 	 * slightly redundant with sysdev_register()
 	 */
-	sprintf(&name[0], "%s%d", MEMORY_CLASS_NAME, __section_nr(section));
+	sprintf(&name[0], "%s%d", MEMORY_CLASS_NAME, block_id);
 
 	kobj = kset_find_obj(&memory_sysdev_class.kset, name);
 	if (!kobj)
@@ -499,18 +491,98 @@
 	return mem;
 }
 
+static int add_mem_block_section(struct memory_block *mem,
+				 int section_nr, unsigned long state)
+{
+	struct memory_block_section *mbs;
+
+	mbs = kzalloc(sizeof(*mbs), GFP_KERNEL);
+	if (!mbs)
+		return -ENOMEM;
+
+	mbs->phys_index = section_nr;
+	mbs->state = state;
+
+	list_add(&mbs->next, &mem->sections);
+	return 0;
+}
+
+static int add_memory_block(int nid, struct mem_section *section,
+			unsigned long state, enum mem_add_context context)
+{
+	struct memory_block *mem;
+	int ret = 0;
+
+	mem = find_memory_block(section);
+	if (!mem) {
+		unsigned long start_pfn;
+
+		mem = kzalloc(sizeof(*mem), GFP_KERNEL);
+		if (!mem)
+			return -ENOMEM;
+
+		mem->state = state;
+		mutex_init(&mem->state_mutex);
+		start_pfn = section_nr_to_pfn(__section_nr(section));
+		mem->phys_device = arch_get_memory_phys_device(start_pfn);
+		INIT_LIST_HEAD(&mem->sections);
+
+		mutex_lock(&mem->state_mutex);
+
+		ret = register_memory(mem, section);
+		if (!ret)
+			ret = mem_create_simple_file(mem, phys_index);
+		if (!ret)
+			ret = mem_create_simple_file(mem, state);
+		if (!ret)
+			ret = mem_create_simple_file(mem, phys_device);
+		if (!ret)
+			ret = mem_create_simple_file(mem, removable);
+		if (!ret) {
+			if (context == HOTPLUG)
+				ret = register_mem_sect_under_node(mem, nid);
+		}
+	} else {
+		kobject_put(&mem->sysdev.kobj);
+		mutex_lock(&mem->state_mutex);
+	}
+
+	if (!ret)
+		ret = add_mem_block_section(mem, __section_nr(section), state);
+
+	mutex_unlock(&mem->state_mutex);
+	return ret;
+}
+
 int remove_memory_block(unsigned long node_id, struct mem_section *section,
 		int phys_device)
 {
 	struct memory_block *mem;
+	struct memory_block_section *mbs, *tmp;
+	int section_nr = __section_nr(section);
 
 	mem = find_memory_block(section);
-	unregister_mem_sect_under_nodes(mem);
-	mem_remove_simple_file(mem, phys_index);
-	mem_remove_simple_file(mem, state);
-	mem_remove_simple_file(mem, phys_device);
-	mem_remove_simple_file(mem, removable);
-	unregister_memory(mem, section);
+	mutex_lock(&mem->state_mutex);
+
+	/* remove the specified section */
+	list_for_each_entry_safe(mbs, tmp, &mem->sections, next) {
+		if (mbs->phys_index == section_nr) {
+			list_del(&mbs->next);
+			kfree(mbs);
+		}
+	}
+
+	mutex_unlock(&mem->state_mutex);
+
+	if (list_empty(&mem->sections)) {
+		unregister_mem_sect_under_nodes(mem);
+		mem_remove_simple_file(mem, phys_index);
+		mem_remove_simple_file(mem, state);
+		mem_remove_simple_file(mem, phys_device);
+		mem_remove_simple_file(mem, removable);
+		unregister_memory(mem);
+		kfree(mem);
+	}
 
 	return 0;
 }
@@ -532,6 +604,24 @@
 	return remove_memory_block(0, section, 0);
 }
 
+u32 __weak memory_block_size(void)
+{
+	return MIN_MEMORY_BLOCK_SIZE;
+}
+
+static u32 get_memory_block_size(void)
+{
+	u32 blk_sz;
+
+	blk_sz = memory_block_size();
+
+	/* Validate blk_sz is a power of 2 and not less than section size */
+	if ((blk_sz & (blk_sz - 1)) || (blk_sz < MIN_MEMORY_BLOCK_SIZE))
+		blk_sz = MIN_MEMORY_BLOCK_SIZE;
+
+	return blk_sz;
+}
+
 /*
  * Initialize the sysfs support for memory devices...
  */
@@ -540,12 +630,16 @@
 	unsigned int i;
 	int ret;
 	int err;
+	int block_sz;
 
 	memory_sysdev_class.kset.uevent_ops = &memory_uevent_ops;
 	ret = sysdev_class_register(&memory_sysdev_class);
 	if (ret)
 		goto out;
 
+	block_sz = get_memory_block_size();
+	sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
+
 	/*
 	 * Create entries for memory sections that were found
 	 * during boot and have been initialized
Index: linux-2.6/include/linux/memory.h
===================================================================
--- linux-2.6.orig/include/linux/memory.h	2010-07-15 08:48:41.000000000 -0500
+++ linux-2.6/include/linux/memory.h	2010-07-15 09:54:06.000000000 -0500
@@ -19,9 +19,15 @@
 #include <linux/node.h>
 #include <linux/compiler.h>
 #include <linux/mutex.h>
+#include <linux/list.h>
 
-struct memory_block {
+struct memory_block_section {
+	unsigned long state;
 	unsigned long phys_index;
+	struct list_head next;
+};
+
+struct memory_block {
 	unsigned long state;
 	/*
 	 * This serializes all state change requests.  It isn't
@@ -34,6 +40,7 @@
 	void *hw;			/* optional pointer to fw/hw data */
 	int (*phys_callback)(struct memory_block *);
 	struct sys_device sysdev;
+	struct list_head sections;
 };
 
 int arch_get_memory_phys_device(unsigned long start_pfn);
@@ -113,7 +120,7 @@
 extern int remove_memory_block(unsigned long, struct mem_section *, int);
 extern int memory_notify(unsigned long val, void *v);
 extern int memory_isolate_notify(unsigned long val, void *v);
-extern struct memory_block *find_memory_block(unsigned long);
+extern struct memory_block *find_memory_block(struct mem_section *);
 extern int memory_is_hidden(struct mem_section *);
 #define CONFIG_MEM_BLOCK_SIZE	(PAGES_PER_SECTION<<PAGE_SHIFT)
 enum mem_add_context { BOOT, HOTPLUG };

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/5] v2 Create new 'end_phys_index' file
  2010-07-15 18:30 [PATCH 0/5] v2 De-couple sysfs memory directories from memory section size Nathan Fontenot
  2010-07-15 18:37 ` [PATCH 1/5] v2 Split the memory_block structure Nathan Fontenot
@ 2010-07-15 18:38 ` Nathan Fontenot
  2010-07-16  0:08   ` KAMEZAWA Hiroyuki
  2010-07-15 18:39 ` [PATCH 3/5] v2 Change the mutex name in the memory_block struct Nathan Fontenot
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Nathan Fontenot @ 2010-07-15 18:38 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev; +Cc: KAMEZAWA Hiroyuki

Add a new 'end_phys_index' file to each memory sysfs directory to
report the physical index of the last memory section
covered by the sysfs directory.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
---
 drivers/base/memory.c  |   14 +++++++++++++-
 include/linux/memory.h |    3 +++
 2 files changed, 16 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/base/memory.c
===================================================================
--- linux-2.6.orig/drivers/base/memory.c	2010-07-15 09:55:54.000000000 -0500
+++ linux-2.6/drivers/base/memory.c	2010-07-15 09:56:05.000000000 -0500
@@ -121,7 +121,15 @@
 {
 	struct memory_block *mem =
 		container_of(dev, struct memory_block, sysdev);
-	return sprintf(buf, "%08lx\n", mem->phys_index);
+	return sprintf(buf, "%08lx\n", mem->start_phys_index);
+}
+
+static ssize_t show_mem_end_phys_index(struct sys_device *dev,
+			struct sysdev_attribute *attr, char *buf)
+{
+	struct memory_block *mem =
+		container_of(dev, struct memory_block, sysdev);
+	return sprintf(buf, "%08lx\n", mem->end_phys_index);
 }
 
 /*
@@ -321,6 +329,7 @@
 }
 
 static SYSDEV_ATTR(phys_index, 0444, show_mem_phys_index, NULL);
+static SYSDEV_ATTR(end_phys_index, 0444, show_mem_end_phys_index, NULL);
 static SYSDEV_ATTR(state, 0644, show_mem_state, store_mem_state);
 static SYSDEV_ATTR(phys_device, 0444, show_phys_device, NULL);
 static SYSDEV_ATTR(removable, 0444, show_mem_removable, NULL);
@@ -533,6 +542,8 @@
 		if (!ret)
 			ret = mem_create_simple_file(mem, phys_index);
 		if (!ret)
+			ret = mem_create_simple_file(mem, end_phys_index);
+		if (!ret)
 			ret = mem_create_simple_file(mem, state);
 		if (!ret)
 			ret = mem_create_simple_file(mem, phys_device);
@@ -577,6 +588,7 @@
 	if (list_empty(&mem->sections)) {
 		unregister_mem_sect_under_nodes(mem);
 		mem_remove_simple_file(mem, phys_index);
+		mem_remove_simple_file(mem, end_phys_index);
 		mem_remove_simple_file(mem, state);
 		mem_remove_simple_file(mem, phys_device);
 		mem_remove_simple_file(mem, removable);
Index: linux-2.6/include/linux/memory.h
===================================================================
--- linux-2.6.orig/include/linux/memory.h	2010-07-15 09:54:06.000000000 -0500
+++ linux-2.6/include/linux/memory.h	2010-07-15 09:56:05.000000000 -0500
@@ -29,6 +29,9 @@
 
 struct memory_block {
 	unsigned long state;
+	unsigned long start_phys_index;
+	unsigned long end_phys_index;
+
 	/*
 	 * This serializes all state change requests.  It isn't
 	 * held during creation because the control files are

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 3/5] v2 Change the mutex name in the memory_block struct
  2010-07-15 18:30 [PATCH 0/5] v2 De-couple sysfs memory directories from memory section size Nathan Fontenot
  2010-07-15 18:37 ` [PATCH 1/5] v2 Split the memory_block structure Nathan Fontenot
  2010-07-15 18:38 ` [PATCH 2/5] v2 Create new 'end_phys_index' file Nathan Fontenot
@ 2010-07-15 18:39 ` Nathan Fontenot
  2010-07-16 17:16   ` Dave Hansen
  2010-07-15 18:40 ` [PATCH 4/5] v2 Update sysfs node routines for new sysfs memory directories Nathan Fontenot
  2010-07-15 18:41 ` [PATCH 5/5] v2 Enable multiple sections per directory for ppc Nathan Fontenot
  4 siblings, 1 reply; 18+ messages in thread
From: Nathan Fontenot @ 2010-07-15 18:39 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev; +Cc: KAMEZAWA Hiroyuki

Change the name of the memory_block mutex since it is now used for
more than just gating changes to the status of the memory sections
covered by the memory sysfs directory.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
---
 drivers/base/memory.c  |   20 ++++++++++----------
 include/linux/memory.h |    9 +--------
 2 files changed, 11 insertions(+), 18 deletions(-)

Index: linux-2.6/drivers/base/memory.c
===================================================================
--- linux-2.6.orig/drivers/base/memory.c	2010-07-15 09:56:05.000000000 -0500
+++ linux-2.6/drivers/base/memory.c	2010-07-15 09:56:10.000000000 -0500
@@ -144,14 +144,14 @@
 	int ret = 1;
 
 	mem = container_of(dev, struct memory_block, sysdev);
-	mutex_lock(&mem->state_mutex);
+	mutex_lock(&mem->mutex);
 
 	list_for_each_entry(mbs, &mem->sections, next) {
 		start_pfn = section_nr_to_pfn(mbs->phys_index);
 		ret &= is_mem_section_removable(start_pfn, PAGES_PER_SECTION);
 	}
 
-	mutex_unlock(&mem->state_mutex);
+	mutex_unlock(&mem->mutex);
 	return sprintf(buf, "%d\n", ret);
 }
 
@@ -263,7 +263,7 @@
 	struct memory_block_section *mbs;
 	int ret = 0;
 
-	mutex_lock(&mem->state_mutex);
+	mutex_lock(&mem->mutex);
 
 	list_for_each_entry(mbs, &mem->sections, next) {
 		if (mbs->state != from_state_req)
@@ -288,7 +288,7 @@
 	if (!ret)
 		mem->state = to_state;
 
-	mutex_unlock(&mem->state_mutex);
+	mutex_unlock(&mem->mutex);
 	return ret;
 }
 
@@ -531,12 +531,12 @@
 			return -ENOMEM;
 
 		mem->state = state;
-		mutex_init(&mem->state_mutex);
+		mutex_init(&mem->mutex);
 		start_pfn = section_nr_to_pfn(__section_nr(section));
 		mem->phys_device = arch_get_memory_phys_device(start_pfn);
 		INIT_LIST_HEAD(&mem->sections);
 
-		mutex_lock(&mem->state_mutex);
+		mutex_lock(&mem->mutex);
 
 		ret = register_memory(mem, section);
 		if (!ret)
@@ -555,13 +555,13 @@
 		}
 	} else {
 		kobject_put(&mem->sysdev.kobj);
-		mutex_lock(&mem->state_mutex);
+		mutex_lock(&mem->mutex);
 	}
 
 	if (!ret)
 		ret = add_mem_block_section(mem, __section_nr(section), state);
 
-	mutex_unlock(&mem->state_mutex);
+	mutex_unlock(&mem->mutex);
 	return ret;
 }
 
@@ -573,7 +573,7 @@
 	int section_nr = __section_nr(section);
 
 	mem = find_memory_block(section);
-	mutex_lock(&mem->state_mutex);
+	mutex_lock(&mem->mutex);
 
 	/* remove the specified section */
 	list_for_each_entry_safe(mbs, tmp, &mem->sections, next) {
@@ -583,7 +583,7 @@
 		}
 	}
 
-	mutex_unlock(&mem->state_mutex);
+	mutex_unlock(&mem->mutex);
 
 	if (list_empty(&mem->sections)) {
 		unregister_mem_sect_under_nodes(mem);
Index: linux-2.6/include/linux/memory.h
===================================================================
--- linux-2.6.orig/include/linux/memory.h	2010-07-15 09:56:05.000000000 -0500
+++ linux-2.6/include/linux/memory.h	2010-07-15 09:56:10.000000000 -0500
@@ -31,14 +31,7 @@
 	unsigned long state;
 	unsigned long start_phys_index;
 	unsigned long end_phys_index;
-
-	/*
-	 * This serializes all state change requests.  It isn't
-	 * held during creation because the control files are
-	 * created long after the critical areas during
-	 * initialization.
-	 */
-	struct mutex state_mutex;
+	struct mutex mutex;
 	int phys_device;		/* to which fru does this belong? */
 	void *hw;			/* optional pointer to fw/hw data */
 	int (*phys_callback)(struct memory_block *);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 4/5] v2 Update sysfs node routines for new sysfs memory directories
  2010-07-15 18:30 [PATCH 0/5] v2 De-couple sysfs memory directories from memory section size Nathan Fontenot
                   ` (2 preceding siblings ...)
  2010-07-15 18:39 ` [PATCH 3/5] v2 Change the mutex name in the memory_block struct Nathan Fontenot
@ 2010-07-15 18:40 ` Nathan Fontenot
  2010-07-16  0:12   ` KAMEZAWA Hiroyuki
  2010-07-15 18:41 ` [PATCH 5/5] v2 Enable multiple sections per directory for ppc Nathan Fontenot
  4 siblings, 1 reply; 18+ messages in thread
From: Nathan Fontenot @ 2010-07-15 18:40 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev; +Cc: KAMEZAWA Hiroyuki

Update the node sysfs directory routines that create
links to the memory sysfs directories under each node.
This update makes the node code aware that a memory sysfs
directory can cover multiple memory sections.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
---
 drivers/base/node.c |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

Index: linux-2.6/drivers/base/node.c
===================================================================
--- linux-2.6.orig/drivers/base/node.c	2010-07-15 09:54:06.000000000 -0500
+++ linux-2.6/drivers/base/node.c	2010-07-15 09:56:16.000000000 -0500
@@ -346,8 +346,10 @@
 		return -EFAULT;
 	if (!node_online(nid))
 		return 0;
-	sect_start_pfn = section_nr_to_pfn(mem_blk->phys_index);
-	sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
+
+	sect_start_pfn = section_nr_to_pfn(mem_blk->start_phys_index);
+	sect_end_pfn = section_nr_to_pfn(mem_blk->end_phys_index);
+	sect_end_pfn += PAGES_PER_SECTION - 1;
 	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
 		int page_nid;
 
@@ -383,8 +385,10 @@
 	if (!unlinked_nodes)
 		return -ENOMEM;
 	nodes_clear(*unlinked_nodes);
-	sect_start_pfn = section_nr_to_pfn(mem_blk->phys_index);
-	sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
+
+	sect_start_pfn = section_nr_to_pfn(mem_blk->start_phys_index);
+	sect_end_pfn = section_nr_to_pfn(mem_blk->end_phys_index);
+	sect_end_pfn += PAGES_PER_SECTION - 1;
 	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
 		int nid;
 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 5/5] v2 Enable multiple sections per directory for ppc
  2010-07-15 18:30 [PATCH 0/5] v2 De-couple sysfs memory directories from memory section size Nathan Fontenot
                   ` (3 preceding siblings ...)
  2010-07-15 18:40 ` [PATCH 4/5] v2 Update sysfs node routines for new sysfs memory directories Nathan Fontenot
@ 2010-07-15 18:41 ` Nathan Fontenot
  2010-07-16 17:18   ` Dave Hansen
  4 siblings, 1 reply; 18+ messages in thread
From: Nathan Fontenot @ 2010-07-15 18:41 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev; +Cc: KAMEZAWA Hiroyuki

Update the powerpc/pseries code to initialize
the memory sysfs directory block size to be the
same size as a LMB.

Signed-off-by; Nathan Fontenot <nfont@austin.ibm.ocm>
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |   66 +++++++++++++++++++-----
 1 file changed, 53 insertions(+), 13 deletions(-)

Index: linux-2.6/arch/powerpc/platforms/pseries/hotplug-memory.c
===================================================================
--- linux-2.6.orig/arch/powerpc/platforms/pseries/hotplug-memory.c	2010-07-15 09:54:06.000000000 -0500
+++ linux-2.6/arch/powerpc/platforms/pseries/hotplug-memory.c	2010-07-15 09:56:19.000000000 -0500
@@ -17,6 +17,54 @@
 #include <asm/pSeries_reconfig.h>
 #include <asm/sparsemem.h>
 
+static u32 get_memblock_size(void)
+{
+	struct device_node *np;
+	unsigned int memblock_size = 0;
+
+	np = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
+	if (np) {
+		const unsigned int *size;
+
+		size = of_get_property(np, "ibm,lmb-size", NULL);
+		memblock_size = size ? *size : 0;
+
+		of_node_put(np);
+	} else {
+		unsigned int memzero_size = 0;
+		const unsigned int *regs;
+
+		np = of_find_node_by_path("/memory@0");
+		if (np) {
+			regs = of_get_property(np, "reg", NULL);
+			memzero_size = regs ? regs[3] : 0;
+			of_node_put(np);
+		}
+
+		if (memzero_size) {
+			/* We now know the size of memory@0, use this to find
+			 * the first memoryblock and get its size.
+			 */
+			char buf[64];
+
+			sprintf(buf, "/memory@%x", memzero_size);
+			np = of_find_node_by_path(buf);
+			if (np) {
+				regs = of_get_property(np, "reg", NULL);
+				memblock_size = regs ? regs[3] : 0;
+				of_node_put(np);
+			}
+		}
+	}
+
+	return memblock_size;
+}
+
+u32 memory_block_size(void)
+{
+	return get_memblock_size();
+}
+
 static int pseries_remove_memblock(unsigned long base, unsigned int memblock_size)
 {
 	unsigned long start, start_pfn;
@@ -127,30 +175,22 @@
 
 static int pseries_drconf_memory(unsigned long *base, unsigned int action)
 {
-	struct device_node *np;
-	const unsigned long *memblock_size;
+	unsigned long memblock_size;
 	int rc;
 
-	np = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
-	if (!np)
+	memblock_size = get_memblock_size();
+	if (!memblock_size)
 		return -EINVAL;
 
-	memblock_size = of_get_property(np, "ibm,memblock-size", NULL);
-	if (!memblock_size) {
-		of_node_put(np);
-		return -EINVAL;
-	}
-
 	if (action == PSERIES_DRCONF_MEM_ADD) {
-		rc = memblock_add(*base, *memblock_size);
+		rc = memblock_add(*base, memblock_size);
 		rc = (rc < 0) ? -EINVAL : 0;
 	} else if (action == PSERIES_DRCONF_MEM_REMOVE) {
-		rc = pseries_remove_memblock(*base, *memblock_size);
+		rc = pseries_remove_memblock(*base, memblock_size);
 	} else {
 		rc = -EINVAL;
 	}
 
-	of_node_put(np);
 	return rc;
 }
 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] v2 Split the memory_block structure
  2010-07-15 18:37 ` [PATCH 1/5] v2 Split the memory_block structure Nathan Fontenot
@ 2010-07-16  0:06   ` KAMEZAWA Hiroyuki
  2010-07-16 15:29     ` Nathan Fontenot
  2010-07-16 17:15   ` Dave Hansen
  1 sibling, 1 reply; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-16  0:06 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linux-mm, linux-kernel, linuxppc-dev

On Thu, 15 Jul 2010 13:37:51 -0500
Nathan Fontenot <nfont@austin.ibm.com> wrote:

> Split the memory_block struct into a memory_block
> struct to cover each sysfs directory and a new memory_block_section
> struct for each memory section covered by the sysfs directory.
> This change allows for creation of memory sysfs directories that
> can span multiple memory sections.
> 
> This can be beneficial in that it can reduce the number of memory
> sysfs directories created at boot.  This also allows different
> architectures to define how many memory sections are covered by
> a sysfs directory.
> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
> ---
>  drivers/base/memory.c  |  222 ++++++++++++++++++++++++++++++++++---------------
>  include/linux/memory.h |   11 +-
>  2 files changed, 167 insertions(+), 66 deletions(-)
> 
> Index: linux-2.6/drivers/base/memory.c
> ===================================================================
> --- linux-2.6.orig/drivers/base/memory.c	2010-07-15 08:48:41.000000000 -0500
> +++ linux-2.6/drivers/base/memory.c	2010-07-15 09:55:54.000000000 -0500
> @@ -28,6 +28,14 @@
>  #include <asm/uaccess.h>
>  
>  #define MEMORY_CLASS_NAME	"memory"
> +#define MIN_MEMORY_BLOCK_SIZE	(1 << SECTION_SIZE_BITS)
> +
> +static int sections_per_block;
> +
> +static inline int base_memory_block_id(int section_nr)
> +{
> +	return (section_nr / sections_per_block) * sections_per_block;
> +}
>  
>  static struct sysdev_class memory_sysdev_class = {
>  	.name = MEMORY_CLASS_NAME,
> @@ -94,10 +102,9 @@
>  }
>  
>  static void
> -unregister_memory(struct memory_block *memory, struct mem_section *section)
> +unregister_memory(struct memory_block *memory)
>  {
>  	BUG_ON(memory->sysdev.cls != &memory_sysdev_class);
> -	BUG_ON(memory->sysdev.id != __section_nr(section));
>  
>  	/* drop the ref. we got in remove_memory_block() */
>  	kobject_put(&memory->sysdev.kobj);
> @@ -123,13 +130,20 @@
>  static ssize_t show_mem_removable(struct sys_device *dev,
>  			struct sysdev_attribute *attr, char *buf)
>  {
> +	struct memory_block *mem;
> +	struct memory_block_section *mbs;
>  	unsigned long start_pfn;
> -	int ret;
> -	struct memory_block *mem =
> -		container_of(dev, struct memory_block, sysdev);
> +	int ret = 1;
> +
> +	mem = container_of(dev, struct memory_block, sysdev);
> +	mutex_lock(&mem->state_mutex);
>  
> -	start_pfn = section_nr_to_pfn(mem->phys_index);
> -	ret = is_mem_section_removable(start_pfn, PAGES_PER_SECTION);
> +	list_for_each_entry(mbs, &mem->sections, next) {
> +		start_pfn = section_nr_to_pfn(mbs->phys_index);
> +		ret &= is_mem_section_removable(start_pfn, PAGES_PER_SECTION);
> +	}
> +
> +	mutex_unlock(&mem->state_mutex);

Hmm, this means memory cab be offlined the while memory block section. Right ?
Please write this fact in patch description...
And Documentaion/memory_hotplug.txt as "From user's perspective, memory section
is not a unit of memory hotplug anymore".
And descirbe about a new rule.


>  	return sprintf(buf, "%d\n", ret);
>  }
>  
> @@ -182,16 +196,16 @@
>   * OK to have direct references to sparsemem variables in here.
>   */
>  static int
> -memory_block_action(struct memory_block *mem, unsigned long action)
> +memory_block_action(struct memory_block_section *mbs, unsigned long action)
>  {
>  	int i;
>  	unsigned long psection;
>  	unsigned long start_pfn, start_paddr;
>  	struct page *first_page;
>  	int ret;
> -	int old_state = mem->state;
> +	int old_state = mbs->state;
>  
> -	psection = mem->phys_index;
> +	psection = mbs->phys_index;
>  	first_page = pfn_to_page(psection << PFN_SECTION_SHIFT);
>  
>  	/*
> @@ -217,18 +231,18 @@
>  			ret = online_pages(start_pfn, PAGES_PER_SECTION);
>  			break;
>  		case MEM_OFFLINE:
> -			mem->state = MEM_GOING_OFFLINE;
> +			mbs->state = MEM_GOING_OFFLINE;
>  			start_paddr = page_to_pfn(first_page) << PAGE_SHIFT;
>  			ret = remove_memory(start_paddr,
>  					    PAGES_PER_SECTION << PAGE_SHIFT);
>  			if (ret) {
> -				mem->state = old_state;
> +				mbs->state = old_state;
>  				break;
>  			}
>  			break;
>  		default:
>  			WARN(1, KERN_WARNING "%s(%p, %ld) unknown action: %ld\n",
> -					__func__, mem, action, action);
> +					__func__, mbs, action, action);
>  			ret = -EINVAL;
>  	}
>  
> @@ -238,19 +252,34 @@

And please check quilt's diff option.
Usual patche in ML shows a function name in any changes, as
@@ -241,6 +293,8 @@ static int memory_block_change_state(str

Maybe "-p" option is lacked..


>  static int memory_block_change_state(struct memory_block *mem,
>  		unsigned long to_state, unsigned long from_state_req)
>  {
> +	struct memory_block_section *mbs;
>  	int ret = 0;
> +
>  	mutex_lock(&mem->state_mutex);
>  
> -	if (mem->state != from_state_req) {
> -		ret = -EINVAL;
> -		goto out;
> +	list_for_each_entry(mbs, &mem->sections, next) {
> +		if (mbs->state != from_state_req)
> +			continue;
> +
> +		ret = memory_block_action(mbs, to_state);
> +		if (ret)
> +			break;
> +	}
> +
> +	if (ret) {
> +		list_for_each_entry(mbs, &mem->sections, next) {
> +			if (mbs->state == from_state_req)
> +				continue;
> +
> +			if (memory_block_action(mbs, to_state))
> +				printk(KERN_ERR "Could not re-enable memory "
> +				       "section %lx\n", mbs->phys_index);

Why re-enable only ? online->fail->offline never happens ?
If so, please add comment at least.
BTW, is it guaranteed that all sections under a block has same state after
boot ?

> +		}
>  	}
>  
> -	ret = memory_block_action(mem, to_state);
>  	if (!ret)
>  		mem->state = to_state;
>  
> -out:
>  	mutex_unlock(&mem->state_mutex);
>  	return ret;
>  }
> @@ -260,20 +289,15 @@
>  		struct sysdev_attribute *attr, const char *buf, size_t count)
>  {
>  	struct memory_block *mem;
> -	unsigned int phys_section_nr;
>  	int ret = -EINVAL;
>  
>  	mem = container_of(dev, struct memory_block, sysdev);
> -	phys_section_nr = mem->phys_index;
> -
> -	if (!present_section_nr(phys_section_nr))
> -		goto out;
> 
I'm sorry but I couldn't remember why this check was necessary...


 
>  	if (!strncmp(buf, "online", min((int)count, 6)))
>  		ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE);
>  	else if(!strncmp(buf, "offline", min((int)count, 7)))
>  		ret = memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE);
> -out:
> +
>  	if (ret)
>  		return ret;
>  	return count;
> @@ -435,39 +459,6 @@
>  	return 0;
>  }
>  
> -static int add_memory_block(int nid, struct mem_section *section,
> -			unsigned long state, enum mem_add_context context)
> -{
> -	struct memory_block *mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> -	unsigned long start_pfn;
> -	int ret = 0;
> -
> -	if (!mem)
> -		return -ENOMEM;
> -
> -	mem->phys_index = __section_nr(section);
> -	mem->state = state;
> -	mutex_init(&mem->state_mutex);
> -	start_pfn = section_nr_to_pfn(mem->phys_index);
> -	mem->phys_device = arch_get_memory_phys_device(start_pfn);
> -
> -	ret = register_memory(mem, section);
> -	if (!ret)
> -		ret = mem_create_simple_file(mem, phys_index);
> -	if (!ret)
> -		ret = mem_create_simple_file(mem, state);
> -	if (!ret)
> -		ret = mem_create_simple_file(mem, phys_device);
> -	if (!ret)
> -		ret = mem_create_simple_file(mem, removable);
> -	if (!ret) {
> -		if (context == HOTPLUG)
> -			ret = register_mem_sect_under_node(mem, nid);
> -	}
> -
> -	return ret;
> -}
> -

I don't say strongly but this kind of move-code should be done in another patch.


>  /*
>   * For now, we have a linear search to go find the appropriate
>   * memory_block corresponding to a particular phys_index. If
> @@ -482,12 +473,13 @@
>  	struct sys_device *sysdev;
>  	struct memory_block *mem;
>  	char name[sizeof(MEMORY_CLASS_NAME) + 9 + 1];
> +	int block_id = base_memory_block_id(__section_nr(section));
>  
>  	/*
>  	 * This only works because we know that section == sysdev->id
>  	 * slightly redundant with sysdev_register()
>  	 */
> -	sprintf(&name[0], "%s%d", MEMORY_CLASS_NAME, __section_nr(section));
> +	sprintf(&name[0], "%s%d", MEMORY_CLASS_NAME, block_id);
>  
>  	kobj = kset_find_obj(&memory_sysdev_class.kset, name);
>  	if (!kobj)
> @@ -499,18 +491,98 @@
>  	return mem;
>  }
>  
> +static int add_mem_block_section(struct memory_block *mem,
> +				 int section_nr, unsigned long state)
> +{
> +	struct memory_block_section *mbs;
> +
> +	mbs = kzalloc(sizeof(*mbs), GFP_KERNEL);
> +	if (!mbs)
> +		return -ENOMEM;
> +
> +	mbs->phys_index = section_nr;
> +	mbs->state = state;
> +
> +	list_add(&mbs->next, &mem->sections);
> +	return 0;
> +}

Doesn't this "sections" need to be sorted ? Hmm.


> +
> +static int add_memory_block(int nid, struct mem_section *section,
> +			unsigned long state, enum mem_add_context context)
> +{
> +	struct memory_block *mem;
> +	int ret = 0;
> +
> +	mem = find_memory_block(section);
> +	if (!mem) {
> +		unsigned long start_pfn;
> +
> +		mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> +		if (!mem)
> +			return -ENOMEM;
> +
> +		mem->state = state;
> +		mutex_init(&mem->state_mutex);
> +		start_pfn = section_nr_to_pfn(__section_nr(section));
> +		mem->phys_device = arch_get_memory_phys_device(start_pfn);
> +		INIT_LIST_HEAD(&mem->sections);
> +
> +		mutex_lock(&mem->state_mutex);
> +
> +		ret = register_memory(mem, section);
> +		if (!ret)
> +			ret = mem_create_simple_file(mem, phys_index);
> +		if (!ret)
> +			ret = mem_create_simple_file(mem, state);
> +		if (!ret)
> +			ret = mem_create_simple_file(mem, phys_device);
> +		if (!ret)
> +			ret = mem_create_simple_file(mem, removable);
> +		if (!ret) {
> +			if (context == HOTPLUG)
> +				ret = register_mem_sect_under_node(mem, nid);
> +		}
> +	} else {
> +		kobject_put(&mem->sysdev.kobj);
> +		mutex_lock(&mem->state_mutex);
> +	}
> +
> +	if (!ret)
> +		ret = add_mem_block_section(mem, __section_nr(section), state);
> +
> +	mutex_unlock(&mem->state_mutex);
> +	return ret;
> +}
> +
>  int remove_memory_block(unsigned long node_id, struct mem_section *section,
>  		int phys_device)
>  {
>  	struct memory_block *mem;
> +	struct memory_block_section *mbs, *tmp;
> +	int section_nr = __section_nr(section);
>  
>  	mem = find_memory_block(section);
> -	unregister_mem_sect_under_nodes(mem);
> -	mem_remove_simple_file(mem, phys_index);
> -	mem_remove_simple_file(mem, state);
> -	mem_remove_simple_file(mem, phys_device);
> -	mem_remove_simple_file(mem, removable);
> -	unregister_memory(mem, section);
> +	mutex_lock(&mem->state_mutex);
> +
> +	/* remove the specified section */
> +	list_for_each_entry_safe(mbs, tmp, &mem->sections, next) {
> +		if (mbs->phys_index == section_nr) {
> +			list_del(&mbs->next);
> +			kfree(mbs);
> +		}
> +	}
> +
> +	mutex_unlock(&mem->state_mutex);
> +
> +	if (list_empty(&mem->sections)) {
> +		unregister_mem_sect_under_nodes(mem);
> +		mem_remove_simple_file(mem, phys_index);
> +		mem_remove_simple_file(mem, state);
> +		mem_remove_simple_file(mem, phys_device);
> +		mem_remove_simple_file(mem, removable);
> +		unregister_memory(mem);
> +		kfree(mem);
> +	}
>  
>  	return 0;
>  }
> @@ -532,6 +604,24 @@
>  	return remove_memory_block(0, section, 0);
>  }
>  
> +u32 __weak memory_block_size(void)
> +{
> +	return MIN_MEMORY_BLOCK_SIZE;
> +}
> +
> +static u32 get_memory_block_size(void)
> +{
> +	u32 blk_sz;
> +
> +	blk_sz = memory_block_size();
> +
> +	/* Validate blk_sz is a power of 2 and not less than section size */
> +	if ((blk_sz & (blk_sz - 1)) || (blk_sz < MIN_MEMORY_BLOCK_SIZE))
> +		blk_sz = MIN_MEMORY_BLOCK_SIZE;
> +
> +	return blk_sz;
> +}
> +
>  /*
>   * Initialize the sysfs support for memory devices...
>   */
> @@ -540,12 +630,16 @@
>  	unsigned int i;
>  	int ret;
>  	int err;
> +	int block_sz;
>  
>  	memory_sysdev_class.kset.uevent_ops = &memory_uevent_ops;
>  	ret = sysdev_class_register(&memory_sysdev_class);
>  	if (ret)
>  		goto out;
>  
> +	block_sz = get_memory_block_size();
> +	sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
> +
>  	/*
>  	 * Create entries for memory sections that were found
>  	 * during boot and have been initialized
> Index: linux-2.6/include/linux/memory.h
> ===================================================================
> --- linux-2.6.orig/include/linux/memory.h	2010-07-15 08:48:41.000000000 -0500
> +++ linux-2.6/include/linux/memory.h	2010-07-15 09:54:06.000000000 -0500
> @@ -19,9 +19,15 @@
>  #include <linux/node.h>
>  #include <linux/compiler.h>
>  #include <linux/mutex.h>
> +#include <linux/list.h>
>  
> -struct memory_block {
> +struct memory_block_section {
> +	unsigned long state;
>  	unsigned long phys_index;
> +	struct list_head next;
> +};
> +
> +struct memory_block {
>  	unsigned long state;
>  	/*
>  	 * This serializes all state change requests.  It isn't
> @@ -34,6 +40,7 @@
>  	void *hw;			/* optional pointer to fw/hw data */
>  	int (*phys_callback)(struct memory_block *);
>  	struct sys_device sysdev;
> +	struct list_head sections;
>  };
>  
>  int arch_get_memory_phys_device(unsigned long start_pfn);
> @@ -113,7 +120,7 @@
>  extern int remove_memory_block(unsigned long, struct mem_section *, int);
>  extern int memory_notify(unsigned long val, void *v);
>  extern int memory_isolate_notify(unsigned long val, void *v);
> -extern struct memory_block *find_memory_block(unsigned long);
> +extern struct memory_block *find_memory_block(struct mem_section *);
>  extern int memory_is_hidden(struct mem_section *);
>  #define CONFIG_MEM_BLOCK_SIZE	(PAGES_PER_SECTION<<PAGE_SHIFT)
>  enum mem_add_context { BOOT, HOTPLUG };
> 

Okay, please go ahead. But my 1st impression is that IBM should increase ppc's
SECTION_SIZE ;)

Thanks,
-Kame


 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] v2 Create new 'end_phys_index' file
  2010-07-15 18:38 ` [PATCH 2/5] v2 Create new 'end_phys_index' file Nathan Fontenot
@ 2010-07-16  0:08   ` KAMEZAWA Hiroyuki
  2010-07-16 15:36     ` Nathan Fontenot
  0 siblings, 1 reply; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-16  0:08 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linux-mm, linux-kernel, linuxppc-dev

On Thu, 15 Jul 2010 13:38:52 -0500
Nathan Fontenot <nfont@austin.ibm.com> wrote:

> Add a new 'end_phys_index' file to each memory sysfs directory to
> report the physical index of the last memory section
> covered by the sysfs directory.
> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>

Does memory_block have to be contiguous between [phys_index, end_phys_index] ?
Should we provide "# of sections" or "amount of memory under a block" ?

No objections to end_phys_index...buf plz fix diff style.

Thanks,
-Kame


> ---
>  drivers/base/memory.c  |   14 +++++++++++++-
>  include/linux/memory.h |    3 +++
>  2 files changed, 16 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6/drivers/base/memory.c
> ===================================================================
> --- linux-2.6.orig/drivers/base/memory.c	2010-07-15 09:55:54.000000000 -0500
> +++ linux-2.6/drivers/base/memory.c	2010-07-15 09:56:05.000000000 -0500
> @@ -121,7 +121,15 @@
>  {
>  	struct memory_block *mem =
>  		container_of(dev, struct memory_block, sysdev);
> -	return sprintf(buf, "%08lx\n", mem->phys_index);
> +	return sprintf(buf, "%08lx\n", mem->start_phys_index);
> +}
> +
> +static ssize_t show_mem_end_phys_index(struct sys_device *dev,
> +			struct sysdev_attribute *attr, char *buf)
> +{
> +	struct memory_block *mem =
> +		container_of(dev, struct memory_block, sysdev);
> +	return sprintf(buf, "%08lx\n", mem->end_phys_index);
>  }
>  
>  /*
> @@ -321,6 +329,7 @@
>  }
>  
>  static SYSDEV_ATTR(phys_index, 0444, show_mem_phys_index, NULL);
> +static SYSDEV_ATTR(end_phys_index, 0444, show_mem_end_phys_index, NULL);
>  static SYSDEV_ATTR(state, 0644, show_mem_state, store_mem_state);
>  static SYSDEV_ATTR(phys_device, 0444, show_phys_device, NULL);
>  static SYSDEV_ATTR(removable, 0444, show_mem_removable, NULL);
> @@ -533,6 +542,8 @@
>  		if (!ret)
>  			ret = mem_create_simple_file(mem, phys_index);
>  		if (!ret)
> +			ret = mem_create_simple_file(mem, end_phys_index);
> +		if (!ret)
>  			ret = mem_create_simple_file(mem, state);
>  		if (!ret)
>  			ret = mem_create_simple_file(mem, phys_device);
> @@ -577,6 +588,7 @@
>  	if (list_empty(&mem->sections)) {
>  		unregister_mem_sect_under_nodes(mem);
>  		mem_remove_simple_file(mem, phys_index);
> +		mem_remove_simple_file(mem, end_phys_index);
>  		mem_remove_simple_file(mem, state);
>  		mem_remove_simple_file(mem, phys_device);
>  		mem_remove_simple_file(mem, removable);
> Index: linux-2.6/include/linux/memory.h
> ===================================================================
> --- linux-2.6.orig/include/linux/memory.h	2010-07-15 09:54:06.000000000 -0500
> +++ linux-2.6/include/linux/memory.h	2010-07-15 09:56:05.000000000 -0500
> @@ -29,6 +29,9 @@
>  
>  struct memory_block {
>  	unsigned long state;
> +	unsigned long start_phys_index;
> +	unsigned long end_phys_index;
> +
>  	/*
>  	 * This serializes all state change requests.  It isn't
>  	 * held during creation because the control files are
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/5] v2 Update sysfs node routines for new sysfs memory directories
  2010-07-15 18:40 ` [PATCH 4/5] v2 Update sysfs node routines for new sysfs memory directories Nathan Fontenot
@ 2010-07-16  0:12   ` KAMEZAWA Hiroyuki
  2010-07-16 15:40     ` Nathan Fontenot
  0 siblings, 1 reply; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-16  0:12 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linux-mm, linux-kernel, linuxppc-dev

On Thu, 15 Jul 2010 13:40:40 -0500
Nathan Fontenot <nfont@austin.ibm.com> wrote:

> Update the node sysfs directory routines that create
> links to the memory sysfs directories under each node.
> This update makes the node code aware that a memory sysfs
> directory can cover multiple memory sections.
> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>

Shouldn't "static int link_mem_sections(int nid)" be update ?
It does
 for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
        register..

Thanks,
-Kame


> ---
>  drivers/base/node.c |   12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> Index: linux-2.6/drivers/base/node.c
> ===================================================================
> --- linux-2.6.orig/drivers/base/node.c	2010-07-15 09:54:06.000000000 -0500
> +++ linux-2.6/drivers/base/node.c	2010-07-15 09:56:16.000000000 -0500
> @@ -346,8 +346,10 @@
>  		return -EFAULT;
>  	if (!node_online(nid))
>  		return 0;
> -	sect_start_pfn = section_nr_to_pfn(mem_blk->phys_index);
> -	sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
> +
> +	sect_start_pfn = section_nr_to_pfn(mem_blk->start_phys_index);
> +	sect_end_pfn = section_nr_to_pfn(mem_blk->end_phys_index);
> +	sect_end_pfn += PAGES_PER_SECTION - 1;
>  	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
>  		int page_nid;
>  
> @@ -383,8 +385,10 @@
>  	if (!unlinked_nodes)
>  		return -ENOMEM;
>  	nodes_clear(*unlinked_nodes);
> -	sect_start_pfn = section_nr_to_pfn(mem_blk->phys_index);
> -	sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
> +
> +	sect_start_pfn = section_nr_to_pfn(mem_blk->start_phys_index);
> +	sect_end_pfn = section_nr_to_pfn(mem_blk->end_phys_index);
> +	sect_end_pfn += PAGES_PER_SECTION - 1;
>  	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
>  		int nid;
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] v2 Split the memory_block structure
  2010-07-16  0:06   ` KAMEZAWA Hiroyuki
@ 2010-07-16 15:29     ` Nathan Fontenot
  0 siblings, 0 replies; 18+ messages in thread
From: Nathan Fontenot @ 2010-07-16 15:29 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, linuxppc-dev

Thanks for taking a look a this Kame, answers below...

-Nathan

On 07/15/2010 07:06 PM, KAMEZAWA Hiroyuki wrote:
> On Thu, 15 Jul 2010 13:37:51 -0500
> Nathan Fontenot <nfont@austin.ibm.com> wrote:
> 
>> Split the memory_block struct into a memory_block
>> struct to cover each sysfs directory and a new memory_block_section
>> struct for each memory section covered by the sysfs directory.
>> This change allows for creation of memory sysfs directories that
>> can span multiple memory sections.
>>
>> This can be beneficial in that it can reduce the number of memory
>> sysfs directories created at boot.  This also allows different
>> architectures to define how many memory sections are covered by
>> a sysfs directory.
>>
>> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
>> ---
>>  drivers/base/memory.c  |  222 ++++++++++++++++++++++++++++++++++---------------
>>  include/linux/memory.h |   11 +-
>>  2 files changed, 167 insertions(+), 66 deletions(-)
>>
>> Index: linux-2.6/drivers/base/memory.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/base/memory.c	2010-07-15 08:48:41.000000000 -0500
>> +++ linux-2.6/drivers/base/memory.c	2010-07-15 09:55:54.000000000 -0500
>> @@ -28,6 +28,14 @@
>>  #include <asm/uaccess.h>
>>  
>>  #define MEMORY_CLASS_NAME	"memory"
>> +#define MIN_MEMORY_BLOCK_SIZE	(1 << SECTION_SIZE_BITS)
>> +
>> +static int sections_per_block;
>> +
>> +static inline int base_memory_block_id(int section_nr)
>> +{
>> +	return (section_nr / sections_per_block) * sections_per_block;
>> +}
>>  
>>  static struct sysdev_class memory_sysdev_class = {
>>  	.name = MEMORY_CLASS_NAME,
>> @@ -94,10 +102,9 @@
>>  }
>>  
>>  static void
>> -unregister_memory(struct memory_block *memory, struct mem_section *section)
>> +unregister_memory(struct memory_block *memory)
>>  {
>>  	BUG_ON(memory->sysdev.cls != &memory_sysdev_class);
>> -	BUG_ON(memory->sysdev.id != __section_nr(section));
>>  
>>  	/* drop the ref. we got in remove_memory_block() */
>>  	kobject_put(&memory->sysdev.kobj);
>> @@ -123,13 +130,20 @@
>>  static ssize_t show_mem_removable(struct sys_device *dev,
>>  			struct sysdev_attribute *attr, char *buf)
>>  {
>> +	struct memory_block *mem;
>> +	struct memory_block_section *mbs;
>>  	unsigned long start_pfn;
>> -	int ret;
>> -	struct memory_block *mem =
>> -		container_of(dev, struct memory_block, sysdev);
>> +	int ret = 1;
>> +
>> +	mem = container_of(dev, struct memory_block, sysdev);
>> +	mutex_lock(&mem->state_mutex);
>>  
>> -	start_pfn = section_nr_to_pfn(mem->phys_index);
>> -	ret = is_mem_section_removable(start_pfn, PAGES_PER_SECTION);
>> +	list_for_each_entry(mbs, &mem->sections, next) {
>> +		start_pfn = section_nr_to_pfn(mbs->phys_index);
>> +		ret &= is_mem_section_removable(start_pfn, PAGES_PER_SECTION);
>> +	}
>> +
>> +	mutex_unlock(&mem->state_mutex);
> 
> Hmm, this means memory cab be offlined the while memory block section. Right ?
> Please write this fact in patch description...
> And Documentaion/memory_hotplug.txt as "From user's perspective, memory section
> is not a unit of memory hotplug anymore".
> And descirbe about a new rule.

You are correct.  A memory block is removable only if all of the memory
sections contained within the memory block are removable.

I will include a documentation patch with v3 of the patches to explain this
and to explain that memory add/remove operations are done on a per memory
block basis.

> 
> 
>>  	return sprintf(buf, "%d\n", ret);
>>  }
>>  
>> @@ -182,16 +196,16 @@
>>   * OK to have direct references to sparsemem variables in here.
>>   */
>>  static int
>> -memory_block_action(struct memory_block *mem, unsigned long action)
>> +memory_block_action(struct memory_block_section *mbs, unsigned long action)
>>  {
>>  	int i;
>>  	unsigned long psection;
>>  	unsigned long start_pfn, start_paddr;
>>  	struct page *first_page;
>>  	int ret;
>> -	int old_state = mem->state;
>> +	int old_state = mbs->state;
>>  
>> -	psection = mem->phys_index;
>> +	psection = mbs->phys_index;
>>  	first_page = pfn_to_page(psection << PFN_SECTION_SHIFT);
>>  
>>  	/*
>> @@ -217,18 +231,18 @@
>>  			ret = online_pages(start_pfn, PAGES_PER_SECTION);
>>  			break;
>>  		case MEM_OFFLINE:
>> -			mem->state = MEM_GOING_OFFLINE;
>> +			mbs->state = MEM_GOING_OFFLINE;
>>  			start_paddr = page_to_pfn(first_page) << PAGE_SHIFT;
>>  			ret = remove_memory(start_paddr,
>>  					    PAGES_PER_SECTION << PAGE_SHIFT);
>>  			if (ret) {
>> -				mem->state = old_state;
>> +				mbs->state = old_state;
>>  				break;
>>  			}
>>  			break;
>>  		default:
>>  			WARN(1, KERN_WARNING "%s(%p, %ld) unknown action: %ld\n",
>> -					__func__, mem, action, action);
>> +					__func__, mbs, action, action);
>>  			ret = -EINVAL;
>>  	}
>>  
>> @@ -238,19 +252,34 @@
> 
> And please check quilt's diff option.
> Usual patche in ML shows a function name in any changes, as
> @@ -241,6 +293,8 @@ static int memory_block_change_state(str
> 
> Maybe "-p" option is lacked..

sorry about that.  I'm just using the default options with quilt.  I'll
play around with it to why this is happening.

> 
> 
>>  static int memory_block_change_state(struct memory_block *mem,
>>  		unsigned long to_state, unsigned long from_state_req)
>>  {
>> +	struct memory_block_section *mbs;
>>  	int ret = 0;
>> +
>>  	mutex_lock(&mem->state_mutex);
>>  
>> -	if (mem->state != from_state_req) {
>> -		ret = -EINVAL;
>> -		goto out;
>> +	list_for_each_entry(mbs, &mem->sections, next) {
>> +		if (mbs->state != from_state_req)
>> +			continue;
>> +
>> +		ret = memory_block_action(mbs, to_state);
>> +		if (ret)
>> +			break;
>> +	}
>> +
>> +	if (ret) {
>> +		list_for_each_entry(mbs, &mem->sections, next) {
>> +			if (mbs->state == from_state_req)
>> +				continue;
>> +
>> +			if (memory_block_action(mbs, to_state))
>> +				printk(KERN_ERR "Could not re-enable memory "
>> +				       "section %lx\n", mbs->phys_index);
> 
> Why re-enable only ? online->fail->offline never happens ?
> If so, please add comment at least.

This should handle both conditions.  If we fail to move all of the memory
sections to the 'to_state', it puts all of the memory sections back to the
'from_state_req'.

> BTW, is it guaranteed that all sections under a block has same state after
> boot ?

Yes, during boot all memory sections are marked online.

> 
>> +		}
>>  	}
>>  
>> -	ret = memory_block_action(mem, to_state);
>>  	if (!ret)
>>  		mem->state = to_state;
>>  
>> -out:
>>  	mutex_unlock(&mem->state_mutex);
>>  	return ret;
>>  }
>> @@ -260,20 +289,15 @@
>>  		struct sysdev_attribute *attr, const char *buf, size_t count)
>>  {
>>  	struct memory_block *mem;
>> -	unsigned int phys_section_nr;
>>  	int ret = -EINVAL;
>>  
>>  	mem = container_of(dev, struct memory_block, sysdev);
>> -	phys_section_nr = mem->phys_index;
>> -
>> -	if (!present_section_nr(phys_section_nr))
>> -		goto out;
>>
> I'm sorry but I couldn't remember why this check was necessary...

Not sure either, it appears that it is there to ensure that the memory
section we are trying to act on is actually present.

> 
> 
>  
>>  	if (!strncmp(buf, "online", min((int)count, 6)))
>>  		ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE);
>>  	else if(!strncmp(buf, "offline", min((int)count, 7)))
>>  		ret = memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE);
>> -out:
>> +
>>  	if (ret)
>>  		return ret;
>>  	return count;
>> @@ -435,39 +459,6 @@
>>  	return 0;
>>  }
>>  
>> -static int add_memory_block(int nid, struct mem_section *section,
>> -			unsigned long state, enum mem_add_context context)
>> -{
>> -	struct memory_block *mem = kzalloc(sizeof(*mem), GFP_KERNEL);
>> -	unsigned long start_pfn;
>> -	int ret = 0;
>> -
>> -	if (!mem)
>> -		return -ENOMEM;
>> -
>> -	mem->phys_index = __section_nr(section);
>> -	mem->state = state;
>> -	mutex_init(&mem->state_mutex);
>> -	start_pfn = section_nr_to_pfn(mem->phys_index);
>> -	mem->phys_device = arch_get_memory_phys_device(start_pfn);
>> -
>> -	ret = register_memory(mem, section);
>> -	if (!ret)
>> -		ret = mem_create_simple_file(mem, phys_index);
>> -	if (!ret)
>> -		ret = mem_create_simple_file(mem, state);
>> -	if (!ret)
>> -		ret = mem_create_simple_file(mem, phys_device);
>> -	if (!ret)
>> -		ret = mem_create_simple_file(mem, removable);
>> -	if (!ret) {
>> -		if (context == HOTPLUG)
>> -			ret = register_mem_sect_under_node(mem, nid);
>> -	}
>> -
>> -	return ret;
>> -}
>> -
> 
> I don't say strongly but this kind of move-code should be done in another patch.

ok,  I will move the code move piece to a differnet patch.

> 
> 
>>  /*
>>   * For now, we have a linear search to go find the appropriate
>>   * memory_block corresponding to a particular phys_index. If
>> @@ -482,12 +473,13 @@
>>  	struct sys_device *sysdev;
>>  	struct memory_block *mem;
>>  	char name[sizeof(MEMORY_CLASS_NAME) + 9 + 1];
>> +	int block_id = base_memory_block_id(__section_nr(section));
>>  
>>  	/*
>>  	 * This only works because we know that section == sysdev->id
>>  	 * slightly redundant with sysdev_register()
>>  	 */
>> -	sprintf(&name[0], "%s%d", MEMORY_CLASS_NAME, __section_nr(section));
>> +	sprintf(&name[0], "%s%d", MEMORY_CLASS_NAME, block_id);
>>  
>>  	kobj = kset_find_obj(&memory_sysdev_class.kset, name);
>>  	if (!kobj)
>> @@ -499,18 +491,98 @@
>>  	return mem;
>>  }
>>  
>> +static int add_mem_block_section(struct memory_block *mem,
>> +				 int section_nr, unsigned long state)
>> +{
>> +	struct memory_block_section *mbs;
>> +
>> +	mbs = kzalloc(sizeof(*mbs), GFP_KERNEL);
>> +	if (!mbs)
>> +		return -ENOMEM;
>> +
>> +	mbs->phys_index = section_nr;
>> +	mbs->state = state;
>> +
>> +	list_add(&mbs->next, &mem->sections);
>> +	return 0;
>> +}
> 
> Doesn't this "sections" need to be sorted ? Hmm.

We could, but I cannot think of anything we gain by sorting it.

> 
> 
>> +
>> +static int add_memory_block(int nid, struct mem_section *section,
>> +			unsigned long state, enum mem_add_context context)
>> +{
>> +	struct memory_block *mem;
>> +	int ret = 0;
>> +
>> +	mem = find_memory_block(section);
>> +	if (!mem) {
>> +		unsigned long start_pfn;
>> +
>> +		mem = kzalloc(sizeof(*mem), GFP_KERNEL);
>> +		if (!mem)
>> +			return -ENOMEM;
>> +
>> +		mem->state = state;
>> +		mutex_init(&mem->state_mutex);
>> +		start_pfn = section_nr_to_pfn(__section_nr(section));
>> +		mem->phys_device = arch_get_memory_phys_device(start_pfn);
>> +		INIT_LIST_HEAD(&mem->sections);
>> +
>> +		mutex_lock(&mem->state_mutex);
>> +
>> +		ret = register_memory(mem, section);
>> +		if (!ret)
>> +			ret = mem_create_simple_file(mem, phys_index);
>> +		if (!ret)
>> +			ret = mem_create_simple_file(mem, state);
>> +		if (!ret)
>> +			ret = mem_create_simple_file(mem, phys_device);
>> +		if (!ret)
>> +			ret = mem_create_simple_file(mem, removable);
>> +		if (!ret) {
>> +			if (context == HOTPLUG)
>> +				ret = register_mem_sect_under_node(mem, nid);
>> +		}
>> +	} else {
>> +		kobject_put(&mem->sysdev.kobj);
>> +		mutex_lock(&mem->state_mutex);
>> +	}
>> +
>> +	if (!ret)
>> +		ret = add_mem_block_section(mem, __section_nr(section), state);
>> +
>> +	mutex_unlock(&mem->state_mutex);
>> +	return ret;
>> +}
>> +
>>  int remove_memory_block(unsigned long node_id, struct mem_section *section,
>>  		int phys_device)
>>  {
>>  	struct memory_block *mem;
>> +	struct memory_block_section *mbs, *tmp;
>> +	int section_nr = __section_nr(section);
>>  
>>  	mem = find_memory_block(section);
>> -	unregister_mem_sect_under_nodes(mem);
>> -	mem_remove_simple_file(mem, phys_index);
>> -	mem_remove_simple_file(mem, state);
>> -	mem_remove_simple_file(mem, phys_device);
>> -	mem_remove_simple_file(mem, removable);
>> -	unregister_memory(mem, section);
>> +	mutex_lock(&mem->state_mutex);
>> +
>> +	/* remove the specified section */
>> +	list_for_each_entry_safe(mbs, tmp, &mem->sections, next) {
>> +		if (mbs->phys_index == section_nr) {
>> +			list_del(&mbs->next);
>> +			kfree(mbs);
>> +		}
>> +	}
>> +
>> +	mutex_unlock(&mem->state_mutex);
>> +
>> +	if (list_empty(&mem->sections)) {
>> +		unregister_mem_sect_under_nodes(mem);
>> +		mem_remove_simple_file(mem, phys_index);
>> +		mem_remove_simple_file(mem, state);
>> +		mem_remove_simple_file(mem, phys_device);
>> +		mem_remove_simple_file(mem, removable);
>> +		unregister_memory(mem);
>> +		kfree(mem);
>> +	}
>>  
>>  	return 0;
>>  }
>> @@ -532,6 +604,24 @@
>>  	return remove_memory_block(0, section, 0);
>>  }
>>  
>> +u32 __weak memory_block_size(void)
>> +{
>> +	return MIN_MEMORY_BLOCK_SIZE;
>> +}
>> +
>> +static u32 get_memory_block_size(void)
>> +{
>> +	u32 blk_sz;
>> +
>> +	blk_sz = memory_block_size();
>> +
>> +	/* Validate blk_sz is a power of 2 and not less than section size */
>> +	if ((blk_sz & (blk_sz - 1)) || (blk_sz < MIN_MEMORY_BLOCK_SIZE))
>> +		blk_sz = MIN_MEMORY_BLOCK_SIZE;
>> +
>> +	return blk_sz;
>> +}
>> +
>>  /*
>>   * Initialize the sysfs support for memory devices...
>>   */
>> @@ -540,12 +630,16 @@
>>  	unsigned int i;
>>  	int ret;
>>  	int err;
>> +	int block_sz;
>>  
>>  	memory_sysdev_class.kset.uevent_ops = &memory_uevent_ops;
>>  	ret = sysdev_class_register(&memory_sysdev_class);
>>  	if (ret)
>>  		goto out;
>>  
>> +	block_sz = get_memory_block_size();
>> +	sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
>> +
>>  	/*
>>  	 * Create entries for memory sections that were found
>>  	 * during boot and have been initialized
>> Index: linux-2.6/include/linux/memory.h
>> ===================================================================
>> --- linux-2.6.orig/include/linux/memory.h	2010-07-15 08:48:41.000000000 -0500
>> +++ linux-2.6/include/linux/memory.h	2010-07-15 09:54:06.000000000 -0500
>> @@ -19,9 +19,15 @@
>>  #include <linux/node.h>
>>  #include <linux/compiler.h>
>>  #include <linux/mutex.h>
>> +#include <linux/list.h>
>>  
>> -struct memory_block {
>> +struct memory_block_section {
>> +	unsigned long state;
>>  	unsigned long phys_index;
>> +	struct list_head next;
>> +};
>> +
>> +struct memory_block {
>>  	unsigned long state;
>>  	/*
>>  	 * This serializes all state change requests.  It isn't
>> @@ -34,6 +40,7 @@
>>  	void *hw;			/* optional pointer to fw/hw data */
>>  	int (*phys_callback)(struct memory_block *);
>>  	struct sys_device sysdev;
>> +	struct list_head sections;
>>  };
>>  
>>  int arch_get_memory_phys_device(unsigned long start_pfn);
>> @@ -113,7 +120,7 @@
>>  extern int remove_memory_block(unsigned long, struct mem_section *, int);
>>  extern int memory_notify(unsigned long val, void *v);
>>  extern int memory_isolate_notify(unsigned long val, void *v);
>> -extern struct memory_block *find_memory_block(unsigned long);
>> +extern struct memory_block *find_memory_block(struct mem_section *);
>>  extern int memory_is_hidden(struct mem_section *);
>>  #define CONFIG_MEM_BLOCK_SIZE	(PAGES_PER_SECTION<<PAGE_SHIFT)
>>  enum mem_add_context { BOOT, HOTPLUG };
>>
> 
> Okay, please go ahead. But my 1st impression is that IBM should increase ppc's
> SECTION_SIZE ;)
> 
> Thanks,
> -Kame
> 
> 
>  
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] v2 Create new 'end_phys_index' file
  2010-07-16  0:08   ` KAMEZAWA Hiroyuki
@ 2010-07-16 15:36     ` Nathan Fontenot
  0 siblings, 0 replies; 18+ messages in thread
From: Nathan Fontenot @ 2010-07-16 15:36 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, linuxppc-dev

On 07/15/2010 07:08 PM, KAMEZAWA Hiroyuki wrote:
> On Thu, 15 Jul 2010 13:38:52 -0500
> Nathan Fontenot <nfont@austin.ibm.com> wrote:
> 
>> Add a new 'end_phys_index' file to each memory sysfs directory to
>> report the physical index of the last memory section
>> covered by the sysfs directory.
>>
>> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
> 
> Does memory_block have to be contiguous between [phys_index, end_phys_index] ?
> Should we provide "# of sections" or "amount of memory under a block" ?

Good point.  There is nothing that guarantees that a memory block contains
the contiguous memory sections [phys_index, end_phys_index].  Should there be
a 'memory_sections' file that list the memory sections present in a memory block?
Something along the lines of;

#> cat memory0/memory_sections
0,1,2,3

This could be done instead of the end_phys_index file.

-Nathan
 
> 
> No objections to end_phys_index...buf plz fix diff style.
> 
> Thanks,
> -Kame
> 
> 
>> ---
>>  drivers/base/memory.c  |   14 +++++++++++++-
>>  include/linux/memory.h |    3 +++
>>  2 files changed, 16 insertions(+), 1 deletion(-)
>>
>> Index: linux-2.6/drivers/base/memory.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/base/memory.c	2010-07-15 09:55:54.000000000 -0500
>> +++ linux-2.6/drivers/base/memory.c	2010-07-15 09:56:05.000000000 -0500
>> @@ -121,7 +121,15 @@
>>  {
>>  	struct memory_block *mem =
>>  		container_of(dev, struct memory_block, sysdev);
>> -	return sprintf(buf, "%08lx\n", mem->phys_index);
>> +	return sprintf(buf, "%08lx\n", mem->start_phys_index);
>> +}
>> +
>> +static ssize_t show_mem_end_phys_index(struct sys_device *dev,
>> +			struct sysdev_attribute *attr, char *buf)
>> +{
>> +	struct memory_block *mem =
>> +		container_of(dev, struct memory_block, sysdev);
>> +	return sprintf(buf, "%08lx\n", mem->end_phys_index);
>>  }
>>  
>>  /*
>> @@ -321,6 +329,7 @@
>>  }
>>  
>>  static SYSDEV_ATTR(phys_index, 0444, show_mem_phys_index, NULL);
>> +static SYSDEV_ATTR(end_phys_index, 0444, show_mem_end_phys_index, NULL);
>>  static SYSDEV_ATTR(state, 0644, show_mem_state, store_mem_state);
>>  static SYSDEV_ATTR(phys_device, 0444, show_phys_device, NULL);
>>  static SYSDEV_ATTR(removable, 0444, show_mem_removable, NULL);
>> @@ -533,6 +542,8 @@
>>  		if (!ret)
>>  			ret = mem_create_simple_file(mem, phys_index);
>>  		if (!ret)
>> +			ret = mem_create_simple_file(mem, end_phys_index);
>> +		if (!ret)
>>  			ret = mem_create_simple_file(mem, state);
>>  		if (!ret)
>>  			ret = mem_create_simple_file(mem, phys_device);
>> @@ -577,6 +588,7 @@
>>  	if (list_empty(&mem->sections)) {
>>  		unregister_mem_sect_under_nodes(mem);
>>  		mem_remove_simple_file(mem, phys_index);
>> +		mem_remove_simple_file(mem, end_phys_index);
>>  		mem_remove_simple_file(mem, state);
>>  		mem_remove_simple_file(mem, phys_device);
>>  		mem_remove_simple_file(mem, removable);
>> Index: linux-2.6/include/linux/memory.h
>> ===================================================================
>> --- linux-2.6.orig/include/linux/memory.h	2010-07-15 09:54:06.000000000 -0500
>> +++ linux-2.6/include/linux/memory.h	2010-07-15 09:56:05.000000000 -0500
>> @@ -29,6 +29,9 @@
>>  
>>  struct memory_block {
>>  	unsigned long state;
>> +	unsigned long start_phys_index;
>> +	unsigned long end_phys_index;
>> +
>>  	/*
>>  	 * This serializes all state change requests.  It isn't
>>  	 * held during creation because the control files are
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/5] v2 Update sysfs node routines for new sysfs memory directories
  2010-07-16  0:12   ` KAMEZAWA Hiroyuki
@ 2010-07-16 15:40     ` Nathan Fontenot
  0 siblings, 0 replies; 18+ messages in thread
From: Nathan Fontenot @ 2010-07-16 15:40 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, linuxppc-dev

On 07/15/2010 07:12 PM, KAMEZAWA Hiroyuki wrote:
> On Thu, 15 Jul 2010 13:40:40 -0500
> Nathan Fontenot <nfont@austin.ibm.com> wrote:
> 
>> Update the node sysfs directory routines that create
>> links to the memory sysfs directories under each node.
>> This update makes the node code aware that a memory sysfs
>> directory can cover multiple memory sections.
>>
>> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
> 
> Shouldn't "static int link_mem_sections(int nid)" be update ?
> It does
>  for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
>         register..
> 

No, although the name 'link_mem_sections' does imply that it should.  The
range of start_pfn..end_pfn examined in this routine is the range of pfn's
covered by the entire node, not a memory_block.

-Nathan

> Thanks,
> -Kame
> 
> 
>> ---
>>  drivers/base/node.c |   12 ++++++++----
>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> Index: linux-2.6/drivers/base/node.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/base/node.c	2010-07-15 09:54:06.000000000 -0500
>> +++ linux-2.6/drivers/base/node.c	2010-07-15 09:56:16.000000000 -0500
>> @@ -346,8 +346,10 @@
>>  		return -EFAULT;
>>  	if (!node_online(nid))
>>  		return 0;
>> -	sect_start_pfn = section_nr_to_pfn(mem_blk->phys_index);
>> -	sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
>> +
>> +	sect_start_pfn = section_nr_to_pfn(mem_blk->start_phys_index);
>> +	sect_end_pfn = section_nr_to_pfn(mem_blk->end_phys_index);
>> +	sect_end_pfn += PAGES_PER_SECTION - 1;
>>  	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
>>  		int page_nid;
>>  
>> @@ -383,8 +385,10 @@
>>  	if (!unlinked_nodes)
>>  		return -ENOMEM;
>>  	nodes_clear(*unlinked_nodes);
>> -	sect_start_pfn = section_nr_to_pfn(mem_blk->phys_index);
>> -	sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
>> +
>> +	sect_start_pfn = section_nr_to_pfn(mem_blk->start_phys_index);
>> +	sect_end_pfn = section_nr_to_pfn(mem_blk->end_phys_index);
>> +	sect_end_pfn += PAGES_PER_SECTION - 1;
>>  	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
>>  		int nid;
>>  
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] v2 Split the memory_block structure
  2010-07-15 18:37 ` [PATCH 1/5] v2 Split the memory_block structure Nathan Fontenot
  2010-07-16  0:06   ` KAMEZAWA Hiroyuki
@ 2010-07-16 17:15   ` Dave Hansen
  2010-07-16 18:23     ` Nathan Fontenot
  1 sibling, 1 reply; 18+ messages in thread
From: Dave Hansen @ 2010-07-16 17:15 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linux-mm, linux-kernel, KAMEZAWA Hiroyuki, linuxppc-dev

On Thu, 2010-07-15 at 13:37 -0500, Nathan Fontenot wrote:
> @@ -123,13 +130,20 @@
>  static ssize_t show_mem_removable(struct sys_device *dev,
>  			struct sysdev_attribute *attr, char *buf)
>  {
> +	struct memory_block *mem;
> +	struct memory_block_section *mbs;
>  	unsigned long start_pfn;
> -	int ret;
> -	struct memory_block *mem =
> -		container_of(dev, struct memory_block, sysdev);
> +	int ret = 1;
> +
> +	mem = container_of(dev, struct memory_block, sysdev);
> +	mutex_lock(&mem->state_mutex);
> 
> -	start_pfn = section_nr_to_pfn(mem->phys_index);
> -	ret = is_mem_section_removable(start_pfn, PAGES_PER_SECTION);
> +	list_for_each_entry(mbs, &mem->sections, next) {
> +		start_pfn = section_nr_to_pfn(mbs->phys_index);
> +		ret &= is_mem_section_removable(start_pfn, PAGES_PER_SECTION);
> +	}
> +
> +	mutex_unlock(&mem->state_mutex);
>  	return sprintf(buf, "%d\n", ret);
>  }

Now that the "state_mutex" is getting used for other stuff, should we
just make it "mutex"?

> @@ -182,16 +196,16 @@
>   * OK to have direct references to sparsemem variables in here.
>   */
>  static int
> -memory_block_action(struct memory_block *mem, unsigned long action)
> +memory_block_action(struct memory_block_section *mbs, unsigned long action)
>  {
>  	int i;
>  	unsigned long psection;
>  	unsigned long start_pfn, start_paddr;
>  	struct page *first_page;
>  	int ret;
> -	int old_state = mem->state;
> +	int old_state = mbs->state;
> 
> -	psection = mem->phys_index;
> +	psection = mbs->phys_index;
>  	first_page = pfn_to_page(psection << PFN_SECTION_SHIFT);
> 
>  	/*
> @@ -217,18 +231,18 @@
>  			ret = online_pages(start_pfn, PAGES_PER_SECTION);
>  			break;
>  		case MEM_OFFLINE:
> -			mem->state = MEM_GOING_OFFLINE;
> +			mbs->state = MEM_GOING_OFFLINE;
>  			start_paddr = page_to_pfn(first_page) << PAGE_SHIFT;
>  			ret = remove_memory(start_paddr,
>  					    PAGES_PER_SECTION << PAGE_SHIFT);
>  			if (ret) {
> -				mem->state = old_state;
> +				mbs->state = old_state;
>  				break;
>  			}
>  			break;
>  		default:
>  			WARN(1, KERN_WARNING "%s(%p, %ld) unknown action: %ld\n",
> -					__func__, mem, action, action);
> +					__func__, mbs, action, action);
>  			ret = -EINVAL;
>  	}
> 
> @@ -238,19 +252,34 @@
>  static int memory_block_change_state(struct memory_block *mem,
>  		unsigned long to_state, unsigned long from_state_req)
>  {
> +	struct memory_block_section *mbs;
>  	int ret = 0;
> +
>  	mutex_lock(&mem->state_mutex);
> 
> -	if (mem->state != from_state_req) {
> -		ret = -EINVAL;
> -		goto out;
> +	list_for_each_entry(mbs, &mem->sections, next) {
> +		if (mbs->state != from_state_req)
> +			continue;
> +
> +		ret = memory_block_action(mbs, to_state);
> +		if (ret)
> +			break;
> +	}
> +
> +	if (ret) {
> +		list_for_each_entry(mbs, &mem->sections, next) {
> +			if (mbs->state == from_state_req)
> +				continue;
> +
> +			if (memory_block_action(mbs, to_state))
> +				printk(KERN_ERR "Could not re-enable memory "
> +				       "section %lx\n", mbs->phys_index);
> +		}
>  	}

Please just use a goto here.  It's nicer looking, and much more in line
with what's there already.

...
> ===================================================================
> --- linux-2.6.orig/include/linux/memory.h	2010-07-15 08:48:41.000000000 -0500
> +++ linux-2.6/include/linux/memory.h	2010-07-15 09:54:06.000000000 -0500
> @@ -19,9 +19,15 @@
>  #include <linux/node.h>
>  #include <linux/compiler.h>
>  #include <linux/mutex.h>
> +#include <linux/list.h>
> 
> -struct memory_block {
> +struct memory_block_section {
> +	unsigned long state;
>  	unsigned long phys_index;
> +	struct list_head next;
> +};
> +
> +struct memory_block {
>  	unsigned long state;
>  	/*
>  	 * This serializes all state change requests.  It isn't
> @@ -34,6 +40,7 @@
>  	void *hw;			/* optional pointer to fw/hw data */
>  	int (*phys_callback)(struct memory_block *);
>  	struct sys_device sysdev;
> +	struct list_head sections;
>  };

It looks like we have state in both the memory_block and
memory_block_section.  That seems a bit confusing to me.  This also
looks like it would permit non-contiguous memory_block_sections in a
memory_block.  Is that what you intended?

If the memory_block's state was inferred to be the same as each
memory_block_section, couldn't we just keep a start and end phys_index
in the memory_block, and get away from having memory_block_sections at
all?

-- Dave

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/5] v2 Change the mutex name in the memory_block struct
  2010-07-15 18:39 ` [PATCH 3/5] v2 Change the mutex name in the memory_block struct Nathan Fontenot
@ 2010-07-16 17:16   ` Dave Hansen
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Hansen @ 2010-07-16 17:16 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linux-mm, linux-kernel, KAMEZAWA Hiroyuki, linuxppc-dev

On Thu, 2010-07-15 at 13:39 -0500, Nathan Fontenot wrote:
> 
> Change the name of the memory_block mutex since it is now used for
> more than just gating changes to the status of the memory sections
> covered by the memory sysfs directory.

Heh, sorry about the previous comments. :)

You should move this up to be the first in the series.

-- Dave

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] v2 Enable multiple sections per directory for ppc
  2010-07-15 18:41 ` [PATCH 5/5] v2 Enable multiple sections per directory for ppc Nathan Fontenot
@ 2010-07-16 17:18   ` Dave Hansen
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Hansen @ 2010-07-16 17:18 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linux-mm, linux-kernel, KAMEZAWA Hiroyuki, linuxppc-dev

On Thu, 2010-07-15 at 13:41 -0500, Nathan Fontenot wrote:
>  linux-2.6.orig/arch/powerpc/platforms/pseries/hotplug-memory.c      2010-07-15 09:54:06.000000000 -0500
> +++ linux-2.6/arch/powerpc/platforms/pseries/hotplug-memory.c   2010-07-15 09:56:19.000000000 -0500
> @@ -17,6 +17,54 @@
>  #include <asm/pSeries_reconfig.h>
>  #include <asm/sparsemem.h>
> 
> +static u32 get_memblock_size(void)
> +{
> +       struct device_node *np;
> +       unsigned int memblock_size = 0;
> + 

Please give this sucker some units.  get_memblock_size_bytes()?
get_memblock_size_in_g0ats()?


-- Dave

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] v2 Split the memory_block structure
  2010-07-16 17:15   ` Dave Hansen
@ 2010-07-16 18:23     ` Nathan Fontenot
  2010-07-16 18:33       ` Dave Hansen
  2010-07-16 18:45       ` Dave Hansen
  0 siblings, 2 replies; 18+ messages in thread
From: Nathan Fontenot @ 2010-07-16 18:23 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, KAMEZAWA Hiroyuki, linuxppc-dev

On 07/16/2010 12:15 PM, Dave Hansen wrote:
> On Thu, 2010-07-15 at 13:37 -0500, Nathan Fontenot wrote:
>> @@ -123,13 +130,20 @@
>>  static ssize_t show_mem_removable(struct sys_device *dev,
>>  			struct sysdev_attribute *attr, char *buf)
>>  {
>> +	struct memory_block *mem;
>> +	struct memory_block_section *mbs;
>>  	unsigned long start_pfn;
>> -	int ret;
>> -	struct memory_block *mem =
>> -		container_of(dev, struct memory_block, sysdev);
>> +	int ret = 1;
>> +
>> +	mem = container_of(dev, struct memory_block, sysdev);
>> +	mutex_lock(&mem->state_mutex);
>>
>> -	start_pfn = section_nr_to_pfn(mem->phys_index);
>> -	ret = is_mem_section_removable(start_pfn, PAGES_PER_SECTION);
>> +	list_for_each_entry(mbs, &mem->sections, next) {
>> +		start_pfn = section_nr_to_pfn(mbs->phys_index);
>> +		ret &= is_mem_section_removable(start_pfn, PAGES_PER_SECTION);
>> +	}
>> +
>> +	mutex_unlock(&mem->state_mutex);
>>  	return sprintf(buf, "%d\n", ret);
>>  }
> 
> Now that the "state_mutex" is getting used for other stuff, should we
> just make it "mutex"?
> 
>> @@ -182,16 +196,16 @@
>>   * OK to have direct references to sparsemem variables in here.
>>   */
>>  static int
>> -memory_block_action(struct memory_block *mem, unsigned long action)
>> +memory_block_action(struct memory_block_section *mbs, unsigned long action)
>>  {
>>  	int i;
>>  	unsigned long psection;
>>  	unsigned long start_pfn, start_paddr;
>>  	struct page *first_page;
>>  	int ret;
>> -	int old_state = mem->state;
>> +	int old_state = mbs->state;
>>
>> -	psection = mem->phys_index;
>> +	psection = mbs->phys_index;
>>  	first_page = pfn_to_page(psection << PFN_SECTION_SHIFT);
>>
>>  	/*
>> @@ -217,18 +231,18 @@
>>  			ret = online_pages(start_pfn, PAGES_PER_SECTION);
>>  			break;
>>  		case MEM_OFFLINE:
>> -			mem->state = MEM_GOING_OFFLINE;
>> +			mbs->state = MEM_GOING_OFFLINE;
>>  			start_paddr = page_to_pfn(first_page) << PAGE_SHIFT;
>>  			ret = remove_memory(start_paddr,
>>  					    PAGES_PER_SECTION << PAGE_SHIFT);
>>  			if (ret) {
>> -				mem->state = old_state;
>> +				mbs->state = old_state;
>>  				break;
>>  			}
>>  			break;
>>  		default:
>>  			WARN(1, KERN_WARNING "%s(%p, %ld) unknown action: %ld\n",
>> -					__func__, mem, action, action);
>> +					__func__, mbs, action, action);
>>  			ret = -EINVAL;
>>  	}
>>
>> @@ -238,19 +252,34 @@
>>  static int memory_block_change_state(struct memory_block *mem,
>>  		unsigned long to_state, unsigned long from_state_req)
>>  {
>> +	struct memory_block_section *mbs;
>>  	int ret = 0;
>> +
>>  	mutex_lock(&mem->state_mutex);
>>
>> -	if (mem->state != from_state_req) {
>> -		ret = -EINVAL;
>> -		goto out;
>> +	list_for_each_entry(mbs, &mem->sections, next) {
>> +		if (mbs->state != from_state_req)
>> +			continue;
>> +
>> +		ret = memory_block_action(mbs, to_state);
>> +		if (ret)
>> +			break;
>> +	}
>> +
>> +	if (ret) {
>> +		list_for_each_entry(mbs, &mem->sections, next) {
>> +			if (mbs->state == from_state_req)
>> +				continue;
>> +
>> +			if (memory_block_action(mbs, to_state))
>> +				printk(KERN_ERR "Could not re-enable memory "
>> +				       "section %lx\n", mbs->phys_index);
>> +		}
>>  	}
> 
> Please just use a goto here.  It's nicer looking, and much more in line
> with what's there already.

Not sure if I follow on where you want the goto.  If you mean after the
if (memory_block_action())...  I purposely did not have a goto here.
Since this is in the recovery path I wanted to make sure we tried to return
every memory section to the original state.

> 
> ...
>> ===================================================================
>> --- linux-2.6.orig/include/linux/memory.h	2010-07-15 08:48:41.000000000 -0500
>> +++ linux-2.6/include/linux/memory.h	2010-07-15 09:54:06.000000000 -0500
>> @@ -19,9 +19,15 @@
>>  #include <linux/node.h>
>>  #include <linux/compiler.h>
>>  #include <linux/mutex.h>
>> +#include <linux/list.h>
>>
>> -struct memory_block {
>> +struct memory_block_section {
>> +	unsigned long state;
>>  	unsigned long phys_index;
>> +	struct list_head next;
>> +};
>> +
>> +struct memory_block {
>>  	unsigned long state;
>>  	/*
>>  	 * This serializes all state change requests.  It isn't
>> @@ -34,6 +40,7 @@
>>  	void *hw;			/* optional pointer to fw/hw data */
>>  	int (*phys_callback)(struct memory_block *);
>>  	struct sys_device sysdev;
>> +	struct list_head sections;
>>  };
> 
> It looks like we have state in both the memory_block and
> memory_block_section.  That seems a bit confusing to me.  This also
> looks like it would permit non-contiguous memory_block_sections in a
> memory_block.  Is that what you intended?

The two state fields are a bit confusing, perhaps slightly different
names, block_state and section_state.  The reason for two state fileds
is that memory online/offline is done on a memory block and an entire
memory block is considered online or offline.  The state field in the
memory_block_section is used to track the state of each of the memory
sections as you work through onlining or offlining the entire block
so that if an error occurs we can return each memory section to its 
original state.

You're correct that there is nothing that prevents non-contiguous memory
sections in a memory block.  It was not my intention to have this, but
looking at the patches there is nothing to prevent it.

> 
> If the memory_block's state was inferred to be the same as each
> memory_block_section, couldn't we just keep a start and end phys_index
> in the memory_block, and get away from having memory_block_sections at
> all?

Oooohhh... I like.  Looking at the code it appears this is possible.  I'll
try this out and include it in the next version of the patch.

Do you think we need to add an additional file to each memory block directory
to indicate the number of memory sections in the memory block that are actually
present?

-Nathan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] v2 Split the memory_block structure
  2010-07-16 18:23     ` Nathan Fontenot
@ 2010-07-16 18:33       ` Dave Hansen
  2010-07-16 18:45       ` Dave Hansen
  1 sibling, 0 replies; 18+ messages in thread
From: Dave Hansen @ 2010-07-16 18:33 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linux-mm, linux-kernel, KAMEZAWA Hiroyuki, linuxppc-dev

On Fri, 2010-07-16 at 13:23 -0500, Nathan Fontenot wrote:
> > If the memory_block's state was inferred to be the same as each
> > memory_block_section, couldn't we just keep a start and end phys_index
> > in the memory_block, and get away from having memory_block_sections at
> > all?
> 
> Oooohhh... I like.  Looking at the code it appears this is possible.  I'll
> try this out and include it in the next version of the patch.
> 
> Do you think we need to add an additional file to each memory block directory
> to indicate the number of memory sections in the memory block that are actually
> present? 

I think it's easiest to just say that each 'memory_block' can only hold
contiguous 'memory_block_sections', and we give either the start/end or
start/length pairs.  It gets a lot more complicated if we have to deal
with lots of holes.

I can just see the hardware designers reading this thread, with their
Dr. Evil laughs trying to come up with a reason to give us a couple of
terabytes of RAM with only every-other 16MB area populated. :)  

-- Dave

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] v2 Split the memory_block structure
  2010-07-16 18:23     ` Nathan Fontenot
  2010-07-16 18:33       ` Dave Hansen
@ 2010-07-16 18:45       ` Dave Hansen
  1 sibling, 0 replies; 18+ messages in thread
From: Dave Hansen @ 2010-07-16 18:45 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linux-mm, linux-kernel, KAMEZAWA Hiroyuki, linuxppc-dev

On Fri, 2010-07-16 at 13:23 -0500, Nathan Fontenot wrote:
> >> -    if (mem->state != from_state_req) {
> >> -            ret = -EINVAL;
> >> -            goto out;
> >> +    list_for_each_entry(mbs, &mem->sections, next) {
> >> +            if (mbs->state != from_state_req)
> >> +                    continue;
> >> +
> >> +            ret = memory_block_action(mbs, to_state);
> >> +            if (ret)
> >> +                    break;
> >> +    }
> >> +
> >> +    if (ret) {
> >> +            list_for_each_entry(mbs, &mem->sections, next) {
> >> +                    if (mbs->state == from_state_req)
> >> +                            continue;
> >> +
> >> +                    if (memory_block_action(mbs, to_state))
> >> +                            printk(KERN_ERR "Could not re-enable memory "
> >> +                                   "section %lx\n", mbs->phys_index);
> >> +            }
> >>      }
> > 
> > Please just use a goto here.  It's nicer looking, and much more in line
> > with what's there already.
> 
> Not sure if I follow on where you want the goto.  If you mean after the
> if (memory_block_action())...  I purposely did not have a goto here.
> Since this is in the recovery path I wanted to make sure we tried to return
> every memory section to the original state. 

Looking at it a little closer, I see what you're doing now.

First of all, should memory_block_action() get a new name since it isn
not taking 'memory_block_section's?

The thing I would have liked to see is to have that error handling block
out of the way a bit.  But, the function is small, and there's not _too_
much code in there, so what you have is probably the best way to do it.

Minor nit: Please pull the memory_block_action() out of the if() and do
the:

> >> +            ret = memory_block_action(mbs, to_state);
> >> +            if (ret)
> >> +                    break;

thing like above.  It makes it much more obvious that the loop is
related to the top one.  I was thinking if it made sense to have a
helper function to go through and do that list walk, so you could do:

	ret = set_all_states(mem->sections, to_state);
	if (ret)
		set_all_states(mem->sections, old_state);

But I think you'd need to pass in a bit more information, so it probably
isn't worth doing that, either.

-- Dave

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2010-07-16 18:45 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-15 18:30 [PATCH 0/5] v2 De-couple sysfs memory directories from memory section size Nathan Fontenot
2010-07-15 18:37 ` [PATCH 1/5] v2 Split the memory_block structure Nathan Fontenot
2010-07-16  0:06   ` KAMEZAWA Hiroyuki
2010-07-16 15:29     ` Nathan Fontenot
2010-07-16 17:15   ` Dave Hansen
2010-07-16 18:23     ` Nathan Fontenot
2010-07-16 18:33       ` Dave Hansen
2010-07-16 18:45       ` Dave Hansen
2010-07-15 18:38 ` [PATCH 2/5] v2 Create new 'end_phys_index' file Nathan Fontenot
2010-07-16  0:08   ` KAMEZAWA Hiroyuki
2010-07-16 15:36     ` Nathan Fontenot
2010-07-15 18:39 ` [PATCH 3/5] v2 Change the mutex name in the memory_block struct Nathan Fontenot
2010-07-16 17:16   ` Dave Hansen
2010-07-15 18:40 ` [PATCH 4/5] v2 Update sysfs node routines for new sysfs memory directories Nathan Fontenot
2010-07-16  0:12   ` KAMEZAWA Hiroyuki
2010-07-16 15:40     ` Nathan Fontenot
2010-07-15 18:41 ` [PATCH 5/5] v2 Enable multiple sections per directory for ppc Nathan Fontenot
2010-07-16 17:18   ` Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).