linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: show node to memory section relationship with symlinks in sysfs
@ 2008-09-29 20:05 Gary Hade
  2008-09-30  8:06 ` Yasunori Goto
  0 siblings, 1 reply; 7+ messages in thread
From: Gary Hade @ 2008-09-29 20:05 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Yasunori Goto, Badari Pulavarty, Mel Gorman,
	Chris McDermott, Gary Hade, linux-kernel, Ingo Molnar, Greg KH,
	Dave Hansen, Nish Aravamudan


Show node to memory section relationship with symlinks in sysfs

Add /sys/devices/system/node/nodeX/memoryY symlinks for all
the memory sections located on nodeX.  For example:
/sys/devices/system/node/node1/memory135 -> ../../memory/memory135
indicates that memory section 135 resides on node1.

Successfully tested with 2.6.27-rc7 source on 2-node x86_64,
2-node ppc64, and 2-node ia64 systems.

Also revises documentation to cover this change as well as updating
Documentation/ABI/testing/sysfs-devices-memory to include descriptions
of memory hotremove files 'phys_device', 'phys_index', and 'state'
that were previously not described there.

Supersedes the "mm: show memory section to node relationship in sysfs"
patch posted on 05 Sept 2008 which created node ID containing 'node'
files in /sys/devices/system/memory/memoryX instead of symlinks.
Changed from files to symlinks due to feedback that symlinks were
more consistent with the sysfs way.

Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>

---
 Documentation/ABI/testing/sysfs-devices-memory |   51 ++++++++++++
 Documentation/memory-hotplug.txt               |   16 +++
 drivers/base/memory.c                          |   10 ++
 drivers/base/node.c                            |   61 +++++++++++++++
 include/linux/memory.h                         |    4 
 include/linux/node.h                           |   11 ++
 6 files changed, 146 insertions(+), 7 deletions(-)

Index: linux-2.6.27-rc5/Documentation/ABI/testing/sysfs-devices-memory
===================================================================
--- linux-2.6.27-rc5.orig/Documentation/ABI/testing/sysfs-devices-memory	2008-09-24 13:19:23.000000000 -0700
+++ linux-2.6.27-rc5/Documentation/ABI/testing/sysfs-devices-memory	2008-09-25 13:36:41.000000000 -0700
@@ -6,7 +6,6 @@
 		internal state of the kernel memory blocks. Files could be
 		added or removed dynamically to represent hot-add/remove
 		operations.
-
 Users:		hotplug memory add/remove tools
 		https://w3.opensource.ibm.com/projects/powerpc-utils/
 
@@ -19,6 +18,56 @@
 		This is useful for a user-level agent to determine
 		identify removable sections of the memory before attempting
 		potentially expensive hot-remove memory operation
+Users:		hotplug memory remove tools
+		https://w3.opensource.ibm.com/projects/powerpc-utils/
+
+What:		/sys/devices/system/memory/memoryX/phys_device
+Date:		September 2008
+Contact:	Badari Pulavarty <pbadari@us.ibm.com>
+Description:
+		The file /sys/devices/system/memory/memoryX/phys_device
+		is read-only and is designed to show the name of physical
+		memory device.  Implementation is currently incomplete.
 
+What:		/sys/devices/system/memory/memoryX/phys_index
+Date:		September 2008
+Contact:	Badari Pulavarty <pbadari@us.ibm.com>
+Description:
+		The file /sys/devices/system/memory/memoryX/phys_index
+		is read-only and contains the section ID in hexadecimal
+		which is equivalent to decimal X contained in the
+		memory section directory name.
+
+What:		/sys/devices/system/memory/memoryX/state
+Date:		September 2008
+Contact:	Badari Pulavarty <pbadari@us.ibm.com>
+Description:
+		The file /sys/devices/system/memory/memoryX/state
+		is read-write.  When read, it's contents show the
+		online/offline state of the memory section.  When written,
+		root can toggle the the online/offline state of a removable
+		memory section (see removable file description above)
+		using the following commands.
+		# echo online > /sys/devices/system/memory/memoryX/state
+		# echo offline > /sys/devices/system/memory/memoryX/state
+
+		For example, if /sys/devices/system/memory/memory22/removable
+		contains a value of 1 and
+		/sys/devices/system/memory/memory22/state contains the
+		string "online" the following command can be executed by
+		by root to offline that section.
+		# echo offline > /sys/devices/system/memory/memory22/state
 Users:		hotplug memory remove tools
 		https://w3.opensource.ibm.com/projects/powerpc-utils/
+
+What:		/sys/devices/system/node/nodeX/memoryY
+Date:		September 2008
+Contact:	Gary Hade <garyhade@us.ibm.com>
+Description:
+		When CONFIG_NUMA is enabled
+		/sys/devices/system/node/nodeX/memoryY is a symbolic link that
+		points to the corresponding /sys/devices/system/memory/memoryY
+		memory section directory.  For example, the following symbolic
+		link is created for memory section 9 on node0.
+		/sys/devices/system/node/node0/memory9 -> ../../memory/memory9
+
Index: linux-2.6.27-rc5/Documentation/memory-hotplug.txt
===================================================================
--- linux-2.6.27-rc5.orig/Documentation/memory-hotplug.txt	2008-09-24 13:19:23.000000000 -0700
+++ linux-2.6.27-rc5/Documentation/memory-hotplug.txt	2008-09-25 13:36:58.000000000 -0700
@@ -124,7 +124,7 @@
     This option can be kernel module too.
 
 --------------------------------
-3 sysfs files for memory hotplug
+4 sysfs files for memory hotplug
 --------------------------------
 All sections have their device information under /sys/devices/system/memory as
 
@@ -138,11 +138,12 @@
 (0x100000000 / 1Gib = 4)
 This device covers address range [0x100000000 ... 0x140000000)
 
-Under each section, you can see 3 files.
+Under each section, you can see 4 files.
 
 /sys/devices/system/memory/memoryXXX/phys_index
 /sys/devices/system/memory/memoryXXX/phys_device
 /sys/devices/system/memory/memoryXXX/state
+/sys/devices/system/memory/memoryXXX/removable
 
 'phys_index' : read-only and contains section id, same as XXX.
 'state'      : read-write
@@ -150,10 +151,20 @@
                at write: user can specify "online", "offline" command
 'phys_device': read-only: designed to show the name of physical memory device.
                This is not well implemented now.
+'removable'  : read-only: contains an integer value indicating
+               whether the memory section is removable or not
+               removable.  A value of 1 indicates that the memory
+               section is removable and a value of 0 indicates that
+               it is not removable.
 
 NOTE:
   These directories/files appear after physical memory hotplug phase.
 
+If CONFIG_NUMA is specified the
+/sys/devices/system/memory/memoryXXX memory section
+directories can also be accessed via symbolic links located in
+the /sys/devices/system/node/node* directories.  For example:
+/sys/devices/system/node/node0/memory9 -> ../../memory/memory9
 
 --------------------------------
 4. Physical memory hot-add phase
@@ -365,7 +376,6 @@
   - allowing memory hot-add to ZONE_MOVABLE. maybe we need some switch like
     sysctl or new control file.
   - showing memory section and physical device relationship.
-  - showing memory section and node relationship (maybe good for NUMA)
   - showing memory section is under ZONE_MOVABLE or not
   - test and make it better memory offlining.
   - support HugeTLB page migration and offlining.
Index: linux-2.6.27-rc5/drivers/base/memory.c
===================================================================
--- linux-2.6.27-rc5.orig/drivers/base/memory.c	2008-09-24 13:19:23.000000000 -0700
+++ linux-2.6.27-rc5/drivers/base/memory.c	2008-09-24 13:19:29.000000000 -0700
@@ -368,6 +368,13 @@
 		ret = mem_create_simple_file(mem, phys_device);
 	if (!ret)
 		ret = mem_create_simple_file(mem, removable);
+	if (!ret) {
+		ret = register_mem_sect_under_node(mem);
+		if (ret == -EFAULT) {
+			/* expected during boot if node not registered yet */
+			ret = 0;
+		}
+	}
 
 	return ret;
 }
@@ -380,7 +387,7 @@
  *
  * This could be made generic for all sysdev classes.
  */
-static struct memory_block *find_memory_block(struct mem_section *section)
+struct memory_block *find_memory_block(struct mem_section *section)
 {
 	struct kobject *kobj;
 	struct sys_device *sysdev;
@@ -409,6 +416,7 @@
 	struct memory_block *mem;
 
 	mem = find_memory_block(section);
+	unregister_mem_sect_under_node(mem);
 	mem_remove_simple_file(mem, phys_index);
 	mem_remove_simple_file(mem, state);
 	mem_remove_simple_file(mem, phys_device);
Index: linux-2.6.27-rc5/drivers/base/node.c
===================================================================
--- linux-2.6.27-rc5.orig/drivers/base/node.c	2008-09-24 13:19:23.000000000 -0700
+++ linux-2.6.27-rc5/drivers/base/node.c	2008-09-25 13:36:00.000000000 -0700
@@ -6,6 +6,7 @@
 #include <linux/module.h>
 #include <linux/init.h>
 #include <linux/mm.h>
+#include <linux/memory.h>
 #include <linux/node.h>
 #include <linux/hugetlb.h>
 #include <linux/cpumask.h>
@@ -225,6 +226,63 @@
 	return 0;
 }
 
+#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
+int register_mem_sect_under_node(struct memory_block *mem_blk)
+{
+	unsigned int nid;
+
+	if (!mem_blk)
+		return -EFAULT;
+	nid = section_nr_to_nid(mem_blk->phys_index);
+	if (!node_online(nid))
+		return 0;
+	return sysfs_create_link_nowarn(&node_devices[nid].sysdev.kobj,
+		&mem_blk->sysdev.kobj, kobject_name(&mem_blk->sysdev.kobj));
+}
+
+int unregister_mem_sect_under_node(struct memory_block *mem_blk)
+{
+	unsigned int nid;
+
+	if (!mem_blk)
+		return -EFAULT;
+	nid = section_nr_to_nid(mem_blk->phys_index);
+	if (!node_online(nid))
+		return 0;
+	sysfs_remove_link(&node_devices[nid].sysdev.kobj,
+			 kobject_name(&mem_blk->sysdev.kobj));
+	return 0;
+}
+
+static int link_mem_sections(int nid)
+{
+	unsigned long start_pfn = NODE_DATA(nid)->node_start_pfn;
+	unsigned long end_pfn = start_pfn + NODE_DATA(nid)->node_spanned_pages;
+	unsigned long pfn;
+	int err = 0;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
+		unsigned long section_nr = pfn_to_section_nr(pfn);
+		struct mem_section *mem_sect;
+		struct memory_block *mem_blk;
+		int ret;
+
+		if (!present_section_nr(section_nr))
+			continue;
+		if (pfn_to_nid(pfn) != nid)
+			continue;
+		mem_sect = __nr_to_section(section_nr);
+		mem_blk = find_memory_block(mem_sect);
+		ret = register_mem_sect_under_node(mem_blk);
+		if (!err)
+			err = ret;
+	}
+	return err;
+}
+#else
+static int link_mem_sections(int nid) { return 0; }
+#endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
+
 int register_one_node(int nid)
 {
 	int error = 0;
@@ -244,6 +302,9 @@
 			if (cpu_to_node(cpu) == nid)
 				register_cpu_under_node(cpu, nid);
 		}
+
+		/* link memory sections under this node */
+		error = link_mem_sections(nid);
 	}
 
 	return error;
Index: linux-2.6.27-rc5/include/linux/memory.h
===================================================================
--- linux-2.6.27-rc5.orig/include/linux/memory.h	2008-09-24 13:19:23.000000000 -0700
+++ linux-2.6.27-rc5/include/linux/memory.h	2008-09-24 13:19:29.000000000 -0700
@@ -84,9 +84,9 @@
 extern int memory_dev_init(void);
 extern int remove_memory_block(unsigned long, struct mem_section *, int);
 extern int memory_notify(unsigned long val, void *v);
+extern struct memory_block *find_memory_block(struct mem_section *);
 #define CONFIG_MEM_BLOCK_SIZE	(PAGES_PER_SECTION<<PAGE_SHIFT)
-
-
+#define section_nr_to_nid(section_nr) pfn_to_nid(section_nr_to_pfn(section_nr))
 #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
 
 #ifdef CONFIG_MEMORY_HOTPLUG
Index: linux-2.6.27-rc5/include/linux/node.h
===================================================================
--- linux-2.6.27-rc5.orig/include/linux/node.h	2008-09-24 13:19:23.000000000 -0700
+++ linux-2.6.27-rc5/include/linux/node.h	2008-09-24 13:19:29.000000000 -0700
@@ -26,6 +26,7 @@
 	struct sys_device	sysdev;
 };
 
+struct memory_block;
 extern struct node node_devices[];
 
 extern int register_node(struct node *, int, struct node *);
@@ -35,6 +36,8 @@
 extern void unregister_one_node(int nid);
 extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
+extern int register_mem_sect_under_node(struct memory_block *mem_blk);
+extern int unregister_mem_sect_under_node(struct memory_block *mem_blk);
 #else
 static inline int register_one_node(int nid)
 {
@@ -52,6 +55,14 @@
 {
 	return 0;
 }
+static inline int register_mem_sect_under_node(struct memory_block *mem_blk)
+{
+	return 0;
+}
+static inline int unregister_mem_sect_under_node(struct memory_block *mem_blk)
+{
+	return 0;
+}
 #endif
 
 #define to_node(sys_device) container_of(sys_device, struct node, sysdev)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: show node to memory section relationship with symlinks in sysfs
  2008-09-29 20:05 [PATCH] mm: show node to memory section relationship with symlinks in sysfs Gary Hade
@ 2008-09-30  8:06 ` Yasunori Goto
  2008-09-30 15:50   ` Dave Hansen
  2008-09-30 23:29   ` Gary Hade
  0 siblings, 2 replies; 7+ messages in thread
From: Yasunori Goto @ 2008-09-30  8:06 UTC (permalink / raw)
  To: Gary Hade
  Cc: linux-mm, Andrew Morton, Badari Pulavarty, Mel Gorman,
	Chris McDermott, linux-kernel, Ingo Molnar, Greg KH, Dave Hansen,
	Nish Aravamudan


> +#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
> +int register_mem_sect_under_node(struct memory_block *mem_blk)
        :

I think this patch is convenience even when memory hotplug is disabled.
CONFIG_SPARSEMEM seems better than CONFIG_MEMORY_HOTPLUG_SPARSE.


> +int register_mem_sect_under_node(struct memory_block *mem_blk)
> +{
> +	unsigned int nid;
> +
> +	if (!mem_blk)
> +		return -EFAULT;
> +	nid = section_nr_to_nid(mem_blk->phys_index);

(snip)

> +#define section_nr_to_nid(section_nr) pfn_to_nid(section_nr_to_pfn(section_nr))
>  #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */

If the first page of the section is not valid, then this section_nr_to_nid()
doesn't return correct value.

I tested this patch. In my box, the start_pfn of node 1 is 1200400, but 
section_nr_to_pfn(mem_blk->phys_index) returns 1200000. As a result,
the section is linked to node 0.

Bye.
-- 
Yasunori Goto 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: show node to memory section relationship with symlinks in sysfs
  2008-09-30  8:06 ` Yasunori Goto
@ 2008-09-30 15:50   ` Dave Hansen
  2008-09-30 19:41     ` Gary Hade
  2008-09-30 23:29   ` Gary Hade
  1 sibling, 1 reply; 7+ messages in thread
From: Dave Hansen @ 2008-09-30 15:50 UTC (permalink / raw)
  To: Yasunori Goto
  Cc: Gary Hade, linux-mm, Andrew Morton, Badari Pulavarty, Mel Gorman,
	Chris McDermott, linux-kernel, Ingo Molnar, Greg KH,
	Nish Aravamudan

On Tue, 2008-09-30 at 17:06 +0900, Yasunori Goto wrote:
> > +#define section_nr_to_nid(section_nr) pfn_to_nid(section_nr_to_pfn(section_nr))
> >  #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
> 
> If the first page of the section is not valid, then this section_nr_to_nid()
> doesn't return correct value.
> 
> I tested this patch. In my box, the start_pfn of node 1 is 1200400, but 
> section_nr_to_pfn(mem_blk->phys_index) returns 1200000. As a result,
> the section is linked to node 0.

Crap, I was worried about that.

Gary, this means that we have a N:1 relationship between NUMA nodes and
sections.  This normally isn't a problem because sections don't really
care about nodes and they layer underneath them.

We'll probably need multiple symlinks in each section directory.

-- Dave


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: show node to memory section relationship with symlinks in sysfs
  2008-09-30 15:50   ` Dave Hansen
@ 2008-09-30 19:41     ` Gary Hade
  2008-10-01  2:48       ` Yasunori Goto
  0 siblings, 1 reply; 7+ messages in thread
From: Gary Hade @ 2008-09-30 19:41 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Yasunori Goto, Gary Hade, linux-mm, Andrew Morton,
	Badari Pulavarty, Mel Gorman, Chris McDermott, linux-kernel,
	Ingo Molnar, Greg KH, Nish Aravamudan

On Tue, Sep 30, 2008 at 08:50:37AM -0700, Dave Hansen wrote:
> On Tue, 2008-09-30 at 17:06 +0900, Yasunori Goto wrote:
> > > +#define section_nr_to_nid(section_nr) pfn_to_nid(section_nr_to_pfn(section_nr))
> > >  #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
> > 
> > If the first page of the section is not valid, then this section_nr_to_nid()
> > doesn't return correct value.
> > 
> > I tested this patch. In my box, the start_pfn of node 1 is 1200400, but 
> > section_nr_to_pfn(mem_blk->phys_index) returns 1200000. As a result,
> > the section is linked to node 0.
> 
> Crap, I was worried about that.
> 
> Gary, this means that we have a N:1 relationship between NUMA nodes and
> sections.  This normally isn't a problem because sections don't really
> care about nodes and they layer underneath them.

So, using Yasunori-san's example the memory section starting at
pfn 1200000 actually resides on both node 0 and node 1.

> 
> We'll probably need multiple symlinks in each section directory.

or perhaps symlinks to the same section directory from >1 node directory.

Gary

-- 
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503  IBM T/L: 775-4503
garyhade@us.ibm.com
http://www.ibm.com/linux/ltc

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: show node to memory section relationship with symlinks in sysfs
  2008-09-30  8:06 ` Yasunori Goto
  2008-09-30 15:50   ` Dave Hansen
@ 2008-09-30 23:29   ` Gary Hade
  1 sibling, 0 replies; 7+ messages in thread
From: Gary Hade @ 2008-09-30 23:29 UTC (permalink / raw)
  To: Yasunori Goto
  Cc: Gary Hade, linux-mm, Andrew Morton, Badari Pulavarty, Mel Gorman,
	Chris McDermott, linux-kernel, Ingo Molnar, Greg KH, Dave Hansen,
	Nish Aravamudan

On Tue, Sep 30, 2008 at 05:06:08PM +0900, Yasunori Goto wrote:
> 
> > +#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
> > +int register_mem_sect_under_node(struct memory_block *mem_blk)
>         :
> 
> I think this patch is convenience even when memory hotplug is disabled.
> CONFIG_SPARSEMEM seems better than CONFIG_MEMORY_HOTPLUG_SPARSE.

Yes, this would be nice but unfortunately the presence of the
memory section directories that are referenced by the symlinks
also depend on CONFIG_MEMORY_HOTPLUG_SPARSE being enabled.  Removal
of the memory hotplug dependency for the code in drivers/base/memory.c
will require more than a simple CONFIG_MEMORY_HOTPLUG_SPARSE to
CONFIG_SPARSEMEM dependency change.  I am still looking at this.

Thanks for the review and testing.

Gary

-- 
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503  IBM T/L: 775-4503
garyhade@us.ibm.com
http://www.ibm.com/linux/ltc

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: show node to memory section relationship with symlinks in sysfs
  2008-09-30 19:41     ` Gary Hade
@ 2008-10-01  2:48       ` Yasunori Goto
  2008-10-01 16:51         ` Gary Hade
  0 siblings, 1 reply; 7+ messages in thread
From: Yasunori Goto @ 2008-10-01  2:48 UTC (permalink / raw)
  To: Gary Hade
  Cc: Dave Hansen, linux-mm, Andrew Morton, Badari Pulavarty,
	Mel Gorman, Chris McDermott, linux-kernel, Ingo Molnar, Greg KH,
	Nish Aravamudan

> On Tue, Sep 30, 2008 at 08:50:37AM -0700, Dave Hansen wrote:
> > On Tue, 2008-09-30 at 17:06 +0900, Yasunori Goto wrote:
> > > > +#define section_nr_to_nid(section_nr) pfn_to_nid(section_nr_to_pfn(section_nr))
> > > >  #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
> > > 
> > > If the first page of the section is not valid, then this section_nr_to_nid()
> > > doesn't return correct value.
> > > 
> > > I tested this patch. In my box, the start_pfn of node 1 is 1200400, but 
> > > section_nr_to_pfn(mem_blk->phys_index) returns 1200000. As a result,
> > > the section is linked to node 0.
> > 
> > Crap, I was worried about that.
> > 
> > Gary, this means that we have a N:1 relationship between NUMA nodes and
> > sections.  This normally isn't a problem because sections don't really
> > care about nodes and they layer underneath them.
> 
> So, using Yasunori-san's example the memory section starting at
> pfn 1200000 actually resides on both node 0 and node 1.


It may be possible that one section is divided to different node in theory.
(I don't know really there is...)

But, the cause of my trouble differs from it.
There is a memory hole which is occupied by firmware.
So, the memory map of my box is here.

----
early_node_map[3] active PFN ranges
    0: 0x00000100 -> 0x00006d00
    0: 0x00408000 -> 0x00410000
    1: 0x01200400 -> 0x01210000
----

memmap_init() initializes from start_pfn (to end_pfn).
So, the memmaps for this first hole (0x1200000 - 0x12003ff) are not initialized,
and node id is not set for them. This is true cause.


Bye.

-- 
Yasunori Goto 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: show node to memory section relationship with symlinks in sysfs
  2008-10-01  2:48       ` Yasunori Goto
@ 2008-10-01 16:51         ` Gary Hade
  0 siblings, 0 replies; 7+ messages in thread
From: Gary Hade @ 2008-10-01 16:51 UTC (permalink / raw)
  To: Yasunori Goto
  Cc: Gary Hade, Dave Hansen, linux-mm, Andrew Morton,
	Badari Pulavarty, Mel Gorman, Chris McDermott, linux-kernel,
	Ingo Molnar, Greg KH, Nish Aravamudan

On Wed, Oct 01, 2008 at 11:48:29AM +0900, Yasunori Goto wrote:
> > On Tue, Sep 30, 2008 at 08:50:37AM -0700, Dave Hansen wrote:
> > > On Tue, 2008-09-30 at 17:06 +0900, Yasunori Goto wrote:
> > > > > +#define section_nr_to_nid(section_nr) pfn_to_nid(section_nr_to_pfn(section_nr))
> > > > >  #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
> > > > 
> > > > If the first page of the section is not valid, then this section_nr_to_nid()
> > > > doesn't return correct value.
> > > > 
> > > > I tested this patch. In my box, the start_pfn of node 1 is 1200400, but 
> > > > section_nr_to_pfn(mem_blk->phys_index) returns 1200000. As a result,
> > > > the section is linked to node 0.
> > > 
> > > Crap, I was worried about that.
> > > 
> > > Gary, this means that we have a N:1 relationship between NUMA nodes and
> > > sections.  This normally isn't a problem because sections don't really
> > > care about nodes and they layer underneath them.
> > 
> > So, using Yasunori-san's example the memory section starting at
> > pfn 1200000 actually resides on both node 0 and node 1.
> 
> 
> It may be possible that one section is divided to different node in theory.
> (I don't know really there is...)
> 
> But, the cause of my trouble differs from it.
> There is a memory hole which is occupied by firmware.
> So, the memory map of my box is here.
> 
> ----
> early_node_map[3] active PFN ranges
>     0: 0x00000100 -> 0x00006d00
>     0: 0x00408000 -> 0x00410000
>     1: 0x01200400 -> 0x01210000
> ----
> 
> memmap_init() initializes from start_pfn (to end_pfn).
> So, the memmaps for this first hole (0x1200000 - 0x12003ff) are not initialized,
> and node id is not set for them. This is true cause.

Thanks for the clarification.  I think we need to cover both the
theoretical single memory section spanning multiple nodes case
and your memory hole/memory section intersection case.

Gary

-- 
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503  IBM T/L: 775-4503
garyhade@us.ibm.com
http://www.ibm.com/linux/ltc

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-10-01 16:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-29 20:05 [PATCH] mm: show node to memory section relationship with symlinks in sysfs Gary Hade
2008-09-30  8:06 ` Yasunori Goto
2008-09-30 15:50   ` Dave Hansen
2008-09-30 19:41     ` Gary Hade
2008-10-01  2:48       ` Yasunori Goto
2008-10-01 16:51         ` Gary Hade
2008-09-30 23:29   ` Gary Hade

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).