All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: add node physical memory range to sysfs
@ 2012-12-07 22:34 ` Davidlohr Bueso
  0 siblings, 0 replies; 22+ messages in thread
From: Davidlohr Bueso @ 2012-12-07 22:34 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: linux-kernel, linux-mm

This patch adds a new 'memrange' file that shows the starting and
ending physical addresses that are associated to a node. This is
useful for identifying specific DIMMs within the system.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
---
 drivers/base/node.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index af1a177..f165a0a 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -211,6 +211,19 @@ static ssize_t node_read_distance(struct device *dev,
 }
 static DEVICE_ATTR(distance, S_IRUGO, node_read_distance, NULL);
 
+static ssize_t node_read_memrange(struct device *dev,
+				  struct device_attribute *attr, char *buf)
+{
+	int nid = dev->id;
+	unsigned long start_pfn = NODE_DATA(nid)->node_start_pfn;
+	unsigned long end_pfn = start_pfn + NODE_DATA(nid)->node_spanned_pages;
+
+	return sprintf(buf, "%#010Lx-%#010Lx\n",
+		       (unsigned long long) start_pfn << PAGE_SHIFT,
+		       (unsigned long long) (end_pfn << PAGE_SHIFT) - 1);
+}
+static DEVICE_ATTR(memrange, S_IRUGO, node_read_memrange, NULL);
+
 #ifdef CONFIG_HUGETLBFS
 /*
  * hugetlbfs per node attributes registration interface:
@@ -274,6 +287,7 @@ int register_node(struct node *node, int num, struct node *parent)
 		device_create_file(&node->dev, &dev_attr_numastat);
 		device_create_file(&node->dev, &dev_attr_distance);
 		device_create_file(&node->dev, &dev_attr_vmstat);
+		device_create_file(&node->dev, &dev_attr_memrange);
 
 		scan_unevictable_register_node(node);
 
@@ -299,6 +313,7 @@ void unregister_node(struct node *node)
 	device_remove_file(&node->dev, &dev_attr_numastat);
 	device_remove_file(&node->dev, &dev_attr_distance);
 	device_remove_file(&node->dev, &dev_attr_vmstat);
+	device_remove_file(&node->dev, &dev_attr_memrange);
 
 	scan_unevictable_unregister_node(node);
 	hugetlb_unregister_node(node);		/* no-op, if memoryless node */
-- 
1.7.11.7




^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH] mm: add node physical memory range to sysfs
@ 2012-12-07 22:34 ` Davidlohr Bueso
  0 siblings, 0 replies; 22+ messages in thread
From: Davidlohr Bueso @ 2012-12-07 22:34 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: linux-kernel, linux-mm

This patch adds a new 'memrange' file that shows the starting and
ending physical addresses that are associated to a node. This is
useful for identifying specific DIMMs within the system.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
---
 drivers/base/node.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index af1a177..f165a0a 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -211,6 +211,19 @@ static ssize_t node_read_distance(struct device *dev,
 }
 static DEVICE_ATTR(distance, S_IRUGO, node_read_distance, NULL);
 
+static ssize_t node_read_memrange(struct device *dev,
+				  struct device_attribute *attr, char *buf)
+{
+	int nid = dev->id;
+	unsigned long start_pfn = NODE_DATA(nid)->node_start_pfn;
+	unsigned long end_pfn = start_pfn + NODE_DATA(nid)->node_spanned_pages;
+
+	return sprintf(buf, "%#010Lx-%#010Lx\n",
+		       (unsigned long long) start_pfn << PAGE_SHIFT,
+		       (unsigned long long) (end_pfn << PAGE_SHIFT) - 1);
+}
+static DEVICE_ATTR(memrange, S_IRUGO, node_read_memrange, NULL);
+
 #ifdef CONFIG_HUGETLBFS
 /*
  * hugetlbfs per node attributes registration interface:
@@ -274,6 +287,7 @@ int register_node(struct node *node, int num, struct node *parent)
 		device_create_file(&node->dev, &dev_attr_numastat);
 		device_create_file(&node->dev, &dev_attr_distance);
 		device_create_file(&node->dev, &dev_attr_vmstat);
+		device_create_file(&node->dev, &dev_attr_memrange);
 
 		scan_unevictable_register_node(node);
 
@@ -299,6 +313,7 @@ void unregister_node(struct node *node)
 	device_remove_file(&node->dev, &dev_attr_numastat);
 	device_remove_file(&node->dev, &dev_attr_distance);
 	device_remove_file(&node->dev, &dev_attr_vmstat);
+	device_remove_file(&node->dev, &dev_attr_memrange);
 
 	scan_unevictable_unregister_node(node);
 	hugetlb_unregister_node(node);		/* no-op, if memoryless node */
-- 
1.7.11.7



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: add node physical memory range to sysfs
  2012-12-07 22:34 ` Davidlohr Bueso
@ 2012-12-07 23:51   ` Andrew Morton
  -1 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2012-12-07 23:51 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: Greg Kroah-Hartman, linux-kernel, linux-mm

On Fri, 07 Dec 2012 14:34:56 -0800
Davidlohr Bueso <davidlohr.bueso@hp.com> wrote:

> This patch adds a new 'memrange' file that shows the starting and
> ending physical addresses that are associated to a node. This is
> useful for identifying specific DIMMs within the system.

I was going to bug you about docmentation, but apparently we didn't
document /sys/devices/system/node/node*/.  A great labor-saving device,
that!

> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -211,6 +211,19 @@ static ssize_t node_read_distance(struct device *dev,
>  }
>  static DEVICE_ATTR(distance, S_IRUGO, node_read_distance, NULL);
>  
> +static ssize_t node_read_memrange(struct device *dev,
> +				  struct device_attribute *attr, char *buf)
> +{
> +	int nid = dev->id;
> +	unsigned long start_pfn = NODE_DATA(nid)->node_start_pfn;
> +	unsigned long end_pfn = start_pfn + NODE_DATA(nid)->node_spanned_pages;

hm.  Is this correct for all for
FLATMEM/SPARSEMEM/SPARSEMEM_VMEMMAP/DISCONTIGME/etc?



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: add node physical memory range to sysfs
@ 2012-12-07 23:51   ` Andrew Morton
  0 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2012-12-07 23:51 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: Greg Kroah-Hartman, linux-kernel, linux-mm

On Fri, 07 Dec 2012 14:34:56 -0800
Davidlohr Bueso <davidlohr.bueso@hp.com> wrote:

> This patch adds a new 'memrange' file that shows the starting and
> ending physical addresses that are associated to a node. This is
> useful for identifying specific DIMMs within the system.

I was going to bug you about docmentation, but apparently we didn't
document /sys/devices/system/node/node*/.  A great labor-saving device,
that!

> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -211,6 +211,19 @@ static ssize_t node_read_distance(struct device *dev,
>  }
>  static DEVICE_ATTR(distance, S_IRUGO, node_read_distance, NULL);
>  
> +static ssize_t node_read_memrange(struct device *dev,
> +				  struct device_attribute *attr, char *buf)
> +{
> +	int nid = dev->id;
> +	unsigned long start_pfn = NODE_DATA(nid)->node_start_pfn;
> +	unsigned long end_pfn = start_pfn + NODE_DATA(nid)->node_spanned_pages;

hm.  Is this correct for all for
FLATMEM/SPARSEMEM/SPARSEMEM_VMEMMAP/DISCONTIGME/etc?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: add node physical memory range to sysfs
  2012-12-07 23:51   ` Andrew Morton
@ 2012-12-08  0:17     ` Dave Hansen
  -1 siblings, 0 replies; 22+ messages in thread
From: Dave Hansen @ 2012-12-08  0:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Davidlohr Bueso, Greg Kroah-Hartman, linux-kernel, linux-mm

On 12/07/2012 03:51 PM, Andrew Morton wrote:
>> > +static ssize_t node_read_memrange(struct device *dev,
>> > +				  struct device_attribute *attr, char *buf)
>> > +{
>> > +	int nid = dev->id;
>> > +	unsigned long start_pfn = NODE_DATA(nid)->node_start_pfn;
>> > +	unsigned long end_pfn = start_pfn + NODE_DATA(nid)->node_spanned_pages;
> hm.  Is this correct for all for
> FLATMEM/SPARSEMEM/SPARSEMEM_VMEMMAP/DISCONTIGME/etc?

It's not _wrong_ per se, but it's not super precise, either.

The problem is, it's quite valid to have these node_start/spanned ranges
overlap between two or more nodes on some hardware.  So, if the desired
purpose is to map nodes to DIMMs, then this can only accomplish this on
_some_ hardware, not all.  It would be completely useless for that
purpose for some configurations.

Seems like the better way to do this would be to expose the DIMMs
themselves in some way, and then map _those_ back to a node.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: add node physical memory range to sysfs
@ 2012-12-08  0:17     ` Dave Hansen
  0 siblings, 0 replies; 22+ messages in thread
From: Dave Hansen @ 2012-12-08  0:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Davidlohr Bueso, Greg Kroah-Hartman, linux-kernel, linux-mm

On 12/07/2012 03:51 PM, Andrew Morton wrote:
>> > +static ssize_t node_read_memrange(struct device *dev,
>> > +				  struct device_attribute *attr, char *buf)
>> > +{
>> > +	int nid = dev->id;
>> > +	unsigned long start_pfn = NODE_DATA(nid)->node_start_pfn;
>> > +	unsigned long end_pfn = start_pfn + NODE_DATA(nid)->node_spanned_pages;
> hm.  Is this correct for all for
> FLATMEM/SPARSEMEM/SPARSEMEM_VMEMMAP/DISCONTIGME/etc?

It's not _wrong_ per se, but it's not super precise, either.

The problem is, it's quite valid to have these node_start/spanned ranges
overlap between two or more nodes on some hardware.  So, if the desired
purpose is to map nodes to DIMMs, then this can only accomplish this on
_some_ hardware, not all.  It would be completely useless for that
purpose for some configurations.

Seems like the better way to do this would be to expose the DIMMs
themselves in some way, and then map _those_ back to a node.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: add node physical memory range to sysfs
  2012-12-07 22:34 ` Davidlohr Bueso
@ 2012-12-08 19:45   ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 22+ messages in thread
From: Greg Kroah-Hartman @ 2012-12-08 19:45 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: linux-kernel, linux-mm

On Fri, Dec 07, 2012 at 02:34:56PM -0800, Davidlohr Bueso wrote:
> This patch adds a new 'memrange' file that shows the starting and
> ending physical addresses that are associated to a node. This is
> useful for identifying specific DIMMs within the system.
> 
> Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
> ---
>  drivers/base/node.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index af1a177..f165a0a 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -211,6 +211,19 @@ static ssize_t node_read_distance(struct device *dev,
>  }
>  static DEVICE_ATTR(distance, S_IRUGO, node_read_distance, NULL);
>  
> +static ssize_t node_read_memrange(struct device *dev,
> +				  struct device_attribute *attr, char *buf)
> +{
> +	int nid = dev->id;
> +	unsigned long start_pfn = NODE_DATA(nid)->node_start_pfn;
> +	unsigned long end_pfn = start_pfn + NODE_DATA(nid)->node_spanned_pages;
> +
> +	return sprintf(buf, "%#010Lx-%#010Lx\n",
> +		       (unsigned long long) start_pfn << PAGE_SHIFT,
> +		       (unsigned long long) (end_pfn << PAGE_SHIFT) - 1);
> +}
> +static DEVICE_ATTR(memrange, S_IRUGO, node_read_memrange, NULL);

As you're adding a new sysfs file, we need a Documentation/ABI/ entry as
well.  Yes, the existing ones aren't there already, as Andrew points
out, sorry, but that means you get to document them all :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: add node physical memory range to sysfs
@ 2012-12-08 19:45   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 22+ messages in thread
From: Greg Kroah-Hartman @ 2012-12-08 19:45 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: linux-kernel, linux-mm

On Fri, Dec 07, 2012 at 02:34:56PM -0800, Davidlohr Bueso wrote:
> This patch adds a new 'memrange' file that shows the starting and
> ending physical addresses that are associated to a node. This is
> useful for identifying specific DIMMs within the system.
> 
> Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
> ---
>  drivers/base/node.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index af1a177..f165a0a 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -211,6 +211,19 @@ static ssize_t node_read_distance(struct device *dev,
>  }
>  static DEVICE_ATTR(distance, S_IRUGO, node_read_distance, NULL);
>  
> +static ssize_t node_read_memrange(struct device *dev,
> +				  struct device_attribute *attr, char *buf)
> +{
> +	int nid = dev->id;
> +	unsigned long start_pfn = NODE_DATA(nid)->node_start_pfn;
> +	unsigned long end_pfn = start_pfn + NODE_DATA(nid)->node_spanned_pages;
> +
> +	return sprintf(buf, "%#010Lx-%#010Lx\n",
> +		       (unsigned long long) start_pfn << PAGE_SHIFT,
> +		       (unsigned long long) (end_pfn << PAGE_SHIFT) - 1);
> +}
> +static DEVICE_ATTR(memrange, S_IRUGO, node_read_memrange, NULL);

As you're adding a new sysfs file, we need a Documentation/ABI/ entry as
well.  Yes, the existing ones aren't there already, as Andrew points
out, sorry, but that means you get to document them all :)

thanks,

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: add node physical memory range to sysfs
  2012-12-08  0:17     ` Dave Hansen
@ 2012-12-13  1:18       ` Davidlohr Bueso
  -1 siblings, 0 replies; 22+ messages in thread
From: Davidlohr Bueso @ 2012-12-13  1:18 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Andrew Morton, Greg Kroah-Hartman, linux-kernel, linux-mm

On Fri, 2012-12-07 at 16:17 -0800, Dave Hansen wrote:
> On 12/07/2012 03:51 PM, Andrew Morton wrote:
> >> > +static ssize_t node_read_memrange(struct device *dev,
> >> > +				  struct device_attribute *attr, char *buf)
> >> > +{
> >> > +	int nid = dev->id;
> >> > +	unsigned long start_pfn = NODE_DATA(nid)->node_start_pfn;
> >> > +	unsigned long end_pfn = start_pfn + NODE_DATA(nid)->node_spanned_pages;
> > hm.  Is this correct for all for
> > FLATMEM/SPARSEMEM/SPARSEMEM_VMEMMAP/DISCONTIGME/etc?
> 
> It's not _wrong_ per se, but it's not super precise, either.
> 
> The problem is, it's quite valid to have these node_start/spanned ranges
> overlap between two or more nodes on some hardware.  So, if the desired
> purpose is to map nodes to DIMMs, then this can only accomplish this on
> _some_ hardware, not all.  It would be completely useless for that
> purpose for some configurations.
> 
> Seems like the better way to do this would be to expose the DIMMs
> themselves in some way, and then map _those_ back to a node.
> 

Good point, and from a DIMM perspective, I agree, and will look into
this. However, IMHO, having the range of physical addresses for every
node still provides valuable information, from a NUMA point of view. For
example, dealing with node related e820 mappings.

Andrew, with the documentation patch, would you be wiling to pickup a v2
of this?

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: add node physical memory range to sysfs
@ 2012-12-13  1:18       ` Davidlohr Bueso
  0 siblings, 0 replies; 22+ messages in thread
From: Davidlohr Bueso @ 2012-12-13  1:18 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Andrew Morton, Greg Kroah-Hartman, linux-kernel, linux-mm

On Fri, 2012-12-07 at 16:17 -0800, Dave Hansen wrote:
> On 12/07/2012 03:51 PM, Andrew Morton wrote:
> >> > +static ssize_t node_read_memrange(struct device *dev,
> >> > +				  struct device_attribute *attr, char *buf)
> >> > +{
> >> > +	int nid = dev->id;
> >> > +	unsigned long start_pfn = NODE_DATA(nid)->node_start_pfn;
> >> > +	unsigned long end_pfn = start_pfn + NODE_DATA(nid)->node_spanned_pages;
> > hm.  Is this correct for all for
> > FLATMEM/SPARSEMEM/SPARSEMEM_VMEMMAP/DISCONTIGME/etc?
> 
> It's not _wrong_ per se, but it's not super precise, either.
> 
> The problem is, it's quite valid to have these node_start/spanned ranges
> overlap between two or more nodes on some hardware.  So, if the desired
> purpose is to map nodes to DIMMs, then this can only accomplish this on
> _some_ hardware, not all.  It would be completely useless for that
> purpose for some configurations.
> 
> Seems like the better way to do this would be to expose the DIMMs
> themselves in some way, and then map _those_ back to a node.
> 

Good point, and from a DIMM perspective, I agree, and will look into
this. However, IMHO, having the range of physical addresses for every
node still provides valuable information, from a NUMA point of view. For
example, dealing with node related e820 mappings.

Andrew, with the documentation patch, would you be wiling to pickup a v2
of this?

Thanks,
Davidlohr

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: add node physical memory range to sysfs
  2012-12-13  1:18       ` Davidlohr Bueso
@ 2012-12-13  1:48         ` Dave Hansen
  -1 siblings, 0 replies; 22+ messages in thread
From: Dave Hansen @ 2012-12-13  1:48 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: Andrew Morton, Greg Kroah-Hartman, linux-kernel, linux-mm

On 12/12/2012 05:18 PM, Davidlohr Bueso wrote:
> On Fri, 2012-12-07 at 16:17 -0800, Dave Hansen wrote:
>> Seems like the better way to do this would be to expose the DIMMs
>> themselves in some way, and then map _those_ back to a node.
> 
> Good point, and from a DIMM perspective, I agree, and will look into
> this. However, IMHO, having the range of physical addresses for every
> node still provides valuable information, from a NUMA point of view. For
> example, dealing with node related e820 mappings.

But if we went and did it per-DIMM (showing which physical addresses and
NUMA nodes a DIMM maps to), wouldn't that be redundant with this
proposed interface?

How do you plan to use this in practice, btw?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: add node physical memory range to sysfs
@ 2012-12-13  1:48         ` Dave Hansen
  0 siblings, 0 replies; 22+ messages in thread
From: Dave Hansen @ 2012-12-13  1:48 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: Andrew Morton, Greg Kroah-Hartman, linux-kernel, linux-mm

On 12/12/2012 05:18 PM, Davidlohr Bueso wrote:
> On Fri, 2012-12-07 at 16:17 -0800, Dave Hansen wrote:
>> Seems like the better way to do this would be to expose the DIMMs
>> themselves in some way, and then map _those_ back to a node.
> 
> Good point, and from a DIMM perspective, I agree, and will look into
> this. However, IMHO, having the range of physical addresses for every
> node still provides valuable information, from a NUMA point of view. For
> example, dealing with node related e820 mappings.

But if we went and did it per-DIMM (showing which physical addresses and
NUMA nodes a DIMM maps to), wouldn't that be redundant with this
proposed interface?

How do you plan to use this in practice, btw?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: add node physical memory range to sysfs
  2012-12-13  1:48         ` Dave Hansen
@ 2012-12-13  2:03           ` Davidlohr Bueso
  -1 siblings, 0 replies; 22+ messages in thread
From: Davidlohr Bueso @ 2012-12-13  2:03 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Andrew Morton, Greg Kroah-Hartman, linux-kernel, linux-mm

On Wed, 2012-12-12 at 17:48 -0800, Dave Hansen wrote:
> On 12/12/2012 05:18 PM, Davidlohr Bueso wrote:
> > On Fri, 2012-12-07 at 16:17 -0800, Dave Hansen wrote:
> >> Seems like the better way to do this would be to expose the DIMMs
> >> themselves in some way, and then map _those_ back to a node.
> > 
> > Good point, and from a DIMM perspective, I agree, and will look into
> > this. However, IMHO, having the range of physical addresses for every
> > node still provides valuable information, from a NUMA point of view. For
> > example, dealing with node related e820 mappings.
> 
> But if we went and did it per-DIMM (showing which physical addresses and
> NUMA nodes a DIMM maps to), wouldn't that be redundant with this
> proposed interface?
> 

If DIMMs overlap between nodes, then we wouldn't have an exact range for
a node in question. Having both approaches would complement each other.

> How do you plan to use this in practice, btw?
> 

It started because I needed to recognize the address of a node to remove
it from the e820 mappings and have the system "ignore" the node's
memory.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: add node physical memory range to sysfs
@ 2012-12-13  2:03           ` Davidlohr Bueso
  0 siblings, 0 replies; 22+ messages in thread
From: Davidlohr Bueso @ 2012-12-13  2:03 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Andrew Morton, Greg Kroah-Hartman, linux-kernel, linux-mm

On Wed, 2012-12-12 at 17:48 -0800, Dave Hansen wrote:
> On 12/12/2012 05:18 PM, Davidlohr Bueso wrote:
> > On Fri, 2012-12-07 at 16:17 -0800, Dave Hansen wrote:
> >> Seems like the better way to do this would be to expose the DIMMs
> >> themselves in some way, and then map _those_ back to a node.
> > 
> > Good point, and from a DIMM perspective, I agree, and will look into
> > this. However, IMHO, having the range of physical addresses for every
> > node still provides valuable information, from a NUMA point of view. For
> > example, dealing with node related e820 mappings.
> 
> But if we went and did it per-DIMM (showing which physical addresses and
> NUMA nodes a DIMM maps to), wouldn't that be redundant with this
> proposed interface?
> 

If DIMMs overlap between nodes, then we wouldn't have an exact range for
a node in question. Having both approaches would complement each other.

> How do you plan to use this in practice, btw?
> 

It started because I needed to recognize the address of a node to remove
it from the e820 mappings and have the system "ignore" the node's
memory.

Thanks,
Davidlohr

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: add node physical memory range to sysfs
  2012-12-13  2:03           ` Davidlohr Bueso
@ 2012-12-13  4:49             ` Dave Hansen
  -1 siblings, 0 replies; 22+ messages in thread
From: Dave Hansen @ 2012-12-13  4:49 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: Andrew Morton, Greg Kroah-Hartman, linux-kernel, linux-mm

On 12/12/2012 06:03 PM, Davidlohr Bueso wrote:
> On Wed, 2012-12-12 at 17:48 -0800, Dave Hansen wrote:
>> But if we went and did it per-DIMM (showing which physical addresses and
>> NUMA nodes a DIMM maps to), wouldn't that be redundant with this
>> proposed interface?
> 
> If DIMMs overlap between nodes, then we wouldn't have an exact range for
> a node in question. Having both approaches would complement each other.

How is that possible?  If NUMA nodes are defined by distances from CPUs
to memory, how could a DIMM have more than a single distance to any
given CPU?

>> How do you plan to use this in practice, btw?
> 
> It started because I needed to recognize the address of a node to remove
> it from the e820 mappings and have the system "ignore" the node's
> memory.

Actually, now that I think about it, can you check in the
/sys/devices/system/ directories for memory and nodes?  We have linkages
there for each memory section to every NUMA node, and you can also
derive the physical address from the phys_index in each section.  That
should allow you to work out physical addresses for a given node.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: add node physical memory range to sysfs
@ 2012-12-13  4:49             ` Dave Hansen
  0 siblings, 0 replies; 22+ messages in thread
From: Dave Hansen @ 2012-12-13  4:49 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: Andrew Morton, Greg Kroah-Hartman, linux-kernel, linux-mm

On 12/12/2012 06:03 PM, Davidlohr Bueso wrote:
> On Wed, 2012-12-12 at 17:48 -0800, Dave Hansen wrote:
>> But if we went and did it per-DIMM (showing which physical addresses and
>> NUMA nodes a DIMM maps to), wouldn't that be redundant with this
>> proposed interface?
> 
> If DIMMs overlap between nodes, then we wouldn't have an exact range for
> a node in question. Having both approaches would complement each other.

How is that possible?  If NUMA nodes are defined by distances from CPUs
to memory, how could a DIMM have more than a single distance to any
given CPU?

>> How do you plan to use this in practice, btw?
> 
> It started because I needed to recognize the address of a node to remove
> it from the e820 mappings and have the system "ignore" the node's
> memory.

Actually, now that I think about it, can you check in the
/sys/devices/system/ directories for memory and nodes?  We have linkages
there for each memory section to every NUMA node, and you can also
derive the physical address from the phys_index in each section.  That
should allow you to work out physical addresses for a given node.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: add node physical memory range to sysfs
  2012-12-13  4:49             ` Dave Hansen
@ 2012-12-13 15:17               ` KOSAKI Motohiro
  -1 siblings, 0 replies; 22+ messages in thread
From: KOSAKI Motohiro @ 2012-12-13 15:17 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Davidlohr Bueso, Andrew Morton, Greg Kroah-Hartman, linux-kernel,
	linux-mm, kosaki.motohiro

(12/12/12 11:49 PM), Dave Hansen wrote:
> On 12/12/2012 06:03 PM, Davidlohr Bueso wrote:
>> On Wed, 2012-12-12 at 17:48 -0800, Dave Hansen wrote:
>>> But if we went and did it per-DIMM (showing which physical addresses and
>>> NUMA nodes a DIMM maps to), wouldn't that be redundant with this
>>> proposed interface?
>>
>> If DIMMs overlap between nodes, then we wouldn't have an exact range for
>> a node in question. Having both approaches would complement each other.
> 
> How is that possible?  If NUMA nodes are defined by distances from CPUs
> to memory, how could a DIMM have more than a single distance to any
> given CPU?

numa_emulation? just guess.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: add node physical memory range to sysfs
@ 2012-12-13 15:17               ` KOSAKI Motohiro
  0 siblings, 0 replies; 22+ messages in thread
From: KOSAKI Motohiro @ 2012-12-13 15:17 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Davidlohr Bueso, Andrew Morton, Greg Kroah-Hartman, linux-kernel,
	linux-mm, kosaki.motohiro

(12/12/12 11:49 PM), Dave Hansen wrote:
> On 12/12/2012 06:03 PM, Davidlohr Bueso wrote:
>> On Wed, 2012-12-12 at 17:48 -0800, Dave Hansen wrote:
>>> But if we went and did it per-DIMM (showing which physical addresses and
>>> NUMA nodes a DIMM maps to), wouldn't that be redundant with this
>>> proposed interface?
>>
>> If DIMMs overlap between nodes, then we wouldn't have an exact range for
>> a node in question. Having both approaches would complement each other.
> 
> How is that possible?  If NUMA nodes are defined by distances from CPUs
> to memory, how could a DIMM have more than a single distance to any
> given CPU?

numa_emulation? just guess.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: add node physical memory range to sysfs
  2012-12-13  4:49             ` Dave Hansen
@ 2012-12-13 23:15               ` Davidlohr Bueso
  -1 siblings, 0 replies; 22+ messages in thread
From: Davidlohr Bueso @ 2012-12-13 23:15 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Andrew Morton, Greg Kroah-Hartman, linux-kernel, linux-mm

On Wed, 2012-12-12 at 20:49 -0800, Dave Hansen wrote:
> On 12/12/2012 06:03 PM, Davidlohr Bueso wrote:
> > On Wed, 2012-12-12 at 17:48 -0800, Dave Hansen wrote:
> >> But if we went and did it per-DIMM (showing which physical addresses and
> >> NUMA nodes a DIMM maps to), wouldn't that be redundant with this
> >> proposed interface?
> > 
> > If DIMMs overlap between nodes, then we wouldn't have an exact range for
> > a node in question. Having both approaches would complement each other.
> 
> How is that possible?  If NUMA nodes are defined by distances from CPUs
> to memory, how could a DIMM have more than a single distance to any
> given CPU?

Can't this occur when interleaving emulated nodes with physical ones?

> 
> >> How do you plan to use this in practice, btw?
> > 
> > It started because I needed to recognize the address of a node to remove
> > it from the e820 mappings and have the system "ignore" the node's
> > memory.
> 
> Actually, now that I think about it, can you check in the
> /sys/devices/system/ directories for memory and nodes?  We have linkages
> there for each memory section to every NUMA node, and you can also
> derive the physical address from the phys_index in each section.  That
> should allow you to work out physical addresses for a given node.
> 

I had looked at the memory-hotplug interface but found that this
'phys_index' doesn't include holes, while ->node_spanned_pages does.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: add node physical memory range to sysfs
@ 2012-12-13 23:15               ` Davidlohr Bueso
  0 siblings, 0 replies; 22+ messages in thread
From: Davidlohr Bueso @ 2012-12-13 23:15 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Andrew Morton, Greg Kroah-Hartman, linux-kernel, linux-mm

On Wed, 2012-12-12 at 20:49 -0800, Dave Hansen wrote:
> On 12/12/2012 06:03 PM, Davidlohr Bueso wrote:
> > On Wed, 2012-12-12 at 17:48 -0800, Dave Hansen wrote:
> >> But if we went and did it per-DIMM (showing which physical addresses and
> >> NUMA nodes a DIMM maps to), wouldn't that be redundant with this
> >> proposed interface?
> > 
> > If DIMMs overlap between nodes, then we wouldn't have an exact range for
> > a node in question. Having both approaches would complement each other.
> 
> How is that possible?  If NUMA nodes are defined by distances from CPUs
> to memory, how could a DIMM have more than a single distance to any
> given CPU?

Can't this occur when interleaving emulated nodes with physical ones?

> 
> >> How do you plan to use this in practice, btw?
> > 
> > It started because I needed to recognize the address of a node to remove
> > it from the e820 mappings and have the system "ignore" the node's
> > memory.
> 
> Actually, now that I think about it, can you check in the
> /sys/devices/system/ directories for memory and nodes?  We have linkages
> there for each memory section to every NUMA node, and you can also
> derive the physical address from the phys_index in each section.  That
> should allow you to work out physical addresses for a given node.
> 

I had looked at the memory-hotplug interface but found that this
'phys_index' doesn't include holes, while ->node_spanned_pages does.

Thanks,
Davidlohr

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: add node physical memory range to sysfs
  2012-12-13 23:15               ` Davidlohr Bueso
@ 2012-12-14  0:18                 ` Dave Hansen
  -1 siblings, 0 replies; 22+ messages in thread
From: Dave Hansen @ 2012-12-14  0:18 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: Andrew Morton, Greg Kroah-Hartman, linux-kernel, linux-mm

On 12/13/2012 03:15 PM, Davidlohr Bueso wrote:
> On Wed, 2012-12-12 at 20:49 -0800, Dave Hansen wrote:
>> How is that possible?  If NUMA nodes are defined by distances from CPUs
>> to memory, how could a DIMM have more than a single distance to any
>> given CPU?
> 
> Can't this occur when interleaving emulated nodes with physical ones?

I'm glad you mentioned numa=fake. Its interleaving node configuration
would also make the patch you've proposed completely useless.  Let's say
you've got a two-node system with 16GB of RAM:

|        0        |      1      |

And you use numa=fake=1G, you'll get the interleaved like this:

|0|1|0|1|0|1|0|1|0|1|0|1|0|1|0|1|

The information that is exported from the interface you're proposing
would be:

node0: start_pfn=0  and spanned_pages = 15G
node1: start_pfn=1G and spanned_pages = 15G

In that situation, there is no way, to figure out which DIMM is backed
by a given node since the node ranges overlap.

>>>> How do you plan to use this in practice, btw?
>>>
>>> It started because I needed to recognize the address of a node to remove
>>> it from the e820 mappings and have the system "ignore" the node's
>>> memory.
>>
>> Actually, now that I think about it, can you check in the
>> /sys/devices/system/ directories for memory and nodes?  We have linkages
>> there for each memory section to every NUMA node, and you can also
>> derive the physical address from the phys_index in each section.  That
>> should allow you to work out physical addresses for a given node.
>> 
> I had looked at the memory-hotplug interface but found that this
> 'phys_index' doesn't include holes, while ->node_spanned_pages does.

I'm not sure what you mean.  Each memory section in sysfs accounts for
SECTION_SIZE where sections are 128MB by default on x86_64.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: add node physical memory range to sysfs
@ 2012-12-14  0:18                 ` Dave Hansen
  0 siblings, 0 replies; 22+ messages in thread
From: Dave Hansen @ 2012-12-14  0:18 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: Andrew Morton, Greg Kroah-Hartman, linux-kernel, linux-mm

On 12/13/2012 03:15 PM, Davidlohr Bueso wrote:
> On Wed, 2012-12-12 at 20:49 -0800, Dave Hansen wrote:
>> How is that possible?  If NUMA nodes are defined by distances from CPUs
>> to memory, how could a DIMM have more than a single distance to any
>> given CPU?
> 
> Can't this occur when interleaving emulated nodes with physical ones?

I'm glad you mentioned numa=fake. Its interleaving node configuration
would also make the patch you've proposed completely useless.  Let's say
you've got a two-node system with 16GB of RAM:

|        0        |      1      |

And you use numa=fake=1G, you'll get the interleaved like this:

|0|1|0|1|0|1|0|1|0|1|0|1|0|1|0|1|

The information that is exported from the interface you're proposing
would be:

node0: start_pfn=0  and spanned_pages = 15G
node1: start_pfn=1G and spanned_pages = 15G

In that situation, there is no way, to figure out which DIMM is backed
by a given node since the node ranges overlap.

>>>> How do you plan to use this in practice, btw?
>>>
>>> It started because I needed to recognize the address of a node to remove
>>> it from the e820 mappings and have the system "ignore" the node's
>>> memory.
>>
>> Actually, now that I think about it, can you check in the
>> /sys/devices/system/ directories for memory and nodes?  We have linkages
>> there for each memory section to every NUMA node, and you can also
>> derive the physical address from the phys_index in each section.  That
>> should allow you to work out physical addresses for a given node.
>> 
> I had looked at the memory-hotplug interface but found that this
> 'phys_index' doesn't include holes, while ->node_spanned_pages does.

I'm not sure what you mean.  Each memory section in sysfs accounts for
SECTION_SIZE where sections are 128MB by default on x86_64.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2012-12-14  0:19 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-07 22:34 [PATCH] mm: add node physical memory range to sysfs Davidlohr Bueso
2012-12-07 22:34 ` Davidlohr Bueso
2012-12-07 23:51 ` Andrew Morton
2012-12-07 23:51   ` Andrew Morton
2012-12-08  0:17   ` Dave Hansen
2012-12-08  0:17     ` Dave Hansen
2012-12-13  1:18     ` Davidlohr Bueso
2012-12-13  1:18       ` Davidlohr Bueso
2012-12-13  1:48       ` Dave Hansen
2012-12-13  1:48         ` Dave Hansen
2012-12-13  2:03         ` Davidlohr Bueso
2012-12-13  2:03           ` Davidlohr Bueso
2012-12-13  4:49           ` Dave Hansen
2012-12-13  4:49             ` Dave Hansen
2012-12-13 15:17             ` KOSAKI Motohiro
2012-12-13 15:17               ` KOSAKI Motohiro
2012-12-13 23:15             ` Davidlohr Bueso
2012-12-13 23:15               ` Davidlohr Bueso
2012-12-14  0:18               ` Dave Hansen
2012-12-14  0:18                 ` Dave Hansen
2012-12-08 19:45 ` Greg Kroah-Hartman
2012-12-08 19:45   ` Greg Kroah-Hartman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.