All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 1/2] mm: Export definition of 'zone_names' array through mmzone.h
@ 2016-09-06  5:34 ` Anshuman Khandual
  0 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2016-09-06  5:34 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: akpm

zone_names[] is used to identify any zone given it's index which
can be used in many other places. So exporting the definition
through include/linux/mmzone.h header for it's broader access.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
Changes in V2:
- Removed the static and declared in mmzone.h per Andrew

 include/linux/mmzone.h | 1 +
 mm/page_alloc.c        | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 7f2ae99..9943204 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -341,6 +341,7 @@ enum zone_type {
 
 };
 
+extern char * const zone_names[];
 #ifndef __GENERATING_BOUNDS_H
 
 struct zone {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a2214c6..cb46bf8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -207,7 +207,7 @@ int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
 
 EXPORT_SYMBOL(totalram_pages);
 
-static char * const zone_names[MAX_NR_ZONES] = {
+char * const zone_names[MAX_NR_ZONES] = {
 #ifdef CONFIG_ZONE_DMA
 	 "DMA",
 #endif
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V2 1/2] mm: Export definition of 'zone_names' array through mmzone.h
@ 2016-09-06  5:34 ` Anshuman Khandual
  0 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2016-09-06  5:34 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: akpm

zone_names[] is used to identify any zone given it's index which
can be used in many other places. So exporting the definition
through include/linux/mmzone.h header for it's broader access.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
Changes in V2:
- Removed the static and declared in mmzone.h per Andrew

 include/linux/mmzone.h | 1 +
 mm/page_alloc.c        | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 7f2ae99..9943204 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -341,6 +341,7 @@ enum zone_type {
 
 };
 
+extern char * const zone_names[];
 #ifndef __GENERATING_BOUNDS_H
 
 struct zone {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a2214c6..cb46bf8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -207,7 +207,7 @@ int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
 
 EXPORT_SYMBOL(totalram_pages);
 
-static char * const zone_names[MAX_NR_ZONES] = {
+char * const zone_names[MAX_NR_ZONES] = {
 #ifdef CONFIG_ZONE_DMA
 	 "DMA",
 #endif
-- 
2.1.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V2 2/2] mm: Add sysfs interface to dump each node's zonelist information
  2016-09-06  5:34 ` Anshuman Khandual
@ 2016-09-06  5:34   ` Anshuman Khandual
  -1 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2016-09-06  5:34 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: akpm

Each individual node in the system has a ZONELIST_FALLBACK zonelist
and a ZONELIST_NOFALLBACK zonelist. These zonelists decide fallback
order of zones during memory allocations. Sometimes it helps to dump
these zonelists to see the priority order of various zones in them.

Particularly platforms which support memory hotplug into previously
non existing zones (at boot), this interface helps in visualizing
which all zonelists of the system at what priority level, the new
hot added memory ends up in. POWER is such a platform where all the
memory detected during boot time remains with ZONE_DMA for good but
then hot plug process can actually get new memory into ZONE_MOVABLE.
So having a way to get the snapshot of the zonelists on the system
after memory or node hot[un]plug is desirable. This change adds one
new sysfs interface (/sys/devices/system/memory/system_zone_details)
which will fetch and dump this information.

Example zonelist information from a KVM guest.

[NODE (0)]
        ZONELIST_FALLBACK
        (0) (node 0) (zone DMA c00000000140c000)
        (1) (node 1) (zone DMA c000000100000000)
        (2) (node 2) (zone DMA c000000200000000)
        (3) (node 3) (zone DMA c000000300000000)
        ZONELIST_NOFALLBACK
        (0) (node 0) (zone DMA c00000000140c000)
[NODE (1)]
        ZONELIST_FALLBACK
        (0) (node 1) (zone DMA c000000100000000)
        (1) (node 2) (zone DMA c000000200000000)
        (2) (node 3) (zone DMA c000000300000000)
        (3) (node 0) (zone DMA c00000000140c000)
        ZONELIST_NOFALLBACK
        (0) (node 1) (zone DMA c000000100000000)
[NODE (2)]
        ZONELIST_FALLBACK
        (0) (node 2) (zone DMA c000000200000000)
        (1) (node 3) (zone DMA c000000300000000)
        (2) (node 0) (zone DMA c00000000140c000)
        (3) (node 1) (zone DMA c000000100000000)
        ZONELIST_NOFALLBACK
        (0) (node 2) (zone DMA c000000200000000)
[NODE (3)]
        ZONELIST_FALLBACK
        (0) (node 3) (zone DMA c000000300000000)
        (1) (node 0) (zone DMA c00000000140c000)
        (2) (node 1) (zone DMA c000000100000000)
        (3) (node 2) (zone DMA c000000200000000)
        ZONELIST_NOFALLBACK
        (0) (node 3) (zone DMA c000000300000000)

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
Changes in V2:
- Added more details into the commit message
- Added sysfs interface file details into the commit message
- Added ../ABI/testing/sysfs-system-zone-details file

 .../ABI/testing/sysfs-system-zone-details          |  9 +++++
 drivers/base/memory.c                              | 46 ++++++++++++++++++++++
 2 files changed, 55 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-system-zone-details

diff --git a/Documentation/ABI/testing/sysfs-system-zone-details b/Documentation/ABI/testing/sysfs-system-zone-details
new file mode 100644
index 0000000..9c13b2e
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-system-zone-details
@@ -0,0 +1,9 @@
+What:		/sys/devices/system/memory/system_zone_details
+Date:		Sep 2016
+KernelVersion:	4.8
+Contact:	khandual@linux.vnet.ibm.com
+Description:
+		This read only file dumps the zonelist and it's constituent
+		zones information for both ZONELIST_FALLBACK and ZONELIST_
+		NOFALLBACK zonelists for each online node of the system at
+		any given point of time.
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index dc75de9..8c9330a 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -442,7 +442,52 @@ print_block_size(struct device *dev, struct device_attribute *attr,
 	return sprintf(buf, "%lx\n", get_memory_block_size());
 }
 
+static ssize_t dump_zonelist(char *buf, struct zonelist *zonelist)
+{
+	unsigned int i;
+	ssize_t count = 0;
+
+	for (i = 0; zonelist->_zonerefs[i].zone; i++) {
+		count += sprintf(buf + count,
+			"\t\t(%d) (node %d) (%-10s %lx)\n", i,
+			zonelist->_zonerefs[i].zone->zone_pgdat->node_id,
+			zone_names[zonelist->_zonerefs[i].zone_idx],
+			(unsigned long) zonelist->_zonerefs[i].zone);
+	}
+	return count;
+}
+
+static ssize_t dump_zonelists(char *buf)
+{
+	struct zonelist *zonelist;
+	unsigned int node;
+	ssize_t count = 0;
+
+	for_each_online_node(node) {
+		zonelist = &(NODE_DATA(node)->
+				node_zonelists[ZONELIST_FALLBACK]);
+		count += sprintf(buf + count, "[NODE (%d)]\n", node);
+		count += sprintf(buf + count, "\tZONELIST_FALLBACK\n");
+		count += dump_zonelist(buf + count, zonelist);
+
+		zonelist = &(NODE_DATA(node)->
+				node_zonelists[ZONELIST_NOFALLBACK]);
+		count += sprintf(buf + count, "\tZONELIST_NOFALLBACK\n");
+		count += dump_zonelist(buf + count, zonelist);
+	}
+	return count;
+}
+
+static ssize_t
+print_system_zone_details(struct device *dev, struct device_attribute *attr,
+		 char *buf)
+{
+	return dump_zonelists(buf);
+}
+
+
 static DEVICE_ATTR(block_size_bytes, 0444, print_block_size, NULL);
+static DEVICE_ATTR(system_zone_details, 0444, print_system_zone_details, NULL);
 
 /*
  * Memory auto online policy.
@@ -783,6 +828,7 @@ static struct attribute *memory_root_attrs[] = {
 #endif
 
 	&dev_attr_block_size_bytes.attr,
+	&dev_attr_system_zone_details.attr,
 	&dev_attr_auto_online_blocks.attr,
 	NULL
 };
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V2 2/2] mm: Add sysfs interface to dump each node's zonelist information
@ 2016-09-06  5:34   ` Anshuman Khandual
  0 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2016-09-06  5:34 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: akpm

Each individual node in the system has a ZONELIST_FALLBACK zonelist
and a ZONELIST_NOFALLBACK zonelist. These zonelists decide fallback
order of zones during memory allocations. Sometimes it helps to dump
these zonelists to see the priority order of various zones in them.

Particularly platforms which support memory hotplug into previously
non existing zones (at boot), this interface helps in visualizing
which all zonelists of the system at what priority level, the new
hot added memory ends up in. POWER is such a platform where all the
memory detected during boot time remains with ZONE_DMA for good but
then hot plug process can actually get new memory into ZONE_MOVABLE.
So having a way to get the snapshot of the zonelists on the system
after memory or node hot[un]plug is desirable. This change adds one
new sysfs interface (/sys/devices/system/memory/system_zone_details)
which will fetch and dump this information.

Example zonelist information from a KVM guest.

[NODE (0)]
        ZONELIST_FALLBACK
        (0) (node 0) (zone DMA c00000000140c000)
        (1) (node 1) (zone DMA c000000100000000)
        (2) (node 2) (zone DMA c000000200000000)
        (3) (node 3) (zone DMA c000000300000000)
        ZONELIST_NOFALLBACK
        (0) (node 0) (zone DMA c00000000140c000)
[NODE (1)]
        ZONELIST_FALLBACK
        (0) (node 1) (zone DMA c000000100000000)
        (1) (node 2) (zone DMA c000000200000000)
        (2) (node 3) (zone DMA c000000300000000)
        (3) (node 0) (zone DMA c00000000140c000)
        ZONELIST_NOFALLBACK
        (0) (node 1) (zone DMA c000000100000000)
[NODE (2)]
        ZONELIST_FALLBACK
        (0) (node 2) (zone DMA c000000200000000)
        (1) (node 3) (zone DMA c000000300000000)
        (2) (node 0) (zone DMA c00000000140c000)
        (3) (node 1) (zone DMA c000000100000000)
        ZONELIST_NOFALLBACK
        (0) (node 2) (zone DMA c000000200000000)
[NODE (3)]
        ZONELIST_FALLBACK
        (0) (node 3) (zone DMA c000000300000000)
        (1) (node 0) (zone DMA c00000000140c000)
        (2) (node 1) (zone DMA c000000100000000)
        (3) (node 2) (zone DMA c000000200000000)
        ZONELIST_NOFALLBACK
        (0) (node 3) (zone DMA c000000300000000)

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
Changes in V2:
- Added more details into the commit message
- Added sysfs interface file details into the commit message
- Added ../ABI/testing/sysfs-system-zone-details file

 .../ABI/testing/sysfs-system-zone-details          |  9 +++++
 drivers/base/memory.c                              | 46 ++++++++++++++++++++++
 2 files changed, 55 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-system-zone-details

diff --git a/Documentation/ABI/testing/sysfs-system-zone-details b/Documentation/ABI/testing/sysfs-system-zone-details
new file mode 100644
index 0000000..9c13b2e
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-system-zone-details
@@ -0,0 +1,9 @@
+What:		/sys/devices/system/memory/system_zone_details
+Date:		Sep 2016
+KernelVersion:	4.8
+Contact:	khandual@linux.vnet.ibm.com
+Description:
+		This read only file dumps the zonelist and it's constituent
+		zones information for both ZONELIST_FALLBACK and ZONELIST_
+		NOFALLBACK zonelists for each online node of the system at
+		any given point of time.
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index dc75de9..8c9330a 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -442,7 +442,52 @@ print_block_size(struct device *dev, struct device_attribute *attr,
 	return sprintf(buf, "%lx\n", get_memory_block_size());
 }
 
+static ssize_t dump_zonelist(char *buf, struct zonelist *zonelist)
+{
+	unsigned int i;
+	ssize_t count = 0;
+
+	for (i = 0; zonelist->_zonerefs[i].zone; i++) {
+		count += sprintf(buf + count,
+			"\t\t(%d) (node %d) (%-10s %lx)\n", i,
+			zonelist->_zonerefs[i].zone->zone_pgdat->node_id,
+			zone_names[zonelist->_zonerefs[i].zone_idx],
+			(unsigned long) zonelist->_zonerefs[i].zone);
+	}
+	return count;
+}
+
+static ssize_t dump_zonelists(char *buf)
+{
+	struct zonelist *zonelist;
+	unsigned int node;
+	ssize_t count = 0;
+
+	for_each_online_node(node) {
+		zonelist = &(NODE_DATA(node)->
+				node_zonelists[ZONELIST_FALLBACK]);
+		count += sprintf(buf + count, "[NODE (%d)]\n", node);
+		count += sprintf(buf + count, "\tZONELIST_FALLBACK\n");
+		count += dump_zonelist(buf + count, zonelist);
+
+		zonelist = &(NODE_DATA(node)->
+				node_zonelists[ZONELIST_NOFALLBACK]);
+		count += sprintf(buf + count, "\tZONELIST_NOFALLBACK\n");
+		count += dump_zonelist(buf + count, zonelist);
+	}
+	return count;
+}
+
+static ssize_t
+print_system_zone_details(struct device *dev, struct device_attribute *attr,
+		 char *buf)
+{
+	return dump_zonelists(buf);
+}
+
+
 static DEVICE_ATTR(block_size_bytes, 0444, print_block_size, NULL);
+static DEVICE_ATTR(system_zone_details, 0444, print_system_zone_details, NULL);
 
 /*
  * Memory auto online policy.
@@ -783,6 +828,7 @@ static struct attribute *memory_root_attrs[] = {
 #endif
 
 	&dev_attr_block_size_bytes.attr,
+	&dev_attr_system_zone_details.attr,
 	&dev_attr_auto_online_blocks.attr,
 	NULL
 };
-- 
2.1.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 2/2] mm: Add sysfs interface to dump each node's zonelist information
  2016-09-06  5:34   ` Anshuman Khandual
  (?)
@ 2016-09-06  6:11   ` kbuild test robot
  2016-09-06  6:49       ` Anshuman Khandual
  -1 siblings, 1 reply; 37+ messages in thread
From: kbuild test robot @ 2016-09-06  6:11 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: kbuild-all, linux-mm, linux-kernel, akpm

[-- Attachment #1: Type: text/plain, Size: 1920 bytes --]

Hi Anshuman,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.8-rc5 next-20160905]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Anshuman-Khandual/mm-Export-definition-of-zone_names-array-through-mmzone-h/20160906-133749
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: x86_64-randconfig-x013-201636 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/base/memory.c: In function 'dump_zonelists':
>> drivers/base/memory.c:474:20: error: 'ZONELIST_NOFALLBACK' undeclared (first use in this function)
        node_zonelists[ZONELIST_NOFALLBACK]);
                       ^~~~~~~~~~~~~~~~~~~
   drivers/base/memory.c:474:20: note: each undeclared identifier is reported only once for each function it appears in

vim +/ZONELIST_NOFALLBACK +474 drivers/base/memory.c

   468					node_zonelists[ZONELIST_FALLBACK]);
   469			count += sprintf(buf + count, "[NODE (%d)]\n", node);
   470			count += sprintf(buf + count, "\tZONELIST_FALLBACK\n");
   471			count += dump_zonelist(buf + count, zonelist);
   472	
   473			zonelist = &(NODE_DATA(node)->
 > 474					node_zonelists[ZONELIST_NOFALLBACK]);
   475			count += sprintf(buf + count, "\tZONELIST_NOFALLBACK\n");
   476			count += dump_zonelist(buf + count, zonelist);
   477		}

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 25727 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 2/2] mm: Add sysfs interface to dump each node's zonelist information
  2016-09-06  6:11   ` kbuild test robot
@ 2016-09-06  6:49       ` Anshuman Khandual
  0 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2016-09-06  6:49 UTC (permalink / raw)
  To: kbuild test robot, Anshuman Khandual
  Cc: kbuild-all, linux-mm, linux-kernel, akpm

On 09/06/2016 11:41 AM, kbuild test robot wrote:
> Hi Anshuman,
> 
> [auto build test ERROR on mmotm/master]
> [also build test ERROR on v4.8-rc5 next-20160905]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
> [Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
> [Check https://git-scm.com/docs/git-format-patch for more information]
> 
> url:    https://github.com/0day-ci/linux/commits/Anshuman-Khandual/mm-Export-definition-of-zone_names-array-through-mmzone-h/20160906-133749
> base:   git://git.cmpxchg.org/linux-mmotm.git master
> config: x86_64-randconfig-x013-201636 (attached as .config)
> compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=x86_64 
> 
> All errors (new ones prefixed by >>):
> 
>    drivers/base/memory.c: In function 'dump_zonelists':
>>> >> drivers/base/memory.c:474:20: error: 'ZONELIST_NOFALLBACK' undeclared (first use in this function)
>         node_zonelists[ZONELIST_NOFALLBACK]);
>                        ^~~~~~~~~~~~~~~~~~~
>    drivers/base/memory.c:474:20: note: each undeclared identifier is reported only once for each function it appears in
> 
> vim +/ZONELIST_NOFALLBACK +474 drivers/base/memory.c
> 
>    468					node_zonelists[ZONELIST_FALLBACK]);
>    469			count += sprintf(buf + count, "[NODE (%d)]\n", node);
>    470			count += sprintf(buf + count, "\tZONELIST_FALLBACK\n");
>    471			count += dump_zonelist(buf + count, zonelist);
>    472	
>    473			zonelist = &(NODE_DATA(node)->
>  > 474					node_zonelists[ZONELIST_NOFALLBACK]);
>    475			count += sprintf(buf + count, "\tZONELIST_NOFALLBACK\n");

Missed the fact that ZONELIST_NOFALLBACK is valid only on CONFIG_NUMA
systems. Will fix and resend the patch.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 2/2] mm: Add sysfs interface to dump each node's zonelist information
@ 2016-09-06  6:49       ` Anshuman Khandual
  0 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2016-09-06  6:49 UTC (permalink / raw)
  To: kbuild test robot, Anshuman Khandual
  Cc: kbuild-all, linux-mm, linux-kernel, akpm

On 09/06/2016 11:41 AM, kbuild test robot wrote:
> Hi Anshuman,
> 
> [auto build test ERROR on mmotm/master]
> [also build test ERROR on v4.8-rc5 next-20160905]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
> [Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
> [Check https://git-scm.com/docs/git-format-patch for more information]
> 
> url:    https://github.com/0day-ci/linux/commits/Anshuman-Khandual/mm-Export-definition-of-zone_names-array-through-mmzone-h/20160906-133749
> base:   git://git.cmpxchg.org/linux-mmotm.git master
> config: x86_64-randconfig-x013-201636 (attached as .config)
> compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=x86_64 
> 
> All errors (new ones prefixed by >>):
> 
>    drivers/base/memory.c: In function 'dump_zonelists':
>>> >> drivers/base/memory.c:474:20: error: 'ZONELIST_NOFALLBACK' undeclared (first use in this function)
>         node_zonelists[ZONELIST_NOFALLBACK]);
>                        ^~~~~~~~~~~~~~~~~~~
>    drivers/base/memory.c:474:20: note: each undeclared identifier is reported only once for each function it appears in
> 
> vim +/ZONELIST_NOFALLBACK +474 drivers/base/memory.c
> 
>    468					node_zonelists[ZONELIST_FALLBACK]);
>    469			count += sprintf(buf + count, "[NODE (%d)]\n", node);
>    470			count += sprintf(buf + count, "\tZONELIST_FALLBACK\n");
>    471			count += dump_zonelist(buf + count, zonelist);
>    472	
>    473			zonelist = &(NODE_DATA(node)->
>  > 474					node_zonelists[ZONELIST_NOFALLBACK]);
>    475			count += sprintf(buf + count, "\tZONELIST_NOFALLBACK\n");

Missed the fact that ZONELIST_NOFALLBACK is valid only on CONFIG_NUMA
systems. Will fix and resend the patch.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH V3] mm: Add sysfs interface to dump each node's zonelist information
  2016-09-06  5:34   ` Anshuman Khandual
@ 2016-09-06  8:31     ` Anshuman Khandual
  -1 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2016-09-06  8:31 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: akpm

Each individual node in the system has a ZONELIST_FALLBACK zonelist
and a ZONELIST_NOFALLBACK zonelist. These zonelists decide fallback
order of zones during memory allocations. Sometimes it helps to dump
these zonelists to see the priority order of various zones in them.

Particularly platforms which support memory hotplug into previously
non existing zones (at boot), this interface helps in visualizing
which all zonelists of the system at what priority level, the new
hot added memory ends up in. POWER is such a platform where all the
memory detected during boot time remains with ZONE_DMA for good but
then hot plug process can actually get new memory into ZONE_MOVABLE.
So having a way to get the snapshot of the zonelists on the system
after memory or node hot[un]plug is desirable. This change adds one
new sysfs interface (/sys/devices/system/memory/system_zone_details)
which will fetch and dump this information.

Example zonelist information from a KVM guest.

[NODE (0)]
        ZONELIST_FALLBACK
        (0) (node 0) (zone DMA c00000000140c000)
        (1) (node 1) (zone DMA c000000100000000)
        (2) (node 2) (zone DMA c000000200000000)
        (3) (node 3) (zone DMA c000000300000000)
        ZONELIST_NOFALLBACK
        (0) (node 0) (zone DMA c00000000140c000)
[NODE (1)]
        ZONELIST_FALLBACK
        (0) (node 1) (zone DMA c000000100000000)
        (1) (node 2) (zone DMA c000000200000000)
        (2) (node 3) (zone DMA c000000300000000)
        (3) (node 0) (zone DMA c00000000140c000)
        ZONELIST_NOFALLBACK
        (0) (node 1) (zone DMA c000000100000000)
[NODE (2)]
        ZONELIST_FALLBACK
        (0) (node 2) (zone DMA c000000200000000)
        (1) (node 3) (zone DMA c000000300000000)
        (2) (node 0) (zone DMA c00000000140c000)
        (3) (node 1) (zone DMA c000000100000000)
        ZONELIST_NOFALLBACK
        (0) (node 2) (zone DMA c000000200000000)
[NODE (3)]
        ZONELIST_FALLBACK
        (0) (node 3) (zone DMA c000000300000000)
        (1) (node 0) (zone DMA c00000000140c000)
        (2) (node 1) (zone DMA c000000100000000)
        (3) (node 2) (zone DMA c000000200000000)
        ZONELIST_NOFALLBACK
        (0) (node 3) (zone DMA c000000300000000)

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
Changes in V3:
- Moved all these new sysfs code inside CONFIG_NUMA

Changes in V2:
- Added more details into the commit message
- Added sysfs interface file details into the commit message
- Added ../ABI/testing/sysfs-system-zone-details file

 .../ABI/testing/sysfs-system-zone-details          |  9 ++++
 drivers/base/memory.c                              | 52 ++++++++++++++++++++++
 2 files changed, 61 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-system-zone-details

diff --git a/Documentation/ABI/testing/sysfs-system-zone-details b/Documentation/ABI/testing/sysfs-system-zone-details
new file mode 100644
index 0000000..9c13b2e
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-system-zone-details
@@ -0,0 +1,9 @@
+What:		/sys/devices/system/memory/system_zone_details
+Date:		Sep 2016
+KernelVersion:	4.8
+Contact:	khandual@linux.vnet.ibm.com
+Description:
+		This read only file dumps the zonelist and it's constituent
+		zones information for both ZONELIST_FALLBACK and ZONELIST_
+		NOFALLBACK zonelists for each online node of the system at
+		any given point of time.
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index dc75de9..65fd30e 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -442,7 +442,56 @@ print_block_size(struct device *dev, struct device_attribute *attr,
 	return sprintf(buf, "%lx\n", get_memory_block_size());
 }
 
+#ifdef CONFIG_NUMA
+static ssize_t dump_zonelist(char *buf, struct zonelist *zonelist)
+{
+	unsigned int i;
+	ssize_t count = 0;
+
+	for (i = 0; zonelist->_zonerefs[i].zone; i++) {
+		count += sprintf(buf + count,
+			"\t\t(%d) (node %d) (%-10s %lx)\n", i,
+			zonelist->_zonerefs[i].zone->zone_pgdat->node_id,
+			zone_names[zonelist->_zonerefs[i].zone_idx],
+			(unsigned long) zonelist->_zonerefs[i].zone);
+	}
+	return count;
+}
+
+static ssize_t dump_zonelists(char *buf)
+{
+	struct zonelist *zonelist;
+	unsigned int node;
+	ssize_t count = 0;
+
+	for_each_online_node(node) {
+		zonelist = &(NODE_DATA(node)->
+				node_zonelists[ZONELIST_FALLBACK]);
+		count += sprintf(buf + count, "[NODE (%d)]\n", node);
+		count += sprintf(buf + count, "\tZONELIST_FALLBACK\n");
+		count += dump_zonelist(buf + count, zonelist);
+
+		zonelist = &(NODE_DATA(node)->
+				node_zonelists[ZONELIST_NOFALLBACK]);
+		count += sprintf(buf + count, "\tZONELIST_NOFALLBACK\n");
+		count += dump_zonelist(buf + count, zonelist);
+	}
+	return count;
+}
+
+static ssize_t
+print_system_zone_details(struct device *dev, struct device_attribute *attr,
+		 char *buf)
+{
+	return dump_zonelists(buf);
+}
+#endif
+
+
 static DEVICE_ATTR(block_size_bytes, 0444, print_block_size, NULL);
+#ifdef CONFIG_NUMA
+static DEVICE_ATTR(system_zone_details, 0444, print_system_zone_details, NULL);
+#endif
 
 /*
  * Memory auto online policy.
@@ -783,6 +832,9 @@ static struct attribute *memory_root_attrs[] = {
 #endif
 
 	&dev_attr_block_size_bytes.attr,
+#ifdef CONFIG_NUMA
+	&dev_attr_system_zone_details.attr,
+#endif
 	&dev_attr_auto_online_blocks.attr,
 	NULL
 };
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V3] mm: Add sysfs interface to dump each node's zonelist information
@ 2016-09-06  8:31     ` Anshuman Khandual
  0 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2016-09-06  8:31 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: akpm

Each individual node in the system has a ZONELIST_FALLBACK zonelist
and a ZONELIST_NOFALLBACK zonelist. These zonelists decide fallback
order of zones during memory allocations. Sometimes it helps to dump
these zonelists to see the priority order of various zones in them.

Particularly platforms which support memory hotplug into previously
non existing zones (at boot), this interface helps in visualizing
which all zonelists of the system at what priority level, the new
hot added memory ends up in. POWER is such a platform where all the
memory detected during boot time remains with ZONE_DMA for good but
then hot plug process can actually get new memory into ZONE_MOVABLE.
So having a way to get the snapshot of the zonelists on the system
after memory or node hot[un]plug is desirable. This change adds one
new sysfs interface (/sys/devices/system/memory/system_zone_details)
which will fetch and dump this information.

Example zonelist information from a KVM guest.

[NODE (0)]
        ZONELIST_FALLBACK
        (0) (node 0) (zone DMA c00000000140c000)
        (1) (node 1) (zone DMA c000000100000000)
        (2) (node 2) (zone DMA c000000200000000)
        (3) (node 3) (zone DMA c000000300000000)
        ZONELIST_NOFALLBACK
        (0) (node 0) (zone DMA c00000000140c000)
[NODE (1)]
        ZONELIST_FALLBACK
        (0) (node 1) (zone DMA c000000100000000)
        (1) (node 2) (zone DMA c000000200000000)
        (2) (node 3) (zone DMA c000000300000000)
        (3) (node 0) (zone DMA c00000000140c000)
        ZONELIST_NOFALLBACK
        (0) (node 1) (zone DMA c000000100000000)
[NODE (2)]
        ZONELIST_FALLBACK
        (0) (node 2) (zone DMA c000000200000000)
        (1) (node 3) (zone DMA c000000300000000)
        (2) (node 0) (zone DMA c00000000140c000)
        (3) (node 1) (zone DMA c000000100000000)
        ZONELIST_NOFALLBACK
        (0) (node 2) (zone DMA c000000200000000)
[NODE (3)]
        ZONELIST_FALLBACK
        (0) (node 3) (zone DMA c000000300000000)
        (1) (node 0) (zone DMA c00000000140c000)
        (2) (node 1) (zone DMA c000000100000000)
        (3) (node 2) (zone DMA c000000200000000)
        ZONELIST_NOFALLBACK
        (0) (node 3) (zone DMA c000000300000000)

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
Changes in V3:
- Moved all these new sysfs code inside CONFIG_NUMA

Changes in V2:
- Added more details into the commit message
- Added sysfs interface file details into the commit message
- Added ../ABI/testing/sysfs-system-zone-details file

 .../ABI/testing/sysfs-system-zone-details          |  9 ++++
 drivers/base/memory.c                              | 52 ++++++++++++++++++++++
 2 files changed, 61 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-system-zone-details

diff --git a/Documentation/ABI/testing/sysfs-system-zone-details b/Documentation/ABI/testing/sysfs-system-zone-details
new file mode 100644
index 0000000..9c13b2e
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-system-zone-details
@@ -0,0 +1,9 @@
+What:		/sys/devices/system/memory/system_zone_details
+Date:		Sep 2016
+KernelVersion:	4.8
+Contact:	khandual@linux.vnet.ibm.com
+Description:
+		This read only file dumps the zonelist and it's constituent
+		zones information for both ZONELIST_FALLBACK and ZONELIST_
+		NOFALLBACK zonelists for each online node of the system at
+		any given point of time.
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index dc75de9..65fd30e 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -442,7 +442,56 @@ print_block_size(struct device *dev, struct device_attribute *attr,
 	return sprintf(buf, "%lx\n", get_memory_block_size());
 }
 
+#ifdef CONFIG_NUMA
+static ssize_t dump_zonelist(char *buf, struct zonelist *zonelist)
+{
+	unsigned int i;
+	ssize_t count = 0;
+
+	for (i = 0; zonelist->_zonerefs[i].zone; i++) {
+		count += sprintf(buf + count,
+			"\t\t(%d) (node %d) (%-10s %lx)\n", i,
+			zonelist->_zonerefs[i].zone->zone_pgdat->node_id,
+			zone_names[zonelist->_zonerefs[i].zone_idx],
+			(unsigned long) zonelist->_zonerefs[i].zone);
+	}
+	return count;
+}
+
+static ssize_t dump_zonelists(char *buf)
+{
+	struct zonelist *zonelist;
+	unsigned int node;
+	ssize_t count = 0;
+
+	for_each_online_node(node) {
+		zonelist = &(NODE_DATA(node)->
+				node_zonelists[ZONELIST_FALLBACK]);
+		count += sprintf(buf + count, "[NODE (%d)]\n", node);
+		count += sprintf(buf + count, "\tZONELIST_FALLBACK\n");
+		count += dump_zonelist(buf + count, zonelist);
+
+		zonelist = &(NODE_DATA(node)->
+				node_zonelists[ZONELIST_NOFALLBACK]);
+		count += sprintf(buf + count, "\tZONELIST_NOFALLBACK\n");
+		count += dump_zonelist(buf + count, zonelist);
+	}
+	return count;
+}
+
+static ssize_t
+print_system_zone_details(struct device *dev, struct device_attribute *attr,
+		 char *buf)
+{
+	return dump_zonelists(buf);
+}
+#endif
+
+
 static DEVICE_ATTR(block_size_bytes, 0444, print_block_size, NULL);
+#ifdef CONFIG_NUMA
+static DEVICE_ATTR(system_zone_details, 0444, print_system_zone_details, NULL);
+#endif
 
 /*
  * Memory auto online policy.
@@ -783,6 +832,9 @@ static struct attribute *memory_root_attrs[] = {
 #endif
 
 	&dev_attr_block_size_bytes.attr,
+#ifdef CONFIG_NUMA
+	&dev_attr_system_zone_details.attr,
+#endif
 	&dev_attr_auto_online_blocks.attr,
 	NULL
 };
-- 
2.1.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH V3] mm: Add sysfs interface to dump each node's zonelist information
  2016-09-06  8:31     ` Anshuman Khandual
  (?)
@ 2016-09-06  9:05     ` kbuild test robot
  2016-09-07 12:32         ` Anshuman Khandual
  -1 siblings, 1 reply; 37+ messages in thread
From: kbuild test robot @ 2016-09-06  9:05 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: kbuild-all, linux-mm, linux-kernel, akpm

[-- Attachment #1: Type: text/plain, Size: 1773 bytes --]

Hi Anshuman,

[auto build test ERROR on driver-core/driver-core-testing]
[also build test ERROR on v4.8-rc5 next-20160906]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Anshuman-Khandual/mm-Add-sysfs-interface-to-dump-each-node-s-zonelist-information/20160906-163752
config: x86_64-randconfig-x019-201636 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/base/memory.c: In function 'dump_zonelist':
>> drivers/base/memory.c:455:4: error: 'zone_names' undeclared (first use in this function)
       zone_names[zonelist->_zonerefs[i].zone_idx],
       ^~~~~~~~~~
   drivers/base/memory.c:455:4: note: each undeclared identifier is reported only once for each function it appears in

vim +/zone_names +455 drivers/base/memory.c

   449		ssize_t count = 0;
   450	
   451		for (i = 0; zonelist->_zonerefs[i].zone; i++) {
   452			count += sprintf(buf + count,
   453				"\t\t(%d) (node %d) (%-10s %lx)\n", i,
   454				zonelist->_zonerefs[i].zone->zone_pgdat->node_id,
 > 455				zone_names[zonelist->_zonerefs[i].zone_idx],
   456				(unsigned long) zonelist->_zonerefs[i].zone);
   457		}
   458		return count;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 22411 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3] mm: Add sysfs interface to dump each node's zonelist information
  2016-09-06  8:31     ` Anshuman Khandual
@ 2016-09-06 20:36       ` Dave Hansen
  -1 siblings, 0 replies; 37+ messages in thread
From: Dave Hansen @ 2016-09-06 20:36 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm, linux-kernel; +Cc: akpm, Kees Cook

On 09/06/2016 01:31 AM, Anshuman Khandual wrote:
> [NODE (0)]
>         ZONELIST_FALLBACK
>         (0) (node 0) (zone DMA c00000000140c000)
>         (1) (node 1) (zone DMA c000000100000000)
>         (2) (node 2) (zone DMA c000000200000000)
>         (3) (node 3) (zone DMA c000000300000000)
>         ZONELIST_NOFALLBACK
>         (0) (node 0) (zone DMA c00000000140c000)

Don't we have some prohibition on dumping out kernel addresses like this
so that attackers can't trivially defeat kernel layout randomization?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3] mm: Add sysfs interface to dump each node's zonelist information
@ 2016-09-06 20:36       ` Dave Hansen
  0 siblings, 0 replies; 37+ messages in thread
From: Dave Hansen @ 2016-09-06 20:36 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm, linux-kernel; +Cc: akpm, Kees Cook

On 09/06/2016 01:31 AM, Anshuman Khandual wrote:
> [NODE (0)]
>         ZONELIST_FALLBACK
>         (0) (node 0) (zone DMA c00000000140c000)
>         (1) (node 1) (zone DMA c000000100000000)
>         (2) (node 2) (zone DMA c000000200000000)
>         (3) (node 3) (zone DMA c000000300000000)
>         ZONELIST_NOFALLBACK
>         (0) (node 0) (zone DMA c00000000140c000)

Don't we have some prohibition on dumping out kernel addresses like this
so that attackers can't trivially defeat kernel layout randomization?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3] mm: Add sysfs interface to dump each node's zonelist information
  2016-09-06 20:36       ` Dave Hansen
@ 2016-09-07  3:08         ` Kees Cook
  -1 siblings, 0 replies; 37+ messages in thread
From: Kees Cook @ 2016-09-07  3:08 UTC (permalink / raw)
  To: Dave Hansen, Anshuman Khandual; +Cc: Linux-MM, LKML, Andrew Morton

On Tue, Sep 6, 2016 at 1:36 PM, Dave Hansen <dave.hansen@intel.com> wrote:
> On 09/06/2016 01:31 AM, Anshuman Khandual wrote:
>> [NODE (0)]
>>         ZONELIST_FALLBACK
>>         (0) (node 0) (zone DMA c00000000140c000)
>>         (1) (node 1) (zone DMA c000000100000000)
>>         (2) (node 2) (zone DMA c000000200000000)
>>         (3) (node 3) (zone DMA c000000300000000)
>>         ZONELIST_NOFALLBACK
>>         (0) (node 0) (zone DMA c00000000140c000)
>
> Don't we have some prohibition on dumping out kernel addresses like this
> so that attackers can't trivially defeat kernel layout randomization?

Anything printing memory addresses should be using %pK (not %lx as done here).

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3] mm: Add sysfs interface to dump each node's zonelist information
@ 2016-09-07  3:08         ` Kees Cook
  0 siblings, 0 replies; 37+ messages in thread
From: Kees Cook @ 2016-09-07  3:08 UTC (permalink / raw)
  To: Dave Hansen, Anshuman Khandual; +Cc: Linux-MM, LKML, Andrew Morton

On Tue, Sep 6, 2016 at 1:36 PM, Dave Hansen <dave.hansen@intel.com> wrote:
> On 09/06/2016 01:31 AM, Anshuman Khandual wrote:
>> [NODE (0)]
>>         ZONELIST_FALLBACK
>>         (0) (node 0) (zone DMA c00000000140c000)
>>         (1) (node 1) (zone DMA c000000100000000)
>>         (2) (node 2) (zone DMA c000000200000000)
>>         (3) (node 3) (zone DMA c000000300000000)
>>         ZONELIST_NOFALLBACK
>>         (0) (node 0) (zone DMA c00000000140c000)
>
> Don't we have some prohibition on dumping out kernel addresses like this
> so that attackers can't trivially defeat kernel layout randomization?

Anything printing memory addresses should be using %pK (not %lx as done here).

-Kees

-- 
Kees Cook
Nexus Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3] mm: Add sysfs interface to dump each node's zonelist information
  2016-09-07  3:08         ` Kees Cook
@ 2016-09-07  4:00           ` Anshuman Khandual
  -1 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2016-09-07  4:00 UTC (permalink / raw)
  To: Kees Cook, Dave Hansen, Anshuman Khandual; +Cc: Linux-MM, LKML, Andrew Morton

On 09/07/2016 08:38 AM, Kees Cook wrote:
> On Tue, Sep 6, 2016 at 1:36 PM, Dave Hansen <dave.hansen@intel.com> wrote:
>> On 09/06/2016 01:31 AM, Anshuman Khandual wrote:
>>> [NODE (0)]
>>>         ZONELIST_FALLBACK
>>>         (0) (node 0) (zone DMA c00000000140c000)
>>>         (1) (node 1) (zone DMA c000000100000000)
>>>         (2) (node 2) (zone DMA c000000200000000)
>>>         (3) (node 3) (zone DMA c000000300000000)
>>>         ZONELIST_NOFALLBACK
>>>         (0) (node 0) (zone DMA c00000000140c000)
>>
>> Don't we have some prohibition on dumping out kernel addresses like this
>> so that attackers can't trivially defeat kernel layout randomization?
> 
> Anything printing memory addresses should be using %pK (not %lx as done here).

Learned about the significance of %pK coupled with kptr_restrict
interface. Will change this. Thanks for pointing out.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3] mm: Add sysfs interface to dump each node's zonelist information
@ 2016-09-07  4:00           ` Anshuman Khandual
  0 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2016-09-07  4:00 UTC (permalink / raw)
  To: Kees Cook, Dave Hansen, Anshuman Khandual; +Cc: Linux-MM, LKML, Andrew Morton

On 09/07/2016 08:38 AM, Kees Cook wrote:
> On Tue, Sep 6, 2016 at 1:36 PM, Dave Hansen <dave.hansen@intel.com> wrote:
>> On 09/06/2016 01:31 AM, Anshuman Khandual wrote:
>>> [NODE (0)]
>>>         ZONELIST_FALLBACK
>>>         (0) (node 0) (zone DMA c00000000140c000)
>>>         (1) (node 1) (zone DMA c000000100000000)
>>>         (2) (node 2) (zone DMA c000000200000000)
>>>         (3) (node 3) (zone DMA c000000300000000)
>>>         ZONELIST_NOFALLBACK
>>>         (0) (node 0) (zone DMA c00000000140c000)
>>
>> Don't we have some prohibition on dumping out kernel addresses like this
>> so that attackers can't trivially defeat kernel layout randomization?
> 
> Anything printing memory addresses should be using %pK (not %lx as done here).

Learned about the significance of %pK coupled with kptr_restrict
interface. Will change this. Thanks for pointing out.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3] mm: Add sysfs interface to dump each node's zonelist information
  2016-09-06  9:05     ` kbuild test robot
@ 2016-09-07 12:32         ` Anshuman Khandual
  0 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2016-09-07 12:32 UTC (permalink / raw)
  To: kbuild test robot, Anshuman Khandual
  Cc: kbuild-all, linux-mm, linux-kernel, akpm

On 09/06/2016 02:35 PM, kbuild test robot wrote:
> Hi Anshuman,
> 
> [auto build test ERROR on driver-core/driver-core-testing]
> [also build test ERROR on v4.8-rc5 next-20160906]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
> [Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
> [Check https://git-scm.com/docs/git-format-patch for more information]
> 
> url:    https://github.com/0day-ci/linux/commits/Anshuman-Khandual/mm-Add-sysfs-interface-to-dump-each-node-s-zonelist-information/20160906-163752
> config: x86_64-randconfig-x019-201636 (attached as .config)
> compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=x86_64 

I am not able to reproduce this build failure with Fedora 24
and gcc (GCC) 6.1.1 20160621 on a x86 laptop. Maybe adding
mmzone.h into page_alloc.c will be enough to just take care
any issues.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3] mm: Add sysfs interface to dump each node's zonelist information
@ 2016-09-07 12:32         ` Anshuman Khandual
  0 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2016-09-07 12:32 UTC (permalink / raw)
  To: kbuild test robot, Anshuman Khandual
  Cc: kbuild-all, linux-mm, linux-kernel, akpm

On 09/06/2016 02:35 PM, kbuild test robot wrote:
> Hi Anshuman,
> 
> [auto build test ERROR on driver-core/driver-core-testing]
> [also build test ERROR on v4.8-rc5 next-20160906]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
> [Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
> [Check https://git-scm.com/docs/git-format-patch for more information]
> 
> url:    https://github.com/0day-ci/linux/commits/Anshuman-Khandual/mm-Add-sysfs-interface-to-dump-each-node-s-zonelist-information/20160906-163752
> config: x86_64-randconfig-x019-201636 (attached as .config)
> compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=x86_64 

I am not able to reproduce this build failure with Fedora 24
and gcc (GCC) 6.1.1 20160621 on a x86 laptop. Maybe adding
mmzone.h into page_alloc.c will be enough to just take care
any issues.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
  2016-09-06  8:31     ` Anshuman Khandual
@ 2016-09-08  2:46       ` Anshuman Khandual
  -1 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2016-09-08  2:46 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: akpm

Each individual node in the system has a ZONELIST_FALLBACK zonelist
and a ZONELIST_NOFALLBACK zonelist. These zonelists decide fallback
order of zones during memory allocations. Sometimes it helps to dump
these zonelists to see the priority order of various zones in them.

Particularly platforms which support memory hotplug into previously
non existing zones (at boot), this interface helps in visualizing
which all zonelists of the system at what priority level, the new
hot added memory ends up in. POWER is such a platform where all the
memory detected during boot time remains with ZONE_DMA for good but
then hot plug process can actually get new memory into ZONE_MOVABLE.
So having a way to get the snapshot of the zonelists on the system
after memory or node hot[un]plug is desirable. This change adds one
new sysfs interface (/sys/devices/system/memory/system_zone_details)
which will fetch and dump this information.

Example zonelist information from a KVM guest.

[NODE (0)]
        ZONELIST_FALLBACK
                (0) (node 0) (DMA     0xc0000000ffff6300)
                (1) (node 1) (DMA     0xc0000001ffff6300)
                (2) (node 2) (DMA     0xc0000002ffff6300)
                (3) (node 3) (DMA     0xc0000003ffdba300)
        ZONELIST_NOFALLBACK
                (0) (node 0) (DMA     0xc0000000ffff6300)
[NODE (1)]
        ZONELIST_FALLBACK
                (0) (node 1) (DMA     0xc0000001ffff6300)
                (1) (node 2) (DMA     0xc0000002ffff6300)
                (2) (node 3) (DMA     0xc0000003ffdba300)
                (3) (node 0) (DMA     0xc0000000ffff6300)
        ZONELIST_NOFALLBACK
                (0) (node 1) (DMA     0xc0000001ffff6300)
[NODE (2)]
        ZONELIST_FALLBACK
                (0) (node 2) (DMA     0xc0000002ffff6300)
                (1) (node 3) (DMA     0xc0000003ffdba300)
                (2) (node 0) (DMA     0xc0000000ffff6300)
                (3) (node 1) (DMA     0xc0000001ffff6300)
        ZONELIST_NOFALLBACK
                (0) (node 2) (DMA     0xc0000002ffff6300)
[NODE (3)]
        ZONELIST_FALLBACK
                (0) (node 3) (DMA     0xc0000003ffdba300)
                (1) (node 0) (DMA     0xc0000000ffff6300)
                (2) (node 1) (DMA     0xc0000001ffff6300)
                (3) (node 2) (DMA     0xc0000002ffff6300)
        ZONELIST_NOFALLBACK
                (0) (node 3) (DMA     0xc0000003ffdba300)

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
Changes in V4:
- Explicitly included mmzone.h header inside page_alloc.c
- Changed the kernel address printing from %lx to %pK

Changes in V3:
- Moved all these new sysfs code inside CONFIG_NUMA

Changes in V2:
- Added more details into the commit message
- Added sysfs interface file details into the commit message
- Added ../ABI/testing/sysfs-system-zone-details file

 .../ABI/testing/sysfs-system-zone-details          |  9 ++++
 drivers/base/memory.c                              | 52 ++++++++++++++++++++++
 mm/page_alloc.c                                    |  1 +
 3 files changed, 62 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-system-zone-details

diff --git a/Documentation/ABI/testing/sysfs-system-zone-details b/Documentation/ABI/testing/sysfs-system-zone-details
new file mode 100644
index 0000000..9c13b2e
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-system-zone-details
@@ -0,0 +1,9 @@
+What:		/sys/devices/system/memory/system_zone_details
+Date:		Sep 2016
+KernelVersion:	4.8
+Contact:	khandual@linux.vnet.ibm.com
+Description:
+		This read only file dumps the zonelist and it's constituent
+		zones information for both ZONELIST_FALLBACK and ZONELIST_
+		NOFALLBACK zonelists for each online node of the system at
+		any given point of time.
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index dc75de9..c7ab991 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -442,7 +442,56 @@ print_block_size(struct device *dev, struct device_attribute *attr,
 	return sprintf(buf, "%lx\n", get_memory_block_size());
 }
 
+#ifdef CONFIG_NUMA
+static ssize_t dump_zonelist(char *buf, struct zonelist *zonelist)
+{
+	unsigned int i;
+	ssize_t count = 0;
+
+	for (i = 0; zonelist->_zonerefs[i].zone; i++) {
+		count += sprintf(buf + count,
+			"\t\t(%d) (node %d) (%-7s 0x%pK)\n", i,
+			zonelist->_zonerefs[i].zone->zone_pgdat->node_id,
+			zone_names[zonelist->_zonerefs[i].zone_idx],
+			(void *) zonelist->_zonerefs[i].zone);
+	}
+	return count;
+}
+
+static ssize_t dump_zonelists(char *buf)
+{
+	struct zonelist *zonelist;
+	unsigned int node;
+	ssize_t count = 0;
+
+	for_each_online_node(node) {
+		zonelist = &(NODE_DATA(node)->
+				node_zonelists[ZONELIST_FALLBACK]);
+		count += sprintf(buf + count, "[NODE (%d)]\n", node);
+		count += sprintf(buf + count, "\tZONELIST_FALLBACK\n");
+		count += dump_zonelist(buf + count, zonelist);
+
+		zonelist = &(NODE_DATA(node)->
+				node_zonelists[ZONELIST_NOFALLBACK]);
+		count += sprintf(buf + count, "\tZONELIST_NOFALLBACK\n");
+		count += dump_zonelist(buf + count, zonelist);
+	}
+	return count;
+}
+
+static ssize_t
+print_system_zone_details(struct device *dev, struct device_attribute *attr,
+		 char *buf)
+{
+	return dump_zonelists(buf);
+}
+#endif
+
+
 static DEVICE_ATTR(block_size_bytes, 0444, print_block_size, NULL);
+#ifdef CONFIG_NUMA
+static DEVICE_ATTR(system_zone_details, 0444, print_system_zone_details, NULL);
+#endif
 
 /*
  * Memory auto online policy.
@@ -783,6 +832,9 @@ static struct attribute *memory_root_attrs[] = {
 #endif
 
 	&dev_attr_block_size_bytes.attr,
+#ifdef CONFIG_NUMA
+	&dev_attr_system_zone_details.attr,
+#endif
 	&dev_attr_auto_online_blocks.attr,
 	NULL
 };
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a2214c6..d3da022 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -64,6 +64,7 @@
 #include <linux/page_owner.h>
 #include <linux/kthread.h>
 #include <linux/memcontrol.h>
+#include <linux/mmzone.h>
 
 #include <asm/sections.h>
 #include <asm/tlbflush.h>
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
@ 2016-09-08  2:46       ` Anshuman Khandual
  0 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2016-09-08  2:46 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: akpm

Each individual node in the system has a ZONELIST_FALLBACK zonelist
and a ZONELIST_NOFALLBACK zonelist. These zonelists decide fallback
order of zones during memory allocations. Sometimes it helps to dump
these zonelists to see the priority order of various zones in them.

Particularly platforms which support memory hotplug into previously
non existing zones (at boot), this interface helps in visualizing
which all zonelists of the system at what priority level, the new
hot added memory ends up in. POWER is such a platform where all the
memory detected during boot time remains with ZONE_DMA for good but
then hot plug process can actually get new memory into ZONE_MOVABLE.
So having a way to get the snapshot of the zonelists on the system
after memory or node hot[un]plug is desirable. This change adds one
new sysfs interface (/sys/devices/system/memory/system_zone_details)
which will fetch and dump this information.

Example zonelist information from a KVM guest.

[NODE (0)]
        ZONELIST_FALLBACK
                (0) (node 0) (DMA     0xc0000000ffff6300)
                (1) (node 1) (DMA     0xc0000001ffff6300)
                (2) (node 2) (DMA     0xc0000002ffff6300)
                (3) (node 3) (DMA     0xc0000003ffdba300)
        ZONELIST_NOFALLBACK
                (0) (node 0) (DMA     0xc0000000ffff6300)
[NODE (1)]
        ZONELIST_FALLBACK
                (0) (node 1) (DMA     0xc0000001ffff6300)
                (1) (node 2) (DMA     0xc0000002ffff6300)
                (2) (node 3) (DMA     0xc0000003ffdba300)
                (3) (node 0) (DMA     0xc0000000ffff6300)
        ZONELIST_NOFALLBACK
                (0) (node 1) (DMA     0xc0000001ffff6300)
[NODE (2)]
        ZONELIST_FALLBACK
                (0) (node 2) (DMA     0xc0000002ffff6300)
                (1) (node 3) (DMA     0xc0000003ffdba300)
                (2) (node 0) (DMA     0xc0000000ffff6300)
                (3) (node 1) (DMA     0xc0000001ffff6300)
        ZONELIST_NOFALLBACK
                (0) (node 2) (DMA     0xc0000002ffff6300)
[NODE (3)]
        ZONELIST_FALLBACK
                (0) (node 3) (DMA     0xc0000003ffdba300)
                (1) (node 0) (DMA     0xc0000000ffff6300)
                (2) (node 1) (DMA     0xc0000001ffff6300)
                (3) (node 2) (DMA     0xc0000002ffff6300)
        ZONELIST_NOFALLBACK
                (0) (node 3) (DMA     0xc0000003ffdba300)

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
Changes in V4:
- Explicitly included mmzone.h header inside page_alloc.c
- Changed the kernel address printing from %lx to %pK

Changes in V3:
- Moved all these new sysfs code inside CONFIG_NUMA

Changes in V2:
- Added more details into the commit message
- Added sysfs interface file details into the commit message
- Added ../ABI/testing/sysfs-system-zone-details file

 .../ABI/testing/sysfs-system-zone-details          |  9 ++++
 drivers/base/memory.c                              | 52 ++++++++++++++++++++++
 mm/page_alloc.c                                    |  1 +
 3 files changed, 62 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-system-zone-details

diff --git a/Documentation/ABI/testing/sysfs-system-zone-details b/Documentation/ABI/testing/sysfs-system-zone-details
new file mode 100644
index 0000000..9c13b2e
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-system-zone-details
@@ -0,0 +1,9 @@
+What:		/sys/devices/system/memory/system_zone_details
+Date:		Sep 2016
+KernelVersion:	4.8
+Contact:	khandual@linux.vnet.ibm.com
+Description:
+		This read only file dumps the zonelist and it's constituent
+		zones information for both ZONELIST_FALLBACK and ZONELIST_
+		NOFALLBACK zonelists for each online node of the system at
+		any given point of time.
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index dc75de9..c7ab991 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -442,7 +442,56 @@ print_block_size(struct device *dev, struct device_attribute *attr,
 	return sprintf(buf, "%lx\n", get_memory_block_size());
 }
 
+#ifdef CONFIG_NUMA
+static ssize_t dump_zonelist(char *buf, struct zonelist *zonelist)
+{
+	unsigned int i;
+	ssize_t count = 0;
+
+	for (i = 0; zonelist->_zonerefs[i].zone; i++) {
+		count += sprintf(buf + count,
+			"\t\t(%d) (node %d) (%-7s 0x%pK)\n", i,
+			zonelist->_zonerefs[i].zone->zone_pgdat->node_id,
+			zone_names[zonelist->_zonerefs[i].zone_idx],
+			(void *) zonelist->_zonerefs[i].zone);
+	}
+	return count;
+}
+
+static ssize_t dump_zonelists(char *buf)
+{
+	struct zonelist *zonelist;
+	unsigned int node;
+	ssize_t count = 0;
+
+	for_each_online_node(node) {
+		zonelist = &(NODE_DATA(node)->
+				node_zonelists[ZONELIST_FALLBACK]);
+		count += sprintf(buf + count, "[NODE (%d)]\n", node);
+		count += sprintf(buf + count, "\tZONELIST_FALLBACK\n");
+		count += dump_zonelist(buf + count, zonelist);
+
+		zonelist = &(NODE_DATA(node)->
+				node_zonelists[ZONELIST_NOFALLBACK]);
+		count += sprintf(buf + count, "\tZONELIST_NOFALLBACK\n");
+		count += dump_zonelist(buf + count, zonelist);
+	}
+	return count;
+}
+
+static ssize_t
+print_system_zone_details(struct device *dev, struct device_attribute *attr,
+		 char *buf)
+{
+	return dump_zonelists(buf);
+}
+#endif
+
+
 static DEVICE_ATTR(block_size_bytes, 0444, print_block_size, NULL);
+#ifdef CONFIG_NUMA
+static DEVICE_ATTR(system_zone_details, 0444, print_system_zone_details, NULL);
+#endif
 
 /*
  * Memory auto online policy.
@@ -783,6 +832,9 @@ static struct attribute *memory_root_attrs[] = {
 #endif
 
 	&dev_attr_block_size_bytes.attr,
+#ifdef CONFIG_NUMA
+	&dev_attr_system_zone_details.attr,
+#endif
 	&dev_attr_auto_online_blocks.attr,
 	NULL
 };
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a2214c6..d3da022 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -64,6 +64,7 @@
 #include <linux/page_owner.h>
 #include <linux/kthread.h>
 #include <linux/memcontrol.h>
+#include <linux/mmzone.h>
 
 #include <asm/sections.h>
 #include <asm/tlbflush.h>
-- 
2.1.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
  2016-09-08  2:46       ` Anshuman Khandual
  (?)
@ 2016-09-08  7:44       ` kbuild test robot
  -1 siblings, 0 replies; 37+ messages in thread
From: kbuild test robot @ 2016-09-08  7:44 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: kbuild-all, linux-mm, linux-kernel, akpm

[-- Attachment #1: Type: text/plain, Size: 1766 bytes --]

Hi Anshuman,

[auto build test ERROR on driver-core/driver-core-testing]
[also build test ERROR on v4.8-rc5]
[cannot apply to next-20160908]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Anshuman-Khandual/mm-Add-sysfs-interface-to-dump-each-node-s-zonelist-information/20160908-104922
config: x86_64-lkp (attached as .config)
compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/base/memory.c: In function 'dump_zonelist':
>> drivers/base/memory.c:455:4: error: 'zone_names' undeclared (first use in this function)
       zone_names[zonelist->_zonerefs[i].zone_idx],
       ^~~~~~~~~~
   drivers/base/memory.c:455:4: note: each undeclared identifier is reported only once for each function it appears in

vim +/zone_names +455 drivers/base/memory.c

   449		ssize_t count = 0;
   450	
   451		for (i = 0; zonelist->_zonerefs[i].zone; i++) {
   452			count += sprintf(buf + count,
   453				"\t\t(%d) (node %d) (%-7s 0x%pK)\n", i,
   454				zonelist->_zonerefs[i].zone->zone_pgdat->node_id,
 > 455				zone_names[zonelist->_zonerefs[i].zone_idx],
   456				(void *) zonelist->_zonerefs[i].zone);
   457		}
   458		return count;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 23633 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
  2016-09-08  2:46       ` Anshuman Khandual
@ 2016-09-08 20:24         ` Dave Hansen
  -1 siblings, 0 replies; 37+ messages in thread
From: Dave Hansen @ 2016-09-08 20:24 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm, linux-kernel; +Cc: akpm

On 09/07/2016 07:46 PM, Anshuman Khandual wrote:
> after memory or node hot[un]plug is desirable. This change adds one
> new sysfs interface (/sys/devices/system/memory/system_zone_details)
> which will fetch and dump this information.

Doesn't this violate the "one value per file" sysfs rule?  Does it
belong in debugfs instead?

I also really question the need to dump kernel addresses out, filtered
or not.  What's the point?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
@ 2016-09-08 20:24         ` Dave Hansen
  0 siblings, 0 replies; 37+ messages in thread
From: Dave Hansen @ 2016-09-08 20:24 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm, linux-kernel; +Cc: akpm

On 09/07/2016 07:46 PM, Anshuman Khandual wrote:
> after memory or node hot[un]plug is desirable. This change adds one
> new sysfs interface (/sys/devices/system/memory/system_zone_details)
> which will fetch and dump this information.

Doesn't this violate the "one value per file" sysfs rule?  Does it
belong in debugfs instead?

I also really question the need to dump kernel addresses out, filtered
or not.  What's the point?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
  2016-09-08  2:46       ` Anshuman Khandual
@ 2016-09-09 13:36         ` Michal Hocko
  -1 siblings, 0 replies; 37+ messages in thread
From: Michal Hocko @ 2016-09-09 13:36 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: linux-mm, linux-kernel, akpm, Dave Hansen

On Thu 08-09-16 08:16:58, Anshuman Khandual wrote:
> Each individual node in the system has a ZONELIST_FALLBACK zonelist
> and a ZONELIST_NOFALLBACK zonelist. These zonelists decide fallback
> order of zones during memory allocations. Sometimes it helps to dump
> these zonelists to see the priority order of various zones in them.
> 
> Particularly platforms which support memory hotplug into previously
> non existing zones (at boot), this interface helps in visualizing
> which all zonelists of the system at what priority level, the new
> hot added memory ends up in. POWER is such a platform where all the
> memory detected during boot time remains with ZONE_DMA for good but
> then hot plug process can actually get new memory into ZONE_MOVABLE.
> So having a way to get the snapshot of the zonelists on the system
> after memory or node hot[un]plug is desirable. This change adds one
> new sysfs interface (/sys/devices/system/memory/system_zone_details)
> which will fetch and dump this information.

I am still not sure I understand why this is helpful and who is the
consumer for this interface and how it will benefit from the
information. Dave (who doesn't seem to be on the CC list re-added) had
another objection that this breaks one-value-per-file rule for sysfs
files.

This all smells like a debugging feature to me and so it should go into
debugfs.

> Example zonelist information from a KVM guest.
> 
> [NODE (0)]
>         ZONELIST_FALLBACK
>                 (0) (node 0) (DMA     0xc0000000ffff6300)
>                 (1) (node 1) (DMA     0xc0000001ffff6300)
>                 (2) (node 2) (DMA     0xc0000002ffff6300)
>                 (3) (node 3) (DMA     0xc0000003ffdba300)
>         ZONELIST_NOFALLBACK
>                 (0) (node 0) (DMA     0xc0000000ffff6300)
> [NODE (1)]
>         ZONELIST_FALLBACK
>                 (0) (node 1) (DMA     0xc0000001ffff6300)
>                 (1) (node 2) (DMA     0xc0000002ffff6300)
>                 (2) (node 3) (DMA     0xc0000003ffdba300)
>                 (3) (node 0) (DMA     0xc0000000ffff6300)
>         ZONELIST_NOFALLBACK
>                 (0) (node 1) (DMA     0xc0000001ffff6300)
> [NODE (2)]
>         ZONELIST_FALLBACK
>                 (0) (node 2) (DMA     0xc0000002ffff6300)
>                 (1) (node 3) (DMA     0xc0000003ffdba300)
>                 (2) (node 0) (DMA     0xc0000000ffff6300)
>                 (3) (node 1) (DMA     0xc0000001ffff6300)
>         ZONELIST_NOFALLBACK
>                 (0) (node 2) (DMA     0xc0000002ffff6300)
> [NODE (3)]
>         ZONELIST_FALLBACK
>                 (0) (node 3) (DMA     0xc0000003ffdba300)
>                 (1) (node 0) (DMA     0xc0000000ffff6300)
>                 (2) (node 1) (DMA     0xc0000001ffff6300)
>                 (3) (node 2) (DMA     0xc0000002ffff6300)
>         ZONELIST_NOFALLBACK
>                 (0) (node 3) (DMA     0xc0000003ffdba300)
> 
> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> ---
> Changes in V4:
> - Explicitly included mmzone.h header inside page_alloc.c
> - Changed the kernel address printing from %lx to %pK
> 
> Changes in V3:
> - Moved all these new sysfs code inside CONFIG_NUMA
> 
> Changes in V2:
> - Added more details into the commit message
> - Added sysfs interface file details into the commit message
> - Added ../ABI/testing/sysfs-system-zone-details file
> 
>  .../ABI/testing/sysfs-system-zone-details          |  9 ++++
>  drivers/base/memory.c                              | 52 ++++++++++++++++++++++
>  mm/page_alloc.c                                    |  1 +
>  3 files changed, 62 insertions(+)
>  create mode 100644 Documentation/ABI/testing/sysfs-system-zone-details
> 
> diff --git a/Documentation/ABI/testing/sysfs-system-zone-details b/Documentation/ABI/testing/sysfs-system-zone-details
> new file mode 100644
> index 0000000..9c13b2e
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-system-zone-details
> @@ -0,0 +1,9 @@
> +What:		/sys/devices/system/memory/system_zone_details
> +Date:		Sep 2016
> +KernelVersion:	4.8
> +Contact:	khandual@linux.vnet.ibm.com
> +Description:
> +		This read only file dumps the zonelist and it's constituent
> +		zones information for both ZONELIST_FALLBACK and ZONELIST_
> +		NOFALLBACK zonelists for each online node of the system at
> +		any given point of time.
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index dc75de9..c7ab991 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -442,7 +442,56 @@ print_block_size(struct device *dev, struct device_attribute *attr,
>  	return sprintf(buf, "%lx\n", get_memory_block_size());
>  }
>  
> +#ifdef CONFIG_NUMA
> +static ssize_t dump_zonelist(char *buf, struct zonelist *zonelist)
> +{
> +	unsigned int i;
> +	ssize_t count = 0;
> +
> +	for (i = 0; zonelist->_zonerefs[i].zone; i++) {
> +		count += sprintf(buf + count,
> +			"\t\t(%d) (node %d) (%-7s 0x%pK)\n", i,
> +			zonelist->_zonerefs[i].zone->zone_pgdat->node_id,
> +			zone_names[zonelist->_zonerefs[i].zone_idx],
> +			(void *) zonelist->_zonerefs[i].zone);
> +	}
> +	return count;
> +}
> +
> +static ssize_t dump_zonelists(char *buf)
> +{
> +	struct zonelist *zonelist;
> +	unsigned int node;
> +	ssize_t count = 0;
> +
> +	for_each_online_node(node) {
> +		zonelist = &(NODE_DATA(node)->
> +				node_zonelists[ZONELIST_FALLBACK]);
> +		count += sprintf(buf + count, "[NODE (%d)]\n", node);
> +		count += sprintf(buf + count, "\tZONELIST_FALLBACK\n");
> +		count += dump_zonelist(buf + count, zonelist);
> +
> +		zonelist = &(NODE_DATA(node)->
> +				node_zonelists[ZONELIST_NOFALLBACK]);
> +		count += sprintf(buf + count, "\tZONELIST_NOFALLBACK\n");
> +		count += dump_zonelist(buf + count, zonelist);
> +	}
> +	return count;
> +}
> +
> +static ssize_t
> +print_system_zone_details(struct device *dev, struct device_attribute *attr,
> +		 char *buf)
> +{
> +	return dump_zonelists(buf);
> +}
> +#endif
> +
> +
>  static DEVICE_ATTR(block_size_bytes, 0444, print_block_size, NULL);
> +#ifdef CONFIG_NUMA
> +static DEVICE_ATTR(system_zone_details, 0444, print_system_zone_details, NULL);
> +#endif
>  
>  /*
>   * Memory auto online policy.
> @@ -783,6 +832,9 @@ static struct attribute *memory_root_attrs[] = {
>  #endif
>  
>  	&dev_attr_block_size_bytes.attr,
> +#ifdef CONFIG_NUMA
> +	&dev_attr_system_zone_details.attr,
> +#endif
>  	&dev_attr_auto_online_blocks.attr,
>  	NULL
>  };
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a2214c6..d3da022 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -64,6 +64,7 @@
>  #include <linux/page_owner.h>
>  #include <linux/kthread.h>
>  #include <linux/memcontrol.h>
> +#include <linux/mmzone.h>
>  
>  #include <asm/sections.h>
>  #include <asm/tlbflush.h>
> -- 
> 2.1.0
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
@ 2016-09-09 13:36         ` Michal Hocko
  0 siblings, 0 replies; 37+ messages in thread
From: Michal Hocko @ 2016-09-09 13:36 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: linux-mm, linux-kernel, akpm, Dave Hansen

On Thu 08-09-16 08:16:58, Anshuman Khandual wrote:
> Each individual node in the system has a ZONELIST_FALLBACK zonelist
> and a ZONELIST_NOFALLBACK zonelist. These zonelists decide fallback
> order of zones during memory allocations. Sometimes it helps to dump
> these zonelists to see the priority order of various zones in them.
> 
> Particularly platforms which support memory hotplug into previously
> non existing zones (at boot), this interface helps in visualizing
> which all zonelists of the system at what priority level, the new
> hot added memory ends up in. POWER is such a platform where all the
> memory detected during boot time remains with ZONE_DMA for good but
> then hot plug process can actually get new memory into ZONE_MOVABLE.
> So having a way to get the snapshot of the zonelists on the system
> after memory or node hot[un]plug is desirable. This change adds one
> new sysfs interface (/sys/devices/system/memory/system_zone_details)
> which will fetch and dump this information.

I am still not sure I understand why this is helpful and who is the
consumer for this interface and how it will benefit from the
information. Dave (who doesn't seem to be on the CC list re-added) had
another objection that this breaks one-value-per-file rule for sysfs
files.

This all smells like a debugging feature to me and so it should go into
debugfs.

> Example zonelist information from a KVM guest.
> 
> [NODE (0)]
>         ZONELIST_FALLBACK
>                 (0) (node 0) (DMA     0xc0000000ffff6300)
>                 (1) (node 1) (DMA     0xc0000001ffff6300)
>                 (2) (node 2) (DMA     0xc0000002ffff6300)
>                 (3) (node 3) (DMA     0xc0000003ffdba300)
>         ZONELIST_NOFALLBACK
>                 (0) (node 0) (DMA     0xc0000000ffff6300)
> [NODE (1)]
>         ZONELIST_FALLBACK
>                 (0) (node 1) (DMA     0xc0000001ffff6300)
>                 (1) (node 2) (DMA     0xc0000002ffff6300)
>                 (2) (node 3) (DMA     0xc0000003ffdba300)
>                 (3) (node 0) (DMA     0xc0000000ffff6300)
>         ZONELIST_NOFALLBACK
>                 (0) (node 1) (DMA     0xc0000001ffff6300)
> [NODE (2)]
>         ZONELIST_FALLBACK
>                 (0) (node 2) (DMA     0xc0000002ffff6300)
>                 (1) (node 3) (DMA     0xc0000003ffdba300)
>                 (2) (node 0) (DMA     0xc0000000ffff6300)
>                 (3) (node 1) (DMA     0xc0000001ffff6300)
>         ZONELIST_NOFALLBACK
>                 (0) (node 2) (DMA     0xc0000002ffff6300)
> [NODE (3)]
>         ZONELIST_FALLBACK
>                 (0) (node 3) (DMA     0xc0000003ffdba300)
>                 (1) (node 0) (DMA     0xc0000000ffff6300)
>                 (2) (node 1) (DMA     0xc0000001ffff6300)
>                 (3) (node 2) (DMA     0xc0000002ffff6300)
>         ZONELIST_NOFALLBACK
>                 (0) (node 3) (DMA     0xc0000003ffdba300)
> 
> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> ---
> Changes in V4:
> - Explicitly included mmzone.h header inside page_alloc.c
> - Changed the kernel address printing from %lx to %pK
> 
> Changes in V3:
> - Moved all these new sysfs code inside CONFIG_NUMA
> 
> Changes in V2:
> - Added more details into the commit message
> - Added sysfs interface file details into the commit message
> - Added ../ABI/testing/sysfs-system-zone-details file
> 
>  .../ABI/testing/sysfs-system-zone-details          |  9 ++++
>  drivers/base/memory.c                              | 52 ++++++++++++++++++++++
>  mm/page_alloc.c                                    |  1 +
>  3 files changed, 62 insertions(+)
>  create mode 100644 Documentation/ABI/testing/sysfs-system-zone-details
> 
> diff --git a/Documentation/ABI/testing/sysfs-system-zone-details b/Documentation/ABI/testing/sysfs-system-zone-details
> new file mode 100644
> index 0000000..9c13b2e
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-system-zone-details
> @@ -0,0 +1,9 @@
> +What:		/sys/devices/system/memory/system_zone_details
> +Date:		Sep 2016
> +KernelVersion:	4.8
> +Contact:	khandual@linux.vnet.ibm.com
> +Description:
> +		This read only file dumps the zonelist and it's constituent
> +		zones information for both ZONELIST_FALLBACK and ZONELIST_
> +		NOFALLBACK zonelists for each online node of the system at
> +		any given point of time.
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index dc75de9..c7ab991 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -442,7 +442,56 @@ print_block_size(struct device *dev, struct device_attribute *attr,
>  	return sprintf(buf, "%lx\n", get_memory_block_size());
>  }
>  
> +#ifdef CONFIG_NUMA
> +static ssize_t dump_zonelist(char *buf, struct zonelist *zonelist)
> +{
> +	unsigned int i;
> +	ssize_t count = 0;
> +
> +	for (i = 0; zonelist->_zonerefs[i].zone; i++) {
> +		count += sprintf(buf + count,
> +			"\t\t(%d) (node %d) (%-7s 0x%pK)\n", i,
> +			zonelist->_zonerefs[i].zone->zone_pgdat->node_id,
> +			zone_names[zonelist->_zonerefs[i].zone_idx],
> +			(void *) zonelist->_zonerefs[i].zone);
> +	}
> +	return count;
> +}
> +
> +static ssize_t dump_zonelists(char *buf)
> +{
> +	struct zonelist *zonelist;
> +	unsigned int node;
> +	ssize_t count = 0;
> +
> +	for_each_online_node(node) {
> +		zonelist = &(NODE_DATA(node)->
> +				node_zonelists[ZONELIST_FALLBACK]);
> +		count += sprintf(buf + count, "[NODE (%d)]\n", node);
> +		count += sprintf(buf + count, "\tZONELIST_FALLBACK\n");
> +		count += dump_zonelist(buf + count, zonelist);
> +
> +		zonelist = &(NODE_DATA(node)->
> +				node_zonelists[ZONELIST_NOFALLBACK]);
> +		count += sprintf(buf + count, "\tZONELIST_NOFALLBACK\n");
> +		count += dump_zonelist(buf + count, zonelist);
> +	}
> +	return count;
> +}
> +
> +static ssize_t
> +print_system_zone_details(struct device *dev, struct device_attribute *attr,
> +		 char *buf)
> +{
> +	return dump_zonelists(buf);
> +}
> +#endif
> +
> +
>  static DEVICE_ATTR(block_size_bytes, 0444, print_block_size, NULL);
> +#ifdef CONFIG_NUMA
> +static DEVICE_ATTR(system_zone_details, 0444, print_system_zone_details, NULL);
> +#endif
>  
>  /*
>   * Memory auto online policy.
> @@ -783,6 +832,9 @@ static struct attribute *memory_root_attrs[] = {
>  #endif
>  
>  	&dev_attr_block_size_bytes.attr,
> +#ifdef CONFIG_NUMA
> +	&dev_attr_system_zone_details.attr,
> +#endif
>  	&dev_attr_auto_online_blocks.attr,
>  	NULL
>  };
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a2214c6..d3da022 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -64,6 +64,7 @@
>  #include <linux/page_owner.h>
>  #include <linux/kthread.h>
>  #include <linux/memcontrol.h>
> +#include <linux/mmzone.h>
>  
>  #include <asm/sections.h>
>  #include <asm/tlbflush.h>
> -- 
> 2.1.0
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
  2016-09-09 13:36         ` Michal Hocko
@ 2016-09-12  5:24           ` Anshuman Khandual
  -1 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2016-09-12  5:24 UTC (permalink / raw)
  To: Michal Hocko, Anshuman Khandual; +Cc: linux-mm, linux-kernel, akpm, Dave Hansen

On 09/09/2016 07:06 PM, Michal Hocko wrote:
> On Thu 08-09-16 08:16:58, Anshuman Khandual wrote:
>> > Each individual node in the system has a ZONELIST_FALLBACK zonelist
>> > and a ZONELIST_NOFALLBACK zonelist. These zonelists decide fallback
>> > order of zones during memory allocations. Sometimes it helps to dump
>> > these zonelists to see the priority order of various zones in them.
>> > 
>> > Particularly platforms which support memory hotplug into previously
>> > non existing zones (at boot), this interface helps in visualizing
>> > which all zonelists of the system at what priority level, the new
>> > hot added memory ends up in. POWER is such a platform where all the
>> > memory detected during boot time remains with ZONE_DMA for good but
>> > then hot plug process can actually get new memory into ZONE_MOVABLE.
>> > So having a way to get the snapshot of the zonelists on the system
>> > after memory or node hot[un]plug is desirable. This change adds one
>> > new sysfs interface (/sys/devices/system/memory/system_zone_details)
>> > which will fetch and dump this information.
> I am still not sure I understand why this is helpful and who is the
> consumer for this interface and how it will benefit from the
> information. Dave (who doesn't seem to be on the CC list re-added) had
> another objection that this breaks one-value-per-file rule for sysfs
> files.

It helps in understanding the relative priority of each memory zone of the
system during various allocation scenarios. Its particularly helpful after
hotplug/unplug of additional memory into previously non existing zone on
a node.

> 
> This all smells like a debugging feature to me and so it should go into
> debugfs.

Sure, will make it a debugfs interface.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
@ 2016-09-12  5:24           ` Anshuman Khandual
  0 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2016-09-12  5:24 UTC (permalink / raw)
  To: Michal Hocko, Anshuman Khandual; +Cc: linux-mm, linux-kernel, akpm, Dave Hansen

On 09/09/2016 07:06 PM, Michal Hocko wrote:
> On Thu 08-09-16 08:16:58, Anshuman Khandual wrote:
>> > Each individual node in the system has a ZONELIST_FALLBACK zonelist
>> > and a ZONELIST_NOFALLBACK zonelist. These zonelists decide fallback
>> > order of zones during memory allocations. Sometimes it helps to dump
>> > these zonelists to see the priority order of various zones in them.
>> > 
>> > Particularly platforms which support memory hotplug into previously
>> > non existing zones (at boot), this interface helps in visualizing
>> > which all zonelists of the system at what priority level, the new
>> > hot added memory ends up in. POWER is such a platform where all the
>> > memory detected during boot time remains with ZONE_DMA for good but
>> > then hot plug process can actually get new memory into ZONE_MOVABLE.
>> > So having a way to get the snapshot of the zonelists on the system
>> > after memory or node hot[un]plug is desirable. This change adds one
>> > new sysfs interface (/sys/devices/system/memory/system_zone_details)
>> > which will fetch and dump this information.
> I am still not sure I understand why this is helpful and who is the
> consumer for this interface and how it will benefit from the
> information. Dave (who doesn't seem to be on the CC list re-added) had
> another objection that this breaks one-value-per-file rule for sysfs
> files.

It helps in understanding the relative priority of each memory zone of the
system during various allocation scenarios. Its particularly helpful after
hotplug/unplug of additional memory into previously non existing zone on
a node.

> 
> This all smells like a debugging feature to me and so it should go into
> debugfs.

Sure, will make it a debugfs interface.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
  2016-09-08 20:24         ` Dave Hansen
@ 2016-09-12  5:27           ` Anshuman Khandual
  -1 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2016-09-12  5:27 UTC (permalink / raw)
  To: Dave Hansen, Anshuman Khandual, linux-mm, linux-kernel; +Cc: akpm

On 09/09/2016 01:54 AM, Dave Hansen wrote:
> On 09/07/2016 07:46 PM, Anshuman Khandual wrote:
>> > after memory or node hot[un]plug is desirable. This change adds one
>> > new sysfs interface (/sys/devices/system/memory/system_zone_details)
>> > which will fetch and dump this information.
> Doesn't this violate the "one value per file" sysfs rule?  Does it
> belong in debugfs instead?

Yeah sure. Will make it a debugfs interface.

> 
> I also really question the need to dump kernel addresses out, filtered
> or not.  What's the point?

Hmm, thought it to be an additional information. But yes its additional
and can be dropped.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
@ 2016-09-12  5:27           ` Anshuman Khandual
  0 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2016-09-12  5:27 UTC (permalink / raw)
  To: Dave Hansen, Anshuman Khandual, linux-mm, linux-kernel; +Cc: akpm

On 09/09/2016 01:54 AM, Dave Hansen wrote:
> On 09/07/2016 07:46 PM, Anshuman Khandual wrote:
>> > after memory or node hot[un]plug is desirable. This change adds one
>> > new sysfs interface (/sys/devices/system/memory/system_zone_details)
>> > which will fetch and dump this information.
> Doesn't this violate the "one value per file" sysfs rule?  Does it
> belong in debugfs instead?

Yeah sure. Will make it a debugfs interface.

> 
> I also really question the need to dump kernel addresses out, filtered
> or not.  What's the point?

Hmm, thought it to be an additional information. But yes its additional
and can be dropped.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
  2016-09-12  5:27           ` Anshuman Khandual
@ 2016-09-12 18:13             ` David Rientjes
  -1 siblings, 0 replies; 37+ messages in thread
From: David Rientjes @ 2016-09-12 18:13 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: Dave Hansen, linux-mm, linux-kernel, akpm

On Mon, 12 Sep 2016, Anshuman Khandual wrote:

> >> > after memory or node hot[un]plug is desirable. This change adds one
> >> > new sysfs interface (/sys/devices/system/memory/system_zone_details)
> >> > which will fetch and dump this information.
> > Doesn't this violate the "one value per file" sysfs rule?  Does it
> > belong in debugfs instead?
> 
> Yeah sure. Will make it a debugfs interface.
> 

So the intended reader of this file is running as root?

> > I also really question the need to dump kernel addresses out, filtered 
> > or not.  What's the point?
> 
> Hmm, thought it to be an additional information. But yes its additional
> and can be dropped.
> 

I'm questioning if this information can be inferred from information 
already in /proc/zoneinfo and sysfs.  We know the no-fallback zonelist is 
going to include the local node, and we know the other zonelists are 
either node ordered or zone ordered (or do we need to extend 
vm.numa_zonelist_order for default?).  I may have missed what new 
knowledge this interface is imparting on us.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
@ 2016-09-12 18:13             ` David Rientjes
  0 siblings, 0 replies; 37+ messages in thread
From: David Rientjes @ 2016-09-12 18:13 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: Dave Hansen, linux-mm, linux-kernel, akpm

On Mon, 12 Sep 2016, Anshuman Khandual wrote:

> >> > after memory or node hot[un]plug is desirable. This change adds one
> >> > new sysfs interface (/sys/devices/system/memory/system_zone_details)
> >> > which will fetch and dump this information.
> > Doesn't this violate the "one value per file" sysfs rule?  Does it
> > belong in debugfs instead?
> 
> Yeah sure. Will make it a debugfs interface.
> 

So the intended reader of this file is running as root?

> > I also really question the need to dump kernel addresses out, filtered 
> > or not.  What's the point?
> 
> Hmm, thought it to be an additional information. But yes its additional
> and can be dropped.
> 

I'm questioning if this information can be inferred from information 
already in /proc/zoneinfo and sysfs.  We know the no-fallback zonelist is 
going to include the local node, and we know the other zonelists are 
either node ordered or zone ordered (or do we need to extend 
vm.numa_zonelist_order for default?).  I may have missed what new 
knowledge this interface is imparting on us.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
  2016-09-12 18:13             ` David Rientjes
@ 2016-09-17  4:26               ` Anshuman Khandual
  -1 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2016-09-17  4:26 UTC (permalink / raw)
  To: David Rientjes, Anshuman Khandual
  Cc: Dave Hansen, linux-mm, linux-kernel, akpm

On 09/12/2016 11:43 PM, David Rientjes wrote:
> On Mon, 12 Sep 2016, Anshuman Khandual wrote:
> 
>>>>> after memory or node hot[un]plug is desirable. This change adds one
>>>>> new sysfs interface (/sys/devices/system/memory/system_zone_details)
>>>>> which will fetch and dump this information.
>>> Doesn't this violate the "one value per file" sysfs rule?  Does it
>>> belong in debugfs instead?
>>
>> Yeah sure. Will make it a debugfs interface.
>>
> 
> So the intended reader of this file is running as root?

Yeah.

> 
>>> I also really question the need to dump kernel addresses out, filtered 
>>> or not.  What's the point?
>>
>> Hmm, thought it to be an additional information. But yes its additional
>> and can be dropped.
>>
> 
> I'm questioning if this information can be inferred from information 
> already in /proc/zoneinfo and sysfs.  We know the no-fallback zonelist is 
> going to include the local node, and we know the other zonelists are 
> either node ordered or zone ordered (or do we need to extend 
> vm.numa_zonelist_order for default?).  I may have missed what new 
> knowledge this interface is imparting on us.

IIUC /proc/zoneinfo lists down zone internal state and statistics for
all zones on the system at any given point of time. The no-fallback
list contains the zones from the local node and fallback (which gets
used more often than the no-fallback) list contains all zones either
in node-ordered or zone-ordered manner. In most of the platforms the
default being the node order but the sequence of present nodes in
that order is determined by various factors like NUMA distance, load,
presence of CPUs on the node etc. This order of nodes in the fallback
list is the most important information derived out of this interface.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
@ 2016-09-17  4:26               ` Anshuman Khandual
  0 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2016-09-17  4:26 UTC (permalink / raw)
  To: David Rientjes, Anshuman Khandual
  Cc: Dave Hansen, linux-mm, linux-kernel, akpm

On 09/12/2016 11:43 PM, David Rientjes wrote:
> On Mon, 12 Sep 2016, Anshuman Khandual wrote:
> 
>>>>> after memory or node hot[un]plug is desirable. This change adds one
>>>>> new sysfs interface (/sys/devices/system/memory/system_zone_details)
>>>>> which will fetch and dump this information.
>>> Doesn't this violate the "one value per file" sysfs rule?  Does it
>>> belong in debugfs instead?
>>
>> Yeah sure. Will make it a debugfs interface.
>>
> 
> So the intended reader of this file is running as root?

Yeah.

> 
>>> I also really question the need to dump kernel addresses out, filtered 
>>> or not.  What's the point?
>>
>> Hmm, thought it to be an additional information. But yes its additional
>> and can be dropped.
>>
> 
> I'm questioning if this information can be inferred from information 
> already in /proc/zoneinfo and sysfs.  We know the no-fallback zonelist is 
> going to include the local node, and we know the other zonelists are 
> either node ordered or zone ordered (or do we need to extend 
> vm.numa_zonelist_order for default?).  I may have missed what new 
> knowledge this interface is imparting on us.

IIUC /proc/zoneinfo lists down zone internal state and statistics for
all zones on the system at any given point of time. The no-fallback
list contains the zones from the local node and fallback (which gets
used more often than the no-fallback) list contains all zones either
in node-ordered or zone-ordered manner. In most of the platforms the
default being the node order but the sequence of present nodes in
that order is determined by various factors like NUMA distance, load,
presence of CPUs on the node etc. This order of nodes in the fallback
list is the most important information derived out of this interface.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
  2016-09-17  4:26               ` Anshuman Khandual
@ 2016-09-20  0:54                 ` David Rientjes
  -1 siblings, 0 replies; 37+ messages in thread
From: David Rientjes @ 2016-09-20  0:54 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: Dave Hansen, linux-mm, linux-kernel, akpm

On Sat, 17 Sep 2016, Anshuman Khandual wrote:

> > I'm questioning if this information can be inferred from information 
> > already in /proc/zoneinfo and sysfs.  We know the no-fallback zonelist is 
> > going to include the local node, and we know the other zonelists are 
> > either node ordered or zone ordered (or do we need to extend 
> > vm.numa_zonelist_order for default?).  I may have missed what new 
> > knowledge this interface is imparting on us.
> 
> IIUC /proc/zoneinfo lists down zone internal state and statistics for
> all zones on the system at any given point of time. The no-fallback
> list contains the zones from the local node and fallback (which gets
> used more often than the no-fallback) list contains all zones either
> in node-ordered or zone-ordered manner. In most of the platforms the
> default being the node order but the sequence of present nodes in
> that order is determined by various factors like NUMA distance, load,
> presence of CPUs on the node etc. This order of nodes in the fallback
> list is the most important information derived out of this interface.
> 

The point is that all of this can be inferred with information already 
provided, so the additional interface seems unnecessary.  The only 
extension I think that is needed is to determine if the order is node or 
zone when vm.numa_zonelist_order == default and we shouldn't parse this 
from dmesg.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
@ 2016-09-20  0:54                 ` David Rientjes
  0 siblings, 0 replies; 37+ messages in thread
From: David Rientjes @ 2016-09-20  0:54 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: Dave Hansen, linux-mm, linux-kernel, akpm

On Sat, 17 Sep 2016, Anshuman Khandual wrote:

> > I'm questioning if this information can be inferred from information 
> > already in /proc/zoneinfo and sysfs.  We know the no-fallback zonelist is 
> > going to include the local node, and we know the other zonelists are 
> > either node ordered or zone ordered (or do we need to extend 
> > vm.numa_zonelist_order for default?).  I may have missed what new 
> > knowledge this interface is imparting on us.
> 
> IIUC /proc/zoneinfo lists down zone internal state and statistics for
> all zones on the system at any given point of time. The no-fallback
> list contains the zones from the local node and fallback (which gets
> used more often than the no-fallback) list contains all zones either
> in node-ordered or zone-ordered manner. In most of the platforms the
> default being the node order but the sequence of present nodes in
> that order is determined by various factors like NUMA distance, load,
> presence of CPUs on the node etc. This order of nodes in the fallback
> list is the most important information derived out of this interface.
> 

The point is that all of this can be inferred with information already 
provided, so the additional interface seems unnecessary.  The only 
extension I think that is needed is to determine if the order is node or 
zone when vm.numa_zonelist_order == default and we shouldn't parse this 
from dmesg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
  2016-09-20  0:54                 ` David Rientjes
@ 2016-10-13 14:38                   ` Anshuman Khandual
  -1 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2016-10-13 14:38 UTC (permalink / raw)
  To: David Rientjes, Anshuman Khandual
  Cc: Dave Hansen, linux-mm, linux-kernel, akpm

On 09/20/2016 06:24 AM, David Rientjes wrote:
> On Sat, 17 Sep 2016, Anshuman Khandual wrote:
> 
>>> > > I'm questioning if this information can be inferred from information 
>>> > > already in /proc/zoneinfo and sysfs.  We know the no-fallback zonelist is 
>>> > > going to include the local node, and we know the other zonelists are 
>>> > > either node ordered or zone ordered (or do we need to extend 
>>> > > vm.numa_zonelist_order for default?).  I may have missed what new 
>>> > > knowledge this interface is imparting on us.
>> > 
>> > IIUC /proc/zoneinfo lists down zone internal state and statistics for
>> > all zones on the system at any given point of time. The no-fallback
>> > list contains the zones from the local node and fallback (which gets
>> > used more often than the no-fallback) list contains all zones either
>> > in node-ordered or zone-ordered manner. In most of the platforms the
>> > default being the node order but the sequence of present nodes in
>> > that order is determined by various factors like NUMA distance, load,
>> > presence of CPUs on the node etc. This order of nodes in the fallback
>> > list is the most important information derived out of this interface.
>> > 
> The point is that all of this can be inferred with information already 
> provided, so the additional interface seems unnecessary.  The only 
> extension I think that is needed is to determine if the order is node or 
> zone when vm.numa_zonelist_order == default and we shouldn't parse this 
> from dmesg.

Okay. Seems like the general view is that this interface is not necessary.
Hence wont be posting the debugfs version for now.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
@ 2016-10-13 14:38                   ` Anshuman Khandual
  0 siblings, 0 replies; 37+ messages in thread
From: Anshuman Khandual @ 2016-10-13 14:38 UTC (permalink / raw)
  To: David Rientjes, Anshuman Khandual
  Cc: Dave Hansen, linux-mm, linux-kernel, akpm

On 09/20/2016 06:24 AM, David Rientjes wrote:
> On Sat, 17 Sep 2016, Anshuman Khandual wrote:
> 
>>> > > I'm questioning if this information can be inferred from information 
>>> > > already in /proc/zoneinfo and sysfs.  We know the no-fallback zonelist is 
>>> > > going to include the local node, and we know the other zonelists are 
>>> > > either node ordered or zone ordered (or do we need to extend 
>>> > > vm.numa_zonelist_order for default?).  I may have missed what new 
>>> > > knowledge this interface is imparting on us.
>> > 
>> > IIUC /proc/zoneinfo lists down zone internal state and statistics for
>> > all zones on the system at any given point of time. The no-fallback
>> > list contains the zones from the local node and fallback (which gets
>> > used more often than the no-fallback) list contains all zones either
>> > in node-ordered or zone-ordered manner. In most of the platforms the
>> > default being the node order but the sequence of present nodes in
>> > that order is determined by various factors like NUMA distance, load,
>> > presence of CPUs on the node etc. This order of nodes in the fallback
>> > list is the most important information derived out of this interface.
>> > 
> The point is that all of this can be inferred with information already 
> provided, so the additional interface seems unnecessary.  The only 
> extension I think that is needed is to determine if the order is node or 
> zone when vm.numa_zonelist_order == default and we shouldn't parse this 
> from dmesg.

Okay. Seems like the general view is that this interface is not necessary.
Hence wont be posting the debugfs version for now.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2016-10-13 20:10 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-06  5:34 [PATCH V2 1/2] mm: Export definition of 'zone_names' array through mmzone.h Anshuman Khandual
2016-09-06  5:34 ` Anshuman Khandual
2016-09-06  5:34 ` [PATCH V2 2/2] mm: Add sysfs interface to dump each node's zonelist information Anshuman Khandual
2016-09-06  5:34   ` Anshuman Khandual
2016-09-06  6:11   ` kbuild test robot
2016-09-06  6:49     ` Anshuman Khandual
2016-09-06  6:49       ` Anshuman Khandual
2016-09-06  8:31   ` [PATCH V3] " Anshuman Khandual
2016-09-06  8:31     ` Anshuman Khandual
2016-09-06  9:05     ` kbuild test robot
2016-09-07 12:32       ` Anshuman Khandual
2016-09-07 12:32         ` Anshuman Khandual
2016-09-06 20:36     ` Dave Hansen
2016-09-06 20:36       ` Dave Hansen
2016-09-07  3:08       ` Kees Cook
2016-09-07  3:08         ` Kees Cook
2016-09-07  4:00         ` Anshuman Khandual
2016-09-07  4:00           ` Anshuman Khandual
2016-09-08  2:46     ` [PATCH V4] " Anshuman Khandual
2016-09-08  2:46       ` Anshuman Khandual
2016-09-08  7:44       ` kbuild test robot
2016-09-08 20:24       ` Dave Hansen
2016-09-08 20:24         ` Dave Hansen
2016-09-12  5:27         ` Anshuman Khandual
2016-09-12  5:27           ` Anshuman Khandual
2016-09-12 18:13           ` David Rientjes
2016-09-12 18:13             ` David Rientjes
2016-09-17  4:26             ` Anshuman Khandual
2016-09-17  4:26               ` Anshuman Khandual
2016-09-20  0:54               ` David Rientjes
2016-09-20  0:54                 ` David Rientjes
2016-10-13 14:38                 ` Anshuman Khandual
2016-10-13 14:38                   ` Anshuman Khandual
2016-09-09 13:36       ` Michal Hocko
2016-09-09 13:36         ` Michal Hocko
2016-09-12  5:24         ` Anshuman Khandual
2016-09-12  5:24           ` Anshuman Khandual

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.