linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] mm: Move definition of 'zone_names' array into mmzone.h
@ 2016-08-31  3:25 Anshuman Khandual
  2016-08-31  3:25 ` [PATCH 2/2] mm: Add sysfs interface to dump each node's zonelist information Anshuman Khandual
  2016-08-31 21:10 ` [PATCH 1/2] mm: Move definition of 'zone_names' array into mmzone.h Andrew Morton
  0 siblings, 2 replies; 6+ messages in thread
From: Anshuman Khandual @ 2016-08-31  3:25 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: akpm

zone_names[] is used to identify any zone given it's index which
can be used in many other places. So moving the definition into
include/linux/mmzone.h for broader access.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 include/linux/mmzone.h | 17 +++++++++++++++++
 mm/page_alloc.c        | 17 -----------------
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d572b78..01ca3f4 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -341,6 +341,23 @@ enum zone_type {
 
 };
 
+static char * const zone_names[__MAX_NR_ZONES] = {
+#ifdef CONFIG_ZONE_DMA
+	 "DMA",
+#endif
+#ifdef CONFIG_ZONE_DMA32
+	 "DMA32",
+#endif
+	 "Normal",
+#ifdef CONFIG_HIGHMEM
+	 "HighMem",
+#endif
+	 "Movable",
+#ifdef CONFIG_ZONE_DEVICE
+	 "Device",
+#endif
+};
+
 #ifndef __GENERATING_BOUNDS_H
 
 struct zone {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3fbe73a..8e2261c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -207,23 +207,6 @@ int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
 
 EXPORT_SYMBOL(totalram_pages);
 
-static char * const zone_names[MAX_NR_ZONES] = {
-#ifdef CONFIG_ZONE_DMA
-	 "DMA",
-#endif
-#ifdef CONFIG_ZONE_DMA32
-	 "DMA32",
-#endif
-	 "Normal",
-#ifdef CONFIG_HIGHMEM
-	 "HighMem",
-#endif
-	 "Movable",
-#ifdef CONFIG_ZONE_DEVICE
-	 "Device",
-#endif
-};
-
 char * const migratetype_names[MIGRATE_TYPES] = {
 	"Unmovable",
 	"Movable",
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] mm: Add sysfs interface to dump each node's zonelist information
  2016-08-31  3:25 [PATCH 1/2] mm: Move definition of 'zone_names' array into mmzone.h Anshuman Khandual
@ 2016-08-31  3:25 ` Anshuman Khandual
  2016-08-31 21:12   ` Andrew Morton
  2016-08-31 21:10 ` [PATCH 1/2] mm: Move definition of 'zone_names' array into mmzone.h Andrew Morton
  1 sibling, 1 reply; 6+ messages in thread
From: Anshuman Khandual @ 2016-08-31  3:25 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: akpm

Each individual node in the system has a ZONELIST_FALLBACK zonelist
and a ZONELIST_NOFALLBACK zonelist. These zonelists decide fallback
order of zones during memory allocations. Sometimes it helps to dump
these zonelists to see the priority order of various zones in them.
This change just adds a sysfs interface for doing the same.

Example zonelist information from a KVM guest.

[NODE (0)]
        ZONELIST_FALLBACK
        (0) (node 0) (zone DMA c00000000140c000)
        (1) (node 1) (zone DMA c000000100000000)
        (2) (node 2) (zone DMA c000000200000000)
        (3) (node 3) (zone DMA c000000300000000)
        ZONELIST_NOFALLBACK
        (0) (node 0) (zone DMA c00000000140c000)
[NODE (1)]
        ZONELIST_FALLBACK
        (0) (node 1) (zone DMA c000000100000000)
        (1) (node 2) (zone DMA c000000200000000)
        (2) (node 3) (zone DMA c000000300000000)
        (3) (node 0) (zone DMA c00000000140c000)
        ZONELIST_NOFALLBACK
        (0) (node 1) (zone DMA c000000100000000)
[NODE (2)]
        ZONELIST_FALLBACK
        (0) (node 2) (zone DMA c000000200000000)
        (1) (node 3) (zone DMA c000000300000000)
        (2) (node 0) (zone DMA c00000000140c000)
        (3) (node 1) (zone DMA c000000100000000)
        ZONELIST_NOFALLBACK
        (0) (node 2) (zone DMA c000000200000000)
[NODE (3)]
        ZONELIST_FALLBACK
        (0) (node 3) (zone DMA c000000300000000)
        (1) (node 0) (zone DMA c00000000140c000)
        (2) (node 1) (zone DMA c000000100000000)
        (3) (node 2) (zone DMA c000000200000000)
        ZONELIST_NOFALLBACK
        (0) (node 3) (zone DMA c000000300000000)

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 drivers/base/memory.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index dc75de9..8c9330a 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -442,7 +442,52 @@ print_block_size(struct device *dev, struct device_attribute *attr,
 	return sprintf(buf, "%lx\n", get_memory_block_size());
 }
 
+static ssize_t dump_zonelist(char *buf, struct zonelist *zonelist)
+{
+	unsigned int i;
+	ssize_t count = 0;
+
+	for (i = 0; zonelist->_zonerefs[i].zone; i++) {
+		count += sprintf(buf + count,
+			"\t\t(%d) (node %d) (%-10s %lx)\n", i,
+			zonelist->_zonerefs[i].zone->zone_pgdat->node_id,
+			zone_names[zonelist->_zonerefs[i].zone_idx],
+			(unsigned long) zonelist->_zonerefs[i].zone);
+	}
+	return count;
+}
+
+static ssize_t dump_zonelists(char *buf)
+{
+	struct zonelist *zonelist;
+	unsigned int node;
+	ssize_t count = 0;
+
+	for_each_online_node(node) {
+		zonelist = &(NODE_DATA(node)->
+				node_zonelists[ZONELIST_FALLBACK]);
+		count += sprintf(buf + count, "[NODE (%d)]\n", node);
+		count += sprintf(buf + count, "\tZONELIST_FALLBACK\n");
+		count += dump_zonelist(buf + count, zonelist);
+
+		zonelist = &(NODE_DATA(node)->
+				node_zonelists[ZONELIST_NOFALLBACK]);
+		count += sprintf(buf + count, "\tZONELIST_NOFALLBACK\n");
+		count += dump_zonelist(buf + count, zonelist);
+	}
+	return count;
+}
+
+static ssize_t
+print_system_zone_details(struct device *dev, struct device_attribute *attr,
+		 char *buf)
+{
+	return dump_zonelists(buf);
+}
+
+
 static DEVICE_ATTR(block_size_bytes, 0444, print_block_size, NULL);
+static DEVICE_ATTR(system_zone_details, 0444, print_system_zone_details, NULL);
 
 /*
  * Memory auto online policy.
@@ -783,6 +828,7 @@ static struct attribute *memory_root_attrs[] = {
 #endif
 
 	&dev_attr_block_size_bytes.attr,
+	&dev_attr_system_zone_details.attr,
 	&dev_attr_auto_online_blocks.attr,
 	NULL
 };
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] mm: Move definition of 'zone_names' array into mmzone.h
  2016-08-31  3:25 [PATCH 1/2] mm: Move definition of 'zone_names' array into mmzone.h Anshuman Khandual
  2016-08-31  3:25 ` [PATCH 2/2] mm: Add sysfs interface to dump each node's zonelist information Anshuman Khandual
@ 2016-08-31 21:10 ` Andrew Morton
  2016-09-02  4:29   ` Anshuman Khandual
  1 sibling, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2016-08-31 21:10 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: linux-mm, linux-kernel

On Wed, 31 Aug 2016 08:55:49 +0530 Anshuman Khandual <khandual@linux.vnet.ibm.com> wrote:

> zone_names[] is used to identify any zone given it's index which
> can be used in many other places. So moving the definition into
> include/linux/mmzone.h for broader access.
> 
> ...
>
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -341,6 +341,23 @@ enum zone_type {
>  
>  };
>  
> +static char * const zone_names[__MAX_NR_ZONES] = {
> +#ifdef CONFIG_ZONE_DMA
> +	 "DMA",
> +#endif
> +#ifdef CONFIG_ZONE_DMA32
> +	 "DMA32",
> +#endif
> +	 "Normal",
> +#ifdef CONFIG_HIGHMEM
> +	 "HighMem",
> +#endif
> +	 "Movable",
> +#ifdef CONFIG_ZONE_DEVICE
> +	 "Device",
> +#endif
> +};
> +
>  #ifndef __GENERATING_BOUNDS_H
>  
>  struct zone {
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3fbe73a..8e2261c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -207,23 +207,6 @@ int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
>  
>  EXPORT_SYMBOL(totalram_pages);
>  
> -static char * const zone_names[MAX_NR_ZONES] = {
> -#ifdef CONFIG_ZONE_DMA
> -	 "DMA",
> -#endif
> -#ifdef CONFIG_ZONE_DMA32
> -	 "DMA32",
> -#endif
> -	 "Normal",
> -#ifdef CONFIG_HIGHMEM
> -	 "HighMem",
> -#endif
> -	 "Movable",
> -#ifdef CONFIG_ZONE_DEVICE
> -	 "Device",
> -#endif
> -};
> -
>  char * const migratetype_names[MIGRATE_TYPES] = {
>  	"Unmovable",
>  	"Movable",

This is worrisome.  On some (ancient) compilers, this will produce a
copy of that array into each compilation unit which includes mmzone.h.

On smarter compilers, it will produce a copy of the array in each
compilation unit which *uses* zone_names[].

On even smarter compilers (and linkers!), only one copy of zone_names[]
will exist in vmlinux.

I don't know if gcc is an "even smarter compiler" and I didn't check,
and I didn't check which gcc versions are even smarter.  I'd rather not
have to ;) It is risky.

So, let's just make it non-static and add a declaration into mmzone.h,
please.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] mm: Add sysfs interface to dump each node's zonelist information
  2016-08-31  3:25 ` [PATCH 2/2] mm: Add sysfs interface to dump each node's zonelist information Anshuman Khandual
@ 2016-08-31 21:12   ` Andrew Morton
  2016-09-02  4:34     ` Anshuman Khandual
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2016-08-31 21:12 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: linux-mm, linux-kernel

On Wed, 31 Aug 2016 08:55:50 +0530 Anshuman Khandual <khandual@linux.vnet.ibm.com> wrote:

> Each individual node in the system has a ZONELIST_FALLBACK zonelist
> and a ZONELIST_NOFALLBACK zonelist. These zonelists decide fallback
> order of zones during memory allocations. Sometimes it helps to dump
> these zonelists to see the priority order of various zones in them.
> This change just adds a sysfs interface for doing the same.
> 
> Example zonelist information from a KVM guest.
> 
> [NODE (0)]
>         ZONELIST_FALLBACK
>         (0) (node 0) (zone DMA c00000000140c000)
>         (1) (node 1) (zone DMA c000000100000000)
>         (2) (node 2) (zone DMA c000000200000000)
>         (3) (node 3) (zone DMA c000000300000000)
>         ZONELIST_NOFALLBACK
>         (0) (node 0) (zone DMA c00000000140c000)
> [NODE (1)]
>         ZONELIST_FALLBACK
>         (0) (node 1) (zone DMA c000000100000000)
>         (1) (node 2) (zone DMA c000000200000000)
>         (2) (node 3) (zone DMA c000000300000000)
>         (3) (node 0) (zone DMA c00000000140c000)
>         ZONELIST_NOFALLBACK
>         (0) (node 1) (zone DMA c000000100000000)
> [NODE (2)]
>         ZONELIST_FALLBACK
>         (0) (node 2) (zone DMA c000000200000000)
>         (1) (node 3) (zone DMA c000000300000000)
>         (2) (node 0) (zone DMA c00000000140c000)
>         (3) (node 1) (zone DMA c000000100000000)
>         ZONELIST_NOFALLBACK
>         (0) (node 2) (zone DMA c000000200000000)
> [NODE (3)]
>         ZONELIST_FALLBACK
>         (0) (node 3) (zone DMA c000000300000000)
>         (1) (node 0) (zone DMA c00000000140c000)
>         (2) (node 1) (zone DMA c000000100000000)
>         (3) (node 2) (zone DMA c000000200000000)
>         ZONELIST_NOFALLBACK
>         (0) (node 3) (zone DMA c000000300000000)

Can you please sell this a bit better?  Why does it "sometimes help"? 
Why does the benefit of this patch to our users justify the overhead
and cost?

Please document the full path to the sysfs file(s) within the changelog.

Please find somewhere in Documentation/ to document the new interface.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] mm: Move definition of 'zone_names' array into mmzone.h
  2016-08-31 21:10 ` [PATCH 1/2] mm: Move definition of 'zone_names' array into mmzone.h Andrew Morton
@ 2016-09-02  4:29   ` Anshuman Khandual
  0 siblings, 0 replies; 6+ messages in thread
From: Anshuman Khandual @ 2016-09-02  4:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel

On 09/01/2016 02:40 AM, Andrew Morton wrote:
> On Wed, 31 Aug 2016 08:55:49 +0530 Anshuman Khandual <khandual@linux.vnet.ibm.com> wrote:
> 
>> zone_names[] is used to identify any zone given it's index which
>> can be used in many other places. So moving the definition into
>> include/linux/mmzone.h for broader access.
>>
>> ...
>>
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -341,6 +341,23 @@ enum zone_type {
>>  
>>  };
>>  
>> +static char * const zone_names[__MAX_NR_ZONES] = {
>> +#ifdef CONFIG_ZONE_DMA
>> +	 "DMA",
>> +#endif
>> +#ifdef CONFIG_ZONE_DMA32
>> +	 "DMA32",
>> +#endif
>> +	 "Normal",
>> +#ifdef CONFIG_HIGHMEM
>> +	 "HighMem",
>> +#endif
>> +	 "Movable",
>> +#ifdef CONFIG_ZONE_DEVICE
>> +	 "Device",
>> +#endif
>> +};
>> +
>>  #ifndef __GENERATING_BOUNDS_H
>>  
>>  struct zone {
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 3fbe73a..8e2261c 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -207,23 +207,6 @@ int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
>>  
>>  EXPORT_SYMBOL(totalram_pages);
>>  
>> -static char * const zone_names[MAX_NR_ZONES] = {
>> -#ifdef CONFIG_ZONE_DMA
>> -	 "DMA",
>> -#endif
>> -#ifdef CONFIG_ZONE_DMA32
>> -	 "DMA32",
>> -#endif
>> -	 "Normal",
>> -#ifdef CONFIG_HIGHMEM
>> -	 "HighMem",
>> -#endif
>> -	 "Movable",
>> -#ifdef CONFIG_ZONE_DEVICE
>> -	 "Device",
>> -#endif
>> -};
>> -
>>  char * const migratetype_names[MIGRATE_TYPES] = {
>>  	"Unmovable",
>>  	"Movable",
> 
> This is worrisome.  On some (ancient) compilers, this will produce a
> copy of that array into each compilation unit which includes mmzone.h.
> 
> On smarter compilers, it will produce a copy of the array in each
> compilation unit which *uses* zone_names[].
> 
> On even smarter compilers (and linkers!), only one copy of zone_names[]
> will exist in vmlinux.
> 
> I don't know if gcc is an "even smarter compiler" and I didn't check,
> and I didn't check which gcc versions are even smarter.  I'd rather not
> have to ;) It is risky.
> 
> So, let's just make it non-static and add a declaration into mmzone.h,
> please.
> 

I understand your concern, will change it.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] mm: Add sysfs interface to dump each node's zonelist information
  2016-08-31 21:12   ` Andrew Morton
@ 2016-09-02  4:34     ` Anshuman Khandual
  0 siblings, 0 replies; 6+ messages in thread
From: Anshuman Khandual @ 2016-09-02  4:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel

On 09/01/2016 02:42 AM, Andrew Morton wrote:
> On Wed, 31 Aug 2016 08:55:50 +0530 Anshuman Khandual <khandual@linux.vnet.ibm.com> wrote:
> 
>> Each individual node in the system has a ZONELIST_FALLBACK zonelist
>> and a ZONELIST_NOFALLBACK zonelist. These zonelists decide fallback
>> order of zones during memory allocations. Sometimes it helps to dump
>> these zonelists to see the priority order of various zones in them.
>> This change just adds a sysfs interface for doing the same.
>>
>> Example zonelist information from a KVM guest.
>>
>> [NODE (0)]
>>         ZONELIST_FALLBACK
>>         (0) (node 0) (zone DMA c00000000140c000)
>>         (1) (node 1) (zone DMA c000000100000000)
>>         (2) (node 2) (zone DMA c000000200000000)
>>         (3) (node 3) (zone DMA c000000300000000)
>>         ZONELIST_NOFALLBACK
>>         (0) (node 0) (zone DMA c00000000140c000)
>> [NODE (1)]
>>         ZONELIST_FALLBACK
>>         (0) (node 1) (zone DMA c000000100000000)
>>         (1) (node 2) (zone DMA c000000200000000)
>>         (2) (node 3) (zone DMA c000000300000000)
>>         (3) (node 0) (zone DMA c00000000140c000)
>>         ZONELIST_NOFALLBACK
>>         (0) (node 1) (zone DMA c000000100000000)
>> [NODE (2)]
>>         ZONELIST_FALLBACK
>>         (0) (node 2) (zone DMA c000000200000000)
>>         (1) (node 3) (zone DMA c000000300000000)
>>         (2) (node 0) (zone DMA c00000000140c000)
>>         (3) (node 1) (zone DMA c000000100000000)
>>         ZONELIST_NOFALLBACK
>>         (0) (node 2) (zone DMA c000000200000000)
>> [NODE (3)]
>>         ZONELIST_FALLBACK
>>         (0) (node 3) (zone DMA c000000300000000)
>>         (1) (node 0) (zone DMA c00000000140c000)
>>         (2) (node 1) (zone DMA c000000100000000)
>>         (3) (node 2) (zone DMA c000000200000000)
>>         ZONELIST_NOFALLBACK
>>         (0) (node 3) (zone DMA c000000300000000)
> 
> Can you please sell this a bit better?  Why does it "sometimes help"?
> Why does the benefit of this patch to our users justify the overhead
> and cost?

On platforms which support memory hotplug into previously non existing
(at boot) zones, this interface helps in visualizing which zonelists
of the system, the new hot added memory ends up in. POWER is such a
platform where all the memory detected during boot time remains with
ZONE_DMA for good but then hot plug process can actually get new memory
into ZONE_MOVABLE. So having a way to get the snapshot of the zonelists
on the system after memory or node hot[un]plug is a good thing, IMHO.

> 
> Please document the full path to the sysfs file(s) within the changelog.

Sure, will do.

> 
> Please find somewhere in Documentation/ to document the new interface.
> 

Sure, will create this following file describing the interface.

Documentation/ABI/testing/sysfs-system-zone-details

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-09-02  4:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-31  3:25 [PATCH 1/2] mm: Move definition of 'zone_names' array into mmzone.h Anshuman Khandual
2016-08-31  3:25 ` [PATCH 2/2] mm: Add sysfs interface to dump each node's zonelist information Anshuman Khandual
2016-08-31 21:12   ` Andrew Morton
2016-09-02  4:34     ` Anshuman Khandual
2016-08-31 21:10 ` [PATCH 1/2] mm: Move definition of 'zone_names' array into mmzone.h Andrew Morton
2016-09-02  4:29   ` Anshuman Khandual

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).