linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] improvements about lowmem_reserve and /proc/zoneinfo
@ 2020-03-24 14:22 Baoquan He
  2020-03-24 14:22 ` [PATCH 1/5] mm/page_alloc.c: only tune sysctl_lowmem_reserve_ratio value once when changing it Baoquan He
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Baoquan He @ 2020-03-24 14:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, akpm, iamjoonsoo.kim, hannes, mhocko, vbabka, bhe

When tried to review below patchset from Joonsoo, I found out there's
some space to improve about lowmem_reserve and /proc/zoneinfo printing.
So made this patchset to do it.

[PATCH v3 0/2] integrate classzone_idx and high_zoneidx
http://lkml.kernel.org/r/1584693135-4396-1-git-send-email-iamjoonsoo.kim@lge.com

Baoquan He (5):
  mm/page_alloc.c: only tune sysctl_lowmem_reserve_ratio value once when
    changing it
  mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty
  mm/vmstat.c: do not show lowmem reserve protection information of
    empty zone
  mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo
  mm/vmstat.c: remove the useless code

 mm/page_alloc.c | 13 +++++++++++--
 mm/vmstat.c     | 44 +++++++++++++++++---------------------------
 2 files changed, 28 insertions(+), 29 deletions(-)

-- 
2.17.2



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/5] mm/page_alloc.c: only tune sysctl_lowmem_reserve_ratio value once when changing it
  2020-03-24 14:22 [PATCH 0/5] improvements about lowmem_reserve and /proc/zoneinfo Baoquan He
@ 2020-03-24 14:22 ` Baoquan He
  2020-03-24 14:22 ` [PATCH 2/5] mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty Baoquan He
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Baoquan He @ 2020-03-24 14:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, akpm, iamjoonsoo.kim, hannes, mhocko, vbabka, bhe

When people write to /proc/sys/vm/lowmem_reserve_ratio to change
sysctl_lowmem_reserve_ratio[], setup_per_zone_lowmem_reserve()
is called to recalculate all ->lowmem_reserve[] for each zone of all
nodes as below:

static void setup_per_zone_lowmem_reserve(void)
{
...
	for_each_online_pgdat(pgdat) {
		for (j = 0; j < MAX_NR_ZONES; j++) {
			...
			while (idx) {
				...
				if (sysctl_lowmem_reserve_ratio[idx] < 1) {
					sysctl_lowmem_reserve_ratio[idx] = 0;
					lower_zone->lowmem_reserve[j] = 0;
                                } else {
				...
			}
		}
	}
}

Meanwhile, here, sysctl_lowmem_reserve_ratio[idx] will be tuned if its
value is smaller than '1'. As we know, sysctl_lowmem_reserve_ratio[] is
set for zone without regarding to which node it belongs to. That means
the tuning will be done on all nodes, even though it has been done in the
first node.

And the tuning will be done too even when init_per_zone_wmark_min()
calls setup_per_zone_lowmem_reserve(), where actually nobody tries to
change sysctl_lowmem_reserve_ratio[].

So now move the tuning into lowmem_reserve_ratio_sysctl_handler(), to
make code logic more reasonable.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/page_alloc.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ca1453204e66..c0c788798d8b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7840,8 +7840,7 @@ static void setup_per_zone_lowmem_reserve(void)
 				idx--;
 				lower_zone = pgdat->node_zones + idx;
 
-				if (sysctl_lowmem_reserve_ratio[idx] < 1) {
-					sysctl_lowmem_reserve_ratio[idx] = 0;
+				if (!sysctl_lowmem_reserve_ratio[idx]) {
 					lower_zone->lowmem_reserve[j] = 0;
 				} else {
 					lower_zone->lowmem_reserve[j] =
@@ -8106,7 +8105,15 @@ int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *table, int write,
 int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *table, int write,
 	void __user *buffer, size_t *length, loff_t *ppos)
 {
+	int i;
+
 	proc_dointvec_minmax(table, write, buffer, length, ppos);
+
+	for (i = 0; i < MAX_NR_ZONES; i++) {
+		if (sysctl_lowmem_reserve_ratio[i] < 1)
+			sysctl_lowmem_reserve_ratio[i] = 0;
+	}
+
 	setup_per_zone_lowmem_reserve();
 	return 0;
 }
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty
  2020-03-24 14:22 [PATCH 0/5] improvements about lowmem_reserve and /proc/zoneinfo Baoquan He
  2020-03-24 14:22 ` [PATCH 1/5] mm/page_alloc.c: only tune sysctl_lowmem_reserve_ratio value once when changing it Baoquan He
@ 2020-03-24 14:22 ` Baoquan He
  2020-03-24 14:22 ` [PATCH 3/5] mm/vmstat.c: do not show lowmem reserve protection information of empty zone Baoquan He
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Baoquan He @ 2020-03-24 14:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, akpm, iamjoonsoo.kim, hannes, mhocko, vbabka, bhe

When requesting memory allocation from a specific zone is not satisfied,
it will fall to lower zone to try allocating memory. In this case,
lower zone's ->lowmem_reserve[] will help protect its own memory resource.
The higher the relevant ->lowmem_reserve[] is, the harder the upper zone
can get memory from this lower zone.

However, this protection mechanism should be applied to populated zone,
but not an empty zone. So filling ->lowmem_reserve[] for empty zone is
not necessary, and may mislead people that it's valid data in that zone.

Node 2, zone      DMA
  pages free     0
        min      0
        low      0
        high     0
        spanned  0
        present  0
        managed  0
        protection: (0, 0, 1024, 1024)
Node 2, zone    DMA32
  pages free     0
        min      0
        low      0
        high     0
        spanned  0
        present  0
        managed  0
        protection: (0, 0, 1024, 1024)
Node 2, zone   Normal
  per-node stats
      nr_inactive_anon 0
      nr_active_anon 143
      nr_inactive_file 0
      nr_active_file 0
      nr_unevictable 0
      nr_slab_reclaimable 45
      nr_slab_unreclaimable 254

Here clear out zone->lowmem_reserve[] if zone is empty.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/page_alloc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c0c788798d8b..138a56c0f48f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7840,8 +7840,10 @@ static void setup_per_zone_lowmem_reserve(void)
 				idx--;
 				lower_zone = pgdat->node_zones + idx;
 
-				if (!sysctl_lowmem_reserve_ratio[idx]) {
+				if (!sysctl_lowmem_reserve_ratio[idx] ||
+				    !zone_managed_pages(lower_zone)) {
 					lower_zone->lowmem_reserve[j] = 0;
+					continue;
 				} else {
 					lower_zone->lowmem_reserve[j] =
 						managed_pages / sysctl_lowmem_reserve_ratio[idx];
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] mm/vmstat.c: do not show lowmem reserve protection information of empty zone
  2020-03-24 14:22 [PATCH 0/5] improvements about lowmem_reserve and /proc/zoneinfo Baoquan He
  2020-03-24 14:22 ` [PATCH 1/5] mm/page_alloc.c: only tune sysctl_lowmem_reserve_ratio value once when changing it Baoquan He
  2020-03-24 14:22 ` [PATCH 2/5] mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty Baoquan He
@ 2020-03-24 14:22 ` Baoquan He
  2020-03-24 14:22 ` [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo Baoquan He
  2020-03-24 14:22 ` [PATCH 5/5] mm/vmstat.c: remove the useless code Baoquan He
  4 siblings, 0 replies; 14+ messages in thread
From: Baoquan He @ 2020-03-24 14:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, akpm, iamjoonsoo.kim, hannes, mhocko, vbabka, bhe

Because the lowmem reserve protection of a zone can't tell anything if
the zone is empty, except of adding one more line in /proc/zoneinfo.

Let's remove it from that zone's showing.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/vmstat.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 96d21a792b57..6fd1407f4632 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1590,6 +1590,12 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
 		   zone->present_pages,
 		   zone_managed_pages(zone));
 
+	/* If unpopulated, no other information is useful */
+	if (!populated_zone(zone)) {
+		seq_putc(m, '\n');
+		return;
+	}
+
 	seq_printf(m,
 		   "\n        protection: (%ld",
 		   zone->lowmem_reserve[0]);
@@ -1597,12 +1603,6 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
 		seq_printf(m, ", %ld", zone->lowmem_reserve[i]);
 	seq_putc(m, ')');
 
-	/* If unpopulated, no other information is useful */
-	if (!populated_zone(zone)) {
-		seq_putc(m, '\n');
-		return;
-	}
-
 	for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
 		seq_printf(m, "\n      %-12s %lu", zone_stat_name(i),
 			   zone_page_state(zone, i));
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo
  2020-03-24 14:22 [PATCH 0/5] improvements about lowmem_reserve and /proc/zoneinfo Baoquan He
                   ` (2 preceding siblings ...)
  2020-03-24 14:22 ` [PATCH 3/5] mm/vmstat.c: do not show lowmem reserve protection information of empty zone Baoquan He
@ 2020-03-24 14:22 ` Baoquan He
  2020-03-24 19:25   ` David Rientjes
  2020-03-24 14:22 ` [PATCH 5/5] mm/vmstat.c: remove the useless code Baoquan He
  4 siblings, 1 reply; 14+ messages in thread
From: Baoquan He @ 2020-03-24 14:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, akpm, iamjoonsoo.kim, hannes, mhocko, vbabka, bhe

This moving makes the layout of /proc/zoneinfo more sensible. And there
are 4 zones at most currently, it doesn't need to scroll down much to get
to the 1st populated zone, even though the 1st populated zone is MOVABLE
zone.

Node 2, per-node stats
      nr_inactive_anon 48
      nr_active_anon 15454
...
      nr_foll_pin_acquired 0
      nr_foll_pin_released 0
Node 2, zone      DMA
  pages free     0
        min      0
        low      0
        high     0
        spanned  0
        present  0
        managed  0
Node 2, zone    DMA32
  pages free     0
        min      0
        low      0
        high     0
        spanned  0
        present  0
        managed  0
Node 2, zone   Normal
  pages free     0
        min      0
        low      0
        high     0
        spanned  0
        present  0
        managed  0
Node 2, zone  Movable
  pages free     196346
        min      3540
...
        managed  262144

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/vmstat.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 6fd1407f4632..4bbf9be786da 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1567,13 +1567,6 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
 {
 	int i;
 	seq_printf(m, "Node %d, zone %8s", pgdat->node_id, zone->name);
-	if (is_zone_first_populated(pgdat, zone)) {
-		seq_printf(m, "\n  per-node stats");
-		for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
-			seq_printf(m, "\n      %-12s %lu", node_stat_name(i),
-				   node_page_state(pgdat, i));
-		}
-	}
 	seq_printf(m,
 		   "\n  pages free     %lu"
 		   "\n        min      %lu"
@@ -1648,7 +1641,18 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
  */
 static int zoneinfo_show(struct seq_file *m, void *arg)
 {
+	int i;
 	pg_data_t *pgdat = (pg_data_t *)arg;
+
+	if (node_state(pgdat->node_id, N_MEMORY)) {
+		seq_printf(m, "Node %d, per-node stats", pgdat->node_id);
+		for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
+			seq_printf(m, "\n      %-12s %lu", node_stat_name(i),
+				   node_page_state(pgdat, i));
+		}
+		seq_putc(m, '\n');
+	}
+
 	walk_zones_in_node(m, pgdat, false, false, zoneinfo_show_print);
 	return 0;
 }
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] mm/vmstat.c: remove the useless code
  2020-03-24 14:22 [PATCH 0/5] improvements about lowmem_reserve and /proc/zoneinfo Baoquan He
                   ` (3 preceding siblings ...)
  2020-03-24 14:22 ` [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo Baoquan He
@ 2020-03-24 14:22 ` Baoquan He
  4 siblings, 0 replies; 14+ messages in thread
From: Baoquan He @ 2020-03-24 14:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, akpm, iamjoonsoo.kim, hannes, mhocko, vbabka, bhe

No one calls is_zone_first_populated(), remove it.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/vmstat.c | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 4bbf9be786da..7097eb99f30d 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1548,20 +1548,6 @@ static const struct seq_operations pagetypeinfo_op = {
 	.show	= pagetypeinfo_show,
 };
 
-static bool is_zone_first_populated(pg_data_t *pgdat, struct zone *zone)
-{
-	int zid;
-
-	for (zid = 0; zid < MAX_NR_ZONES; zid++) {
-		struct zone *compare = &pgdat->node_zones[zid];
-
-		if (populated_zone(compare))
-			return zone == compare;
-	}
-
-	return false;
-}
-
 static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
 							struct zone *zone)
 {
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo
  2020-03-24 14:22 ` [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo Baoquan He
@ 2020-03-24 19:25   ` David Rientjes
  2020-03-25  5:53     ` Baoquan He
  0 siblings, 1 reply; 14+ messages in thread
From: David Rientjes @ 2020-03-24 19:25 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, linux-mm, akpm, iamjoonsoo.kim, hannes, mhocko, vbabka

On Tue, 24 Mar 2020, Baoquan He wrote:

> This moving makes the layout of /proc/zoneinfo more sensible. And there
> are 4 zones at most currently, it doesn't need to scroll down much to get
> to the 1st populated zone, even though the 1st populated zone is MOVABLE
> zone.
> 

Doesn't this introduce risk that it will break existing parsers of 
/proc/zoneinfo in subtle ways?

In some cases /proc/zoneinfo is a tricky file to correctly parse because 
you have to rely on the existing order in which it is printed to determine 
which zone is being described.  We need to print zones even with unmanaged 
pages, for instance, otherwise userspace may be unaware of which zones are 
supported and what order they are in.  That's important to be able to 
construct the proper string to use when writing vm.lowmem_reserve_ratio.

I'd prefer not changing the order of /proc/zoneinfo if it can be avoided 
just because the risk outweighs the reward that we may break some 
initscript parsers.

> Node 2, per-node stats
>       nr_inactive_anon 48
>       nr_active_anon 15454
> ...
>       nr_foll_pin_acquired 0
>       nr_foll_pin_released 0
> Node 2, zone      DMA
>   pages free     0
>         min      0
>         low      0
>         high     0
>         spanned  0
>         present  0
>         managed  0
> Node 2, zone    DMA32
>   pages free     0
>         min      0
>         low      0
>         high     0
>         spanned  0
>         present  0
>         managed  0
> Node 2, zone   Normal
>   pages free     0
>         min      0
>         low      0
>         high     0
>         spanned  0
>         present  0
>         managed  0
> Node 2, zone  Movable
>   pages free     196346
>         min      3540
> ...
>         managed  262144
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/vmstat.c | 18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 6fd1407f4632..4bbf9be786da 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1567,13 +1567,6 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>  {
>  	int i;
>  	seq_printf(m, "Node %d, zone %8s", pgdat->node_id, zone->name);
> -	if (is_zone_first_populated(pgdat, zone)) {
> -		seq_printf(m, "\n  per-node stats");
> -		for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
> -			seq_printf(m, "\n      %-12s %lu", node_stat_name(i),
> -				   node_page_state(pgdat, i));
> -		}
> -	}
>  	seq_printf(m,
>  		   "\n  pages free     %lu"
>  		   "\n        min      %lu"
> @@ -1648,7 +1641,18 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>   */
>  static int zoneinfo_show(struct seq_file *m, void *arg)
>  {
> +	int i;
>  	pg_data_t *pgdat = (pg_data_t *)arg;
> +
> +	if (node_state(pgdat->node_id, N_MEMORY)) {
> +		seq_printf(m, "Node %d, per-node stats", pgdat->node_id);
> +		for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
> +			seq_printf(m, "\n      %-12s %lu", node_stat_name(i),
> +				   node_page_state(pgdat, i));
> +		}
> +		seq_putc(m, '\n');
> +	}
> +
>  	walk_zones_in_node(m, pgdat, false, false, zoneinfo_show_print);
>  	return 0;
>  }
> -- 
> 2.17.2
> 
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo
  2020-03-24 19:25   ` David Rientjes
@ 2020-03-25  5:53     ` Baoquan He
  2020-03-25  8:55       ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Baoquan He @ 2020-03-25  5:53 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-kernel, linux-mm, akpm, iamjoonsoo.kim, hannes, mhocko, vbabka

On 03/24/20 at 12:25pm, David Rientjes wrote:
> On Tue, 24 Mar 2020, Baoquan He wrote:
> 
> > This moving makes the layout of /proc/zoneinfo more sensible. And there
> > are 4 zones at most currently, it doesn't need to scroll down much to get
> > to the 1st populated zone, even though the 1st populated zone is MOVABLE
> > zone.
> > 
> 
> Doesn't this introduce risk that it will break existing parsers of 
> /proc/zoneinfo in subtle ways?
> 
> In some cases /proc/zoneinfo is a tricky file to correctly parse because 
> you have to rely on the existing order in which it is printed to determine 
> which zone is being described.  We need to print zones even with unmanaged 
> pages, for instance, otherwise userspace may be unaware of which zones are 
> supported and what order they are in.  That's important to be able to 
> construct the proper string to use when writing vm.lowmem_reserve_ratio.
> 
> I'd prefer not changing the order of /proc/zoneinfo if it can be avoided 
> just because the risk outweighs the reward that we may break some 
> initscript parsers.

Oh, I may not describe the change and result clearly. This patch doesn't
change zone order at all.  I only move the per-node stats to the front of
each node, the zone order is completely kept the same, still DMA, DMA32,
NORMAL, MOVABLE.

Before this patch, per-node stats are printed inside the first populated
zone of each node. E.g in this node 2 which only has movable zone, the
per-node stats is embedded in the last zone. In fact, this per-node stats
are made for the whole node, but not for one zone. 

Node 2, zone      DMA
  pages free     0
        min      0
        low      0
        high     0
        spanned  0
        present  0
        managed  0
        protection: (0, 0, 0, 1024, 1024)
Node 2, zone    DMA32
  pages free     0
        min      0
        low      0
        high     0
        spanned  0
        present  0
        managed  0
        protection: (0, 0, 0, 1024, 1024)
Node 2, zone   Normal
  pages free     0
        min      0
        low      0
        high     0
        spanned  0
        present  0
        managed  0
        protection: (0, 0, 0, 8192, 8192)
Node 2, zone  Movable
  per-node stats              --------------------------->> start of per-node stats
      nr_inactive_anon 42
      nr_active_anon 11787
      nr_inactive_file 32222
      nr_active_file 6081
      nr_unevictable 0
      nr_slab_reclaimable 0
      nr_slab_unreclaimable 0
      ......                  --------- (mid items are omitted)
      nr_anon_transparent_hugepages 0
      nr_unstable  0
      nr_vmscan_write 0
      nr_vmscan_immediate_reclaim 0
      nr_dirtied   25523
      nr_written   25113
      nr_kernel_misc_reclaimable 0  ------------------------->> end of per-node stats
  pages free     211331             ------------------------->> start printing data of zone Movable
        min      3524
        low      4405
        high     5286
        spanned  262144
        present  262144
        managed  262144
        protection: (0, 0, 0, 0, 0)
      nr_free_pages 211331
      nr_zone_inactive_anon 42
      nr_zone_active_anon 11787
      nr_zone_inactive_file 32222
      nr_zone_active_file 6081
      nr_zone_unevictable 0
      nr_zone_write_pending 2

With this patch applied, only the per-node stats part is moved to the
front of each node.

Node 2, per-node stats             --------------------------->> start of per-node stats
      nr_inactive_anon 42
      nr_active_anon 12358
      nr_inactive_file 33139
      nr_active_file 10088
      nr_unevictable 0
      nr_slab_reclaimable 0
      ......                  --------- (mid items are omitted)
      nr_vmscan_write 0
      nr_vmscan_immediate_reclaim 0
      nr_dirtied   9082
      nr_written   8844
      nr_kernel_misc_reclaimable 0
      nr_foll_pin_acquired 0
      nr_foll_pin_released 0     ------------------------->> end of per-node stats
Node 2, zone      DMA
  pages free     0
        min      0
        low      0
        high     0
        spanned  0
        present  0
        managed  0
Node 2, zone    DMA32
  pages free     0
        min      0
        low      0
        high     0
        spanned  0
        present  0
        managed  0
Node 2, zone   Normal
  pages free     0
        min      0
        low      0
        high     0
        spanned  0
        present  0
        managed  0
Node 2, zone  Movable
  pages free     205601     ------------------------->> start printing data of zone Movable
        min      3525
        low      4406
        high     5287
        spanned  262144
        present  262144
        managed  262144
        protection: (0, 0, 0, 0)
      nr_free_pages 205601
      nr_zone_inactive_anon 42
      nr_zone_active_anon 12358
      ........



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo
  2020-03-25  5:53     ` Baoquan He
@ 2020-03-25  8:55       ` Michal Hocko
  2020-03-25 14:23         ` Baoquan He
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2020-03-25  8:55 UTC (permalink / raw)
  To: Baoquan He
  Cc: David Rientjes, linux-kernel, linux-mm, akpm, iamjoonsoo.kim,
	hannes, vbabka

On Wed 25-03-20 13:53:31, Baoquan He wrote:
> On 03/24/20 at 12:25pm, David Rientjes wrote:
> > On Tue, 24 Mar 2020, Baoquan He wrote:
> > 
> > > This moving makes the layout of /proc/zoneinfo more sensible. And there
> > > are 4 zones at most currently, it doesn't need to scroll down much to get
> > > to the 1st populated zone, even though the 1st populated zone is MOVABLE
> > > zone.
> > > 
> > 
> > Doesn't this introduce risk that it will break existing parsers of 
> > /proc/zoneinfo in subtle ways?
> > 
> > In some cases /proc/zoneinfo is a tricky file to correctly parse because 
> > you have to rely on the existing order in which it is printed to determine 
> > which zone is being described.  We need to print zones even with unmanaged 
> > pages, for instance, otherwise userspace may be unaware of which zones are 
> > supported and what order they are in.  That's important to be able to 
> > construct the proper string to use when writing vm.lowmem_reserve_ratio.
> > 
> > I'd prefer not changing the order of /proc/zoneinfo if it can be avoided 
> > just because the risk outweighs the reward that we may break some 
> > initscript parsers.
> 
> Oh, I may not describe the change and result clearly. This patch doesn't
> change zone order at all.  I only move the per-node stats to the front of
> each node, the zone order is completely kept the same, still DMA, DMA32,
> NORMAL, MOVABLE.

Even this can break existing parsers. Fixing that up is likely not hard
and existing parsers would be mostly debugging hacks here and there but
I do miss any actual justification except for you considering it more
sensible. I do not remember this would be a common pain point for people
parsing this file. If anything the overal structure of the file makes it
hard to parse and your patches do not really address that. We are likely
too late to make the output much more sensible TBH.

That being said, I haven't looked more closely on your patches because I
do not have spare cycles for that. Your justification for touching the
code which seems to be working relatively well is quite weak IMHO, yet
it adds a non zero risk for breaking existing parsers.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo
  2020-03-25  8:55       ` Michal Hocko
@ 2020-03-25 14:23         ` Baoquan He
  2020-03-25 19:45           ` David Rientjes
  0 siblings, 1 reply; 14+ messages in thread
From: Baoquan He @ 2020-03-25 14:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Rientjes, linux-kernel, linux-mm, akpm, iamjoonsoo.kim,
	hannes, vbabka, mgorman

On 03/25/20 at 09:55am, Michal Hocko wrote:
> On Wed 25-03-20 13:53:31, Baoquan He wrote:
> > On 03/24/20 at 12:25pm, David Rientjes wrote:
> > > On Tue, 24 Mar 2020, Baoquan He wrote:
> > > 
> > > > This moving makes the layout of /proc/zoneinfo more sensible. And there
> > > > are 4 zones at most currently, it doesn't need to scroll down much to get
> > > > to the 1st populated zone, even though the 1st populated zone is MOVABLE
> > > > zone.
> > > > 
> > > 
> > > Doesn't this introduce risk that it will break existing parsers of 
> > > /proc/zoneinfo in subtle ways?
> > > 
> > > In some cases /proc/zoneinfo is a tricky file to correctly parse because 
> > > you have to rely on the existing order in which it is printed to determine 
> > > which zone is being described.  We need to print zones even with unmanaged 
> > > pages, for instance, otherwise userspace may be unaware of which zones are 
> > > supported and what order they are in.  That's important to be able to 
> > > construct the proper string to use when writing vm.lowmem_reserve_ratio.
> > > 
> > > I'd prefer not changing the order of /proc/zoneinfo if it can be avoided 
> > > just because the risk outweighs the reward that we may break some 
> > > initscript parsers.
> > 
> > Oh, I may not describe the change and result clearly. This patch doesn't
> > change zone order at all.  I only move the per-node stats to the front of
> > each node, the zone order is completely kept the same, still DMA, DMA32,
> > NORMAL, MOVABLE.
> 
> Even this can break existing parsers. Fixing that up is likely not hard
> and existing parsers would be mostly debugging hacks here and there but
> I do miss any actual justification except for you considering it more
> sensible. I do not remember this would be a common pain point for people
> parsing this file. If anything the overal structure of the file makes it
> hard to parse and your patches do not really address that. We are likely
> too late to make the output much more sensible TBH.
> 
> That being said, I haven't looked more closely on your patches because I
> do not have spare cycles for that. Your justification for touching the
> code which seems to be working relatively well is quite weak IMHO, yet
> it adds a non zero risk for breaking existing parsers.

I would take the saying of non zero risk for breaking existing parsers.
When considering this change, I thought about the possible risk. However,
found out the per-node stats was added in 2016 which is not so late, and
assume nobody will rely on the order of per-node stats embeded into a
zone. But I have to admit any concern or worry of risk is worth being
considerred carefully since /proc/zoneinfo is a classic interface.

So, in view of objections from you and David, I would like to drop this
patch and patch 5. It's a small improvement, not worth taking any risk.
But if it goes back to this time of 2017, I would like to spend some
time to defend it :-)

commit e2ecc8a79ed49f7838b4fdf352c4c48cec9424ac
Author: Mel Gorman <mgorman@techsingularity.net>
Date:   Thu Jul 28 15:47:02 2016 -0700

    mm, vmstat: print node-based stats in zoneinfo file




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo
  2020-03-25 14:23         ` Baoquan He
@ 2020-03-25 19:45           ` David Rientjes
  2020-03-26  4:24             ` Baoquan He
  0 siblings, 1 reply; 14+ messages in thread
From: David Rientjes @ 2020-03-25 19:45 UTC (permalink / raw)
  To: Baoquan He
  Cc: Michal Hocko, linux-kernel, linux-mm, akpm, iamjoonsoo.kim,
	hannes, vbabka, mgorman

On Wed, 25 Mar 2020, Baoquan He wrote:

> > Even this can break existing parsers. Fixing that up is likely not hard
> > and existing parsers would be mostly debugging hacks here and there but
> > I do miss any actual justification except for you considering it more
> > sensible. I do not remember this would be a common pain point for people
> > parsing this file. If anything the overal structure of the file makes it
> > hard to parse and your patches do not really address that. We are likely
> > too late to make the output much more sensible TBH.
> > 
> > That being said, I haven't looked more closely on your patches because I
> > do not have spare cycles for that. Your justification for touching the
> > code which seems to be working relatively well is quite weak IMHO, yet
> > it adds a non zero risk for breaking existing parsers.
> 
> I would take the saying of non zero risk for breaking existing parsers.
> When considering this change, I thought about the possible risk. However,
> found out the per-node stats was added in 2016 which is not so late, and
> assume nobody will rely on the order of per-node stats embeded into a
> zone. But I have to admit any concern or worry of risk is worth being
> considerred carefully since /proc/zoneinfo is a classic interface.
> 

For context, we started parsing /proc/zoneinfo in initscripts to be able 
to determine the order in which vm.lowmem_reserve_ratio needs to be set 
and this required my kernel change from 2017:

commit b2bd8598195f1b2a72130592125ac6b4218988a2
Author: David Rientjes <rientjes@google.com>
Date:   Wed May 3 14:52:59 2017 -0700

    mm, vmstat: print non-populated zones in zoneinfo

Otherwise, we found, it's much more difficult to determine how this array 
should be structured.  So at least we parse this file very carefully, I'm 
not sure how much others do, but it seems like an unnecessary risk for 
little reward.  I'm happy to see it has been decided to drop this patch 
and patch 5.

> So, in view of objections from you and David, I would like to drop this
> patch and patch 5. It's a small improvement, not worth taking any risk.
> But if it goes back to this time of 2017, I would like to spend some
> time to defend it :-)
> 
> commit e2ecc8a79ed49f7838b4fdf352c4c48cec9424ac
> Author: Mel Gorman <mgorman@techsingularity.net>
> Date:   Thu Jul 28 15:47:02 2016 -0700
> 
>     mm, vmstat: print node-based stats in zoneinfo file
> 
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo
  2020-03-25 19:45           ` David Rientjes
@ 2020-03-26  4:24             ` Baoquan He
  2020-03-26  6:43               ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Baoquan He @ 2020-03-26  4:24 UTC (permalink / raw)
  To: David Rientjes
  Cc: Michal Hocko, linux-kernel, linux-mm, akpm, iamjoonsoo.kim,
	hannes, vbabka, mgorman

On 03/25/20 at 12:45pm, David Rientjes wrote:
> On Wed, 25 Mar 2020, Baoquan He wrote:
> 
> > > Even this can break existing parsers. Fixing that up is likely not hard
> > > and existing parsers would be mostly debugging hacks here and there but
> > > I do miss any actual justification except for you considering it more
> > > sensible. I do not remember this would be a common pain point for people
> > > parsing this file. If anything the overal structure of the file makes it
> > > hard to parse and your patches do not really address that. We are likely
> > > too late to make the output much more sensible TBH.
> > > 
> > > That being said, I haven't looked more closely on your patches because I
> > > do not have spare cycles for that. Your justification for touching the
> > > code which seems to be working relatively well is quite weak IMHO, yet
> > > it adds a non zero risk for breaking existing parsers.
> > 
> > I would take the saying of non zero risk for breaking existing parsers.
> > When considering this change, I thought about the possible risk. However,
> > found out the per-node stats was added in 2016 which is not so late, and
> > assume nobody will rely on the order of per-node stats embeded into a
> > zone. But I have to admit any concern or worry of risk is worth being
> > considerred carefully since /proc/zoneinfo is a classic interface.
> > 
> 
> For context, we started parsing /proc/zoneinfo in initscripts to be able 
> to determine the order in which vm.lowmem_reserve_ratio needs to be set 
> and this required my kernel change from 2017:
> 
> commit b2bd8598195f1b2a72130592125ac6b4218988a2
> Author: David Rientjes <rientjes@google.com>
> Date:   Wed May 3 14:52:59 2017 -0700
> 
>     mm, vmstat: print non-populated zones in zoneinfo
> 
> Otherwise, we found, it's much more difficult to determine how this array 
> should be structured.  So at least we parse this file very carefully, I'm 
> not sure how much others do, but it seems like an unnecessary risk for 
> little reward.  I'm happy to see it has been decided to drop this patch 
> and patch 5.


OK, I see why it is in such a situation, the empty zones were not printed. 

I could still not get how vm.lowmem_reserve_ratio is set with
/proc/zoneinfo in the old initscripts, do you see any risk if not
filling and showing the ->lowmem_reserve[] of empty zone in
patch 2 and 3?  Thanks in advance.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo
  2020-03-26  4:24             ` Baoquan He
@ 2020-03-26  6:43               ` Michal Hocko
  2020-03-26 11:22                 ` Baoquan He
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2020-03-26  6:43 UTC (permalink / raw)
  To: Baoquan He
  Cc: David Rientjes, linux-kernel, linux-mm, akpm, iamjoonsoo.kim,
	hannes, vbabka, mgorman

On Thu 26-03-20 12:24:54, Baoquan He wrote:
> On 03/25/20 at 12:45pm, David Rientjes wrote:
> > On Wed, 25 Mar 2020, Baoquan He wrote:
> > 
> > > > Even this can break existing parsers. Fixing that up is likely not hard
> > > > and existing parsers would be mostly debugging hacks here and there but
> > > > I do miss any actual justification except for you considering it more
> > > > sensible. I do not remember this would be a common pain point for people
> > > > parsing this file. If anything the overal structure of the file makes it
> > > > hard to parse and your patches do not really address that. We are likely
> > > > too late to make the output much more sensible TBH.
> > > > 
> > > > That being said, I haven't looked more closely on your patches because I
> > > > do not have spare cycles for that. Your justification for touching the
> > > > code which seems to be working relatively well is quite weak IMHO, yet
> > > > it adds a non zero risk for breaking existing parsers.
> > > 
> > > I would take the saying of non zero risk for breaking existing parsers.
> > > When considering this change, I thought about the possible risk. However,
> > > found out the per-node stats was added in 2016 which is not so late, and
> > > assume nobody will rely on the order of per-node stats embeded into a
> > > zone. But I have to admit any concern or worry of risk is worth being
> > > considerred carefully since /proc/zoneinfo is a classic interface.
> > > 
> > 
> > For context, we started parsing /proc/zoneinfo in initscripts to be able 
> > to determine the order in which vm.lowmem_reserve_ratio needs to be set 
> > and this required my kernel change from 2017:
> > 
> > commit b2bd8598195f1b2a72130592125ac6b4218988a2
> > Author: David Rientjes <rientjes@google.com>
> > Date:   Wed May 3 14:52:59 2017 -0700
> > 
> >     mm, vmstat: print non-populated zones in zoneinfo
> > 
> > Otherwise, we found, it's much more difficult to determine how this array 
> > should be structured.  So at least we parse this file very carefully, I'm 
> > not sure how much others do, but it seems like an unnecessary risk for 
> > little reward.  I'm happy to see it has been decided to drop this patch 
> > and patch 5.
> 
> 
> OK, I see why it is in such a situation, the empty zones were not printed. 
> 
> I could still not get how vm.lowmem_reserve_ratio is set with
> /proc/zoneinfo in the old initscripts, do you see any risk if not
> filling and showing the ->lowmem_reserve[] of empty zone in
> patch 2 and 3?  Thanks in advance.

The point is why should we even care. Displaying that information
shouldn't hurt anything, right?

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo
  2020-03-26  6:43               ` Michal Hocko
@ 2020-03-26 11:22                 ` Baoquan He
  0 siblings, 0 replies; 14+ messages in thread
From: Baoquan He @ 2020-03-26 11:22 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Rientjes, linux-kernel, linux-mm, akpm, iamjoonsoo.kim,
	hannes, vbabka, mgorman

On 03/26/20 at 07:43am, Michal Hocko wrote:
> On Thu 26-03-20 12:24:54, Baoquan He wrote:
> > On 03/25/20 at 12:45pm, David Rientjes wrote:
> > > On Wed, 25 Mar 2020, Baoquan He wrote:
> > > 
> > > > > Even this can break existing parsers. Fixing that up is likely not hard
> > > > > and existing parsers would be mostly debugging hacks here and there but
> > > > > I do miss any actual justification except for you considering it more
> > > > > sensible. I do not remember this would be a common pain point for people
> > > > > parsing this file. If anything the overal structure of the file makes it
> > > > > hard to parse and your patches do not really address that. We are likely
> > > > > too late to make the output much more sensible TBH.
> > > > > 
> > > > > That being said, I haven't looked more closely on your patches because I
> > > > > do not have spare cycles for that. Your justification for touching the
> > > > > code which seems to be working relatively well is quite weak IMHO, yet
> > > > > it adds a non zero risk for breaking existing parsers.
> > > > 
> > > > I would take the saying of non zero risk for breaking existing parsers.
> > > > When considering this change, I thought about the possible risk. However,
> > > > found out the per-node stats was added in 2016 which is not so late, and
> > > > assume nobody will rely on the order of per-node stats embeded into a
> > > > zone. But I have to admit any concern or worry of risk is worth being
> > > > considerred carefully since /proc/zoneinfo is a classic interface.
> > > > 
> > > 
> > > For context, we started parsing /proc/zoneinfo in initscripts to be able 
> > > to determine the order in which vm.lowmem_reserve_ratio needs to be set 
> > > and this required my kernel change from 2017:
> > > 
> > > commit b2bd8598195f1b2a72130592125ac6b4218988a2
> > > Author: David Rientjes <rientjes@google.com>
> > > Date:   Wed May 3 14:52:59 2017 -0700
> > > 
> > >     mm, vmstat: print non-populated zones in zoneinfo
> > > 
> > > Otherwise, we found, it's much more difficult to determine how this array 
> > > should be structured.  So at least we parse this file very carefully, I'm 
> > > not sure how much others do, but it seems like an unnecessary risk for 
> > > little reward.  I'm happy to see it has been decided to drop this patch 
> > > and patch 5.
> > 
> > 
> > OK, I see why it is in such a situation, the empty zones were not printed. 
> > 
> > I could still not get how vm.lowmem_reserve_ratio is set with
> > /proc/zoneinfo in the old initscripts, do you see any risk if not
> > filling and showing the ->lowmem_reserve[] of empty zone in
> > patch 2 and 3?  Thanks in advance.
> 
> The point is why should we even care. Displaying that information
> shouldn't hurt anything, right?

Well, I would say why not. If saying anything hurted, I often check
/proc/zoneinfo to get information about system memory like many people,
I was wondering why the protection data is over there, but it's am
empty zone, and they protect what. I dare say it's more than once I
asked to myself, just sometime I am too lazy to start to make it clear
when focusing on another issue. Not sure if that is kind of hurting. I
would like to see it's not there to confuse me if anyone else have
stood up to fix it, I absolutely will vote for it.

Surely, we also need to evaluate if any risk or complexity is involved, 
While with my understanding, I don't see risk, and the change is quite
simple and easy to understand.



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-03-26 11:22 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-24 14:22 [PATCH 0/5] improvements about lowmem_reserve and /proc/zoneinfo Baoquan He
2020-03-24 14:22 ` [PATCH 1/5] mm/page_alloc.c: only tune sysctl_lowmem_reserve_ratio value once when changing it Baoquan He
2020-03-24 14:22 ` [PATCH 2/5] mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty Baoquan He
2020-03-24 14:22 ` [PATCH 3/5] mm/vmstat.c: do not show lowmem reserve protection information of empty zone Baoquan He
2020-03-24 14:22 ` [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo Baoquan He
2020-03-24 19:25   ` David Rientjes
2020-03-25  5:53     ` Baoquan He
2020-03-25  8:55       ` Michal Hocko
2020-03-25 14:23         ` Baoquan He
2020-03-25 19:45           ` David Rientjes
2020-03-26  4:24             ` Baoquan He
2020-03-26  6:43               ` Michal Hocko
2020-03-26 11:22                 ` Baoquan He
2020-03-24 14:22 ` [PATCH 5/5] mm/vmstat.c: remove the useless code Baoquan He

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).