* [PATCH 1/5] mm/page_alloc.c: only tune sysctl_lowmem_reserve_ratio value once when changing it
2020-03-24 14:22 [PATCH 0/5] improvements about lowmem_reserve and /proc/zoneinfo Baoquan He
@ 2020-03-24 14:22 ` Baoquan He
2020-03-24 14:22 ` [PATCH 2/5] mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty Baoquan He
` (3 subsequent siblings)
4 siblings, 0 replies; 14+ messages in thread
From: Baoquan He @ 2020-03-24 14:22 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, akpm, iamjoonsoo.kim, hannes, mhocko, vbabka, bhe
When people write to /proc/sys/vm/lowmem_reserve_ratio to change
sysctl_lowmem_reserve_ratio[], setup_per_zone_lowmem_reserve()
is called to recalculate all ->lowmem_reserve[] for each zone of all
nodes as below:
static void setup_per_zone_lowmem_reserve(void)
{
...
for_each_online_pgdat(pgdat) {
for (j = 0; j < MAX_NR_ZONES; j++) {
...
while (idx) {
...
if (sysctl_lowmem_reserve_ratio[idx] < 1) {
sysctl_lowmem_reserve_ratio[idx] = 0;
lower_zone->lowmem_reserve[j] = 0;
} else {
...
}
}
}
}
Meanwhile, here, sysctl_lowmem_reserve_ratio[idx] will be tuned if its
value is smaller than '1'. As we know, sysctl_lowmem_reserve_ratio[] is
set for zone without regarding to which node it belongs to. That means
the tuning will be done on all nodes, even though it has been done in the
first node.
And the tuning will be done too even when init_per_zone_wmark_min()
calls setup_per_zone_lowmem_reserve(), where actually nobody tries to
change sysctl_lowmem_reserve_ratio[].
So now move the tuning into lowmem_reserve_ratio_sysctl_handler(), to
make code logic more reasonable.
Signed-off-by: Baoquan He <bhe@redhat.com>
---
mm/page_alloc.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ca1453204e66..c0c788798d8b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7840,8 +7840,7 @@ static void setup_per_zone_lowmem_reserve(void)
idx--;
lower_zone = pgdat->node_zones + idx;
- if (sysctl_lowmem_reserve_ratio[idx] < 1) {
- sysctl_lowmem_reserve_ratio[idx] = 0;
+ if (!sysctl_lowmem_reserve_ratio[idx]) {
lower_zone->lowmem_reserve[j] = 0;
} else {
lower_zone->lowmem_reserve[j] =
@@ -8106,7 +8105,15 @@ int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *table, int write,
int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *length, loff_t *ppos)
{
+ int i;
+
proc_dointvec_minmax(table, write, buffer, length, ppos);
+
+ for (i = 0; i < MAX_NR_ZONES; i++) {
+ if (sysctl_lowmem_reserve_ratio[i] < 1)
+ sysctl_lowmem_reserve_ratio[i] = 0;
+ }
+
setup_per_zone_lowmem_reserve();
return 0;
}
--
2.17.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/5] mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty
2020-03-24 14:22 [PATCH 0/5] improvements about lowmem_reserve and /proc/zoneinfo Baoquan He
2020-03-24 14:22 ` [PATCH 1/5] mm/page_alloc.c: only tune sysctl_lowmem_reserve_ratio value once when changing it Baoquan He
@ 2020-03-24 14:22 ` Baoquan He
2020-03-24 14:22 ` [PATCH 3/5] mm/vmstat.c: do not show lowmem reserve protection information of empty zone Baoquan He
` (2 subsequent siblings)
4 siblings, 0 replies; 14+ messages in thread
From: Baoquan He @ 2020-03-24 14:22 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, akpm, iamjoonsoo.kim, hannes, mhocko, vbabka, bhe
When requesting memory allocation from a specific zone is not satisfied,
it will fall to lower zone to try allocating memory. In this case,
lower zone's ->lowmem_reserve[] will help protect its own memory resource.
The higher the relevant ->lowmem_reserve[] is, the harder the upper zone
can get memory from this lower zone.
However, this protection mechanism should be applied to populated zone,
but not an empty zone. So filling ->lowmem_reserve[] for empty zone is
not necessary, and may mislead people that it's valid data in that zone.
Node 2, zone DMA
pages free 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
protection: (0, 0, 1024, 1024)
Node 2, zone DMA32
pages free 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
protection: (0, 0, 1024, 1024)
Node 2, zone Normal
per-node stats
nr_inactive_anon 0
nr_active_anon 143
nr_inactive_file 0
nr_active_file 0
nr_unevictable 0
nr_slab_reclaimable 45
nr_slab_unreclaimable 254
Here clear out zone->lowmem_reserve[] if zone is empty.
Signed-off-by: Baoquan He <bhe@redhat.com>
---
mm/page_alloc.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c0c788798d8b..138a56c0f48f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7840,8 +7840,10 @@ static void setup_per_zone_lowmem_reserve(void)
idx--;
lower_zone = pgdat->node_zones + idx;
- if (!sysctl_lowmem_reserve_ratio[idx]) {
+ if (!sysctl_lowmem_reserve_ratio[idx] ||
+ !zone_managed_pages(lower_zone)) {
lower_zone->lowmem_reserve[j] = 0;
+ continue;
} else {
lower_zone->lowmem_reserve[j] =
managed_pages / sysctl_lowmem_reserve_ratio[idx];
--
2.17.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/5] mm/vmstat.c: do not show lowmem reserve protection information of empty zone
2020-03-24 14:22 [PATCH 0/5] improvements about lowmem_reserve and /proc/zoneinfo Baoquan He
2020-03-24 14:22 ` [PATCH 1/5] mm/page_alloc.c: only tune sysctl_lowmem_reserve_ratio value once when changing it Baoquan He
2020-03-24 14:22 ` [PATCH 2/5] mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty Baoquan He
@ 2020-03-24 14:22 ` Baoquan He
2020-03-24 14:22 ` [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo Baoquan He
2020-03-24 14:22 ` [PATCH 5/5] mm/vmstat.c: remove the useless code Baoquan He
4 siblings, 0 replies; 14+ messages in thread
From: Baoquan He @ 2020-03-24 14:22 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, akpm, iamjoonsoo.kim, hannes, mhocko, vbabka, bhe
Because the lowmem reserve protection of a zone can't tell anything if
the zone is empty, except of adding one more line in /proc/zoneinfo.
Let's remove it from that zone's showing.
Signed-off-by: Baoquan He <bhe@redhat.com>
---
mm/vmstat.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 96d21a792b57..6fd1407f4632 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1590,6 +1590,12 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
zone->present_pages,
zone_managed_pages(zone));
+ /* If unpopulated, no other information is useful */
+ if (!populated_zone(zone)) {
+ seq_putc(m, '\n');
+ return;
+ }
+
seq_printf(m,
"\n protection: (%ld",
zone->lowmem_reserve[0]);
@@ -1597,12 +1603,6 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
seq_printf(m, ", %ld", zone->lowmem_reserve[i]);
seq_putc(m, ')');
- /* If unpopulated, no other information is useful */
- if (!populated_zone(zone)) {
- seq_putc(m, '\n');
- return;
- }
-
for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
seq_printf(m, "\n %-12s %lu", zone_stat_name(i),
zone_page_state(zone, i));
--
2.17.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo
2020-03-24 14:22 [PATCH 0/5] improvements about lowmem_reserve and /proc/zoneinfo Baoquan He
` (2 preceding siblings ...)
2020-03-24 14:22 ` [PATCH 3/5] mm/vmstat.c: do not show lowmem reserve protection information of empty zone Baoquan He
@ 2020-03-24 14:22 ` Baoquan He
2020-03-24 19:25 ` David Rientjes
2020-03-24 14:22 ` [PATCH 5/5] mm/vmstat.c: remove the useless code Baoquan He
4 siblings, 1 reply; 14+ messages in thread
From: Baoquan He @ 2020-03-24 14:22 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, akpm, iamjoonsoo.kim, hannes, mhocko, vbabka, bhe
This moving makes the layout of /proc/zoneinfo more sensible. And there
are 4 zones at most currently, it doesn't need to scroll down much to get
to the 1st populated zone, even though the 1st populated zone is MOVABLE
zone.
Node 2, per-node stats
nr_inactive_anon 48
nr_active_anon 15454
...
nr_foll_pin_acquired 0
nr_foll_pin_released 0
Node 2, zone DMA
pages free 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
Node 2, zone DMA32
pages free 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
Node 2, zone Normal
pages free 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
Node 2, zone Movable
pages free 196346
min 3540
...
managed 262144
Signed-off-by: Baoquan He <bhe@redhat.com>
---
mm/vmstat.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 6fd1407f4632..4bbf9be786da 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1567,13 +1567,6 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
{
int i;
seq_printf(m, "Node %d, zone %8s", pgdat->node_id, zone->name);
- if (is_zone_first_populated(pgdat, zone)) {
- seq_printf(m, "\n per-node stats");
- for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
- seq_printf(m, "\n %-12s %lu", node_stat_name(i),
- node_page_state(pgdat, i));
- }
- }
seq_printf(m,
"\n pages free %lu"
"\n min %lu"
@@ -1648,7 +1641,18 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
*/
static int zoneinfo_show(struct seq_file *m, void *arg)
{
+ int i;
pg_data_t *pgdat = (pg_data_t *)arg;
+
+ if (node_state(pgdat->node_id, N_MEMORY)) {
+ seq_printf(m, "Node %d, per-node stats", pgdat->node_id);
+ for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
+ seq_printf(m, "\n %-12s %lu", node_stat_name(i),
+ node_page_state(pgdat, i));
+ }
+ seq_putc(m, '\n');
+ }
+
walk_zones_in_node(m, pgdat, false, false, zoneinfo_show_print);
return 0;
}
--
2.17.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo
2020-03-24 14:22 ` [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo Baoquan He
@ 2020-03-24 19:25 ` David Rientjes
2020-03-25 5:53 ` Baoquan He
0 siblings, 1 reply; 14+ messages in thread
From: David Rientjes @ 2020-03-24 19:25 UTC (permalink / raw)
To: Baoquan He
Cc: linux-kernel, linux-mm, akpm, iamjoonsoo.kim, hannes, mhocko, vbabka
On Tue, 24 Mar 2020, Baoquan He wrote:
> This moving makes the layout of /proc/zoneinfo more sensible. And there
> are 4 zones at most currently, it doesn't need to scroll down much to get
> to the 1st populated zone, even though the 1st populated zone is MOVABLE
> zone.
>
Doesn't this introduce risk that it will break existing parsers of
/proc/zoneinfo in subtle ways?
In some cases /proc/zoneinfo is a tricky file to correctly parse because
you have to rely on the existing order in which it is printed to determine
which zone is being described. We need to print zones even with unmanaged
pages, for instance, otherwise userspace may be unaware of which zones are
supported and what order they are in. That's important to be able to
construct the proper string to use when writing vm.lowmem_reserve_ratio.
I'd prefer not changing the order of /proc/zoneinfo if it can be avoided
just because the risk outweighs the reward that we may break some
initscript parsers.
> Node 2, per-node stats
> nr_inactive_anon 48
> nr_active_anon 15454
> ...
> nr_foll_pin_acquired 0
> nr_foll_pin_released 0
> Node 2, zone DMA
> pages free 0
> min 0
> low 0
> high 0
> spanned 0
> present 0
> managed 0
> Node 2, zone DMA32
> pages free 0
> min 0
> low 0
> high 0
> spanned 0
> present 0
> managed 0
> Node 2, zone Normal
> pages free 0
> min 0
> low 0
> high 0
> spanned 0
> present 0
> managed 0
> Node 2, zone Movable
> pages free 196346
> min 3540
> ...
> managed 262144
>
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
> mm/vmstat.c | 18 +++++++++++-------
> 1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 6fd1407f4632..4bbf9be786da 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1567,13 +1567,6 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
> {
> int i;
> seq_printf(m, "Node %d, zone %8s", pgdat->node_id, zone->name);
> - if (is_zone_first_populated(pgdat, zone)) {
> - seq_printf(m, "\n per-node stats");
> - for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
> - seq_printf(m, "\n %-12s %lu", node_stat_name(i),
> - node_page_state(pgdat, i));
> - }
> - }
> seq_printf(m,
> "\n pages free %lu"
> "\n min %lu"
> @@ -1648,7 +1641,18 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
> */
> static int zoneinfo_show(struct seq_file *m, void *arg)
> {
> + int i;
> pg_data_t *pgdat = (pg_data_t *)arg;
> +
> + if (node_state(pgdat->node_id, N_MEMORY)) {
> + seq_printf(m, "Node %d, per-node stats", pgdat->node_id);
> + for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
> + seq_printf(m, "\n %-12s %lu", node_stat_name(i),
> + node_page_state(pgdat, i));
> + }
> + seq_putc(m, '\n');
> + }
> +
> walk_zones_in_node(m, pgdat, false, false, zoneinfo_show_print);
> return 0;
> }
> --
> 2.17.2
>
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo
2020-03-24 19:25 ` David Rientjes
@ 2020-03-25 5:53 ` Baoquan He
2020-03-25 8:55 ` Michal Hocko
0 siblings, 1 reply; 14+ messages in thread
From: Baoquan He @ 2020-03-25 5:53 UTC (permalink / raw)
To: David Rientjes
Cc: linux-kernel, linux-mm, akpm, iamjoonsoo.kim, hannes, mhocko, vbabka
On 03/24/20 at 12:25pm, David Rientjes wrote:
> On Tue, 24 Mar 2020, Baoquan He wrote:
>
> > This moving makes the layout of /proc/zoneinfo more sensible. And there
> > are 4 zones at most currently, it doesn't need to scroll down much to get
> > to the 1st populated zone, even though the 1st populated zone is MOVABLE
> > zone.
> >
>
> Doesn't this introduce risk that it will break existing parsers of
> /proc/zoneinfo in subtle ways?
>
> In some cases /proc/zoneinfo is a tricky file to correctly parse because
> you have to rely on the existing order in which it is printed to determine
> which zone is being described. We need to print zones even with unmanaged
> pages, for instance, otherwise userspace may be unaware of which zones are
> supported and what order they are in. That's important to be able to
> construct the proper string to use when writing vm.lowmem_reserve_ratio.
>
> I'd prefer not changing the order of /proc/zoneinfo if it can be avoided
> just because the risk outweighs the reward that we may break some
> initscript parsers.
Oh, I may not describe the change and result clearly. This patch doesn't
change zone order at all. I only move the per-node stats to the front of
each node, the zone order is completely kept the same, still DMA, DMA32,
NORMAL, MOVABLE.
Before this patch, per-node stats are printed inside the first populated
zone of each node. E.g in this node 2 which only has movable zone, the
per-node stats is embedded in the last zone. In fact, this per-node stats
are made for the whole node, but not for one zone.
Node 2, zone DMA
pages free 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
protection: (0, 0, 0, 1024, 1024)
Node 2, zone DMA32
pages free 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
protection: (0, 0, 0, 1024, 1024)
Node 2, zone Normal
pages free 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
protection: (0, 0, 0, 8192, 8192)
Node 2, zone Movable
per-node stats --------------------------->> start of per-node stats
nr_inactive_anon 42
nr_active_anon 11787
nr_inactive_file 32222
nr_active_file 6081
nr_unevictable 0
nr_slab_reclaimable 0
nr_slab_unreclaimable 0
...... --------- (mid items are omitted)
nr_anon_transparent_hugepages 0
nr_unstable 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_dirtied 25523
nr_written 25113
nr_kernel_misc_reclaimable 0 ------------------------->> end of per-node stats
pages free 211331 ------------------------->> start printing data of zone Movable
min 3524
low 4405
high 5286
spanned 262144
present 262144
managed 262144
protection: (0, 0, 0, 0, 0)
nr_free_pages 211331
nr_zone_inactive_anon 42
nr_zone_active_anon 11787
nr_zone_inactive_file 32222
nr_zone_active_file 6081
nr_zone_unevictable 0
nr_zone_write_pending 2
With this patch applied, only the per-node stats part is moved to the
front of each node.
Node 2, per-node stats --------------------------->> start of per-node stats
nr_inactive_anon 42
nr_active_anon 12358
nr_inactive_file 33139
nr_active_file 10088
nr_unevictable 0
nr_slab_reclaimable 0
...... --------- (mid items are omitted)
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_dirtied 9082
nr_written 8844
nr_kernel_misc_reclaimable 0
nr_foll_pin_acquired 0
nr_foll_pin_released 0 ------------------------->> end of per-node stats
Node 2, zone DMA
pages free 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
Node 2, zone DMA32
pages free 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
Node 2, zone Normal
pages free 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
Node 2, zone Movable
pages free 205601 ------------------------->> start printing data of zone Movable
min 3525
low 4406
high 5287
spanned 262144
present 262144
managed 262144
protection: (0, 0, 0, 0)
nr_free_pages 205601
nr_zone_inactive_anon 42
nr_zone_active_anon 12358
........
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo
2020-03-25 5:53 ` Baoquan He
@ 2020-03-25 8:55 ` Michal Hocko
2020-03-25 14:23 ` Baoquan He
0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2020-03-25 8:55 UTC (permalink / raw)
To: Baoquan He
Cc: David Rientjes, linux-kernel, linux-mm, akpm, iamjoonsoo.kim,
hannes, vbabka
On Wed 25-03-20 13:53:31, Baoquan He wrote:
> On 03/24/20 at 12:25pm, David Rientjes wrote:
> > On Tue, 24 Mar 2020, Baoquan He wrote:
> >
> > > This moving makes the layout of /proc/zoneinfo more sensible. And there
> > > are 4 zones at most currently, it doesn't need to scroll down much to get
> > > to the 1st populated zone, even though the 1st populated zone is MOVABLE
> > > zone.
> > >
> >
> > Doesn't this introduce risk that it will break existing parsers of
> > /proc/zoneinfo in subtle ways?
> >
> > In some cases /proc/zoneinfo is a tricky file to correctly parse because
> > you have to rely on the existing order in which it is printed to determine
> > which zone is being described. We need to print zones even with unmanaged
> > pages, for instance, otherwise userspace may be unaware of which zones are
> > supported and what order they are in. That's important to be able to
> > construct the proper string to use when writing vm.lowmem_reserve_ratio.
> >
> > I'd prefer not changing the order of /proc/zoneinfo if it can be avoided
> > just because the risk outweighs the reward that we may break some
> > initscript parsers.
>
> Oh, I may not describe the change and result clearly. This patch doesn't
> change zone order at all. I only move the per-node stats to the front of
> each node, the zone order is completely kept the same, still DMA, DMA32,
> NORMAL, MOVABLE.
Even this can break existing parsers. Fixing that up is likely not hard
and existing parsers would be mostly debugging hacks here and there but
I do miss any actual justification except for you considering it more
sensible. I do not remember this would be a common pain point for people
parsing this file. If anything the overal structure of the file makes it
hard to parse and your patches do not really address that. We are likely
too late to make the output much more sensible TBH.
That being said, I haven't looked more closely on your patches because I
do not have spare cycles for that. Your justification for touching the
code which seems to be working relatively well is quite weak IMHO, yet
it adds a non zero risk for breaking existing parsers.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo
2020-03-25 8:55 ` Michal Hocko
@ 2020-03-25 14:23 ` Baoquan He
2020-03-25 19:45 ` David Rientjes
0 siblings, 1 reply; 14+ messages in thread
From: Baoquan He @ 2020-03-25 14:23 UTC (permalink / raw)
To: Michal Hocko
Cc: David Rientjes, linux-kernel, linux-mm, akpm, iamjoonsoo.kim,
hannes, vbabka, mgorman
On 03/25/20 at 09:55am, Michal Hocko wrote:
> On Wed 25-03-20 13:53:31, Baoquan He wrote:
> > On 03/24/20 at 12:25pm, David Rientjes wrote:
> > > On Tue, 24 Mar 2020, Baoquan He wrote:
> > >
> > > > This moving makes the layout of /proc/zoneinfo more sensible. And there
> > > > are 4 zones at most currently, it doesn't need to scroll down much to get
> > > > to the 1st populated zone, even though the 1st populated zone is MOVABLE
> > > > zone.
> > > >
> > >
> > > Doesn't this introduce risk that it will break existing parsers of
> > > /proc/zoneinfo in subtle ways?
> > >
> > > In some cases /proc/zoneinfo is a tricky file to correctly parse because
> > > you have to rely on the existing order in which it is printed to determine
> > > which zone is being described. We need to print zones even with unmanaged
> > > pages, for instance, otherwise userspace may be unaware of which zones are
> > > supported and what order they are in. That's important to be able to
> > > construct the proper string to use when writing vm.lowmem_reserve_ratio.
> > >
> > > I'd prefer not changing the order of /proc/zoneinfo if it can be avoided
> > > just because the risk outweighs the reward that we may break some
> > > initscript parsers.
> >
> > Oh, I may not describe the change and result clearly. This patch doesn't
> > change zone order at all. I only move the per-node stats to the front of
> > each node, the zone order is completely kept the same, still DMA, DMA32,
> > NORMAL, MOVABLE.
>
> Even this can break existing parsers. Fixing that up is likely not hard
> and existing parsers would be mostly debugging hacks here and there but
> I do miss any actual justification except for you considering it more
> sensible. I do not remember this would be a common pain point for people
> parsing this file. If anything the overal structure of the file makes it
> hard to parse and your patches do not really address that. We are likely
> too late to make the output much more sensible TBH.
>
> That being said, I haven't looked more closely on your patches because I
> do not have spare cycles for that. Your justification for touching the
> code which seems to be working relatively well is quite weak IMHO, yet
> it adds a non zero risk for breaking existing parsers.
I would take the saying of non zero risk for breaking existing parsers.
When considering this change, I thought about the possible risk. However,
found out the per-node stats was added in 2016 which is not so late, and
assume nobody will rely on the order of per-node stats embeded into a
zone. But I have to admit any concern or worry of risk is worth being
considerred carefully since /proc/zoneinfo is a classic interface.
So, in view of objections from you and David, I would like to drop this
patch and patch 5. It's a small improvement, not worth taking any risk.
But if it goes back to this time of 2017, I would like to spend some
time to defend it :-)
commit e2ecc8a79ed49f7838b4fdf352c4c48cec9424ac
Author: Mel Gorman <mgorman@techsingularity.net>
Date: Thu Jul 28 15:47:02 2016 -0700
mm, vmstat: print node-based stats in zoneinfo file
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo
2020-03-25 14:23 ` Baoquan He
@ 2020-03-25 19:45 ` David Rientjes
2020-03-26 4:24 ` Baoquan He
0 siblings, 1 reply; 14+ messages in thread
From: David Rientjes @ 2020-03-25 19:45 UTC (permalink / raw)
To: Baoquan He
Cc: Michal Hocko, linux-kernel, linux-mm, akpm, iamjoonsoo.kim,
hannes, vbabka, mgorman
On Wed, 25 Mar 2020, Baoquan He wrote:
> > Even this can break existing parsers. Fixing that up is likely not hard
> > and existing parsers would be mostly debugging hacks here and there but
> > I do miss any actual justification except for you considering it more
> > sensible. I do not remember this would be a common pain point for people
> > parsing this file. If anything the overal structure of the file makes it
> > hard to parse and your patches do not really address that. We are likely
> > too late to make the output much more sensible TBH.
> >
> > That being said, I haven't looked more closely on your patches because I
> > do not have spare cycles for that. Your justification for touching the
> > code which seems to be working relatively well is quite weak IMHO, yet
> > it adds a non zero risk for breaking existing parsers.
>
> I would take the saying of non zero risk for breaking existing parsers.
> When considering this change, I thought about the possible risk. However,
> found out the per-node stats was added in 2016 which is not so late, and
> assume nobody will rely on the order of per-node stats embeded into a
> zone. But I have to admit any concern or worry of risk is worth being
> considerred carefully since /proc/zoneinfo is a classic interface.
>
For context, we started parsing /proc/zoneinfo in initscripts to be able
to determine the order in which vm.lowmem_reserve_ratio needs to be set
and this required my kernel change from 2017:
commit b2bd8598195f1b2a72130592125ac6b4218988a2
Author: David Rientjes <rientjes@google.com>
Date: Wed May 3 14:52:59 2017 -0700
mm, vmstat: print non-populated zones in zoneinfo
Otherwise, we found, it's much more difficult to determine how this array
should be structured. So at least we parse this file very carefully, I'm
not sure how much others do, but it seems like an unnecessary risk for
little reward. I'm happy to see it has been decided to drop this patch
and patch 5.
> So, in view of objections from you and David, I would like to drop this
> patch and patch 5. It's a small improvement, not worth taking any risk.
> But if it goes back to this time of 2017, I would like to spend some
> time to defend it :-)
>
> commit e2ecc8a79ed49f7838b4fdf352c4c48cec9424ac
> Author: Mel Gorman <mgorman@techsingularity.net>
> Date: Thu Jul 28 15:47:02 2016 -0700
>
> mm, vmstat: print node-based stats in zoneinfo file
>
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo
2020-03-25 19:45 ` David Rientjes
@ 2020-03-26 4:24 ` Baoquan He
2020-03-26 6:43 ` Michal Hocko
0 siblings, 1 reply; 14+ messages in thread
From: Baoquan He @ 2020-03-26 4:24 UTC (permalink / raw)
To: David Rientjes
Cc: Michal Hocko, linux-kernel, linux-mm, akpm, iamjoonsoo.kim,
hannes, vbabka, mgorman
On 03/25/20 at 12:45pm, David Rientjes wrote:
> On Wed, 25 Mar 2020, Baoquan He wrote:
>
> > > Even this can break existing parsers. Fixing that up is likely not hard
> > > and existing parsers would be mostly debugging hacks here and there but
> > > I do miss any actual justification except for you considering it more
> > > sensible. I do not remember this would be a common pain point for people
> > > parsing this file. If anything the overal structure of the file makes it
> > > hard to parse and your patches do not really address that. We are likely
> > > too late to make the output much more sensible TBH.
> > >
> > > That being said, I haven't looked more closely on your patches because I
> > > do not have spare cycles for that. Your justification for touching the
> > > code which seems to be working relatively well is quite weak IMHO, yet
> > > it adds a non zero risk for breaking existing parsers.
> >
> > I would take the saying of non zero risk for breaking existing parsers.
> > When considering this change, I thought about the possible risk. However,
> > found out the per-node stats was added in 2016 which is not so late, and
> > assume nobody will rely on the order of per-node stats embeded into a
> > zone. But I have to admit any concern or worry of risk is worth being
> > considerred carefully since /proc/zoneinfo is a classic interface.
> >
>
> For context, we started parsing /proc/zoneinfo in initscripts to be able
> to determine the order in which vm.lowmem_reserve_ratio needs to be set
> and this required my kernel change from 2017:
>
> commit b2bd8598195f1b2a72130592125ac6b4218988a2
> Author: David Rientjes <rientjes@google.com>
> Date: Wed May 3 14:52:59 2017 -0700
>
> mm, vmstat: print non-populated zones in zoneinfo
>
> Otherwise, we found, it's much more difficult to determine how this array
> should be structured. So at least we parse this file very carefully, I'm
> not sure how much others do, but it seems like an unnecessary risk for
> little reward. I'm happy to see it has been decided to drop this patch
> and patch 5.
OK, I see why it is in such a situation, the empty zones were not printed.
I could still not get how vm.lowmem_reserve_ratio is set with
/proc/zoneinfo in the old initscripts, do you see any risk if not
filling and showing the ->lowmem_reserve[] of empty zone in
patch 2 and 3? Thanks in advance.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo
2020-03-26 4:24 ` Baoquan He
@ 2020-03-26 6:43 ` Michal Hocko
2020-03-26 11:22 ` Baoquan He
0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2020-03-26 6:43 UTC (permalink / raw)
To: Baoquan He
Cc: David Rientjes, linux-kernel, linux-mm, akpm, iamjoonsoo.kim,
hannes, vbabka, mgorman
On Thu 26-03-20 12:24:54, Baoquan He wrote:
> On 03/25/20 at 12:45pm, David Rientjes wrote:
> > On Wed, 25 Mar 2020, Baoquan He wrote:
> >
> > > > Even this can break existing parsers. Fixing that up is likely not hard
> > > > and existing parsers would be mostly debugging hacks here and there but
> > > > I do miss any actual justification except for you considering it more
> > > > sensible. I do not remember this would be a common pain point for people
> > > > parsing this file. If anything the overal structure of the file makes it
> > > > hard to parse and your patches do not really address that. We are likely
> > > > too late to make the output much more sensible TBH.
> > > >
> > > > That being said, I haven't looked more closely on your patches because I
> > > > do not have spare cycles for that. Your justification for touching the
> > > > code which seems to be working relatively well is quite weak IMHO, yet
> > > > it adds a non zero risk for breaking existing parsers.
> > >
> > > I would take the saying of non zero risk for breaking existing parsers.
> > > When considering this change, I thought about the possible risk. However,
> > > found out the per-node stats was added in 2016 which is not so late, and
> > > assume nobody will rely on the order of per-node stats embeded into a
> > > zone. But I have to admit any concern or worry of risk is worth being
> > > considerred carefully since /proc/zoneinfo is a classic interface.
> > >
> >
> > For context, we started parsing /proc/zoneinfo in initscripts to be able
> > to determine the order in which vm.lowmem_reserve_ratio needs to be set
> > and this required my kernel change from 2017:
> >
> > commit b2bd8598195f1b2a72130592125ac6b4218988a2
> > Author: David Rientjes <rientjes@google.com>
> > Date: Wed May 3 14:52:59 2017 -0700
> >
> > mm, vmstat: print non-populated zones in zoneinfo
> >
> > Otherwise, we found, it's much more difficult to determine how this array
> > should be structured. So at least we parse this file very carefully, I'm
> > not sure how much others do, but it seems like an unnecessary risk for
> > little reward. I'm happy to see it has been decided to drop this patch
> > and patch 5.
>
>
> OK, I see why it is in such a situation, the empty zones were not printed.
>
> I could still not get how vm.lowmem_reserve_ratio is set with
> /proc/zoneinfo in the old initscripts, do you see any risk if not
> filling and showing the ->lowmem_reserve[] of empty zone in
> patch 2 and 3? Thanks in advance.
The point is why should we even care. Displaying that information
shouldn't hurt anything, right?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo
2020-03-26 6:43 ` Michal Hocko
@ 2020-03-26 11:22 ` Baoquan He
0 siblings, 0 replies; 14+ messages in thread
From: Baoquan He @ 2020-03-26 11:22 UTC (permalink / raw)
To: Michal Hocko
Cc: David Rientjes, linux-kernel, linux-mm, akpm, iamjoonsoo.kim,
hannes, vbabka, mgorman
On 03/26/20 at 07:43am, Michal Hocko wrote:
> On Thu 26-03-20 12:24:54, Baoquan He wrote:
> > On 03/25/20 at 12:45pm, David Rientjes wrote:
> > > On Wed, 25 Mar 2020, Baoquan He wrote:
> > >
> > > > > Even this can break existing parsers. Fixing that up is likely not hard
> > > > > and existing parsers would be mostly debugging hacks here and there but
> > > > > I do miss any actual justification except for you considering it more
> > > > > sensible. I do not remember this would be a common pain point for people
> > > > > parsing this file. If anything the overal structure of the file makes it
> > > > > hard to parse and your patches do not really address that. We are likely
> > > > > too late to make the output much more sensible TBH.
> > > > >
> > > > > That being said, I haven't looked more closely on your patches because I
> > > > > do not have spare cycles for that. Your justification for touching the
> > > > > code which seems to be working relatively well is quite weak IMHO, yet
> > > > > it adds a non zero risk for breaking existing parsers.
> > > >
> > > > I would take the saying of non zero risk for breaking existing parsers.
> > > > When considering this change, I thought about the possible risk. However,
> > > > found out the per-node stats was added in 2016 which is not so late, and
> > > > assume nobody will rely on the order of per-node stats embeded into a
> > > > zone. But I have to admit any concern or worry of risk is worth being
> > > > considerred carefully since /proc/zoneinfo is a classic interface.
> > > >
> > >
> > > For context, we started parsing /proc/zoneinfo in initscripts to be able
> > > to determine the order in which vm.lowmem_reserve_ratio needs to be set
> > > and this required my kernel change from 2017:
> > >
> > > commit b2bd8598195f1b2a72130592125ac6b4218988a2
> > > Author: David Rientjes <rientjes@google.com>
> > > Date: Wed May 3 14:52:59 2017 -0700
> > >
> > > mm, vmstat: print non-populated zones in zoneinfo
> > >
> > > Otherwise, we found, it's much more difficult to determine how this array
> > > should be structured. So at least we parse this file very carefully, I'm
> > > not sure how much others do, but it seems like an unnecessary risk for
> > > little reward. I'm happy to see it has been decided to drop this patch
> > > and patch 5.
> >
> >
> > OK, I see why it is in such a situation, the empty zones were not printed.
> >
> > I could still not get how vm.lowmem_reserve_ratio is set with
> > /proc/zoneinfo in the old initscripts, do you see any risk if not
> > filling and showing the ->lowmem_reserve[] of empty zone in
> > patch 2 and 3? Thanks in advance.
>
> The point is why should we even care. Displaying that information
> shouldn't hurt anything, right?
Well, I would say why not. If saying anything hurted, I often check
/proc/zoneinfo to get information about system memory like many people,
I was wondering why the protection data is over there, but it's am
empty zone, and they protect what. I dare say it's more than once I
asked to myself, just sometime I am too lazy to start to make it clear
when focusing on another issue. Not sure if that is kind of hurting. I
would like to see it's not there to confuse me if anyone else have
stood up to fix it, I absolutely will vote for it.
Surely, we also need to evaluate if any risk or complexity is involved,
While with my understanding, I don't see risk, and the change is quite
simple and easy to understand.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 5/5] mm/vmstat.c: remove the useless code
2020-03-24 14:22 [PATCH 0/5] improvements about lowmem_reserve and /proc/zoneinfo Baoquan He
` (3 preceding siblings ...)
2020-03-24 14:22 ` [PATCH 4/5] mm/vmstat.c: move the per-node stats to the front of /proc/zoneinfo Baoquan He
@ 2020-03-24 14:22 ` Baoquan He
4 siblings, 0 replies; 14+ messages in thread
From: Baoquan He @ 2020-03-24 14:22 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, akpm, iamjoonsoo.kim, hannes, mhocko, vbabka, bhe
No one calls is_zone_first_populated(), remove it.
Signed-off-by: Baoquan He <bhe@redhat.com>
---
mm/vmstat.c | 14 --------------
1 file changed, 14 deletions(-)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 4bbf9be786da..7097eb99f30d 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1548,20 +1548,6 @@ static const struct seq_operations pagetypeinfo_op = {
.show = pagetypeinfo_show,
};
-static bool is_zone_first_populated(pg_data_t *pgdat, struct zone *zone)
-{
- int zid;
-
- for (zid = 0; zid < MAX_NR_ZONES; zid++) {
- struct zone *compare = &pgdat->node_zones[zid];
-
- if (populated_zone(compare))
- return zone == compare;
- }
-
- return false;
-}
-
static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
struct zone *zone)
{
--
2.17.2
^ permalink raw reply related [flat|nested] 14+ messages in thread