linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] improvements about lowmem_reserve and /proc/zoneinfo
@ 2020-04-02 14:01 Baoquan He
  2020-04-02 14:01 ` [PATCH v2 1/3] mm/page_alloc.c: only tune sysctl_lowmem_reserve_ratio value once when changing it Baoquan He
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Baoquan He @ 2020-04-02 14:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, iamjoonsoo.kim, mhocko, bhe, mgorman, rientjes

In this post, I just drop the patch 4 and patch 5 in old v1 since David
and Michal worried moving per-node stats to the front of /proc/zoneinfo
has potential to break the existing user space scripts. For patch 1~3,
there's no change, seems no risk is found out so far, so just keep them
and repost.

The v1 thread can be found here:
https://lore.kernel.org/linux-mm/20200324142229.12028-1-bhe@redhat.com/

Baoquan He (3):
  mm/page_alloc.c: only tune sysctl_lowmem_reserve_ratio value once when
    changing it
  mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty
  mm/vmstat.c: do not show lowmem reserve protection information of
    empty zone

 mm/page_alloc.c | 13 +++++++++++--
 mm/vmstat.c     | 12 ++++++------
 2 files changed, 17 insertions(+), 8 deletions(-)

-- 
2.17.2


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 1/3] mm/page_alloc.c: only tune sysctl_lowmem_reserve_ratio value once when changing it
  2020-04-02 14:01 [PATCH v2 0/3] improvements about lowmem_reserve and /proc/zoneinfo Baoquan He
@ 2020-04-02 14:01 ` Baoquan He
  2020-04-02 14:01 ` [PATCH v2 2/3] mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty Baoquan He
  2020-04-02 14:01 ` [PATCH v2 3/3] mm/vmstat.c: do not show lowmem reserve protection information of empty zone Baoquan He
  2 siblings, 0 replies; 4+ messages in thread
From: Baoquan He @ 2020-04-02 14:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, iamjoonsoo.kim, mhocko, bhe, mgorman, rientjes

When people write to /proc/sys/vm/lowmem_reserve_ratio to change
sysctl_lowmem_reserve_ratio[], setup_per_zone_lowmem_reserve()
is called to recalculate all ->lowmem_reserve[] for each zone of all
nodes as below:

static void setup_per_zone_lowmem_reserve(void)
{
...
	for_each_online_pgdat(pgdat) {
		for (j = 0; j < MAX_NR_ZONES; j++) {
			...
			while (idx) {
				...
				if (sysctl_lowmem_reserve_ratio[idx] < 1) {
					sysctl_lowmem_reserve_ratio[idx] = 0;
					lower_zone->lowmem_reserve[j] = 0;
                                } else {
				...
			}
		}
	}
}

Meanwhile, here, sysctl_lowmem_reserve_ratio[idx] will be tuned if its
value is smaller than '1'. As we know, sysctl_lowmem_reserve_ratio[] is
set for zone without regarding to which node it belongs to. That means
the tuning will be done on all nodes, even though it has been done in the
first node.

And the tuning will be done too even when init_per_zone_wmark_min()
calls setup_per_zone_lowmem_reserve(), where actually nobody tries to
change sysctl_lowmem_reserve_ratio[].

So now move the tuning into lowmem_reserve_ratio_sysctl_handler(), to
make code logic more reasonable.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/page_alloc.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ca1453204e66..c0c788798d8b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7840,8 +7840,7 @@ static void setup_per_zone_lowmem_reserve(void)
 				idx--;
 				lower_zone = pgdat->node_zones + idx;
 
-				if (sysctl_lowmem_reserve_ratio[idx] < 1) {
-					sysctl_lowmem_reserve_ratio[idx] = 0;
+				if (!sysctl_lowmem_reserve_ratio[idx]) {
 					lower_zone->lowmem_reserve[j] = 0;
 				} else {
 					lower_zone->lowmem_reserve[j] =
@@ -8106,7 +8105,15 @@ int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *table, int write,
 int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *table, int write,
 	void __user *buffer, size_t *length, loff_t *ppos)
 {
+	int i;
+
 	proc_dointvec_minmax(table, write, buffer, length, ppos);
+
+	for (i = 0; i < MAX_NR_ZONES; i++) {
+		if (sysctl_lowmem_reserve_ratio[i] < 1)
+			sysctl_lowmem_reserve_ratio[i] = 0;
+	}
+
 	setup_per_zone_lowmem_reserve();
 	return 0;
 }
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 2/3] mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty
  2020-04-02 14:01 [PATCH v2 0/3] improvements about lowmem_reserve and /proc/zoneinfo Baoquan He
  2020-04-02 14:01 ` [PATCH v2 1/3] mm/page_alloc.c: only tune sysctl_lowmem_reserve_ratio value once when changing it Baoquan He
@ 2020-04-02 14:01 ` Baoquan He
  2020-04-02 14:01 ` [PATCH v2 3/3] mm/vmstat.c: do not show lowmem reserve protection information of empty zone Baoquan He
  2 siblings, 0 replies; 4+ messages in thread
From: Baoquan He @ 2020-04-02 14:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, iamjoonsoo.kim, mhocko, bhe, mgorman, rientjes

When requesting memory allocation from a specific zone is not satisfied,
it will fall to lower zone to try allocating memory. In this case,
lower zone's ->lowmem_reserve[] will help protect its own memory resource.
The higher the relevant ->lowmem_reserve[] is, the harder the upper zone
can get memory from this lower zone.

However, this protection mechanism should be applied to populated zone,
but not an empty zone. So filling ->lowmem_reserve[] for empty zone is
not necessary, and may mislead people that it's valid data in that zone.

Node 2, zone      DMA
  pages free     0
        min      0
        low      0
        high     0
        spanned  0
        present  0
        managed  0
        protection: (0, 0, 1024, 1024)
Node 2, zone    DMA32
  pages free     0
        min      0
        low      0
        high     0
        spanned  0
        present  0
        managed  0
        protection: (0, 0, 1024, 1024)
Node 2, zone   Normal
  per-node stats
      nr_inactive_anon 0
      nr_active_anon 143
      nr_inactive_file 0
      nr_active_file 0
      nr_unevictable 0
      nr_slab_reclaimable 45
      nr_slab_unreclaimable 254

Here clear out zone->lowmem_reserve[] if zone is empty.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/page_alloc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c0c788798d8b..138a56c0f48f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7840,8 +7840,10 @@ static void setup_per_zone_lowmem_reserve(void)
 				idx--;
 				lower_zone = pgdat->node_zones + idx;
 
-				if (!sysctl_lowmem_reserve_ratio[idx]) {
+				if (!sysctl_lowmem_reserve_ratio[idx] ||
+				    !zone_managed_pages(lower_zone)) {
 					lower_zone->lowmem_reserve[j] = 0;
+					continue;
 				} else {
 					lower_zone->lowmem_reserve[j] =
 						managed_pages / sysctl_lowmem_reserve_ratio[idx];
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 3/3] mm/vmstat.c: do not show lowmem reserve protection information of empty zone
  2020-04-02 14:01 [PATCH v2 0/3] improvements about lowmem_reserve and /proc/zoneinfo Baoquan He
  2020-04-02 14:01 ` [PATCH v2 1/3] mm/page_alloc.c: only tune sysctl_lowmem_reserve_ratio value once when changing it Baoquan He
  2020-04-02 14:01 ` [PATCH v2 2/3] mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty Baoquan He
@ 2020-04-02 14:01 ` Baoquan He
  2 siblings, 0 replies; 4+ messages in thread
From: Baoquan He @ 2020-04-02 14:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, akpm, iamjoonsoo.kim, mhocko, bhe, mgorman, rientjes

Because the lowmem reserve protection of a zone can't tell anything if
the zone is empty, except of adding one more line in /proc/zoneinfo.

Let's remove it from that zone's showing.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/vmstat.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 96d21a792b57..6fd1407f4632 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1590,6 +1590,12 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
 		   zone->present_pages,
 		   zone_managed_pages(zone));
 
+	/* If unpopulated, no other information is useful */
+	if (!populated_zone(zone)) {
+		seq_putc(m, '\n');
+		return;
+	}
+
 	seq_printf(m,
 		   "\n        protection: (%ld",
 		   zone->lowmem_reserve[0]);
@@ -1597,12 +1603,6 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
 		seq_printf(m, ", %ld", zone->lowmem_reserve[i]);
 	seq_putc(m, ')');
 
-	/* If unpopulated, no other information is useful */
-	if (!populated_zone(zone)) {
-		seq_putc(m, '\n');
-		return;
-	}
-
 	for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
 		seq_printf(m, "\n      %-12s %lu", zone_stat_name(i),
 			   zone_page_state(zone, i));
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-04-02 14:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-02 14:01 [PATCH v2 0/3] improvements about lowmem_reserve and /proc/zoneinfo Baoquan He
2020-04-02 14:01 ` [PATCH v2 1/3] mm/page_alloc.c: only tune sysctl_lowmem_reserve_ratio value once when changing it Baoquan He
2020-04-02 14:01 ` [PATCH v2 2/3] mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty Baoquan He
2020-04-02 14:01 ` [PATCH v2 3/3] mm/vmstat.c: do not show lowmem reserve protection information of empty zone Baoquan He

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).