[PATCH] mm: vmstat: Use zeroed stats for unpopulated zones

* [PATCH] mm: vmstat: Use zeroed stats for unpopulated zones
@ 2020-05-04  7:03 Sandipan Das
  2020-05-04 10:26 ` Michal Hocko
  0 siblings, 1 reply; 12+ messages in thread
From: Sandipan Das @ 2020-05-04  7:03 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, khlebnikov, mhocko, kirill, aneesh.kumar, srikar

For unpopulated zones, the pagesets point to the common
boot_pageset which can have non-zero vm_numa_stat counts.
Because of this memory-less nodes end up having non-zero
NUMA statistics. This can be observed on any architecture
that supports memory-less NUMA nodes.

E.g.

  $ numactl -H
  available: 2 nodes (0-1)
  node 0 cpus: 0 1 2 3
  node 0 size: 0 MB
  node 0 free: 0 MB
  node 1 cpus: 4 5 6 7
  node 1 size: 8131 MB
  node 1 free: 6980 MB
  node distances:
  node   0   1
    0:  10  40
    1:  40  10

  $ numastat
                             node0           node1
  numa_hit                     108           56495
  numa_miss                      0               0
  numa_foreign                   0               0
  interleave_hit                 0            4537
  local_node                   108           31547
  other_node                     0           24948

Hence, return zero explicitly for all the stats of an
unpopulated zone.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
---
 include/linux/vmstat.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 292485f3d24d..55a68b379a2c 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -159,6 +159,21 @@ static inline unsigned long zone_numa_state_snapshot(struct zone *zone,
 	long x = atomic_long_read(&zone->vm_numa_stat[item]);
 	int cpu;
 
+	/*
+	 * Initially, the pageset of all zones are set to point to the
+	 * boot_pageset. The real pagesets are allocated later but only
+	 * for the populated zones. Unpopulated zones still continue
+	 * using the boot_pageset.
+	 *
+	 * Before the real pagesets are allocated, the boot_pageset's
+	 * vm_numa_stat counters can get incremented. This affects the
+	 * unpopulated zones which end up with non-zero stats despite
+	 * having no memory associated with them. For such cases,
+	 * return zero explicitly.
+	 */
+	if (!populated_zone(zone))
+		return 0;
+
 	for_each_online_cpu(cpu)
 		x += per_cpu_ptr(zone->pageset, cpu)->vm_numa_stat_diff[item];
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread