Re: [PATCH 04/11] mm/vmstat: Convert NUMA statistics to basic NUMA counters

From: Vlastimil Babka <vbabka@suse.cz>
To: Mel Gorman <mgorman@techsingularity.net>,
	Linux-MM <linux-mm@kvack.org>,
	Linux-RT-Users <linux-rt-users@vger.kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Chuck Lever <chuck.lever@oracle.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>, Michal Hocko <mhocko@kernel.org>,
	Oscar Salvador <osalvador@suse.de>
Subject: Re: [PATCH 04/11] mm/vmstat: Convert NUMA statistics to basic NUMA counters
Date: Wed, 14 Apr 2021 14:56:45 +0200	[thread overview]
Message-ID: <7a7ec563-0519-a850-563a-9680a7bd00d3@suse.cz> (raw)
In-Reply-To: <20210407202423.16022-5-mgorman@techsingularity.net>

On 4/7/21 10:24 PM, Mel Gorman wrote:
> NUMA statistics are maintained on the zone level for hits, misses, foreign
> etc but nothing relies on them being perfectly accurate for functional
> correctness. The counters are used by userspace to get a general overview
> of a workloads NUMA behaviour but the page allocator incurs a high cost to
> maintain perfect accuracy similar to what is required for a vmstat like
> NR_FREE_PAGES. There even is a sysctl vm.numa_stat to allow userspace to
> turn off the collection of NUMA statistics like NUMA_HIT.
> 
> This patch converts NUMA_HIT and friends to be NUMA events with similar
> accuracy to VM events. There is a possibility that slight errors will be
> introduced but the overall trend as seen by userspace will be similar.
> Note that while these counters could be maintained at the node level that
> it would have a user-visible impact.

I guess this kind of inaccuracy is fine. I just don't like much
fold_vm_zone_numa_events() which seems to calculate sums of percpu counters and
then assign the result to zone counters for immediate consumption, which differs
from other kinds of folds in vmstat that reset the percpu counters to 0 as they
are treated as diffs to the global counters.

So it seems that this intermediate assignment to zone counters (using
atomic_long_set() even) is unnecessary and this could mimic sum_vm_events() that
just does the summation on a local array?

And probably a bit more serious is that vm_events have vm_events_fold_cpu() to
deal with a cpu going away, but after your patch the stats counted on a cpu just
disapepar from the sums as it goes offline as there's no such thing for the numa
counters.

Thanks,
Vlastimil