linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Michal Hocko <mhocko@suse.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	khlebnikov@yandex-team.ru, kirill@shutemov.name,
	aneesh.kumar@linux.ibm.com, srikar@linux.vnet.ibm.com
Subject: Re: [PATCH] mm: vmstat: Use zeroed stats for unpopulated zones
Date: Wed, 6 May 2020 17:50:28 +0200	[thread overview]
Message-ID: <e2d261fe-8bee-bc16-adb3-0231977e6a04@suse.cz> (raw)
In-Reply-To: <20200506152408.GD6345@dhcp22.suse.cz>

On 5/6/20 5:24 PM, Michal Hocko wrote:
>> 
>> Yes, if we allocate from cpu 0-3 then it should be a miss on node 0. But the
>> zonelists are optimized in a way that they don't include empty zones -
>> build_zonerefs_node() checks managed_zone(). As a result, node 0 zonelist has no
>> node 0 zones, which confuses the stats code. We should probably document that
>> numa stats are bogus on systems with memoryless nodes. This patch makes it
>> somewhat more obvious by presenting nice zeroes on the memoryless node itself,
>> but node 1 now include stats from node 0.
> 
> Thanks for the clarification. So the underlying problem is that zone_statistics
> operates on a preferred zone rather than node. This would be fixable but
> I am not sure whether this is something worth bothering. Maybe it would
> just be more convenient to document the unfortunate memory less nodes
> stats situation and be done with it. Or do we have any consumers that
> really do care?
> 
>> >> NUMA_OTHER uses numa_node_id(), which would mean the node 0's cpus have node 1
>> >> in their numa_node_id() ? Is that correct?
>> > 
>> > numa_node_id should reflect the real node the CPU is associated with.
>> 
>> You're right, numa_node_id() is probably fine. But NUMA_OTHER is actually
>> incremented at the zone where the allocation succeeds. This probably doesn't
>> match Documentation/admin-guide/numastat.rst, even on a non-memoryless-node systems:
>> 
>> other_node      A process ran on this node and got memory from another node.
> 
> Yeah, the documentation doesn't match the implementation. Maybe we
> should just fix the documentation because this has been the case for
> ages.
> 

How about something like this:

diff --git a/Documentation/admin-guide/numastat.rst b/Documentation/admin-guide/numastat.rst
index aaf1667489f8..08ec2c2bdce3 100644
--- a/Documentation/admin-guide/numastat.rst
+++ b/Documentation/admin-guide/numastat.rst
@@ -6,6 +6,21 @@ Numa policy hit/miss statistics
 
 All units are pages. Hugepages have separate counters.
 
+The numa_hit, numa_miss and numa_foreign counters reflect how well processes
+are able to allocate memory from nodes they prefer. If they succeed, numa_hit
+is incremented on the preferred node, otherwise numa_foreign is incremented on
+the preferred node and numa_miss on the node where allocation succeeded.
+
+Usually preferred node is the one local to the CPU where the process executes,
+but restrictions such as mempolicies can change that, so there are also two
+counters based on CPU local node. local_node is similar to numa_hit and is
+incremented on allocation from a node by CPU on the same node. other_node is
+similar to numa_miss and is incremented on the node where allocation succeeds
+from a CPU from a different node. Note there is no counter analogical to
+numa_foreign.
+
+In more detail:
+
 =============== ============================================================
 numa_hit	A process wanted to allocate memory from this node,
 		and succeeded.
@@ -14,11 +29,13 @@ numa_miss	A process wanted to allocate memory from another node,
 		but ended up with memory from this node.
 
 numa_foreign	A process wanted to allocate on this node,
-		but ended up with memory from another one.
+		but ended up with memory from another node.
 
-local_node	A process ran on this node and got memory from it.
+local_node	A process ran on this node's CPU,
+		and got memory from this node.
 
-other_node	A process ran on this node and got memory from another node.
+other_node	A process ran on a different node's CPU
+		and got memory from this node.
 
 interleave_hit 	Interleaving wanted to allocate from this node
 		and succeeded.
@@ -28,3 +45,11 @@ For easier reading you can use the numastat utility from the numactl package
 (http://oss.sgi.com/projects/libnuma/). Note that it only works
 well right now on machines with a small number of CPUs.
 
+Note that on systems with memoryless nodes (where a node has CPUs but no
+memory) the numa_hit, numa_miss and numa_foreign statistics can be skewed
+heavily. In the current kernel implementation, if a process prefers a
+memoryless node (i.e.  because it is running on one of its local CPU), the
+implementation actually treats one of the nearest nodes with memory as the
+preferred node. As a result, such allocation will not increase the numa_foreign
+counter on the memoryless node, and will skew the numa_hit, numa_miss and
+numa_foreign statistics of the nearest node.
-- 
2.26.2



  reply	other threads:[~2020-05-06 15:50 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-04  7:03 [PATCH] mm: vmstat: Use zeroed stats for unpopulated zones Sandipan Das
2020-05-04 10:26 ` Michal Hocko
2020-05-06 13:33   ` Vlastimil Babka
2020-05-06 14:02     ` Michal Hocko
2020-05-06 15:09       ` Vlastimil Babka
2020-05-06 15:24         ` Michal Hocko
2020-05-06 15:50           ` Vlastimil Babka [this message]
2020-05-07  7:09             ` Michal Hocko
2020-05-07  9:05               ` Sandipan Das
2020-05-07  9:08                 ` Sandipan Das
2020-05-07 11:07                   ` Vlastimil Babka
2020-05-07 11:17                     ` Sandipan Das

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e2d261fe-8bee-bc16-adb3-0231977e6a04@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=kirill@shutemov.name \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=sandipan@linux.ibm.com \
    --cc=srikar@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).