linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Michal Hocko <mhocko@suse.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	khlebnikov@yandex-team.ru, kirill@shutemov.name,
	aneesh.kumar@linux.ibm.com, srikar@linux.vnet.ibm.com
Subject: Re: [PATCH] mm: vmstat: Use zeroed stats for unpopulated zones
Date: Wed, 6 May 2020 17:09:51 +0200	[thread overview]
Message-ID: <c930059f-1828-7238-aa2d-4eef52380f16@suse.cz> (raw)
In-Reply-To: <20200506140241.GB6345@dhcp22.suse.cz>

On 5/6/20 4:02 PM, Michal Hocko wrote:
> On Wed 06-05-20 15:33:36, Vlastimil Babka wrote:
>> On 5/4/20 12:26 PM, Michal Hocko wrote:
>> > On Mon 04-05-20 12:33:04, Sandipan Das wrote:
>> >> For unpopulated zones, the pagesets point to the common
>> >> boot_pageset which can have non-zero vm_numa_stat counts.
>> >> Because of this memory-less nodes end up having non-zero
>> >> NUMA statistics. This can be observed on any architecture
>> >> that supports memory-less NUMA nodes.
>> >> 
>> >> E.g.
>> >> 
>> >>   $ numactl -H
>> >>   available: 2 nodes (0-1)
>> >>   node 0 cpus: 0 1 2 3
>> >>   node 0 size: 0 MB
>> >>   node 0 free: 0 MB
>> >>   node 1 cpus: 4 5 6 7
>> >>   node 1 size: 8131 MB
>> >>   node 1 free: 6980 MB
>> >>   node distances:
>> >>   node   0   1
>> >>     0:  10  40
>> >>     1:  40  10
>> >> 
>> >>   $ numastat
>> >>                              node0           node1
>> >>   numa_hit                     108           56495
>> >>   numa_miss                      0               0
>> >>   numa_foreign                   0               0
>> >>   interleave_hit                 0            4537
>> >>   local_node                   108           31547
>> >>   other_node                     0           24948
>> >> 
>> >> Hence, return zero explicitly for all the stats of an
>> >> unpopulated zone.
>> > 
>> > I hope I am not just confused but I would expect that at least
>> > numa_foreign and other_node to be non zero.
>> Hmm, checking zone_statistics():
>> 
>> NUMA_FOREIGN increment uses preferred zone, which is the first in zone in
>> zonelist, so it will be a zone from node 1 even for allocations on cpu
>> associated to node 0 - assuming node 0's unpopulated zones are not included in
>> node 0's zonelist.
> 
> But the allocation could have been requested for node 0 regardless of
> the amount of memory the node has.

Yes, if we allocate from cpu 0-3 then it should be a miss on node 0. But the
zonelists are optimized in a way that they don't include empty zones -
build_zonerefs_node() checks managed_zone(). As a result, node 0 zonelist has no
node 0 zones, which confuses the stats code. We should probably document that
numa stats are bogus on systems with memoryless nodes. This patch makes it
somewhat more obvious by presenting nice zeroes on the memoryless node itself,
but node 1 now include stats from node 0.

>> NUMA_OTHER uses numa_node_id(), which would mean the node 0's cpus have node 1
>> in their numa_node_id() ? Is that correct?
> 
> numa_node_id should reflect the real node the CPU is associated with.

You're right, numa_node_id() is probably fine. But NUMA_OTHER is actually
incremented at the zone where the allocation succeeds. This probably doesn't
match Documentation/admin-guide/numastat.rst, even on a non-memoryless-node systems:

other_node      A process ran on this node and got memory from another node.



  reply	other threads:[~2020-05-06 15:09 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-04  7:03 [PATCH] mm: vmstat: Use zeroed stats for unpopulated zones Sandipan Das
2020-05-04 10:26 ` Michal Hocko
2020-05-06 13:33   ` Vlastimil Babka
2020-05-06 14:02     ` Michal Hocko
2020-05-06 15:09       ` Vlastimil Babka [this message]
2020-05-06 15:24         ` Michal Hocko
2020-05-06 15:50           ` Vlastimil Babka
2020-05-07  7:09             ` Michal Hocko
2020-05-07  9:05               ` Sandipan Das
2020-05-07  9:08                 ` Sandipan Das
2020-05-07 11:07                   ` Vlastimil Babka
2020-05-07 11:17                     ` Sandipan Das

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c930059f-1828-7238-aa2d-4eef52380f16@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=kirill@shutemov.name \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=sandipan@linux.ibm.com \
    --cc=srikar@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).