linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: kemi <kemi.wang@intel.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Christopher Lameter <cl@linux.com>,
	YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Nikolay Borisov <nborisov@suse.com>,
	Pavel Tatashin <pasha.tatashin@oracle.com>,
	David Rientjes <rientjes@google.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Dave <dave.hansen@linux.intel.com>,
	Andi Kleen <andi.kleen@intel.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Ying Huang <ying.huang@intel.com>, Aaron Lu <aaron.lu@intel.com>,
	Aubrey Li <aubrey.li@intel.com>, Linux MM <linux-mm@kvack.org>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 4/5] mm: use node_page_state_snapshot to avoid deviation
Date: Wed, 20 Dec 2017 14:07:35 +0800	[thread overview]
Message-ID: <94187fd5-ad70-eba7-2724-0fe5bed750d6@intel.com> (raw)
In-Reply-To: <20171219124317.GP2787@dhcp22.suse.cz>



On 2017年12月19日 20:43, Michal Hocko wrote:
> On Tue 19-12-17 14:39:25, Kemi Wang wrote:
>> To avoid deviation, this patch uses node_page_state_snapshot instead of
>> node_page_state for node page stats query.
>> e.g. cat /proc/zoneinfo
>>      cat /sys/devices/system/node/node*/vmstat
>>      cat /sys/devices/system/node/node*/numastat
>>
>> As it is a slow path and would not be read frequently, I would worry about
>> it.
> 
> The changelog doesn't explain why these counters needs any special
> treatment. _snapshot variants where used only for internal handling
> where the precision really mattered. We do not have any in-tree user and
> Jack has removed this by http://lkml.kernel.org/r/20171122094416.26019-1-jack@suse.cz
> which is already sitting in the mmotm tree. We can re-add it but that
> would really require a _very good_ reason.
> 

Assume we have *nr* cpus, and threshold size is *t*. Thus, the maximum deviation is nr*t.
Currently, Skylake platform has hundreds of CPUs numbers and the number is still 
increasing. Also, even the threshold size is kept to 125 at maximum (32765 
for NUMA counters now), the deviation is just a little too big as I have mentioned in 
the log. I tend to sum the number in local cpus up when query the global stats.

Also, node_page_state_snapshot is only called in slow path and I don't think that
would be a big problem. 

Anyway, it is a matter of taste. I just think it's better to have.

>> Signed-off-by: Kemi Wang <kemi.wang@intel.com>
>> ---
>>  drivers/base/node.c | 17 ++++++++++-------
>>  mm/vmstat.c         |  2 +-
>>  2 files changed, 11 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/base/node.c b/drivers/base/node.c
>> index a045ea1..cf303f8 100644
>> --- a/drivers/base/node.c
>> +++ b/drivers/base/node.c
>> @@ -169,12 +169,15 @@ static ssize_t node_read_numastat(struct device *dev,
>>  		       "interleave_hit %lu\n"
>>  		       "local_node %lu\n"
>>  		       "other_node %lu\n",
>> -		       node_page_state(NODE_DATA(dev->id), NUMA_HIT),
>> -		       node_page_state(NODE_DATA(dev->id), NUMA_MISS),
>> -		       node_page_state(NODE_DATA(dev->id), NUMA_FOREIGN),
>> -		       node_page_state(NODE_DATA(dev->id), NUMA_INTERLEAVE_HIT),
>> -		       node_page_state(NODE_DATA(dev->id), NUMA_LOCAL),
>> -		       node_page_state(NODE_DATA(dev->id), NUMA_OTHER));
>> +		       node_page_state_snapshot(NODE_DATA(dev->id), NUMA_HIT),
>> +		       node_page_state_snapshot(NODE_DATA(dev->id), NUMA_MISS),
>> +		       node_page_state_snapshot(NODE_DATA(dev->id),
>> +			       NUMA_FOREIGN),
>> +		       node_page_state_snapshot(NODE_DATA(dev->id),
>> +			       NUMA_INTERLEAVE_HIT),
>> +		       node_page_state_snapshot(NODE_DATA(dev->id), NUMA_LOCAL),
>> +		       node_page_state_snapshot(NODE_DATA(dev->id),
>> +			       NUMA_OTHER));
>>  }
>>  
>>  static DEVICE_ATTR(numastat, S_IRUGO, node_read_numastat, NULL);
>> @@ -194,7 +197,7 @@ static ssize_t node_read_vmstat(struct device *dev,
>>  	for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
>>  		n += sprintf(buf+n, "%s %lu\n",
>>  			     vmstat_text[i + NR_VM_ZONE_STAT_ITEMS],
>> -			     node_page_state(pgdat, i));
>> +			     node_page_state_snapshot(pgdat, i));
>>  
>>  	return n;
>>  }
>> diff --git a/mm/vmstat.c b/mm/vmstat.c
>> index 64e08ae..d65f28d 100644
>> --- a/mm/vmstat.c
>> +++ b/mm/vmstat.c
>> @@ -1466,7 +1466,7 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>>  		for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
>>  			seq_printf(m, "\n      %-12s %lu",
>>  				vmstat_text[i + NR_VM_ZONE_STAT_ITEMS],
>> -				node_page_state(pgdat, i));
>> +				node_page_state_snapshot(pgdat, i));
>>  		}
>>  	}
>>  	seq_printf(m,
>> -- 
>> 2.7.4
>>
> 

  reply	other threads:[~2017-12-20  6:09 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-19  6:39 [PATCH v2 0/5] mm: NUMA stats code cleanup and enhancement Kemi Wang
2017-12-19  6:39 ` [PATCH v2 1/5] mm: migrate NUMA stats from per-zone to per-node Kemi Wang
2017-12-19 12:28   ` Michal Hocko
2017-12-20  5:32     ` kemi
2017-12-19  6:39 ` [PATCH v2 2/5] mm: Extends local cpu counter vm_diff_nodestat from s8 to s16 Kemi Wang
2017-12-19 12:38   ` Michal Hocko
2017-12-20  3:05     ` kemi
2017-12-19 16:05   ` Christopher Lameter
2017-12-19 16:20     ` Michal Hocko
2017-12-19 17:21       ` Christopher Lameter
2017-12-20  6:45         ` kemi
2017-12-19  6:39 ` [PATCH v2 3/5] mm: enlarge NUMA counters threshold size Kemi Wang
2017-12-19 12:40   ` Michal Hocko
2017-12-20  5:52     ` kemi
2017-12-20 10:12       ` Michal Hocko
2017-12-20 10:21         ` kemi
2017-12-21  8:06         ` kemi
2017-12-21  8:17           ` Michal Hocko
2017-12-21  8:23             ` kemi
2017-12-21  8:59               ` Michal Hocko
2017-12-21 10:31                 ` kemi
2017-12-22 12:31                   ` Michal Hocko
2017-12-21 17:10           ` Christopher Lameter
2017-12-22  2:06             ` kemi
2017-12-26 19:05               ` Christopher Lameter
2017-12-19  6:39 ` [PATCH v2 4/5] mm: use node_page_state_snapshot to avoid deviation Kemi Wang
2017-12-19 12:43   ` Michal Hocko
2017-12-20  6:07     ` kemi [this message]
2017-12-20 10:06       ` Michal Hocko
2017-12-20 10:24         ` kemi
2017-12-20 15:58           ` Christopher Lameter
2017-12-21  1:39             ` kemi
2017-12-19  6:39 ` [PATCH v2 5/5] mm: Rename zone_statistics() to numa_statistics() Kemi Wang
2017-12-19 12:44   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=94187fd5-ad70-eba7-2724-0fe5bed750d6@intel.com \
    --to=kemi.wang@intel.com \
    --cc=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi.kleen@intel.com \
    --cc=aryabinin@virtuozzo.com \
    --cc=aubrey.li@intel.com \
    --cc=bigeasy@linutronix.de \
    --cc=brouer@redhat.com \
    --cc=cl@linux.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=nborisov@suse.com \
    --cc=pasha.tatashin@oracle.com \
    --cc=rientjes@google.com \
    --cc=tim.c.chen@intel.com \
    --cc=vbabka@suse.cz \
    --cc=yasu.isimatu@gmail.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).