Linux-Fsdevel Archive on
 help / color / Atom feed
From: 王贇 <>
To: Mel Gorman <>
Cc: Peter Zijlstra <>,
	Ingo Molnar <>,
	Juri Lelli <>,
	Vincent Guittot <>,
	Dietmar Eggemann <>,
	Steven Rostedt <>,
	Ben Segall <>,
	Luis Chamberlain <>,
	Kees Cook <>,
	Iurii Zaikin <>,
	Michal Koutn? <>,,,,
	"Paul E. McKenney" <>,
	Randy Dunlap <>,
	Jonathan Corbet <>
Subject: Re: [PATCH RESEND v8 1/2] sched/numa: introduce per-cgroup NUMA locality info
Date: Mon, 24 Feb 2020 11:05:49 +0800
Message-ID: <> (raw)
In-Reply-To: <>

On 2020/2/21 下午10:20, Mel Gorman wrote:
>>> Which is a very interesting corner case in itself but also one that
>>> could have potentially have been inferred from monitoring /proc/vmstat
>>> numa_pte_updates or on a per-task basis by monitoring /proc/PID/sched and
>>> watching numa_scan_seq and total_numa_faults. Accumulating the information
>>> on a per-cgroup basis would require a bit more legwork.
>> That's not working for daily monitoring...
> Indeed although at least /proc/vmstat is cheap to monitor and it could
> at least be tracked if the number of NUMA faults are abnormally low or
> the ratio of remote to local hints are problematic.
>> Besides, compared with locality, this require much more deeper understand
>> on the implementation, which could even be tough for NUMA developers to
>> assemble all these statistics together.
> My point is that even with the patch, the definition of locality is
> subtle. At a single point in time, the locality might appear to be low
> but it's due to an event that happened far in the past.

Agree, the locality's meaning just keep changing... only those who
understand the implementation can figure out the useful information.

>>>> Maybe not a good example, but we just try to highlight that NUMA Balancing
>>>> could have issue in some cases, and we want them to be exposed, somehow,
>>>> maybe by the locality.
>>> Again, I'm somewhat neutral on the patch simply because I would not use
>>> the information for debugging problems with NUMA balancing. I would try
>>> using tracepoints and if the tracepoints were not good enough, I'd add or
>>> fix them -- similar to what I had to do with sched_stick_numa recently.
>>> The caveat is that I mostly look at this sort of problem as a developer.
>>> Sysadmins have very different requirements, especially simplicity even
>>> if the simplicity in this case is an illusion.
>> Fair enough, but I guess PeterZ still want your Ack, so neutral means
>> refuse in this case :-(
> I think the patch is functionally harmless and can be disabled but I also
> would be wary of dealing with a bug report that was based on the numbers
> provided by the locality metric. The bulk of the work related to the bug
> would likely be spent on trying to explain the metric and I've dealt with
> quite a few bugs that were essentially "We don't like this number and think
> something is wrong because of it -- fix it". Even then, I would want the
> workload isolated and then vmstat recorded over time to determine it's
> a persistent problem or not. That's the reason why I'm relucant to ack it.
> I fully acknowledge that this may have value for sysadmins and may be a
> good enough reason to merge it for environments that typically build and
> configure their own kernels. I doubt that general distributions would
> enable it but that's a guess.

Thanks for the kindly explain, I get the point.

False alarm maybe fine to admin, but could be nightmare if the user keep
asking why, I suppose those who want to do some improvement on NUMA may be
interested :-P

Anyway, I understand there is a gap between general requirement and this
locality idea, and it's really hard to be fulfill...

>> BTW, how do you think about the documentation in second patch?
> I think the documentation is great, it's clear and explains itself well.
>> Do you think it's necessary to have a doc to explain NUMA related statistics?
> It would be nice but AFAIK, the stats in vmstats are not documented.
> They are there because recording them over time can be very useful when
> dealing with user bug reports.

Another TODO then :-)

Michael Wang


  parent reply index

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-07  3:34 [PATCH RESEND v8 0/2] sched/numa: introduce numa locality 王贇
2020-02-07  3:35 ` [PATCH RESEND v8 1/2] sched/numa: introduce per-cgroup NUMA locality info 王贇
2020-02-07  3:37   ` 王贇
2020-02-13  2:35     ` 王贇
2020-02-14 15:10   ` Peter Zijlstra
2020-02-17 11:58     ` Mel Gorman
2020-02-17 13:23       ` 王贇
2020-02-17 14:16         ` Mel Gorman
2020-02-18  1:39           ` 王贇
2020-02-21 14:20             ` Mel Gorman
2020-02-21 15:47               ` Peter Zijlstra
2020-02-24  3:13                 ` 王贇
2020-02-24  3:05               ` 王贇 [this message]
2020-02-21 15:28         ` Peter Zijlstra
2020-02-24  3:09           ` 王贇
2020-02-07  3:35 ` [PATCH RESEND v8 2/2] sched/numa: documentation for per-cgroup numa 王贇

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on

Archives are clonable:
	git clone --mirror linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ \
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone