Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: ?????? <yun.wang@linux.alibaba.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Iurii Zaikin <yzaikin@google.com>,
	Michal Koutn? <mkoutny@suse.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org,
	"Paul E. McKenney" <paulmck@linux.ibm.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	Jonathan Corbet <corbet@lwn.net>
Subject: Re: [PATCH RESEND v8 1/2] sched/numa: introduce per-cgroup NUMA locality info
Date: Mon, 17 Feb 2020 14:16:16 +0000
Message-ID: <20200217141616.GB3420@suse.de> (raw)
In-Reply-To: <881deb50-163e-442a-41ec-b375cc445e4d@linux.alibaba.com>

On Mon, Feb 17, 2020 at 09:23:52PM +0800, ?????? wrote:
> 
> 
> On 2020/2/17 ??????7:58, Mel Gorman wrote:
> [snip]
> >> Mel, I suspect you still feel that way, right?
> >>
> > 
> > Yes, I still think it would be a struggle to interpret the data
> > meaningfully without very specific knowledge of the implementation. If
> > the scan rate was constant, it would be easier but that would make NUMA
> > balancing worse overall. Similarly, the stat might get very difficult to
> > interpret when NUMA balancing is failing because of a load imbalance,
> > pages are shared and being interleaved or NUMA groups span multiple
> > active nodes.
> 
> Hi, Mel, appreciated to have you back on the table :-)
> 
> IMHO the scan period changing should not be a problem now, since the
> maximum period is defined by user, so monitoring at maximum period
> on the accumulated page accessing counters is always meaningful, correct?
> 

It has meaning but the scan rate drives the fault rate which is the basis
for the stats you accumulate. If the scan rate is high when accesses
are local, the stats can be skewed making it appear the task is much
more local than it may really is at a later point in time. The scan rate
affects the accuracy of the information. The counters have meaning but
they needs careful interpretation.

> FYI, by monitoring locality, we found that the kvm vcpu thread is not
> covered by NUMA Balancing, whatever how many maximum period passed, the
> counters are not increasing, or very slowly, although inside guest we are
> copying memory.
> 
> Later we found such task rarely exit to user space to trigger task
> work callbacks, and NUMA Balancing scan depends on that, which help us
> realize the importance to enable NUMA Balancing inside guest, with the
> correct NUMA topo, a big performance risk I'll say :-P
> 

Which is a very interesting corner case in itself but also one that
could have potentially have been inferred from monitoring /proc/vmstat
numa_pte_updates or on a per-task basis by monitoring /proc/PID/sched and
watching numa_scan_seq and total_numa_faults. Accumulating the information
on a per-cgroup basis would require a bit more legwork.

> Maybe not a good example, but we just try to highlight that NUMA Balancing
> could have issue in some cases, and we want them to be exposed, somehow,
> maybe by the locality.
> 

Again, I'm somewhat neutral on the patch simply because I would not use
the information for debugging problems with NUMA balancing. I would try
using tracepoints and if the tracepoints were not good enough, I'd add or
fix them -- similar to what I had to do with sched_stick_numa recently.
The caveat is that I mostly look at this sort of problem as a developer.
Sysadmins have very different requirements, especially simplicity even
if the simplicity in this case is an illusion.

-- 
Mel Gorman
SUSE Labs

  reply index

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-07  3:34 [PATCH RESEND v8 0/2] sched/numa: introduce numa locality 王贇
2020-02-07  3:35 ` [PATCH RESEND v8 1/2] sched/numa: introduce per-cgroup NUMA locality info 王贇
2020-02-07  3:37   ` 王贇
2020-02-13  2:35     ` 王贇
2020-02-14 15:10   ` Peter Zijlstra
2020-02-17 11:58     ` Mel Gorman
2020-02-17 13:23       ` 王贇
2020-02-17 14:16         ` Mel Gorman [this message]
2020-02-18  1:39           ` 王贇
2020-02-21 14:20             ` Mel Gorman
2020-02-21 15:47               ` Peter Zijlstra
2020-02-24  3:13                 ` 王贇
2020-02-24  3:05               ` 王贇
2020-02-21 15:28         ` Peter Zijlstra
2020-02-24  3:09           ` 王贇
2020-02-07  3:35 ` [PATCH RESEND v8 2/2] sched/numa: documentation for per-cgroup numa 王贇

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200217141616.GB3420@suse.de \
    --to=mgorman@suse.de \
    --cc=bsegall@google.com \
    --cc=corbet@lwn.net \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mkoutny@suse.com \
    --cc=paulmck@linux.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=yun.wang@linux.alibaba.com \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git