linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: 王贇 <yun.wang@linux.alibaba.com>
To: Iurii Zaikin <yzaikin@google.com>
Cc: "Ingo Molnar" <mingo@redhat.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Juri Lelli" <juri.lelli@redhat.com>,
	"Vincent Guittot" <vincent.guittot@linaro.org>,
	"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Ben Segall" <bsegall@google.com>, "Mel Gorman" <mgorman@suse.de>,
	"Luis Chamberlain" <mcgrof@kernel.org>,
	"Kees Cook" <keescook@chromium.org>,
	"Michal Koutný" <mkoutny@suse.com>,
	linux-fsdevel@vger.kernel.org,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	"Paul E. McKenney" <paulmck@linux.ibm.com>
Subject: Re: [PATCH 3/3] sched/numa: documentation for per-cgroup numa stat
Date: Thu, 14 Nov 2019 10:22:44 +0800	[thread overview]
Message-ID: <b1106697-da56-ad5d-82c9-1461df0f2e35@linux.alibaba.com> (raw)
In-Reply-To: <CAAXuY3qsckZurUHy5kJUQcmrbn-bmGHnjtPPus5=PrQ+MmJX+g@mail.gmail.com>

Hi, Iurii

On 2019/11/14 上午2:28, Iurii Zaikin wrote:
> Since the documentation talks about fairly advanced concepts, every little bit
> of readability improvement helps. I tried to make suggestions that I feel make
> it easier to read, hopefully my nitpicking is not too annoying.

Any comments are welcomed :-)

> On Tue, Nov 12, 2019 at 7:46 PM 王贇 <yun.wang@linux.alibaba.com> wrote:
>> +On NUMA platforms, remote memory accessing always has a performance penalty,
>> +although we have NUMA balancing working hard to maximum the local accessing
>> +proportion, there are still situations it can't helps.
> Nit: working hard to maximize the access locality...
> can't helps -> can't help>> +
>> +This could happen in modern production environment, using bunch of cgroups
>> +to classify and control resources which introduced complex configuration on
>> +memory policy, CPUs and NUMA node, NUMA balancing could facing the wrong
>> +memory policy or exhausted local NUMA node, lead into the low local page
>> +accessing proportion.
> I find the below a bit easier to read.
> This could happen in modern production environment. When a large
> number of cgroups
> are used to classify and control resources, this creates a complex
> memory policy configuration
> for CPUs and NUMA nodes. In such cases NUMA balancing could end up
> with the wrong
> memory policy or exhausted local NUMA node, which would lead to low
> percentage of local page
> accesses.

Sounds better, just for the configuration part, since memory policy, CPUs
and NUMA nodes are configured by different approach, maybe we should still
separate them like:

This could happen in modern production environment. When a large
number of cgroups are used to classify and control resources, this
creates a complex configuration for memory policy, CPUs and NUMA nodes.
In such cases NUMA balancing could end up with the wrong memory policy
or exhausted local NUMA node, which would lead to low percentage of local
page accesses.

> 
>> +We need to perceive such cases, figure out which workloads from which cgroup
>> +has introduced the issues, then we got chance to do adjustment to avoid
>> +performance damages.
> Nit: perceive -> detect, got-> get, damages-> degradation
> 
>> +However, there are no hardware counter for per-task local/remote accessing
>> +info, we don't know how many remote page accessing has been done for a
>> +particular task.
> Nit: counters.
> Nit: we don't know how many remote page accesses have occurred for a
> 
>> +
>> +Statistics
>> +----------
>> +
>> +Fortunately, we have NUMA Balancing which scan task's mapping and trigger PF
>> +periodically, give us the opportunity to record per-task page accessing info.
> Nit: scans, triggers, gives.
> 
>> +By "echo 1 > /proc/sys/kernel/cg_numa_stat" on runtime or add boot parameter
> Nit: at runtime or adding boot parameter
>> +To be noticed, the accounting is in a hierarchy way, which means the numa
>> +statistics representing not only the workload of this group, but also the
>> +workloads of all it's descendants.
> Note that the accounting is hierarchical, which means the numa
> statistics for a given group represents not only the workload of this
> group, but also the
> workloads of all it's descendants.
>> +
>> +For example the 'cpu.numa_stat' show:
>> +  locality 39541 60962 36842 72519 118605 721778 946553
>> +  exectime 1220127 1458684
>> +
>> +The locality is sectioned into 7 regions, closely as:
>> +  0-13% 14-27% 28-42% 43-56% 57-71% 72-85% 86-100%
> Nit: closely -> approximately?
> 
>> +we can draw a line for region_bad_percent, when the line close to 0 things
> nit: we can plot?
>> +are good, when getting close to 100% something is wrong, we can pick a proper
>> +watermark to trigger warning message.
> 
>> +You may want to drop the data if the region_all is too small, which imply
> Nit: implies
>> +there are not much available pages for NUMA Balancing, just ignore would be
> Nit: not many... ingoring
>> +fine since most likely the workload is insensitive to NUMA.
>> +Monitoring root group help you control the overall situation, while you may
> Nit: helps
>> +also want to monitoring all the leaf groups which contain the workloads, this
> Nit: monitor
>> +help to catch the mouse.
> Nit: helps
>> +become too small, for NUMA node X we have:
> Nit: becomes
>> +try put your workload into a memory cgroup which providing per-node memory
> Nit: try to put
>> +These two percentage are usually matched on each node, workload should execute
> Nit: percentages
>> +Depends on which part of the memory accessed mostly by the workload, locality> Depending on which part of the memory is accessed.
> "mostly by the workload" - not sure what you mean here, the majority
> of accesses from the
> workload fall into this part of memory or that accesses from processes
> other than the workload
> are rare?

The prev one actually, sometime the workload only access part of it's
memory, could be a small part but as long as this part is local, things
could be fine.

>> +could still be good with just a little piece of memory locally.
> ?

whatabout:

workload may only access a small part of it's memory, in such cases, although
the majority of memory are remotely, locality could still be good.

>> +Thus to tell if things are find or not depends on the understanding of system
> are fine
>> +After locate which workloads introduced the bad locality, check:
> locate -> indentifying
>> +
>> +1). Is the workloads bind into a particular NUMA node?
> bind into -> bound to
>> +2). Is there any NUMA node run out of resources?
> Has any .. run out of resources
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 5e27d74e2b74..220df1f0beb8 100644
>> +                       lot's of per-cgroup workloads.
> lots

Thanks for point out all these issues, very helpful :-)

Should apply them in next version.

Regards,
Michael Wang

> 

  reply	other threads:[~2019-11-14  2:23 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-13  3:43 [PATCH 0/3] sched/numa: introduce advanced numa statistic 王贇
2019-11-13  3:44 ` [PATCH 1/3] sched/numa: advanced per-cgroup " 王贇
2019-11-13  3:45 ` [PATCH 2/3] sched/numa: expose per-task pages-migration-failure 王贇
2019-11-13  3:45 ` [PATCH 3/3] sched/numa: documentation for per-cgroup numa stat 王贇
2019-11-13 15:09   ` Jonathan Corbet
2019-11-14  1:52     ` 王贇
2019-11-13 18:28   ` Iurii Zaikin
2019-11-14  2:22     ` 王贇 [this message]
2019-11-15  2:29   ` [PATCH v2 " 王贇
2019-11-20  9:45 ` [PATCH 0/3] sched/numa: introduce advanced numa statistic 王贇
2019-11-25  1:35 ` 王贇
2019-11-27  1:48 ` [PATCH v2 " 王贇
2019-11-27  1:49   ` [PATCH v2 1/3] sched/numa: advanced per-cgroup " 王贇
2019-11-27 10:19     ` Mel Gorman
2019-11-28  2:09       ` 王贇
2019-11-28 12:39         ` Michal Koutný
2019-11-28 13:41           ` 王贇
2019-11-28 15:58             ` Michal Koutný
2019-11-29  1:52               ` 王贇
2019-11-29  5:19                 ` 王贇
2019-11-29 10:06                   ` Michal Koutný
2019-12-02  2:11                     ` 王贇
2019-11-27  1:50   ` [PATCH v2 2/3] sched/numa: expose per-task pages-migration-failure 王贇
2019-11-27 10:00     ` Mel Gorman
2019-12-02  2:22     ` 王贇
2019-11-27  1:50   ` [PATCH v2 3/3] sched/numa: documentation for per-cgroup numa stat 王贇
2019-11-27  4:58     ` Randy Dunlap
2019-11-27  5:54       ` 王贇
2019-12-03  5:59   ` [PATCH v3 0/2] sched/numa: introduce numa locality 王贇
2019-12-03  6:00     ` [PATCH v3 1/2] sched/numa: introduce per-cgroup NUMA locality info 王贇
2019-12-04  2:33       ` Randy Dunlap
2019-12-04  2:38         ` 王贇
2019-12-03  6:02     ` [PATCH v3 2/2] sched/numa: documentation for per-cgroup numa statistics 王贇
2019-12-03 13:43       ` Jonathan Corbet
2019-12-04  2:27         ` 王贇
2019-12-04  7:58     ` [PATCH v4 0/2] sched/numa: introduce numa locality 王贇
2019-12-04  7:59       ` [PATCH v4 1/2] sched/numa: introduce per-cgroup NUMA locality info 王贇
2019-12-05  3:28         ` Randy Dunlap
2019-12-05  3:29           ` Randy Dunlap
2019-12-05  3:52             ` 王贇
2019-12-04  8:00       ` [PATCH v4 2/2] sched/numa: documentation for per-cgroup numa statistics 王贇
2019-12-05  3:40         ` Randy Dunlap
2019-12-05  6:53       ` [PATCH v5 0/2] sched/numa: introduce numa locality 王贇
2019-12-05  6:53         ` [PATCH v5 1/2] sched/numa: introduce per-cgroup NUMA locality info 王贇
2019-12-05  6:54         ` [PATCH v5 2/2] sched/numa: documentation for per-cgroup numa, statistics 王贇
2019-12-10  2:19         ` [PATCH v5 0/2] sched/numa: introduce numa locality 王贇
2019-12-13  1:43         ` [PATCH v6 " 王贇
2019-12-13  1:47           ` [PATCH v6 1/2] sched/numa: introduce per-cgroup NUMA locality info 王贇
2020-01-03 15:14             ` Michal Koutný
2020-01-04  4:51               ` 王贇
2019-12-13  1:48           ` [PATCH v6 2/2] sched/numa: documentation for per-cgroup numa 王贇
2019-12-27  2:22           ` [PATCH v6 0/2] sched/numa: introduce numa locality 王贇
2020-01-17  2:19           ` 王贇
2020-01-19  6:08           ` [PATCH v7 " 王贇
2020-01-19  6:09             ` [PATCH v7 1/2] sched/numa: introduce per-cgroup NUMA locality info 王贇
2020-01-19  6:09             ` [PATCH v7 2/2] sched/numa: documentation for per-cgroup numa, statistics 王贇
2020-01-21  0:12               ` Randy Dunlap
2020-01-21  1:58                 ` 王贇
2020-01-21  1:56             ` [PATCH v8 0/2] sched/numa: introduce numa locality 王贇
2020-01-21  1:57               ` [PATCH v8 1/2] sched/numa: introduce per-cgroup NUMA locality info 王贇
2020-01-21  1:57               ` [PATCH v8 2/2] sched/numa: documentation for per-cgroup numa, statistics 王贇
2020-01-21  2:08                 ` Randy Dunlap
2020-02-07  1:10               ` [PATCH v8 0/2] sched/numa: introduce numa locality 王贇
2020-02-07  1:25                 ` Steven Rostedt
2020-02-07  2:31                   ` 王贇
2020-02-07  2:37             ` [PATCH RESEND " 王贇

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b1106697-da56-ad5d-82c9-1461df0f2e35@linux.alibaba.com \
    --to=yun.wang@linux.alibaba.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=mkoutny@suse.com \
    --cc=paulmck@linux.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).