Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: 王贇 <yun.wang@linux.alibaba.com>
To: "Ingo Molnar" <mingo@redhat.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Juri Lelli" <juri.lelli@redhat.com>,
	"Vincent Guittot" <vincent.guittot@linaro.org>,
	"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Ben Segall" <bsegall@google.com>, "Mel Gorman" <mgorman@suse.de>,
	"Luis Chamberlain" <mcgrof@kernel.org>,
	"Kees Cook" <keescook@chromium.org>,
	"Iurii Zaikin" <yzaikin@google.com>,
	"Michal Koutný" <mkoutny@suse.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org,
	"Paul E. McKenney" <paulmck@linux.ibm.com>,
	"Randy Dunlap" <rdunlap@infradead.org>,
	"Jonathan Corbet" <corbet@lwn.net>
Subject: Re: [PATCH v6 0/2] sched/numa: introduce numa locality
Date: Fri, 17 Jan 2020 10:19:17 +0800
Message-ID: <8edb83a2-9943-2954-0da6-f4d29e3df109@linux.alibaba.com> (raw)
In-Reply-To: <d2c4cace-623a-9317-c957-807e3875aa4a@linux.alibaba.com>

Dear folks,

During our testing, we found in some cases the NUMA Balancing
is not helping improving locality, that is the memory writing
inside a virtual machine.

The VM is created by docker kata-runtime, inside guest the
container executed several tasks to malloc memory and keep
writing in page size, then report the time cost after finished
1G writing.

The result is not as good as runc, and we found the locality
is not growing in kata cases, with some debugging we located
the reason.

Those vcpu threads created by VM is rarely exit into userspace
in this case, they just stay in kernel after calling ioctl(KVM_RUN),
while NUMA Balancing work is done with task_work_run(), which
is handled together with signal handling before exit to usermode.

So the situation is, for these vcpu threads, NUMA Balancing work
was queued with task_work_add(), but never got chance to finish.

Now the question is, is this by designed or not?

BTW, we also passed the NUMA topology into VM, but still the result
is not as good as runc, seems like the effect of NUMA Balancing on
host is far more better than inside guest.

Regards,
Michael Wang


On 2019/12/13 上午9:43, 王贇 wrote:
> Since v5:
>   * fix compile failure when NUMA disabled
> Since v4:
>   * improved documentation
> Since v3:
>   * fix comments and improved documentation
> Since v2:
>   * simplified the locality concept & implementation
> Since v1:
>   * improved documentation
> 
> Modern production environment could use hundreds of cgroup to control
> the resources for different workloads, along with the complicated
> resource binding.
> 
> On NUMA platforms where we have multiple nodes, things become even more
> complicated, we hope there are more local memory access to improve the
> performance, and NUMA Balancing keep working hard to achieve that,
> however, wrong memory policy or node binding could easily waste the
> effort, result a lot of remote page accessing.
> 
> We need to notice such problems, then we got chance to fix it before
> there are too much damages, however, there are no good monitoring
> approach yet to help catch the mouse who introduced the remote access.
> 
> This patch set is trying to fill in the missing pieces, by introduce
> the per-cgroup NUMA locality info, with this new statistics, we could
> achieve the daily monitoring on NUMA efficiency, to give warning when
> things going too wrong.
> 
> Please check the second patch for more details.
> 
> Michael Wang (2):
>   sched/numa: introduce per-cgroup NUMA locality info
>   sched/numa: documentation for per-cgroup numa statistics
> 
>  Documentation/admin-guide/cg-numa-stat.rst      | 178 ++++++++++++++++++++++++
>  Documentation/admin-guide/index.rst             |   1 +
>  Documentation/admin-guide/kernel-parameters.txt |   4 +
>  Documentation/admin-guide/sysctl/kernel.rst     |   9 ++
>  include/linux/sched.h                           |  15 ++
>  include/linux/sched/sysctl.h                    |   6 +
>  init/Kconfig                                    |  11 ++
>  kernel/sched/core.c                             |  75 ++++++++++
>  kernel/sched/fair.c                             |  62 +++++++++
>  kernel/sched/sched.h                            |  12 ++
>  kernel/sysctl.c                                 |  11 ++
>  11 files changed, 384 insertions(+)
>  create mode 100644 Documentation/admin-guide/cg-numa-stat.rst
> 

  parent reply index

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-13  3:43 [PATCH 0/3] sched/numa: introduce advanced numa statistic 王贇
2019-11-13  3:44 ` [PATCH 1/3] sched/numa: advanced per-cgroup " 王贇
2019-11-13  3:45 ` [PATCH 2/3] sched/numa: expose per-task pages-migration-failure 王贇
2019-11-13  3:45 ` [PATCH 3/3] sched/numa: documentation for per-cgroup numa stat 王贇
2019-11-13 15:09   ` Jonathan Corbet
2019-11-14  1:52     ` 王贇
2019-11-13 18:28   ` Iurii Zaikin
2019-11-14  2:22     ` 王贇
2019-11-15  2:29   ` [PATCH v2 " 王贇
2019-11-20  9:45 ` [PATCH 0/3] sched/numa: introduce advanced numa statistic 王贇
2019-11-25  1:35 ` 王贇
2019-11-27  1:48 ` [PATCH v2 " 王贇
2019-11-27  1:49   ` [PATCH v2 1/3] sched/numa: advanced per-cgroup " 王贇
2019-11-27 10:19     ` Mel Gorman
2019-11-28  2:09       ` 王贇
2019-11-28 12:39         ` Michal Koutný
2019-11-28 13:41           ` 王贇
2019-11-28 15:58             ` Michal Koutný
2019-11-29  1:52               ` 王贇
2019-11-29  5:19                 ` 王贇
2019-11-29 10:06                   ` Michal Koutný
2019-12-02  2:11                     ` 王贇
2019-11-27  1:50   ` [PATCH v2 2/3] sched/numa: expose per-task pages-migration-failure 王贇
2019-11-27 10:00     ` Mel Gorman
2019-12-02  2:22     ` 王贇
2019-11-27  1:50   ` [PATCH v2 3/3] sched/numa: documentation for per-cgroup numa stat 王贇
2019-11-27  4:58     ` Randy Dunlap
2019-11-27  5:54       ` 王贇
2019-12-03  5:59   ` [PATCH v3 0/2] sched/numa: introduce numa locality 王贇
2019-12-03  6:00     ` [PATCH v3 1/2] sched/numa: introduce per-cgroup NUMA locality info 王贇
2019-12-04  2:33       ` Randy Dunlap
2019-12-04  2:38         ` 王贇
2019-12-03  6:02     ` [PATCH v3 2/2] sched/numa: documentation for per-cgroup numa statistics 王贇
2019-12-03 13:43       ` Jonathan Corbet
2019-12-04  2:27         ` 王贇
2019-12-04  7:58     ` [PATCH v4 0/2] sched/numa: introduce numa locality 王贇
2019-12-04  7:59       ` [PATCH v4 1/2] sched/numa: introduce per-cgroup NUMA locality info 王贇
2019-12-05  3:28         ` Randy Dunlap
2019-12-05  3:29           ` Randy Dunlap
2019-12-05  3:52             ` 王贇
2019-12-04  8:00       ` [PATCH v4 2/2] sched/numa: documentation for per-cgroup numa statistics 王贇
2019-12-05  3:40         ` Randy Dunlap
2019-12-05  6:53       ` [PATCH v5 0/2] sched/numa: introduce numa locality 王贇
2019-12-05  6:53         ` [PATCH v5 1/2] sched/numa: introduce per-cgroup NUMA locality info 王贇
2019-12-05  6:54         ` [PATCH v5 2/2] sched/numa: documentation for per-cgroup numa, statistics 王贇
2019-12-10  2:19         ` [PATCH v5 0/2] sched/numa: introduce numa locality 王贇
2019-12-13  1:43         ` [PATCH v6 " 王贇
2019-12-13  1:47           ` [PATCH v6 1/2] sched/numa: introduce per-cgroup NUMA locality info 王贇
2020-01-03 15:14             ` Michal Koutný
2020-01-04  4:51               ` 王贇
2019-12-13  1:48           ` [PATCH v6 2/2] sched/numa: documentation for per-cgroup numa 王贇
2019-12-27  2:22           ` [PATCH v6 0/2] sched/numa: introduce numa locality 王贇
2020-01-17  2:19           ` 王贇 [this message]
2020-01-19  6:08           ` [PATCH v7 " 王贇
2020-01-19  6:09             ` [PATCH v7 1/2] sched/numa: introduce per-cgroup NUMA locality info 王贇
2020-01-19  6:09             ` [PATCH v7 2/2] sched/numa: documentation for per-cgroup numa, statistics 王贇
2020-01-21  0:12               ` Randy Dunlap
2020-01-21  1:58                 ` 王贇
2020-01-21  1:56             ` [PATCH v8 0/2] sched/numa: introduce numa locality 王贇
2020-01-21  1:57               ` [PATCH v8 1/2] sched/numa: introduce per-cgroup NUMA locality info 王贇
2020-01-21  1:57               ` [PATCH v8 2/2] sched/numa: documentation for per-cgroup numa, statistics 王贇
2020-01-21  2:08                 ` Randy Dunlap
2020-02-07  1:10               ` [PATCH v8 0/2] sched/numa: introduce numa locality 王贇
2020-02-07  1:25                 ` Steven Rostedt
2020-02-07  2:31                   ` 王贇
2020-02-07  2:37             ` [PATCH RESEND " 王贇

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8edb83a2-9943-2954-0da6-f4d29e3df109@linux.alibaba.com \
    --to=yun.wang@linux.alibaba.com \
    --cc=bsegall@google.com \
    --cc=corbet@lwn.net \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=mkoutny@suse.com \
    --cc=paulmck@linux.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git