From: 王贇 <yun.wang@linux.alibaba.com>
To: Peter Zijlstra <peterz@infradead.org>,
hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com,
Ingo Molnar <mingo@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
mcgrof@kernel.org, keescook@chromium.org,
linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org,
"Michal Koutný" <mkoutny@suse.com>,
"Hillf Danton" <hdanton@sina.com>
Subject: Re: [PATCH v2 0/4] per-cgroup numa suite
Date: Thu, 25 Jul 2019 10:33:10 +0800 [thread overview]
Message-ID: <2203b828-1458-5fec-f4f6-353f51091e2a@linux.alibaba.com> (raw)
In-Reply-To: <65c1987f-bcce-2165-8c30-cf8cf3454591@linux.alibaba.com>
Hi, Peter
Now we have all these stuff in cpu cgroup, with the new statistic
folks should be able to estimate their per-cgroup workloads on
numa platform, and numa group + cling would help to address the
issue when their workloads can't be settled on one node.
How do you think about this version :-)
Regards,
Michael Wang
On 2019/7/16 上午11:38, 王贇 wrote:
> During our torturing on numa stuff, we found problems like:
>
> * missing per-cgroup information about the per-node execution status
> * missing per-cgroup information about the numa locality
>
> That is when we have a cpu cgroup running with bunch of tasks, no good
> way to tell how it's tasks are dealing with numa.
>
> The first two patches are trying to complete the missing pieces, but
> more problems appeared after monitoring these status:
>
> * tasks not always running on the preferred numa node
> * tasks from same cgroup running on different nodes
>
> The task numa group handler will always check if tasks are sharing pages
> and try to pack them into a single numa group, so they will have chance to
> settle down on the same node, but this failed in some cases:
>
> * workloads share page caches rather than share mappings
> * workloads got too many wakeup across nodes
>
> Since page caches are not traced by numa balancing, there are no way to
> realize such kind of relationship, and when there are too many wakeup,
> task will be drag from the preferred node and then migrate back by numa
> balancing, repeatedly.
>
> Here the third patch try to address the first issue, we could now give hint
> to kernel about the relationship of tasks, and pack them into single numa
> group.
>
> And the forth patch introduced numa cling, which try to address the wakup
> issue, now we try to make task stay on the preferred node on wakeup in fast
> path, in order to address the unbalancing risk, we monitoring the numa
> migration failure ratio, and pause numa cling when it reach the specified
> degree.
>
> Since v1:
> * move statistics from memory cgroup into cpu group
> * statistics now accounting in hierarchical way
> * locality now accounted into 8 regions equally
> * numa cling no longer override select_idle_sibling, instead we
> prevent numa swap migration with tasks cling to dst-node, also
> prevent wake affine to drag tasks away which already cling to
> prev-cpu
> * other refine on comments and names
>
> Michael Wang (4):
> v2 numa: introduce per-cgroup numa balancing locality statistic
> v2 numa: append per-node execution time in cpu.numa_stat
> v2 numa: introduce numa group per task group
> v4 numa: introduce numa cling feature
>
> include/linux/sched.h | 8 +-
> include/linux/sched/sysctl.h | 3 +
> kernel/sched/core.c | 85 ++++++++
> kernel/sched/debug.c | 7 +
> kernel/sched/fair.c | 510 ++++++++++++++++++++++++++++++++++++++++++-
> kernel/sched/sched.h | 41 ++++
> kernel/sysctl.c | 9 +
> 7 files changed, 651 insertions(+), 12 deletions(-)
>
next prev parent reply other threads:[~2019-07-25 2:33 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-22 2:10 [RFC PATCH 0/5] NUMA Balancer Suite 王贇
2019-04-22 2:11 ` [RFC PATCH 1/5] numa: introduce per-cgroup numa balancing locality, statistic 王贇
2019-04-23 8:44 ` Peter Zijlstra
2019-04-23 9:14 ` 王贇
2019-04-23 8:46 ` Peter Zijlstra
2019-04-23 9:32 ` 王贇
2019-04-23 8:47 ` Peter Zijlstra
2019-04-23 9:33 ` 王贇
2019-04-23 9:46 ` Peter Zijlstra
2019-04-22 2:12 ` [RFC PATCH 2/5] numa: append per-node execution info in memory.numa_stat 王贇
2019-04-23 8:52 ` Peter Zijlstra
2019-04-23 9:36 ` 王贇
2019-04-23 9:46 ` Peter Zijlstra
2019-04-23 10:01 ` 王贇
2019-04-22 2:13 ` [RFC PATCH 3/5] numa: introduce per-cgroup preferred numa node 王贇
2019-04-23 8:55 ` Peter Zijlstra
2019-04-23 9:41 ` 王贇
2019-04-22 2:14 ` [RFC PATCH 4/5] numa: introduce numa balancer infrastructure 王贇
2019-04-22 2:21 ` [RFC PATCH 5/5] numa: numa balancer 王贇
2019-04-23 9:05 ` Peter Zijlstra
2019-04-23 9:59 ` 王贇
[not found] ` <CAHCio2gEw4xyuoiurvwzvEiU8eLas+5ZLhzmqm1V2CJqvt+cyA@mail.gmail.com>
2019-04-23 2:14 ` [RFC PATCH 0/5] NUMA Balancer Suite 王贇
2019-07-03 3:26 ` [PATCH 0/4] per cpu cgroup numa suite 王贇
2019-07-03 3:28 ` [PATCH 1/4] numa: introduce per-cgroup numa balancing locality, statistic 王贇
2019-07-11 13:43 ` Peter Zijlstra
2019-07-12 3:15 ` 王贇
2019-07-11 13:47 ` Peter Zijlstra
2019-07-12 3:43 ` 王贇
2019-07-12 7:58 ` Peter Zijlstra
2019-07-12 9:11 ` 王贇
2019-07-12 9:42 ` Peter Zijlstra
2019-07-12 10:10 ` 王贇
2019-07-15 2:09 ` 王贇
2019-07-15 12:10 ` Michal Koutný
2019-07-16 2:41 ` 王贇
2019-07-19 16:47 ` Michal Koutný
2019-07-03 3:29 ` [PATCH 2/4] numa: append per-node execution info in memory.numa_stat 王贇
2019-07-11 13:45 ` Peter Zijlstra
2019-07-12 3:17 ` 王贇
2019-07-03 3:32 ` [PATCH 3/4] numa: introduce numa group per task group 王贇
2019-07-11 14:10 ` Peter Zijlstra
2019-07-12 4:03 ` 王贇
2019-07-03 3:34 ` [PATCH 4/4] numa: introduce numa cling feature 王贇
2019-07-08 2:25 ` [PATCH v2 " 王贇
2019-07-09 2:15 ` 王贇
2019-07-09 2:24 ` [PATCH v3 " 王贇
2019-07-11 14:27 ` [PATCH " Peter Zijlstra
2019-07-12 3:10 ` 王贇
2019-07-12 7:53 ` Peter Zijlstra
2019-07-12 8:58 ` 王贇
2019-07-22 3:44 ` 王贇
2019-07-11 9:00 ` [PATCH 0/4] per cgroup numa suite 王贇
2019-07-16 3:38 ` [PATCH v2 0/4] per-cgroup " 王贇
2019-07-16 3:39 ` [PATCH v2 1/4] numa: introduce per-cgroup numa balancing locality statistic 王贇
2019-07-16 3:40 ` [PATCH v2 2/4] numa: append per-node execution time in cpu.numa_stat 王贇
2019-07-19 16:39 ` Michal Koutný
2019-07-22 2:36 ` 王贇
2019-07-16 3:41 ` [PATCH v2 3/4] numa: introduce numa group per task group 王贇
2019-07-16 3:41 ` [PATCH v4 4/4] numa: introduce numa cling feature 王贇
2019-07-22 2:37 ` [PATCH v5 " 王贇
2019-07-25 2:33 ` 王贇 [this message]
2019-08-06 1:33 ` [PATCH v2 0/4] per-cgroup numa suite 王贇
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2203b828-1458-5fec-f4f6-353f51091e2a@linux.alibaba.com \
--to=yun.wang@linux.alibaba.com \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=hdanton@sina.com \
--cc=keescook@chromium.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mcgrof@kernel.org \
--cc=mhocko@kernel.org \
--cc=mingo@redhat.com \
--cc=mkoutny@suse.com \
--cc=peterz@infradead.org \
--cc=vdavydov.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).