From: hc128168@gmail.com (Henry C)
To: kernelnewbies@lists.kernelnewbies.org
Subject: Understand numa_faults_memory
Date: Thu, 18 Oct 2018 11:39:59 +0800 [thread overview]
Message-ID: <CAGPGUd__XyXuTnQJ+xzADXEu3y2g=hamnn6SDiJb=kcsm8gUHQ@mail.gmail.com> (raw)
Hi,
I am on CentOS 7 with kernel package 3.10.0-862.2.3.el7.x86_64.
I was looking at the scheduling info of my application by cat
/proc/[pid]/sched:
-------------------------------------------------------------------
se.exec_start : 78998944.120048
se.vruntime : 78337609.962134
se.sum_exec_runtime : 78337613.040860
se.nr_migrations : 6
nr_switches : 41
nr_voluntary_switches : 31
nr_involuntary_switches : 10
se.load.weight : 1024
policy : 0
prio : 120
clock-delta : 13
mm->numa_scan_seq : 925
numa_migrations, 0
numa_faults_memory, 0, 0, 0, 0, 1
numa_faults_memory, 1, 0, 0, 0, 1
numa_faults_memory, 0, 1, 1, 1, 0
numa_faults_memory, 1, 1, 0, 1, 9
And I am trying to understand the numbers after numa_faults_memory.
I dived into the source code:
https://elixir.bootlin.com/linux/v3.18.75/source/kernel/sched/debug.c
I believe line 539 prints the last 4 lines.
Given my machine has 2 NUMA nodes, and the outer loop will loop through the
NUMA nodes and the inner loop is hardcoded to loop twice, the # of lines in
the output match.
So my questions are:
1. What the inner loop is trying to do? Why looping twice?
2. What is the last number in numa_faults_memory (e.g. 9 for last line)?
3. When will this counter be reset? According to the comment at
https://elixir.bootlin.com/linux/v3.18.75/source/include/linux/sched.h#L1577,
"Exponential decaying average of faults on a per-node basis. Scheduling
placement decisions are made based on the these counts. The values remain
static for the duration of a PTE scan", it sounds like numa_faults_memory
is reset or recompute for each PTE scan. What's PTE scan? What does the
"scheduling" refer to? Like scheduling the migration of a chunk of memory
from one NUMA node to another due to NUMA balancing? If yes, does it mean
if I turn off NUMA balancing("echo 0 > /proc/sys/kernel/numa_balancing"),
PTE scan will stop and numa_faults_memory will remain 0 all the time
(assuming there is sufficient memory all the time)?
4. And for my machine task_struct::numa_faults_memory[] would have 4
elements. Then, I am lost what it is tracking. As I thought it is
tracking the number of numa faults per NUMA node. And the number of
elements is always # of NUMA node * 2.
Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20181018/ab9edbcf/attachment.html>
reply other threads:[~2018-10-18 3:39 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGPGUd__XyXuTnQJ+xzADXEu3y2g=hamnn6SDiJb=kcsm8gUHQ@mail.gmail.com' \
--to=hc128168@gmail.com \
--cc=kernelnewbies@lists.kernelnewbies.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).