All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5 v1] mm, oom: Introduce per numa node oom for CONSTRAINT_MEMORY_POLICY
@ 2022-05-12  4:46 Gang Li
  2022-05-12  4:46 ` [PATCH 1/5 v1] mm: add a new parameter `node` to `get/add/inc/dec_mm_counter` Gang Li
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Gang Li @ 2022-05-12  4:46 UTC (permalink / raw)
  To: akpm
  Cc: songmuchun, hca, gor, agordeev, borntraeger, svens, ebiederm,
	keescook, viro, rostedt, mingo, peterz, acme, mark.rutland,
	alexander.shishkin, jolsa, namhyung, david, imbrenda, apopple,
	adobriyan, stephen.s.brennan, ohoono.kwon, haolee.swjtu,
	kaleshsingh, zhengqi.arch, peterx, shy828301, surenb, ccross,
	vincent.whitchurch, tglx, bigeasy, fenghua.yu, linux-s390,
	linux-kernel, linux-mm, linux-fsdevel, linux-perf-users, Gang Li

TLDR:
If a mempolicy is in effect(oc->constraint == CONSTRAINT_MEMORY_POLICY), out_of_memory() will
select victim on specific node to kill. So that kernel can avoid accidental killing on NUMA system.

Problem:
Before this patch series, oom will only kill the process with the highest memory usage.
by selecting process with the highest oom_badness on the entire system to kill.

This works fine on UMA system, but may have some accidental killing on NUMA system.

As shown below, if process c.out is bind to Node1 and keep allocating pages from Node1,
a.out will be killed first. But killing a.out did't free any mem on Node1, so c.out
will be killed then.

A lot of our AMD machines have 8 numa nodes. In these systems, there is a greater chance
of triggering this problem.

OOM before patches:
```
Per-node process memory usage (in MBs)
PID             Node 0        Node 1      Total
----------- ---------- ------------- ----------
3095 a.out     3073.34          0.11    3073.45(Killed first. Maximum memory consumption)
3199 b.out      501.35       1500.00    2001.35
3805 c.out        1.52 (grow)2248.00    2249.52(Killed then. Node1 is full)
----------- ---------- ------------- ----------
Total          3576.21       3748.11    7324.31
```

Solution:
We store per node rss in mm_rss_stat for each process.

If a page allocation with mempolicy in effect(oc->constraint == CONSTRAINT_MEMORY_POLICY)
triger oom. We will calculate oom_badness with rss counter for the corresponding node. Then
select the process with the highest oom_badness on the corresponding node to kill.

OOM after patches:
```
Per-node process memory usage (in MBs)
PID             Node 0        Node 1     Total
----------- ---------- ------------- ----------
3095 a.out     3073.34          0.11    3073.45
3199 b.out      501.35       1500.00    2001.35
3805 c.out        1.52 (grow)2248.00    2249.52(killed)
----------- ---------- ------------- ----------
Total          3576.21       3748.11    7324.31
```

Gang Li (5):
  mm: add a new parameter `node` to `get/add/inc/dec_mm_counter`
  mm: add numa_count field for rss_stat
  mm: add numa fields for tracepoint rss_stat
  mm: enable per numa node rss_stat count
  mm, oom: enable per numa node oom for CONSTRAINT_MEMORY_POLICY

 arch/s390/mm/pgtable.c        |   4 +-
 fs/exec.c                     |   2 +-
 fs/proc/base.c                |   6 +-
 fs/proc/task_mmu.c            |  14 ++--
 include/linux/mm.h            |  59 ++++++++++++-----
 include/linux/mm_types_task.h |  16 +++++
 include/linux/oom.h           |   2 +-
 include/trace/events/kmem.h   |  27 ++++++--
 kernel/events/uprobes.c       |   6 +-
 kernel/fork.c                 |  70 +++++++++++++++++++-
 mm/huge_memory.c              |  13 ++--
 mm/khugepaged.c               |   4 +-
 mm/ksm.c                      |   2 +-
 mm/madvise.c                  |   2 +-
 mm/memory.c                   | 116 ++++++++++++++++++++++++----------
 mm/migrate.c                  |   2 +
 mm/migrate_device.c           |   2 +-
 mm/oom_kill.c                 |  59 ++++++++++++-----
 mm/rmap.c                     |  16 ++---
 mm/swapfile.c                 |   4 +-
 mm/userfaultfd.c              |   2 +-
 21 files changed, 317 insertions(+), 111 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: [PATCH 2/5 v1] mm: add numa_count field for rss_stat
@ 2022-05-12 21:21 kernel test robot
  0 siblings, 0 replies; 13+ messages in thread
From: kernel test robot @ 2022-05-12 21:21 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 13101 bytes --]

CC: kbuild-all(a)lists.01.org
BCC: lkp(a)intel.com
In-Reply-To: <20220512044634.63586-3-ligang.bdlg@bytedance.com>
References: <20220512044634.63586-3-ligang.bdlg@bytedance.com>
TO: Gang Li <ligang.bdlg@bytedance.com>
TO: akpm(a)linux-foundation.org
CC: songmuchun(a)bytedance.com
CC: hca(a)linux.ibm.com
CC: gor(a)linux.ibm.com
CC: agordeev(a)linux.ibm.com
CC: borntraeger(a)linux.ibm.com
CC: svens(a)linux.ibm.com
CC: ebiederm(a)xmission.com
CC: keescook(a)chromium.org
CC: viro(a)zeniv.linux.org.uk
CC: rostedt(a)goodmis.org
CC: mingo(a)redhat.com
CC: peterz(a)infradead.org
CC: acme(a)kernel.org
CC: mark.rutland(a)arm.com
CC: alexander.shishkin(a)linux.intel.com
CC: jolsa(a)kernel.org
CC: namhyung(a)kernel.org
CC: david(a)redhat.com
CC: imbrenda(a)linux.ibm.com
CC: apopple(a)nvidia.com
CC: adobriyan(a)gmail.com
CC: stephen.s.brennan(a)oracle.com
CC: ohoono.kwon(a)samsung.com
CC: haolee.swjtu(a)gmail.com
CC: kaleshsingh(a)google.com
CC: zhengqi.arch(a)bytedance.com
CC: peterx(a)redhat.com
CC: shy828301(a)gmail.com
CC: surenb(a)google.com

Hi Gang,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on s390/features]
[also build test WARNING on kees/for-next/execve tip/perf/core linus/master v5.18-rc6]
[cannot apply to akpm-mm/mm-everything rostedt-trace/for-next next-20220512]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Gang-Li/mm-oom-Introduce-per-numa-node-oom-for-CONSTRAINT_MEMORY_POLICY/20220512-124948
base:   https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
:::::: branch date: 16 hours ago
:::::: commit date: 16 hours ago
config: x86_64-randconfig-s022 (https://download.01.org/0day-ci/archive/20220513/202205130551.e9BVkfHo-lkp(a)intel.com/config)
compiler: gcc-11 (Debian 11.2.0-20) 11.2.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/intel-lab-lkp/linux/commit/3e724f844e3fe45cdffd9f359d140a424aeb462c
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Gang-Li/mm-oom-Introduce-per-numa-node-oom-for-CONSTRAINT_MEMORY_POLICY/20220512-124948
        git checkout 3e724f844e3fe45cdffd9f359d140a424aeb462c
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> kernel/fork.c:1110:1: sparse: sparse: unused label 'free_stack'
>> kernel/fork.c:878:6: sparse: sparse: symbol 'rss_stat_free' was not declared. Should it be static?
   kernel/fork.c:1366:24: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct file [noderef] __rcu *__ret @@     got struct file *new_exe_file @@
   kernel/fork.c:1366:24: sparse:     expected struct file [noderef] __rcu *__ret
   kernel/fork.c:1366:24: sparse:     got struct file *new_exe_file
   kernel/fork.c:1366:22: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct file *[assigned] old_exe_file @@     got struct file [noderef] __rcu *[assigned] __ret @@
   kernel/fork.c:1366:22: sparse:     expected struct file *[assigned] old_exe_file
   kernel/fork.c:1366:22: sparse:     got struct file [noderef] __rcu *[assigned] __ret
   kernel/fork.c:1697:38: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct refcount_struct [usertype] *r @@     got struct refcount_struct [noderef] __rcu * @@
   kernel/fork.c:1697:38: sparse:     expected struct refcount_struct [usertype] *r
   kernel/fork.c:1697:38: sparse:     got struct refcount_struct [noderef] __rcu *
   kernel/fork.c:1706:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/fork.c:1706:31: sparse:     expected struct spinlock [usertype] *lock
   kernel/fork.c:1706:31: sparse:     got struct spinlock [noderef] __rcu *
   kernel/fork.c:1707:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected void const * @@     got struct k_sigaction [noderef] __rcu * @@
   kernel/fork.c:1707:9: sparse:     expected void const *
   kernel/fork.c:1707:9: sparse:     got struct k_sigaction [noderef] __rcu *
   kernel/fork.c:1707:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected void const * @@     got struct k_sigaction [noderef] __rcu * @@
   kernel/fork.c:1707:9: sparse:     expected void const *
   kernel/fork.c:1707:9: sparse:     got struct k_sigaction [noderef] __rcu *
   kernel/fork.c:1707:9: sparse: sparse: incorrect type in argument 2 (different address spaces) @@     expected void const * @@     got struct k_sigaction [noderef] __rcu * @@
   kernel/fork.c:1707:9: sparse:     expected void const *
   kernel/fork.c:1707:9: sparse:     got struct k_sigaction [noderef] __rcu *
   kernel/fork.c:1708:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/fork.c:1708:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/fork.c:1708:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/fork.c:1801:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct qspinlock *lock @@     got struct qspinlock [noderef] __rcu * @@
   kernel/fork.c:1801:9: sparse:     expected struct qspinlock *lock
   kernel/fork.c:1801:9: sparse:     got struct qspinlock [noderef] __rcu *
   kernel/fork.c:2120:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/fork.c:2120:31: sparse:     expected struct spinlock [usertype] *lock
   kernel/fork.c:2120:31: sparse:     got struct spinlock [noderef] __rcu *
   kernel/fork.c:2124:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/fork.c:2124:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/fork.c:2124:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/fork.c:2443:32: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct task_struct [noderef] __rcu *real_parent @@     got struct task_struct * @@
   kernel/fork.c:2443:32: sparse:     expected struct task_struct [noderef] __rcu *real_parent
   kernel/fork.c:2443:32: sparse:     got struct task_struct *
   kernel/fork.c:2452:27: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/fork.c:2452:27: sparse:     expected struct spinlock [usertype] *lock
   kernel/fork.c:2452:27: sparse:     got struct spinlock [noderef] __rcu *
   kernel/fork.c:2497:54: sparse: sparse: incorrect type in argument 2 (different address spaces) @@     expected struct list_head *head @@     got struct list_head [noderef] __rcu * @@
   kernel/fork.c:2497:54: sparse:     expected struct list_head *head
   kernel/fork.c:2497:54: sparse:     got struct list_head [noderef] __rcu *
   kernel/fork.c:2518:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/fork.c:2518:29: sparse:     expected struct spinlock [usertype] *lock
   kernel/fork.c:2518:29: sparse:     got struct spinlock [noderef] __rcu *
   kernel/fork.c:2539:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/fork.c:2539:29: sparse:     expected struct spinlock [usertype] *lock
   kernel/fork.c:2539:29: sparse:     got struct spinlock [noderef] __rcu *
   kernel/fork.c:2566:28: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct sighand_struct *sighand @@     got struct sighand_struct [noderef] __rcu *sighand @@
   kernel/fork.c:2566:28: sparse:     expected struct sighand_struct *sighand
   kernel/fork.c:2566:28: sparse:     got struct sighand_struct [noderef] __rcu *sighand
   kernel/fork.c:2595:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/fork.c:2595:31: sparse:     expected struct spinlock [usertype] *lock
   kernel/fork.c:2595:31: sparse:     got struct spinlock [noderef] __rcu *
   kernel/fork.c:2597:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/fork.c:2597:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/fork.c:2597:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/fork.c:3006:24: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct task_struct *[assigned] parent @@     got struct task_struct [noderef] __rcu *real_parent @@
   kernel/fork.c:3006:24: sparse:     expected struct task_struct *[assigned] parent
   kernel/fork.c:3006:24: sparse:     got struct task_struct [noderef] __rcu *real_parent
   kernel/fork.c:3087:43: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct refcount_struct const [usertype] *r @@     got struct refcount_struct [noderef] __rcu * @@
   kernel/fork.c:3087:43: sparse:     expected struct refcount_struct const [usertype] *r
   kernel/fork.c:3087:43: sparse:     got struct refcount_struct [noderef] __rcu *
   kernel/fork.c:2162:22: sparse: sparse: dereference of noderef expression
   kernel/fork.c: note: in included file (through include/uapi/asm-generic/bpf_perf_event.h, arch/x86/include/generated/uapi/asm/bpf_perf_event.h, ...):
   include/linux/ptrace.h:217:45: sparse: sparse: incorrect type in argument 2 (different address spaces) @@     expected struct task_struct *new_parent @@     got struct task_struct [noderef] __rcu *parent @@
   include/linux/ptrace.h:217:45: sparse:     expected struct task_struct *new_parent
   include/linux/ptrace.h:217:45: sparse:     got struct task_struct [noderef] __rcu *parent
   include/linux/ptrace.h:217:62: sparse: sparse: incorrect type in argument 3 (different address spaces) @@     expected struct cred const *ptracer_cred @@     got struct cred const [noderef] __rcu *ptracer_cred @@
   include/linux/ptrace.h:217:62: sparse:     expected struct cred const *ptracer_cred
   include/linux/ptrace.h:217:62: sparse:     got struct cred const [noderef] __rcu *ptracer_cred
   kernel/fork.c:2495:59: sparse: sparse: dereference of noderef expression
   kernel/fork.c:2496:59: sparse: sparse: dereference of noderef expression

Please review and possibly fold the followup patch.

vim +/free_stack +1110 kernel/fork.c

a3d29e8291b622 Peter Zijlstra            2022-02-07  1100  
d46eb14b735b11 Shakeel Butt              2018-08-17  1101  #ifdef CONFIG_MEMCG
d46eb14b735b11 Shakeel Butt              2018-08-17  1102  	tsk->active_memcg = NULL;
d46eb14b735b11 Shakeel Butt              2018-08-17  1103  #endif
^1da177e4c3f41 Linus Torvalds            2005-04-16  1104  	return tsk;
61c4628b538608 Suresh Siddha             2008-03-10  1105  
3e724f844e3fe4 Gang Li                   2022-05-12  1106  free_rss_stat:
3e724f844e3fe4 Gang Li                   2022-05-12  1107  #ifdef SPLIT_RSS_NUMA_COUNTING
3e724f844e3fe4 Gang Li                   2022-05-12  1108  	kfree(numa_count);
3e724f844e3fe4 Gang Li                   2022-05-12  1109  #endif
b235beea9e996a Linus Torvalds            2016-06-24 @1110  free_stack:
1a03d3f13ffe5d Sebastian Andrzej Siewior 2022-02-17  1111  	exit_task_stack_account(tsk);
ba14a194a434cc Andy Lutomirski           2016-08-11  1112  	free_thread_stack(tsk);
f19b9f74b7ea3b Akinobu Mita              2012-07-30  1113  free_tsk:
61c4628b538608 Suresh Siddha             2008-03-10  1114  	free_task_struct(tsk);
61c4628b538608 Suresh Siddha             2008-03-10  1115  	return NULL;
^1da177e4c3f41 Linus Torvalds            2005-04-16  1116  }
^1da177e4c3f41 Linus Torvalds            2005-04-16  1117  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-06-15 10:15 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-12  4:46 [PATCH 0/5 v1] mm, oom: Introduce per numa node oom for CONSTRAINT_MEMORY_POLICY Gang Li
2022-05-12  4:46 ` [PATCH 1/5 v1] mm: add a new parameter `node` to `get/add/inc/dec_mm_counter` Gang Li
2022-05-12  4:46 ` [PATCH 2/5 v1] mm: add numa_count field for rss_stat Gang Li
2022-05-12 16:31   ` kernel test robot
2022-05-12  4:46 ` [PATCH 3/5 v1] mm: add numa fields for tracepoint rss_stat Gang Li
2022-05-12  4:46 ` [PATCH 4/5 v1] mm: enable per numa node rss_stat count Gang Li
2022-05-17  2:28   ` [mm] c9dc81ef10: BUG:Bad_rss-counter_state_mm:#node:#val kernel test robot
2022-05-17  2:28     ` kernel test robot
2022-05-12  4:46 ` [PATCH 5/5 v1] mm, oom: enable per numa node oom for CONSTRAINT_MEMORY_POLICY Gang Li
2022-05-12 22:31 ` [PATCH 0/5 v1] mm, oom: Introduce " Suren Baghdasaryan
2022-05-16 16:44 ` Michal Hocko
2022-06-15 10:13   ` Gang Li
2022-05-12 21:21 [PATCH 2/5 v1] mm: add numa_count field for rss_stat kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.