[PATCH v13 0/7] cgroup-aware OOM killer

* [PATCH v13 0/7] cgroup-aware OOM killer
@ 2017-11-30 15:28 Roman Gushchin
  2017-11-30 15:28 ` [PATCH v13 1/7] mm, oom: refactor the oom_kill_process() function Roman Gushchin
                   ` (9 more replies)
  0 siblings, 10 replies; 51+ messages in thread
From: Roman Gushchin @ 2017-11-30 15:28 UTC (permalink / raw)
  To: linux-mm
  Cc: Roman Gushchin, Michal Hocko, Vladimir Davydov, Johannes Weiner,
	Tetsuo Handa, David Rientjes, Andrew Morton, Tejun Heo,
	kernel-team, cgroups, linux-doc, linux-kernel, linux-mm

This patchset makes the OOM killer cgroup-aware.

v13:
  - Reverted fallback to per-process OOM as in v11 (asked by Michal)
  - Added entry in cgroup features list
  - Added a note about charge migration
  - Rebase

v12:
  - Root memory cgroup is evaluated based on sum of the oom scores
    of belonging tasks
  - Do not fallback to the per-process behavior if there if
    it wasn't possbile to kill a memcg victim
  - Rebase on top of mm tree

v11:
  - Fixed an issue with skipping the root mem cgroup
    (discovered by Shakeel Butt)
  - Moved a check in __oom_kill_process() to the memmory.oom_group
    patch, added corresponding comments
  - Added a note about ignoring tasks with oom_score_adj -1000
    (proposed by Michal Hocko)
  - Rebase on top of mm tree

v10:
  - Separate oom_group introduction into a standalone patch
  - Stop propagating oom_group
  - Make oom_group delegatable
  - Do not try to kill the biggest task in the first order,
    if the whole cgroup is going to be killed
  - Stop caching oom_score on struct memcg, optimize victim
    memcg selection
  - Drop dmesg printing (for further refining)
  - Small refactorings and comments added here and there
  - Rebase on top of mm tree

v9:
  - Change siblings-to-siblings comparison to the tree-wide search,
    make related refactorings
  - Make oom_group implicitly propagated down by the tree
  - Fix an issue with task selection in root cgroup

v8:
  - Do not kill tasks with OOM_SCORE_ADJ -1000
  - Make the whole thing opt-in with cgroup mount option control
  - Drop oom_priority for further discussions
  - Kill the whole cgroup if oom_group is set and it's
    memory.max is reached
  - Update docs and commit messages

v7:
  - __oom_kill_process() drops reference to the victim task
  - oom_score_adj -1000 is always respected
  - Renamed oom_kill_all to oom_group
  - Dropped oom_prio range, converted from short to int
  - Added a cgroup v2 mount option to disable cgroup-aware OOM killer
  - Docs updated
  - Rebased on top of mmotm

v6:
  - Renamed oom_control.chosen to oom_control.chosen_task
  - Renamed oom_kill_all_tasks to oom_kill_all
  - Per-node NR_SLAB_UNRECLAIMABLE accounting
  - Several minor fixes and cleanups
  - Docs updated

v5:
  - Rebased on top of Michal Hocko's patches, which have changed the
    way how OOM victims becoming an access to the memory
    reserves. Dropped corresponding part of this patchset
  - Separated the oom_kill_process() splitting into a standalone commit
  - Added debug output (suggested by David Rientjes)
  - Some minor fixes

v4:
  - Reworked per-cgroup oom_score_adj into oom_priority
    (based on ideas by David Rientjes)
  - Tasks with oom_score_adj -1000 are never selected if
    oom_kill_all_tasks is not set
  - Memcg victim selection code is reworked, and
    synchronization is based on finding tasks with OOM victim marker,
    rather then on global counter
  - Debug output is dropped
  - Refactored TIF_MEMDIE usage

v3:
  - Merged commits 1-4 into 6
  - Separated oom_score_adj logic and debug output into separate commits
  - Fixed swap accounting

v2:
  - Reworked victim selection based on feedback
    from Michal Hocko, Vladimir Davydov and Johannes Weiner
  - "Kill all tasks" is now an opt-in option, by default
    only one process will be killed
  - Added per-cgroup oom_score_adj
  - Refined oom score calculations, suggested by Vladimir Davydov
  - Converted to a patchset

v1:
  https://lkml.org/lkml/2017/5/18/969

Cc: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: kernel-team@fb.com
Cc: cgroups@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: cgroups@vger.kernel.org
Cc: linux-mm@kvack.org

Roman Gushchin (7):
  mm, oom: refactor the oom_kill_process() function
  mm: implement mem_cgroup_scan_tasks() for the root memory cgroup
  mm, oom: cgroup-aware OOM killer
  mm, oom: introduce memory.oom_group
  mm, oom: add cgroup v2 mount option for cgroup-aware OOM killer
  mm, oom, docs: describe the cgroup-aware OOM killer
  cgroup: list groupoom in cgroup features

 Documentation/cgroup-v2.txt |  58 ++++++++++
 include/linux/cgroup-defs.h |   5 +
 include/linux/memcontrol.h  |  34 ++++++
 include/linux/oom.h         |  12 ++-
 kernel/cgroup/cgroup.c      |  13 ++-
 mm/memcontrol.c             | 258 +++++++++++++++++++++++++++++++++++++++++++-
 mm/oom_kill.c               | 224 +++++++++++++++++++++++++-------------
 7 files changed, 525 insertions(+), 79 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 51+ messages in thread