[RFC PATCH v2 0/5] mm: Select victim using bpf_oom_evaluate_task

* [RFC PATCH v2 0/5] mm: Select victim using bpf_oom_evaluate_task
@ 2023-08-10  8:13 Chuyi Zhou
  2023-08-10  8:13 ` [RFC PATCH v2 1/5] mm, oom: Introduce bpf_oom_evaluate_task Chuyi Zhou
                   ` (5 more replies)
  0 siblings, 6 replies; 30+ messages in thread
From: Chuyi Zhou @ 2023-08-10  8:13 UTC (permalink / raw)
  To: hannes, mhocko, roman.gushchin, ast, daniel, andrii, muchun.song
  Cc: bpf, linux-kernel, wuyun.abel, robin.lu, Chuyi Zhou

Changes
-------

This is v2 of the BPF OOM policy patchset.
v1 : https://lore.kernel.org/lkml/20230804093804.47039-1-zhouchuyi@bytedance.com/
v1 -> v2 changes:

- rename bpf_select_task to bpf_oom_evaluate_task and bypass the
tsk_is_oom_victim (and MMF_OOM_SKIP) logic. (Michal)

- add a new hook to set policy's name, so dump_header() can know
what has been the selection policy when reporting messages. (Michal)

- add a tracepoint when select_bad_process() find nothing. (Alan)

- add a doc to to describe how it is all supposed to work. (Alan)

================

This patchset adds a new interface and use it to select victim when OOM
is invoked. The mainly motivation is the need to customizable OOM victim
selection functionality.

The new interface is a bpf hook plugged in oom_evaluate_task. It takes oc
and current task as parameters and return a result indicating which one is
selected by the attached bpf program.

There are several conserns when designing this interface suggested by
Michal:

1. Hooking into oom_evaluate_task can keep the consistency of global and
memcg OOM interface. Besides, it seems the least disruptive to the existing
oom killer implementation.

2. Userspace can handle a lot on its own and provide the input to the BPF
program to make a decision. Since the oom scope iteration will be
implemented already in the kernel so all the BPF program has to do is to
rank processes or memcgs.

3. The new interface should better bypass the current heuristic rules
(e.g., tsk_is_oom_victim, and MMF_OOM_SKIP) to meet an arbitrary oom
policy's need.

Chuyi Zhou (5):
  mm, oom: Introduce bpf_oom_evaluate_task
  mm: Add policy_name to identify OOM policies
  mm: Add a tracepoint when OOM victim selection is failed
  bpf: Add a OOM policy test
  bpf: Add a BPF OOM policy Doc

 Documentation/bpf/oom.rst                     |  70 +++++++++
 include/linux/oom.h                           |   7 +
 include/trace/events/oom.h                    |  18 +++
 mm/oom_kill.c                                 | 100 +++++++++++--
 .../bpf/prog_tests/test_oom_policy.c          | 140 ++++++++++++++++++
 .../testing/selftests/bpf/progs/oom_policy.c  | 104 +++++++++++++
 6 files changed, 428 insertions(+), 11 deletions(-)
 create mode 100644 Documentation/bpf/oom.rst
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_oom_policy.c
 create mode 100644 tools/testing/selftests/bpf/progs/oom_policy.c

-- 
2.20.1

^ permalink raw reply	[flat|nested] 30+ messages in thread