[RFC 0/2] opportunistic memory reclaim of a killed process

* [RFC 0/2] opportunistic memory reclaim of a killed process
@ 2019-04-11  1:43 Suren Baghdasaryan
  2019-04-11  1:43 ` [RFC 1/2] mm: oom: expose expedite_reclaim to use oom_reaper outside of oom_kill.c Suren Baghdasaryan
                   ` (3 more replies)
  0 siblings, 4 replies; 43+ messages in thread
From: Suren Baghdasaryan @ 2019-04-11  1:43 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, rientjes, willy, yuzhoujian, jrdr.linux, guro, hannes,
	penguin-kernel, ebiederm, shakeelb, christian, minchan,
	timmurray, dancol, joel, jannh, surenb, linux-mm, lsf-pc,
	linux-kernel, kernel-team

The time to kill a process and free its memory can be critical when the
killing was done to prevent memory shortages affecting system
responsiveness.

In the case of Android, where processes can be restarted easily, killing a
less important background process is preferred to delaying or throttling
an interactive foreground process. At the same time unnecessary kills
should be avoided as they cause delays when the killed process is needed
again. This requires a balanced decision from the system software about
how long a kill can be postponed in the hope that memory usage will
decrease without such drastic measures.

As killing a process and reclaiming its memory is not an instant operation,
a margin of free memory has to be maintained to prevent system performance
deterioration while memory of the killed process is being reclaimed. The
size of this margin depends on the minimum reclaim rate to cover the
worst-case scenario and this minimum rate should be deterministic.

Note that on asymmetric architectures like ARM big.LITTLE the reclaim rate
can vary dramatically depending on which core it’s performed at (see test
results). It’s a usual scenario that a non-essential victim process is
being restricted to a less performant or throttled CPU for power saving
purposes. This makes the worst-case reclaim rate scenario very probable.

The cases when victim’s memory reclaim can be delayed further due to
process being blocked in an uninterruptible sleep or when it performs a
time-consuming operation makes the reclaim time even more unpredictable.

Increasing memory reclaim rate and making it more deterministic would
allow for a smaller free memory margin and would lead to more opportunities
to avoid killing a process.

Note that while other strategies like throttling memory allocations are
viable and can be employed for other non-essential processes they would
affect user experience if applied towards an interactive process.

Proposed solution uses existing oom-reaper thread to increase memory
reclaim rate of a killed process and to make this rate more deterministic.
By no means the proposed solution is considered the best and was chosen
because it was simple to implement and allowed for test data collection.
The downside of this solution is that it requires additional “expedite”
hint for something which has to be fast in all cases. Would be great to
find a way that does not require additional hints.

Other possible approaches include:
- Implementing a dedicated syscall to perform opportunistic reclaim in the
context of the process waiting for the victim’s death. A natural boost
bonus occurs if the waiting process has high or RT priority and is not
limited by cpuset cgroup in its CPU choices.
- Implement a mechanism that would perform opportunistic reclaim if it’s
possible unconditionally (similar to checks in task_will_free_mem()).
- Implement opportunistic reclaim that uses shrinker interface, PSI or
other memory pressure indications as a hint to engage.

Test details:
Tests are performed on a Qualcomm® Snapdragon™ 845 8-core ARM big.LITTLE
system with 4 little cores (0.3-1.6GHz) and 4 big cores (0.8-2.5GHz)
running Android.
Memory reclaim speed was measured using signal/signal_generate,
kmem/rss_stat and sched/sched_process_exit traces.

Test results:
powersave governor, min freq
                        normal kills      expedited kills
        little          856 MB/sec        3236 MB/sec
        big             5084 MB/sec       6144 MB/sec

performance governor, max freq
                        normal kills      expedited kills
        little          5602 MB/sec       8144 MB/sec
        big             14656 MB/sec      12398 MB/sec

schedutil governor (default)
                        normal kills      expedited kills
        little          2386 MB/sec       3908 MB/sec
        big             7282 MB/sec       6820-16386 MB/sec
=================================================================
min reclaim speed:      856 MB/sec        3236 MB/sec

The patches are based on 5.1-rc1

Suren Baghdasaryan (2):
  mm: oom: expose expedite_reclaim to use oom_reaper outside of
    oom_kill.c
  signal: extend pidfd_send_signal() to allow expedited process killing

 include/linux/oom.h          |  1 +
 include/linux/sched/signal.h |  3 ++-
 include/linux/signal.h       | 11 ++++++++++-
 ipc/mqueue.c                 |  2 +-
 kernel/signal.c              | 37 ++++++++++++++++++++++++++++--------
 kernel/time/itimer.c         |  2 +-
 mm/oom_kill.c                | 15 +++++++++++++++
 7 files changed, 59 insertions(+), 12 deletions(-)

-- 
2.21.0.392.gf8f6787159e-goog

^ permalink raw reply	[flat|nested] 43+ messages in thread