linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Handle oom bypass more gracefully
@ 2016-05-26 12:40 Michal Hocko
  2016-05-26 12:40 ` [PATCH 1/6] mm, oom: do not loop over all tasks if there are no external tasks sharing mm Michal Hocko
                   ` (6 more replies)
  0 siblings, 7 replies; 41+ messages in thread
From: Michal Hocko @ 2016-05-26 12:40 UTC (permalink / raw)
  To: linux-mm
  Cc: Tetsuo Handa, David Rientjes, Oleg Nesterov, Vladimir Davydov,
	Andrew Morton, LKML

Hi,
the following 6 patches should put some order to very rare cases of
mm shared between processes and make the paths which bypass the oom
killer oom reapable and so much more reliable finally.  Even though mm
shared outside of threadgroup is rare (either use_mm by kernel threads
or exotic clone(CLONE_VM) without CLONE_THREAD resp. CLONE_SIGHAND) it
makes the current oom killer logic quite hard to follow and evaluate. It
is possible to select an oom victim which shares the mm with unkillable
process or bypass the oom killer even when other processes sharing the
mm are still alive and other weird cases.

Patch 1 optimizes oom_kill_task to skip the costly process
iteration when the current oom victim is not sharing mm with other
processes. Patch 2 is a clean up of oom_score_adj handling and a
preparatory work. Patch 3 enforces oom_adj_score to be consistent
between processes sharing the mm to behave consistently with the regular
thread groups. Patch 4 tries to handle vforked tasks better in the oom
path, patch 5 ensures that all tasks sharing the mm are killed and
finally patch 6 should guarantee that task_will_free_mem will always
imply reapable bypass of the oom killer.

The patchset is based on the current mmotm tree (mmotm-2016-05-23-16-51).
I would really appreciate a deep review as this area is full of land
mines but I hope I've made the code much cleaner with less kludges.

I am CCing Oleg (sorry I know you hate this code) but I would feel much
better if you double checked my assumptions about locking and vfork
behavior.

Michal Hocko (6):
      mm, oom: do not loop over all tasks if there are no external tasks sharing mm
      proc, oom_adj: extract oom_score_adj setting into a helper
      mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj
      mm, oom: skip over vforked tasks
      mm, oom: kill all tasks sharing the mm
      mm, oom: fortify task_will_free_mem

 fs/proc/base.c      | 168 +++++++++++++++++++++++++++++-----------------------
 include/linux/mm.h  |   2 +
 include/linux/oom.h |  72 ++++++++++++++++++++--
 mm/memcontrol.c     |   4 +-
 mm/oom_kill.c       |  96 ++++++++++--------------------
 5 files changed, 196 insertions(+), 146 deletions(-)

^ permalink raw reply	[flat|nested] 41+ messages in thread
* [PATCH 0/6 -v2] Handle oom bypass more gracefully
@ 2016-05-30 13:05 Michal Hocko
  2016-05-30 13:05 ` [PATCH 6/6] mm, oom: fortify task_will_free_mem Michal Hocko
  0 siblings, 1 reply; 41+ messages in thread
From: Michal Hocko @ 2016-05-30 13:05 UTC (permalink / raw)
  To: linux-mm
  Cc: Tetsuo Handa, David Rientjes, Oleg Nesterov, Vladimir Davydov,
	Andrew Morton, LKML

Hi,
based on the feedback from Tetsuo and Vladimir (thanks to you both) I
had to change some of my assumptions and rework some patches. I planned
to resend later this week but I guess it would help to argue about the
code after those changes if I resubmit earlier. The previous version was
posted here http://lkml.kernel.org/r/1464266415-15558-1-git-send-email-mhocko@kernel.org

The following 6 patches should put some order to very rare cases of
mm shared between processes and make the paths which bypass the oom
killer oom reapable and so much more reliable finally. Even though mm
shared outside of threadgroup is rare (either vforked tasks for a
short period, use_mm by kernel threads or exotic thread model of
clone(CLONE_VM) without CLONE_THREAD resp. CLONE_SIGHAND). Not only it
makes the current oom killer logic quite hard to follow and evaluate it
can lead to weird corner cases. E.g. it is possible to select an oom
victim which shares the mm with unkillable process or bypass the oom
killer even when other processes sharing the mm are still alive and
other weird cases.

Patch 1 drops a bogus task_lock and mm check from oom_adj_write. This
can be considered a bug fix with a low impact as nobody has noticed
for years.

Patch 2 is a clean up of oom_score_adj handling and a preparatory
work. Patch 3 enforces oom_adj_score to be consistent between processes
sharing the mm to behave consistently with the regular thread
groups. This can be considered a user visible behavior change because
one thread group oom_score_adj update will affect others which share
the same mm via clone(CLONE_VM). I argue that this should be acceptable
because we already have the same behavior for threads in the same thread
group and sharing the mm without signal struct is just a different model
of threading. This is probably the most controversial part of the series,
I would like to find some consensus here though. There were some
suggestions to hook some counter/oom_score_adj into the mm_struct
but I feel that this is not necessary right now and we can rely on
proc handler + oom_kill_process to DTRT. I can be convinced otherwise
but I strongly think that whatever we do the userspace has to have
a way to see the current oom priority as consistently as possible.

Patch 4 makes sure that no vforked task is selected if it is sharing
the mm with oom unkillable task.

Patch 5 ensures that all tasks sharing the mm are killed which in turn
makes sure that all oom victims are oom reapable.

Patch 6 guarantees that task_will_free_mem will always imply reapable
bypass of the oom killer.

The patchset is based on the current mmotm tree (mmotm-2016-05-27-15-19).
I would really appreciate a deep review as this area is full of land
mines but I hope I've made the code much cleaner with less kludges.

I am CCing Oleg (sorry I know you hate this code) but I would feel much
better if you double checked my assumptions about locking and vfork
behavior.

Michal Hocko (6):
      proc, oom: drop bogus task_lock and mm check
      proc, oom_adj: extract oom_score_adj setting into a helper
      mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj
      mm, oom: skip vforked tasks from being selected
      mm, oom: kill all tasks sharing the mm
      mm, oom: fortify task_will_free_mem

 fs/proc/base.c      | 172 ++++++++++++++++++++++++++++++----------------------
 include/linux/mm.h  |   2 +
 include/linux/oom.h |  63 +++++++++++++++++--
 mm/memcontrol.c     |   4 +-
 mm/oom_kill.c       |  82 +++++--------------------
 5 files changed, 176 insertions(+), 147 deletions(-)

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2016-06-01  7:03 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-26 12:40 [PATCH 0/5] Handle oom bypass more gracefully Michal Hocko
2016-05-26 12:40 ` [PATCH 1/6] mm, oom: do not loop over all tasks if there are no external tasks sharing mm Michal Hocko
2016-05-26 14:30   ` Tetsuo Handa
2016-05-26 14:59     ` Michal Hocko
2016-05-26 15:25       ` [PATCH 1/6] mm, oom: do not loop over all tasks if there are noexternal " Tetsuo Handa
2016-05-26 15:35         ` Michal Hocko
2016-05-26 16:14           ` [PATCH 1/6] mm, oom: do not loop over all tasks if there are no external " Tetsuo Handa
2016-05-27  6:45             ` Michal Hocko
2016-05-27  7:15               ` Michal Hocko
2016-05-27  8:03                 ` Michal Hocko
2016-05-26 12:40 ` [PATCH 2/6] proc, oom_adj: extract oom_score_adj setting into a helper Michal Hocko
2016-05-26 12:40 ` [PATCH 3/6] mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj Michal Hocko
2016-05-27 11:18   ` Michal Hocko
2016-05-27 16:18     ` Vladimir Davydov
2016-05-30  7:07       ` Michal Hocko
2016-05-30  8:47         ` Vladimir Davydov
2016-05-30  9:39           ` Michal Hocko
2016-05-30 10:26             ` Vladimir Davydov
2016-05-30 11:11               ` Michal Hocko
2016-05-30 12:19                 ` Vladimir Davydov
2016-05-30 12:28                   ` Michal Hocko
2016-05-26 12:40 ` [PATCH 4/6] mm, oom: skip over vforked tasks Michal Hocko
2016-05-27 16:48   ` Vladimir Davydov
2016-05-30  7:13     ` Michal Hocko
2016-05-30  9:52       ` Michal Hocko
2016-05-30 10:40         ` Vladimir Davydov
2016-05-30 10:53           ` Michal Hocko
2016-05-30 12:03   ` Michal Hocko
2016-05-26 12:40 ` [PATCH 5/6] mm, oom: kill all tasks sharing the mm Michal Hocko
2016-05-26 12:40 ` [PATCH 6/6] mm, oom: fortify task_will_free_mem Michal Hocko
2016-05-26 14:11   ` Tetsuo Handa
2016-05-26 14:23     ` Michal Hocko
2016-05-26 14:41       ` Tetsuo Handa
2016-05-26 14:56         ` Michal Hocko
2016-05-27 11:07   ` Michal Hocko
2016-05-27 16:00 ` [PATCH 0/5] Handle oom bypass more gracefully Michal Hocko
2016-05-30 13:05 [PATCH 0/6 -v2] " Michal Hocko
2016-05-30 13:05 ` [PATCH 6/6] mm, oom: fortify task_will_free_mem Michal Hocko
2016-05-30 17:35   ` Oleg Nesterov
2016-05-31  7:46     ` Michal Hocko
2016-05-31 22:29       ` Oleg Nesterov
2016-06-01  7:03         ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).