All of lore.kernel.org
 help / color / mirror / Atom feed
* Patch "mm: memcontrol: do not recurse in direct reclaim" has been added to the 4.4-stable tree
@ 2016-11-07 16:18 gregkh
  0 siblings, 0 replies; only message in thread
From: gregkh @ 2016-11-07 16:18 UTC (permalink / raw)
  To: hannes, akpm, gregkh, mhocko, tj, torvalds, vdavydov.dev
  Cc: stable, stable-commits


This is a note to let you know that I've just added the patch titled

    mm: memcontrol: do not recurse in direct reclaim

to the 4.4-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     mm-memcontrol-do-not-recurse-in-direct-reclaim.patch
and it can be found in the queue-4.4 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.


>From 89a2848381b5fcd9c4d9c0cd97680e3b28730e31 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Thu, 27 Oct 2016 17:46:56 -0700
Subject: mm: memcontrol: do not recurse in direct reclaim

From: Johannes Weiner <hannes@cmpxchg.org>

commit 89a2848381b5fcd9c4d9c0cd97680e3b28730e31 upstream.

On 4.0, we saw a stack corruption from a page fault entering direct
memory cgroup reclaim, calling into btrfs_releasepage(), which then
tried to allocate an extent and recursed back into a kmem charge ad
nauseam:

  [...]
  btrfs_releasepage+0x2c/0x30
  try_to_release_page+0x32/0x50
  shrink_page_list+0x6da/0x7a0
  shrink_inactive_list+0x1e5/0x510
  shrink_lruvec+0x605/0x7f0
  shrink_zone+0xee/0x320
  do_try_to_free_pages+0x174/0x440
  try_to_free_mem_cgroup_pages+0xa7/0x130
  try_charge+0x17b/0x830
  memcg_charge_kmem+0x40/0x80
  new_slab+0x2d9/0x5a0
  __slab_alloc+0x2fd/0x44f
  kmem_cache_alloc+0x193/0x1e0
  alloc_extent_state+0x21/0xc0
  __clear_extent_bit+0x2b5/0x400
  try_release_extent_mapping+0x1a3/0x220
  __btrfs_releasepage+0x31/0x70
  btrfs_releasepage+0x2c/0x30
  try_to_release_page+0x32/0x50
  shrink_page_list+0x6da/0x7a0
  shrink_inactive_list+0x1e5/0x510
  shrink_lruvec+0x605/0x7f0
  shrink_zone+0xee/0x320
  do_try_to_free_pages+0x174/0x440
  try_to_free_mem_cgroup_pages+0xa7/0x130
  try_charge+0x17b/0x830
  mem_cgroup_try_charge+0x65/0x1c0
  handle_mm_fault+0x117f/0x1510
  __do_page_fault+0x177/0x420
  do_page_fault+0xc/0x10
  page_fault+0x22/0x30

On later kernels, kmem charging is opt-in rather than opt-out, and that
particular kmem allocation in btrfs_releasepage() is no longer being
charged and won't recurse and overrun the stack anymore.

But it's not impossible for an accounted allocation to happen from the
memcg direct reclaim context, and we needed to reproduce this crash many
times before we even got a useful stack trace out of it.

Like other direct reclaimers, mark tasks in memcg reclaim PF_MEMALLOC to
avoid recursing into any other form of direct reclaim.  Then let
recursive charges from PF_MEMALLOC contexts bypass the cgroup limit.

Link: http://lkml.kernel.org/r/20161025141050.GA13019@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/memcontrol.c |    9 +++++++++
 mm/vmscan.c     |    2 ++
 2 files changed, 11 insertions(+)

--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2055,6 +2055,15 @@ retry:
 		     current->flags & PF_EXITING))
 		goto force;
 
+	/*
+	 * Prevent unbounded recursion when reclaim operations need to
+	 * allocate memory. This might exceed the limits temporarily,
+	 * but we prefer facilitating memory reclaim and getting back
+	 * under the limit over triggering OOM kills in these cases.
+	 */
+	if (unlikely(current->flags & PF_MEMALLOC))
+		goto force;
+
 	if (unlikely(task_in_memcg_oom(current)))
 		goto nomem;
 
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2910,7 +2910,9 @@ unsigned long try_to_free_mem_cgroup_pag
 					    sc.may_writepage,
 					    sc.gfp_mask);
 
+	current->flags |= PF_MEMALLOC;
 	nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
+	current->flags &= ~PF_MEMALLOC;
 
 	trace_mm_vmscan_memcg_reclaim_end(nr_reclaimed);
 


Patches currently in stable-queue which might be from hannes@cmpxchg.org are

queue-4.4/mm-memcontrol-do-not-recurse-in-direct-reclaim.patch

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2016-11-07 16:18 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-07 16:18 Patch "mm: memcontrol: do not recurse in direct reclaim" has been added to the 4.4-stable tree gregkh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.