linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v1 1/2] mm/page_alloc: Prevent OOM killer from triggering if requested
@ 2017-07-09 22:49 Joel Fernandes
  2017-07-09 22:49 ` [RFC v1 2/2] tracing/ring_buffer: Try harder to allocate Joel Fernandes
  2017-07-10 16:05 ` [RFC v1 1/2] mm/page_alloc: Prevent OOM killer from triggering if requested Vlastimil Babka
  0 siblings, 2 replies; 3+ messages in thread
From: Joel Fernandes @ 2017-07-09 22:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes, Alexander Duyck, Mel Gorman, Hao Lee,
	Vladimir Davydov, Johannes Weiner, Joonsoo Kim, Steven Rostedt,
	linux-mm

Certain allocation paths such as the ftrace ring buffer allocator
want to try hard to allocate but not trigger OOM killer and de-stabilize
the system. Currently the ring buffer uses __GFP_NO_RETRY to prevent
the OOM killer from triggering situation however this has an issue.
Its possible the system is in a state where:
a) retrying can make the allocation succeed.
b) there's plenty of memory available in the page cache to satisfy
   the request and just retrying is needed. Even though direct reclaim
   makes progress, it still couldn't find free page from the free list.

This patch adds a new GFP flag (__GFP_DONTOOM) to handle the situation
where we want the retry behavior but still want to bail out before going
to OOM killer if retries couldn't satisfy the allocation.

Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hao Lee <haolee.swjtu@gmail.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Joel Fernandes <joelaf@google.com>
---
 include/linux/gfp.h | 6 +++++-
 mm/page_alloc.c     | 7 +++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 4c6656f1fee7..beaabd110008 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -40,6 +40,7 @@ struct vm_area_struct;
 #define ___GFP_DIRECT_RECLAIM	0x400000u
 #define ___GFP_WRITE		0x800000u
 #define ___GFP_KSWAPD_RECLAIM	0x1000000u
+#define ___GFP_DONTOOM		0x2000000u
 #ifdef CONFIG_LOCKDEP
 #define ___GFP_NOLOCKDEP	0x2000000u
 #else
@@ -149,6 +150,8 @@ struct vm_area_struct;
  *   return NULL when direct reclaim and memory compaction have failed to allow
  *   the allocation to succeed.  The OOM killer is not called with the current
  *   implementation.
+ *
+ * __GFP_DONTOOM: The VM implementation must not OOM if retries have exhausted.
  */
 #define __GFP_IO	((__force gfp_t)___GFP_IO)
 #define __GFP_FS	((__force gfp_t)___GFP_FS)
@@ -158,6 +161,7 @@ struct vm_area_struct;
 #define __GFP_REPEAT	((__force gfp_t)___GFP_REPEAT)
 #define __GFP_NOFAIL	((__force gfp_t)___GFP_NOFAIL)
 #define __GFP_NORETRY	((__force gfp_t)___GFP_NORETRY)
+#define __GFP_DONTOOM	((__force gfp_t)___GFP_DONTOOM)
 
 /*
  * Action modifiers
@@ -188,7 +192,7 @@ struct vm_area_struct;
 #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
 
 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT (25 + IS_ENABLED(CONFIG_LOCKDEP))
+#define __GFP_BITS_SHIFT (26 + IS_ENABLED(CONFIG_LOCKDEP))
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bd65b60939b6..970a5c380bb6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3908,6 +3908,13 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (check_retry_cpuset(cpuset_mems_cookie, ac))
 		goto retry_cpuset;
 
+	/*
+	 * Its possible that retries failed but we still don't want OOM
+	 * killer to trigger and can just try again later.
+	 */
+	if (gfp_mask & __GFP_DONTOOM)
+		goto nopage;
+
 	/* Reclaim has failed us, start killing things */
 	page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
 	if (page)
-- 
2.13.2.725.g09c95d1e9-goog

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [RFC v1 2/2] tracing/ring_buffer: Try harder to allocate
  2017-07-09 22:49 [RFC v1 1/2] mm/page_alloc: Prevent OOM killer from triggering if requested Joel Fernandes
@ 2017-07-09 22:49 ` Joel Fernandes
  2017-07-10 16:05 ` [RFC v1 1/2] mm/page_alloc: Prevent OOM killer from triggering if requested Vlastimil Babka
  1 sibling, 0 replies; 3+ messages in thread
From: Joel Fernandes @ 2017-07-09 22:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes, Alexander Duyck, Mel Gorman, Hao Lee,
	Vladimir Davydov, Johannes Weiner, Joonsoo Kim, Steven Rostedt,
	linux-mm

ftrace can fail to allocate per-CPU ring buffer on systems with a large
number of CPUs coupled while large amounts of cache happening in the
page cache. Currently the ring buffer allocation doesn't retry in the VM
implementation even if direct-reclaim made some progress but still
wasn't able to find a free page. On retrying I see that the allocations
almost always succeed. The retry doesn't happen because __GFP_NORETRY is
used in the tracer to prevent the case where we might OOM, however if we
drop __GFP_NORETRY, we risk destabilizing the system if OOM killer is
triggered. To prevent this situation, use the __GFP_DONTOOM flag
introduced in earlier patches while droppping __GFP_NORETRY.

With this the following succeed without destabilizing a system with 8
CPU cores and 4GB of memory:
echo 100000 > /sys/kernel/debug/tracing/buffer_size_kb
On an 8-core system, that would allocate ~800MB.

Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hao Lee <haolee.swjtu@gmail.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Joel Fernandes <joelaf@google.com>
---
 kernel/trace/ring_buffer.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 4ae268e687fe..b1cdcac6ca89 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1141,7 +1141,7 @@ static int __rb_allocate_pages(long nr_pages, struct list_head *pages, int cpu)
 		 * not destabilized.
 		 */
 		bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
-				    GFP_KERNEL | __GFP_NORETRY,
+				    GFP_KERNEL | __GFP_DONTOOM,
 				    cpu_to_node(cpu));
 		if (!bpage)
 			goto free_pages;
@@ -1149,7 +1149,7 @@ static int __rb_allocate_pages(long nr_pages, struct list_head *pages, int cpu)
 		list_add(&bpage->list, pages);
 
 		page = alloc_pages_node(cpu_to_node(cpu),
-					GFP_KERNEL | __GFP_NORETRY, 0);
+					GFP_KERNEL | __GFP_DONTOOM, 0);
 		if (!page)
 			goto free_pages;
 		bpage->page = page_address(page);
-- 
2.13.2.725.g09c95d1e9-goog

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [RFC v1 1/2] mm/page_alloc: Prevent OOM killer from triggering if requested
  2017-07-09 22:49 [RFC v1 1/2] mm/page_alloc: Prevent OOM killer from triggering if requested Joel Fernandes
  2017-07-09 22:49 ` [RFC v1 2/2] tracing/ring_buffer: Try harder to allocate Joel Fernandes
@ 2017-07-10 16:05 ` Vlastimil Babka
  1 sibling, 0 replies; 3+ messages in thread
From: Vlastimil Babka @ 2017-07-10 16:05 UTC (permalink / raw)
  To: Joel Fernandes, linux-kernel
  Cc: Alexander Duyck, Mel Gorman, Hao Lee, Vladimir Davydov,
	Johannes Weiner, Joonsoo Kim, Steven Rostedt, linux-mm,
	Michal Hocko

[+CC Michal Hocko]

On 07/10/2017 12:49 AM, Joel Fernandes wrote:
> Certain allocation paths such as the ftrace ring buffer allocator
> want to try hard to allocate but not trigger OOM killer and de-stabilize
> the system. Currently the ring buffer uses __GFP_NO_RETRY to prevent
> the OOM killer from triggering situation however this has an issue.
> Its possible the system is in a state where:
> a) retrying can make the allocation succeed.
> b) there's plenty of memory available in the page cache to satisfy
>    the request and just retrying is needed. Even though direct reclaim
>    makes progress, it still couldn't find free page from the free list.
> 
> This patch adds a new GFP flag (__GFP_DONTOOM) to handle the situation
> where we want the retry behavior but still want to bail out before going
> to OOM killer if retries couldn't satisfy the allocation.

Michal recently turned __GFP_REPEAT into __GFP_RETRY_MAYFAIL [1][2]
which I think does exactly what you want. Try hard as long as
reclaim/compaction makes progress, but fail the allocation instead of
triggering OOM killer. Can you check it out? It's in mmotm/linux-next.

[1]
http://www.ozlabs.org/~akpm/mmotm/broken-out/mm-tree-wide-replace-__gfp_repeat-by-__gfp_retry_mayfail-with-more-useful-semantic.patch
[2] http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.org

> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Hao Lee <haolee.swjtu@gmail.com>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Joel Fernandes <joelaf@google.com>
> ---
>  include/linux/gfp.h | 6 +++++-
>  mm/page_alloc.c     | 7 +++++++
>  2 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 4c6656f1fee7..beaabd110008 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -40,6 +40,7 @@ struct vm_area_struct;
>  #define ___GFP_DIRECT_RECLAIM	0x400000u
>  #define ___GFP_WRITE		0x800000u
>  #define ___GFP_KSWAPD_RECLAIM	0x1000000u
> +#define ___GFP_DONTOOM		0x2000000u
>  #ifdef CONFIG_LOCKDEP
>  #define ___GFP_NOLOCKDEP	0x2000000u
>  #else
> @@ -149,6 +150,8 @@ struct vm_area_struct;
>   *   return NULL when direct reclaim and memory compaction have failed to allow
>   *   the allocation to succeed.  The OOM killer is not called with the current
>   *   implementation.
> + *
> + * __GFP_DONTOOM: The VM implementation must not OOM if retries have exhausted.
>   */
>  #define __GFP_IO	((__force gfp_t)___GFP_IO)
>  #define __GFP_FS	((__force gfp_t)___GFP_FS)
> @@ -158,6 +161,7 @@ struct vm_area_struct;
>  #define __GFP_REPEAT	((__force gfp_t)___GFP_REPEAT)
>  #define __GFP_NOFAIL	((__force gfp_t)___GFP_NOFAIL)
>  #define __GFP_NORETRY	((__force gfp_t)___GFP_NORETRY)
> +#define __GFP_DONTOOM	((__force gfp_t)___GFP_DONTOOM)
>  
>  /*
>   * Action modifiers
> @@ -188,7 +192,7 @@ struct vm_area_struct;
>  #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
>  
>  /* Room for N __GFP_FOO bits */
> -#define __GFP_BITS_SHIFT (25 + IS_ENABLED(CONFIG_LOCKDEP))
> +#define __GFP_BITS_SHIFT (26 + IS_ENABLED(CONFIG_LOCKDEP))
>  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
>  
>  /*
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bd65b60939b6..970a5c380bb6 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3908,6 +3908,13 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	if (check_retry_cpuset(cpuset_mems_cookie, ac))
>  		goto retry_cpuset;
>  
> +	/*
> +	 * Its possible that retries failed but we still don't want OOM
> +	 * killer to trigger and can just try again later.
> +	 */
> +	if (gfp_mask & __GFP_DONTOOM)
> +		goto nopage;
> +
>  	/* Reclaim has failed us, start killing things */
>  	page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
>  	if (page)
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-07-10 16:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-09 22:49 [RFC v1 1/2] mm/page_alloc: Prevent OOM killer from triggering if requested Joel Fernandes
2017-07-09 22:49 ` [RFC v1 2/2] tracing/ring_buffer: Try harder to allocate Joel Fernandes
2017-07-10 16:05 ` [RFC v1 1/2] mm/page_alloc: Prevent OOM killer from triggering if requested Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).