LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: Zhaoyang Huang <huangzhaoyang@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>,
	linux-kernel@vger.kernel.org, kernel-patch-test@lists.linaro.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Joel Fernandes <joelaf@google.com>,
	Michal Hocko <mhocko@suse.com>,
	linux-mm@kvack.org, Vlastimil Babka <vbabka@suse.cz>,
	Michal Hocko <mhocko@kernel.org>
Subject: Re: [PATCH v1] kernel/trace:check the val against the available mem
Date: Fri, 30 Mar 2018 10:20:38 -0400
Message-ID: <20180330102038.2378925b@gandalf.local.home> (raw)
In-Reply-To: <1522320104-6573-1-git-send-email-zhaoyang.huang@spreadtrum.com>


[ Adding memory management folks to discuss the issue ]

On Thu, 29 Mar 2018 18:41:44 +0800
Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:

> It is reported that some user app would like to echo a huge
> number to "/sys/kernel/debug/tracing/buffer_size_kb" regardless
>  of the available memory, which will cause the coinstantaneous
> page allocation failed and introduce OOM. The commit checking the
> val against the available mem first to avoid the consequence allocation.
> 
> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@spreadtrum.com>
> ---
>  kernel/trace/trace.c | 39 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 2d0ffcc..a4a4237 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -43,6 +43,8 @@
>  #include <linux/trace.h>
>  #include <linux/sched/rt.h>
>  
> +#include <linux/mm.h>
> +#include <linux/swap.h>
>  #include "trace.h"
>  #include "trace_output.h"
>  
> @@ -5967,6 +5969,39 @@ static ssize_t tracing_splice_read_pipe(struct file *filp,
>  	return ret;
>  }
>  
> +static long get_available_mem(void)
> +{
> +	struct sysinfo i;
> +	long available;
> +	unsigned long pagecache;
> +	unsigned long wmark_low = 0;
> +	unsigned long pages[NR_LRU_LISTS];
> +	struct zone *zone;
> +	int lru;
> +
> +	si_meminfo(&i);
> +	si_swapinfo(&i);
> +
> +	for (lru = LRU_BASE; lru < NR_LRU_LISTS; lru++)
> +		pages[lru] = global_page_state(NR_LRU_BASE + lru);
> +
> +	for_each_zone(zone)
> +		wmark_low += zone->watermark[WMARK_LOW];
> +
> +	available = i.freeram - wmark_low;
> +
> +	pagecache = pages[LRU_ACTIVE_FILE] + pages[LRU_INACTIVE_FILE];
> +	pagecache -= min(pagecache / 2, wmark_low);
> +	available += pagecache;
> +
> +	available += global_page_state(NR_SLAB_RECLAIMABLE) -
> +		min(global_page_state(NR_SLAB_RECLAIMABLE) / 2, wmark_low);
> +
> +	if (available < 0)
> +		available = 0;
> +	return available;
> +}
> +

As I stated in my other reply, the above function does not belong in
tracing.

That said, it appears you are having issues that were caused by the
change by commit 848618857d2 ("tracing/ring_buffer: Try harder to
allocate"), where we replaced NORETRY with RETRY_MAYFAIL. The point of
NORETRY was to keep allocations of the tracing ring-buffer from causing
OOMs. But the RETRY was too strong in that case, because there were
those that wanted to allocate large ring buffers but it would fail due
to memory being used that could be reclaimed. Supposedly, RETRY_MAYFAIL
is to allocate with reclaim but still allow to fail, and isn't suppose
to trigger an OOM. From my own tests, this is obviously not the case.

Perhaps this is because the ring buffer allocates one page at a time,
and by doing so, it can get every last available page, and if anything
in the mean time does an allocation without MAYFAIL, it will cause an
OOM. For example, when I stressed this I triggered this:

 pool invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
 pool cpuset=/ mems_allowed=0
 CPU: 7 PID: 1040 Comm: pool Not tainted 4.16.0-rc4-test+ #663
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
 Call Trace:
  dump_stack+0x8e/0xce
  dump_header.isra.30+0x6e/0x28f
  ? _raw_spin_unlock_irqrestore+0x30/0x60
  oom_kill_process+0x218/0x400
  ? has_capability_noaudit+0x17/0x20
  out_of_memory+0xe3/0x5c0
  __alloc_pages_slowpath+0xa8e/0xe50
  __alloc_pages_nodemask+0x206/0x220
  alloc_pages_current+0x6a/0xe0
  __page_cache_alloc+0x6a/0xa0
  filemap_fault+0x208/0x5f0
  ? __might_sleep+0x4a/0x80
  ext4_filemap_fault+0x31/0x44
  __do_fault+0x20/0xd0
  __handle_mm_fault+0xc08/0x1160
  handle_mm_fault+0x76/0x110
  __do_page_fault+0x299/0x580
  do_page_fault+0x2d/0x110
  ? page_fault+0x2f/0x50
  page_fault+0x45/0x50

I wonder if I should have the ring buffer allocate groups of pages, to
avoid this. Or try to allocate with NORETRY, one page at a time, and
when that fails, allocate groups of pages with RETRY_MAYFAIL, and that
may keep it from causing an OOM?

Thoughts?

-- Steve



>  static ssize_t
>  tracing_entries_write(struct file *filp, const char __user *ubuf,
>  		      size_t cnt, loff_t *ppos)
> @@ -5975,13 +6010,15 @@ static ssize_t tracing_splice_read_pipe(struct file *filp,
>  	struct trace_array *tr = inode->i_private;
>  	unsigned long val;
>  	int ret;
> +	long available;
>  
> +	available = get_available_mem();
>  	ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
>  	if (ret)
>  		return ret;
>  
>  	/* must have at least 1 entry */
> -	if (!val)
> +	if (!val || (val > available))
>  		return -EINVAL;
>  
>  	/* value is in KB */

  parent reply index

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-29 10:41 Zhaoyang Huang
2018-03-29 16:05 ` Steven Rostedt
2018-03-30  3:32   ` Zhaoyang Huang
2018-03-30 14:07     ` Steven Rostedt
2018-03-30  6:53 ` [Kernel-patch-test] " kbuild test robot
2018-03-30  6:54 ` kbuild test robot
2018-03-30 14:20 ` Steven Rostedt [this message]
2018-03-30 16:37   ` Joel Fernandes
2018-03-30 19:10     ` Steven Rostedt
2018-03-30 20:37       ` Joel Fernandes
2018-03-30 20:53   ` Matthew Wilcox
2018-03-30 21:30     ` Steven Rostedt
2018-03-30 21:42       ` Steven Rostedt
2018-03-30 23:38         ` Joel Fernandes
2018-03-31  1:41           ` Steven Rostedt
2018-03-31  2:18             ` Matthew Wilcox
2018-03-31  3:07               ` Steven Rostedt
2018-03-31  5:44                 ` Joel Fernandes
2018-04-02  0:52         ` Zhaoyang Huang
2018-04-03 11:06   ` Michal Hocko
2018-04-03 11:51     ` Steven Rostedt
2018-04-03 12:16       ` Michal Hocko
2018-04-03 12:23         ` Steven Rostedt
2018-04-03 12:35           ` Michal Hocko
2018-04-03 13:32             ` Steven Rostedt
2018-04-03 13:56               ` Michal Hocko
2018-04-03 14:17                 ` Steven Rostedt
2018-04-03 16:11                   ` Michal Hocko
2018-04-03 16:59                     ` Steven Rostedt
2018-04-03 22:56                     ` Steven Rostedt
2018-04-04  6:20                       ` Michal Hocko
2018-04-04 12:21                         ` Joel Fernandes
2018-04-04 12:59                         ` Steven Rostedt
2018-04-04 14:10                           ` Michal Hocko
2018-04-04 14:25                             ` Steven Rostedt
2018-04-04 14:42                               ` Michal Hocko
2018-04-04 15:04                                 ` Steven Rostedt
2018-04-04 15:27                                   ` Michal Hocko
2018-04-04 15:38                                     ` Steven Rostedt
2018-04-04  2:58                 ` Zhaoyang Huang
2018-04-04  6:23                   ` Michal Hocko
2018-04-04  9:29                     ` Zhaoyang Huang
2018-04-04 14:11                     ` Steven Rostedt
2018-04-04 14:23                       ` Michal Hocko
2018-04-04 14:31                         ` Steven Rostedt
2018-04-04 14:47                           ` Michal Hocko
2018-04-04 15:47                         ` Steven Rostedt
2018-04-05  2:58                           ` Matthew Wilcox
2018-04-05  4:12                             ` Joel Fernandes
2018-04-05 14:22                               ` Matthew Wilcox
2018-04-05 14:27                                 ` Michal Hocko
2018-04-05 14:34                                   ` Steven Rostedt
2018-04-05 15:13                                   ` Matthew Wilcox
2018-04-05 15:32                                     ` Michal Hocko
2018-04-05 16:15                                       ` Matthew Wilcox
2018-04-05 18:54                                         ` Michal Hocko
2018-04-05 20:15                                           ` __GFP_LOW Matthew Wilcox
2018-04-06  6:09                                             ` __GFP_LOW Michal Hocko
2018-04-08  4:27                                               ` __GFP_LOW Matthew Wilcox
2018-04-09  7:34                                                 ` __GFP_LOW Michal Hocko
2018-04-09 15:51                                                   ` __GFP_LOW Matthew Wilcox
2018-04-09 18:14                                                     ` __GFP_LOW Michal Hocko
     [not found]                                                       ` <CA+JonM0HG9kWb6-0iyDQ8UMxTeR-f=+ZL89t5DvvDULDC8Sfyw@mail.gmail.com>
2018-04-10 12:19                                                         ` __GFP_LOW Matthew Wilcox
2018-04-05 14:30                                 ` [PATCH v1] kernel/trace:check the val against the available mem Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180330102038.2378925b@gandalf.local.home \
    --to=rostedt@goodmis.org \
    --cc=akpm@linux-foundation.org \
    --cc=huangzhaoyang@gmail.com \
    --cc=joelaf@google.com \
    --cc=kernel-patch-test@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git