linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@kernel.org
Cc: rientjes@google.com, oleg@redhat.com,
	torvalds@linux-foundation.org, kwalker@redhat.com, cl@linux.com,
	akpm@linux-foundation.org, hannes@cmpxchg.org,
	vdavydov@parallels.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, skozina@redhat.com
Subject: Re: Can't we use timeout based OOM warning/killing?
Date: Sat, 10 Oct 2015 21:50:58 +0900	[thread overview]
Message-ID: <201510102150.CHH51580.QSHOFOtFLVOJFM@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <201510031502.BJD59536.HFJMtQOOLFFVSO@I-love.SAKURA.ne.jp>

Tetsuo Handa wrote:
> Without means to find out what was happening, we will "overlook real bugs"
> before "paper over real bugs". The means are expected to work without
> knowledge to use trace points functionality, are expected to run without
> memory allocation, are expected to dump output without administrator's
> operation, are expected to work before power reset by watchdog timers.

I want to use something like this patch (CONFIG_DEBUG_something is fine).
Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20151010.txt.xz
----------------------------------------
>From 0f749ddbc2bd9ce57ba56787e77595c3f13e9cc3 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Sat, 10 Oct 2015 20:48:09 +0900
Subject: [PATCH] Memory allocation watchdog kernel thread.

This patch adds a kernel thread which periodically reports number of
memory allocating tasks, dying tasks and OOM victim tasks.
This kernel thread helps reporting whether we are failing to solve OOM
conditions after OOM killer is invoked, in addition to reporting stalls
before OOM killer is invoked (e.g. all __GFP_FS allocating tasks are
blocked by locks or throttling whereas all !__GFP_FS allocating tasks
are unable to invoke the OOM killer).

$ grep MemAlloc serial.txt | grep -A 5 MemAlloc-Info:
[  101.937548] MemAlloc-Info: 4 stalling task, 32 dying task, 1 victim task.
[  101.939460] MemAlloc: sync4(10598) gfp=0x24280ca order=0 delay=17338
[  101.975433] MemAlloc: sync4(10602) gfp=0x24280ca order=0 delay=17115
[  102.015519] MemAlloc: sync4(10599) gfp=0x24280ca order=0 delay=17097
[  102.053884] MemAlloc: sync4(10607) gfp=0x24280ca order=0 delay=15970
[  112.094349] MemAlloc-Info: 176 stalling task, 32 dying task, 1 victim task.
[  112.098411] MemAlloc: sync4(10598) gfp=0x24280ca order=0 delay=27494
[  112.138381] MemAlloc: sync4(10602) gfp=0x24280ca order=0 delay=27271
[  112.178710] MemAlloc: sync4(10599) gfp=0x24280ca order=0 delay=27253
[  112.218674] MemAlloc: sync4(10607) gfp=0x24280ca order=0 delay=26126
[  112.257749] MemAlloc: sync4(10608) gfp=0x24280ca order=0 delay=14083
--
[  128.952137] MemAlloc-Info: 176 stalling task, 32 dying task, 1 victim task.
[  128.954056] MemAlloc: sync4(10598) gfp=0x24280ca order=0 delay=44352
[  128.992231] MemAlloc: sync4(10602) gfp=0x24280ca order=0 delay=44129
[  129.034180] MemAlloc: sync4(10599) gfp=0x24280ca order=0 delay=44111
[  129.071755] MemAlloc: sync4(10607) gfp=0x24280ca order=0 delay=42984
[  129.109851] MemAlloc: sync4(10608) gfp=0x24280ca order=0 delay=30941
--
[  145.683171] MemAlloc-Info: 175 stalling task, 32 dying task, 1 victim task.
[  145.685344] MemAlloc: sync4(10598) gfp=0x24280ca order=0 delay=61084
[  145.736475] MemAlloc: sync4(10599) gfp=0x24280ca order=0 delay=60843
[  145.778084] MemAlloc: sync4(10607) gfp=0x24280ca order=0 delay=59716
[  145.815363] MemAlloc: sync4(10608) gfp=0x24280ca order=0 delay=47673
[  145.853610] MemAlloc: sync4(10601) gfp=0x24280ca order=0 delay=47673
--
[  158.030038] MemAlloc-Info: 178 stalling task, 32 dying task, 1 victim task.
[  158.031945] MemAlloc: sync4(10598) gfp=0x24280ca order=0 delay=73430
[  158.071066] MemAlloc: sync4(10599) gfp=0x24280ca order=0 delay=73189
[  158.108835] MemAlloc: sync4(10607) gfp=0x24280ca order=0 delay=72062
[  158.146500] MemAlloc: sync4(10608) gfp=0x24280ca order=0 delay=60019
[  158.184146] MemAlloc: sync4(10601) gfp=0x24280ca order=0 delay=60019
--
[  174.851184] MemAlloc-Info: 178 stalling task, 32 dying task, 1 victim task.
[  174.853106] MemAlloc: sync4(10598) gfp=0x24280ca order=0 delay=90252
[  174.896592] MemAlloc: sync4(10599) gfp=0x24280ca order=0 delay=90011
[  174.935838] MemAlloc: sync4(10607) gfp=0x24280ca order=0 delay=88884
[  174.978799] MemAlloc: sync4(10608) gfp=0x24280ca order=0 delay=76841
[  175.022003] MemAlloc: sync4(10601) gfp=0x24280ca order=0 delay=76841
--

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 mm/page_alloc.c | 145 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 145 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0d6f540..0473eec 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2972,6 +2972,147 @@ static inline bool is_thp_gfp_mask(gfp_t gfp_mask)
 	return (gfp_mask & (GFP_TRANSHUGE | __GFP_KSWAPD_RECLAIM)) == GFP_TRANSHUGE;
 }
 
+#if 1
+
+static u8 memalloc_counter_active_index; /* Either 0 or 1. */
+static int memalloc_counter[2]; /* Number of tasks doing memory allocation. */
+
+struct memalloc {
+	struct list_head list; /* Connected to memalloc_list. */
+	struct task_struct *task; /* Iniatilized to current. */
+	unsigned long start; /* Initialized to jiffies. */
+	unsigned int order;
+	gfp_t gfp;
+	u8 index; /* Initialized to memalloc_counter_active_index. */
+};
+
+static LIST_HEAD(memalloc_list); /* List of "struct memalloc".*/
+static DEFINE_SPINLOCK(memalloc_list_lock); /* Lock for memalloc_list. */
+
+/*
+ * malloc_watchdog - A kernel thread for monitoring memory allocation stalls.
+ *
+ * @unused: Not used.
+ *
+ * This kernel thread does not terminate.
+ */
+static int malloc_watchdog(void *unused)
+{
+	static const unsigned long timeout = 10 * HZ;
+	struct memalloc *m;
+	struct task_struct *g, *p;
+	unsigned long now;
+	unsigned long spent;
+	unsigned int sigkill_pending;
+	unsigned int memdie_pending;
+	unsigned int stalling_tasks;
+	u8 index;
+
+ not_stalling: /* Healty case. */
+	/*
+	 * Switch active counter and wait for timeout duration.
+	 * This is a kind of open coded implementation of synchronize_srcu()
+	 * because synchronize_srcu_timeout() is missing.
+	 */
+	spin_lock(&memalloc_list_lock);
+	index = memalloc_counter_active_index;
+	memalloc_counter_active_index ^= 1;
+	spin_unlock(&memalloc_list_lock);
+	schedule_timeout_interruptible(timeout);
+	/*
+	 * If memory allocations are working, the counter should remain 0
+	 * because tasks will be able to call both start_memalloc_timer()
+	 * and stop_memalloc_timer() within timeout duration.
+	 */
+	if (likely(!memalloc_counter[index]))
+		goto not_stalling;
+ maybe_stalling: /* Maybe something is wrong. Let's check. */
+	/* First, report whether there are SIGKILL tasks and/or OOM victims. */
+	sigkill_pending = 0;
+	memdie_pending = 0;
+	stalling_tasks = 0;
+	preempt_disable();
+	rcu_read_lock();
+	for_each_process_thread(g, p) {
+		if (test_tsk_thread_flag(p, TIF_MEMDIE))
+			memdie_pending++;
+		if (fatal_signal_pending(p))
+			sigkill_pending++;
+	}
+	rcu_read_unlock();
+	preempt_enable();
+	spin_lock(&memalloc_list_lock);
+	now = jiffies;
+	list_for_each_entry(m, &memalloc_list, list) {
+		spent = now - m->start;
+		if (time_before(spent, timeout))
+			continue;
+		stalling_tasks++;
+	}
+	pr_warn("MemAlloc-Info: %u stalling task, %u dying task, %u victim task.\n",
+		stalling_tasks, sigkill_pending, memdie_pending);
+	/* Next, report tasks stalled at memory allocation. */
+	list_for_each_entry(m, &memalloc_list, list) {
+		spent = now - m->start;
+		if (time_before(spent, timeout))
+			continue;
+		p = m->task;
+		pr_warn("MemAlloc%s: %s(%u) gfp=0x%x order=%u delay=%lu\n",
+			test_tsk_thread_flag(p, TIF_MEMDIE) ? "-victim" :
+			(fatal_signal_pending(p) ? "-dying" : ""),
+			p->comm, p->pid, m->gfp, m->order, spent);
+		show_stack(p, NULL);
+	}
+	spin_unlock(&memalloc_list_lock);
+	/* Wait until next timeout duration. */
+	schedule_timeout_interruptible(timeout);
+	if (memalloc_counter[index])
+		goto maybe_stalling;
+	goto not_stalling;
+	return 0;
+}
+
+static int __init start_malloc_watchdog(void)
+{
+	struct task_struct *task = kthread_run(malloc_watchdog, NULL,
+					       "MallocWatchdog");
+	BUG_ON(IS_ERR(task));
+	return 0;
+}
+late_initcall(start_malloc_watchdog);
+
+#define DEFINE_MEMALLOC_TIMER(m) struct memalloc m = { .task = NULL }
+
+static void start_memalloc_timer(struct memalloc *m, gfp_t gfp_mask, int order)
+{
+	if (m->task)
+		return;
+	m->task = current;
+	m->start = jiffies;
+	m->gfp = gfp_mask;
+	order = order;
+	spin_lock(&memalloc_list_lock);
+	m->index = memalloc_counter_active_index;
+	memalloc_counter[m->index]++;
+	list_add_tail(&m->list, &memalloc_list);
+	spin_unlock(&memalloc_list_lock);
+}
+
+static void stop_memalloc_timer(struct memalloc *m)
+{
+	if (!m->task)
+		return;
+	spin_lock(&memalloc_list_lock);
+	memalloc_counter[m->index]--;
+	list_del(&m->list);
+	spin_unlock(&memalloc_list_lock);
+}
+#else
+#define DEFINE_MEMALLOC_TIMER(m)
+#define start_memalloc_timer(m, gfp_mask, order)
+#define stop_memalloc_timer(m)
+#endif
+
 static inline struct page *
 __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 						struct alloc_context *ac)
@@ -2984,6 +3125,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	enum migrate_mode migration_mode = MIGRATE_ASYNC;
 	bool deferred_compaction = false;
 	int contended_compaction = COMPACT_CONTENDED_NONE;
+	DEFINE_MEMALLOC_TIMER(m);
 
 	/*
 	 * In the slowpath, we sanity check order to avoid ever trying to
@@ -3075,6 +3217,8 @@ retry:
 	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
 		goto nopage;
 
+	start_memalloc_timer(&m, gfp_mask, order);
+
 	/*
 	 * Try direct compaction. The first pass is asynchronous. Subsequent
 	 * attempts after direct reclaim are synchronous
@@ -3168,6 +3312,7 @@ noretry:
 nopage:
 	warn_alloc_failed(gfp_mask, order, NULL);
 got_pg:
+	stop_memalloc_timer(&m);
 	return page;
 }
 
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2015-10-10 12:51 UTC|newest]

Thread overview: 109+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-17 17:59 [PATCH] mm/oom_kill.c: don't kill TASK_UNINTERRUPTIBLE tasks Kyle Walker
2015-09-17 19:22 ` Oleg Nesterov
2015-09-18 15:41   ` Christoph Lameter
2015-09-18 16:24     ` Oleg Nesterov
2015-09-18 16:39       ` Tetsuo Handa
2015-09-18 16:54         ` Oleg Nesterov
2015-09-18 17:00       ` Christoph Lameter
2015-09-18 19:07         ` Oleg Nesterov
2015-09-18 19:19           ` Christoph Lameter
2015-09-18 21:28             ` Kyle Walker
2015-09-18 22:07               ` Christoph Lameter
2015-09-19  8:32         ` Michal Hocko
2015-09-19 14:33           ` Tetsuo Handa
2015-09-19 15:51             ` Michal Hocko
2015-09-21 23:33             ` David Rientjes
2015-09-22  5:33               ` Tetsuo Handa
2015-09-22 23:32                 ` David Rientjes
2015-09-23 12:03                   ` Kyle Walker
2015-09-24 11:50                     ` Tetsuo Handa
2015-09-19 14:44           ` Oleg Nesterov
2015-09-21 23:27         ` David Rientjes
2015-09-19  8:25     ` Michal Hocko
2015-09-19  8:22 ` Michal Hocko
2015-09-21 23:08   ` David Rientjes
2015-09-19 15:03 ` can't oom-kill zap the victim's memory? Oleg Nesterov
2015-09-19 15:10   ` Oleg Nesterov
2015-09-19 15:58   ` Michal Hocko
2015-09-20 13:16     ` Oleg Nesterov
2015-09-19 22:24   ` Linus Torvalds
2015-09-19 22:54     ` Raymond Jennings
2015-09-19 23:00     ` Raymond Jennings
2015-09-19 23:13       ` Linus Torvalds
2015-09-20  9:33     ` Michal Hocko
2015-09-20 13:06       ` Oleg Nesterov
2015-09-20 12:56     ` Oleg Nesterov
2015-09-20 18:05       ` Linus Torvalds
2015-09-20 18:21         ` Raymond Jennings
2015-09-20 18:23         ` Raymond Jennings
2015-09-20 19:07         ` Raymond Jennings
2015-09-21 13:57           ` Oleg Nesterov
2015-09-21 13:44         ` Oleg Nesterov
2015-09-21 14:24           ` Michal Hocko
2015-09-21 15:32             ` Oleg Nesterov
2015-09-21 16:12               ` Michal Hocko
2015-09-22 16:06                 ` Oleg Nesterov
2015-09-22 23:04                   ` David Rientjes
2015-09-23 20:59                   ` Michal Hocko
2015-09-24 21:15                     ` David Rientjes
2015-09-25  9:35                       ` Michal Hocko
2015-09-25 16:14                         ` Tetsuo Handa
2015-09-28 16:18                           ` Tetsuo Handa
2015-09-28 22:28                             ` David Rientjes
2015-10-02 12:36                             ` Michal Hocko
2015-10-02 19:01                               ` Linus Torvalds
2015-10-05 14:44                                 ` Michal Hocko
2015-10-07  5:16                                   ` Vlastimil Babka
2015-10-07 10:43                                     ` Tetsuo Handa
2015-10-08  9:40                                       ` Vlastimil Babka
2015-10-06  7:55                                 ` Eric W. Biederman
2015-10-06  8:49                                   ` Linus Torvalds
2015-10-06  8:55                                     ` Linus Torvalds
2015-10-06 14:52                                       ` Eric W. Biederman
2015-10-03  6:02                               ` Can't we use timeout based OOM warning/killing? Tetsuo Handa
2015-10-06 14:51                                 ` Tetsuo Handa
2015-10-12  6:43                                   ` Tetsuo Handa
2015-10-12 15:25                                     ` Silent hang up caused by pages being not scanned? Tetsuo Handa
2015-10-12 21:23                                       ` Linus Torvalds
2015-10-13 12:21                                         ` Tetsuo Handa
2015-10-13 16:37                                           ` Linus Torvalds
2015-10-14 12:21                                             ` Tetsuo Handa
2015-10-15 13:14                                             ` Michal Hocko
2015-10-16 15:57                                               ` Michal Hocko
2015-10-16 18:34                                                 ` Linus Torvalds
2015-10-16 18:49                                                   ` Tetsuo Handa
2015-10-19 12:57                                                     ` Michal Hocko
2015-10-19 12:53                                                   ` Michal Hocko
2015-10-13 13:32                                       ` Michal Hocko
2015-10-13 16:19                                         ` Tetsuo Handa
2015-10-14 13:22                                           ` Michal Hocko
2015-10-14 14:38                                             ` Tetsuo Handa
2015-10-14 14:59                                               ` Michal Hocko
2015-10-14 15:06                                                 ` Tetsuo Handa
2015-10-26 11:44                                     ` Newbie's question: memory allocation when reclaiming memory Tetsuo Handa
2015-11-05  8:46                                       ` Vlastimil Babka
2015-10-06 15:25                                 ` Can't we use timeout based OOM warning/killing? Linus Torvalds
2015-10-08 15:33                                   ` Tetsuo Handa
2015-10-10 12:50                                 ` Tetsuo Handa [this message]
2015-09-28 22:24                         ` can't oom-kill zap the victim's memory? David Rientjes
2015-09-29  7:57                           ` Tetsuo Handa
2015-09-29 22:56                             ` David Rientjes
2015-09-30  4:25                               ` Tetsuo Handa
2015-09-30 10:21                                 ` Tetsuo Handa
2015-09-30 21:11                                 ` David Rientjes
2015-10-01 12:13                                   ` Tetsuo Handa
2015-10-01 14:48                           ` Michal Hocko
2015-10-02 13:06                             ` Tetsuo Handa
2015-10-06 18:45                     ` Oleg Nesterov
2015-10-07 11:03                       ` Tetsuo Handa
2015-10-07 12:00                         ` Oleg Nesterov
2015-10-08 14:04                           ` Michal Hocko
2015-10-08 14:01                       ` Michal Hocko
2015-09-21 16:51               ` Tetsuo Handa
2015-09-22 12:43                 ` Oleg Nesterov
2015-09-22 14:30                   ` Tetsuo Handa
2015-09-22 14:45                     ` Oleg Nesterov
2015-09-21 23:42               ` David Rientjes
2015-09-21 16:55           ` Linus Torvalds
2015-09-20 14:50   ` Tetsuo Handa
2015-09-20 14:55     ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201510102150.CHH51580.QSHOFOtFLVOJFM@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=hannes@cmpxchg.org \
    --cc=kwalker@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=oleg@redhat.com \
    --cc=rientjes@google.com \
    --cc=skozina@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).