LKML Archive on lore.kernel.org
 help / Atom feed
* tty crash due to auto-failing vmalloc
@ 2017-10-03 22:55 Johannes Weiner
  2017-10-03 23:51 ` Alan Cox
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Johannes Weiner @ 2017-10-03 22:55 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alan Cox, Christoph Hellwig, Andrew Morton, linux-mm,
	linux-kernel, kernel-team

On some of our machines, we see this warning:

	/* switch the line discipline */
	tty->ldisc = ld;
	tty_set_termios_ldisc(tty, disc);
	retval = tty_ldisc_open(tty, tty->ldisc);
	if (retval) {
->		if (!WARN_ON(disc == N_TTY)) {
			tty_ldisc_put(tty->ldisc);
			tty->ldisc = NULL;
		}
	}

where the stack is

tty_ldisc_reinit
tty_ldisc_hangup
__tty_hangup
do_exit
do_signal
syscall

This is followed by a NULL pointer deref crash in n_tty_set_termios,
presumably when it tries to deref that unallocated tty->disc_data.

The only way n_tty_open() can fail is if the vmalloc in there fails.
struct n_tty_data isn't terribly big, but ever since the following
patch it doesn't even *try* the allocation:

commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb
Author: Michal Hocko <mhocko@suse.com>
Date:   Fri Feb 24 14:58:53 2017 -0800

    vmalloc: back off when the current task is killed
    
    __vmalloc_area_node() allocates pages to cover the requested vmalloc
    size.  This can be a lot of memory.  If the current task is killed by
    the OOM killer, and thus has an unlimited access to memory reserves, it
    can consume all the memory theoretically.  Fix this by checking for
    fatal_signal_pending and back off early.
    
    Link: http://lkml.kernel.org/r/20170201092706.9966-4-mhocko@kernel.org
    Signed-off-by: Michal Hocko <mhocko@suse.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

This talks about the oom killer and memory exhaustion, but most fatal
signals don't happen due to the OOM killer.

I think this patch should be reverted. If somebody is vmallocing crazy
amounts of memory in the exit path we should probably track them down
individually; the patch doesn't reference any real instances of that.
But we cannot start failing allocations that have never failed before.

That said, maybe we want Alan's N_NULL failover in the hangup path too?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: tty crash due to auto-failing vmalloc
  2017-10-03 22:55 tty crash due to auto-failing vmalloc Johannes Weiner
@ 2017-10-03 23:51 ` Alan Cox
  2017-10-04  8:33 ` Michal Hocko
  2017-10-04 18:58 ` Johannes Weiner
  2 siblings, 0 replies; 22+ messages in thread
From: Alan Cox @ 2017-10-03 23:51 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, Christoph Hellwig, Andrew Morton, linux-mm,
	linux-kernel, kernel-team

> I think this patch should be reverted. If somebody is vmallocing crazy
> amounts of memory in the exit path we should probably track them down
> individually; the patch doesn't reference any real instances of that.
> But we cannot start failing allocations that have never failed before.
> 
> That said, maybe we want Alan's N_NULL failover in the hangup path too?

I think that would be best. There's always going to be a failure case
even if the vmalloc change makes it rarer. Dropping back to N_NULL fixes
all of the cases.

Alan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: tty crash due to auto-failing vmalloc
  2017-10-03 22:55 tty crash due to auto-failing vmalloc Johannes Weiner
  2017-10-03 23:51 ` Alan Cox
@ 2017-10-04  8:33 ` Michal Hocko
  2017-10-04 18:58 ` Johannes Weiner
  2 siblings, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2017-10-04  8:33 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Alan Cox, Christoph Hellwig, Andrew Morton, linux-mm,
	linux-kernel, kernel-team

On Tue 03-10-17 18:55:04, Johannes Weiner wrote:
[...]
> commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb
> Author: Michal Hocko <mhocko@suse.com>
> Date:   Fri Feb 24 14:58:53 2017 -0800
> 
>     vmalloc: back off when the current task is killed
>     
>     __vmalloc_area_node() allocates pages to cover the requested vmalloc
>     size.  This can be a lot of memory.  If the current task is killed by
>     the OOM killer, and thus has an unlimited access to memory reserves, it
>     can consume all the memory theoretically.  Fix this by checking for
>     fatal_signal_pending and back off early.
>     
>     Link: http://lkml.kernel.org/r/20170201092706.9966-4-mhocko@kernel.org
>     Signed-off-by: Michal Hocko <mhocko@suse.com>
>     Reviewed-by: Christoph Hellwig <hch@lst.de>
>     Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>     Cc: Al Viro <viro@zeniv.linux.org.uk>
>     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> 
> This talks about the oom killer and memory exhaustion, but most fatal
> signals don't happen due to the OOM killer.

Now that we have cd04ae1e2dc8 ("mm, oom: do not rely on TIF_MEMDIE for
memory reserves access") the risk of the memory depletion is much
smaller so reverting the above commit should be acceptable. On the other
hand the failure is still possible and the caller should be prepared for
that.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: tty crash due to auto-failing vmalloc
  2017-10-03 22:55 tty crash due to auto-failing vmalloc Johannes Weiner
  2017-10-03 23:51 ` Alan Cox
  2017-10-04  8:33 ` Michal Hocko
@ 2017-10-04 18:58 ` Johannes Weiner
  2017-10-04 18:59   ` [PATCH 1/2] Revert "vmalloc: back off when the current task is killed" Johannes Weiner
  2017-10-04 18:59   ` [PATCH 2/2] tty: fall back to N_NULL if switching to N_TTY fails during hangup Johannes Weiner
  2 siblings, 2 replies; 22+ messages in thread
From: Johannes Weiner @ 2017-10-04 18:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alan Cox, Christoph Hellwig, Michal Hocko, linux-mm,
	linux-kernel, kernel-team

Okay, how about the following two patches?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/2] Revert "vmalloc: back off when the current task is killed"
  2017-10-04 18:58 ` Johannes Weiner
@ 2017-10-04 18:59   ` Johannes Weiner
  2017-10-04 20:49     ` Tetsuo Handa
                       ` (3 more replies)
  2017-10-04 18:59   ` [PATCH 2/2] tty: fall back to N_NULL if switching to N_TTY fails during hangup Johannes Weiner
  1 sibling, 4 replies; 22+ messages in thread
From: Johannes Weiner @ 2017-10-04 18:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alan Cox, Christoph Hellwig, Michal Hocko, linux-mm,
	linux-kernel, kernel-team

This reverts commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb and
commit 171012f561274784160f666f8398af8b42216e1f.

5d17a73a2ebe ("vmalloc: back off when the current task is killed")
made all vmalloc allocations from a signal-killed task fail. We have
seen crashes in the tty driver from this, where a killed task exiting
tries to switch back to N_TTY, fails n_tty_open because of the vmalloc
failing, and later crashes when dereferencing tty->disc_data.

Arguably, relying on a vmalloc() call to succeed in order to properly
exit a task is not the most robust way of doing things. There will be
a follow-up patch to the tty code to fall back to the N_NULL ldisc.

But the justification to make that vmalloc() call fail like this isn't
convincing, either. The patch mentions an OOM victim exhausting the
memory reserves and thus deadlocking the machine. But the OOM killer
is only one, improbable source of fatal signals. It doesn't make sense
to fail allocations preemptively with plenty of memory in most cases.

The patch doesn't mention real-life instances where vmalloc sites
would exhaust memory, which makes it sound more like a theoretical
issue to begin with. But just in case, the OOM access to memory
reserves has been restricted on the allocator side in cd04ae1e2dc8
("mm, oom: do not rely on TIF_MEMDIE for memory reserves access"),
which should take care of any theoretical concerns on that front.

Revert this patch, and the follow-up that suppresses the allocation
warnings when we fail the allocations due to a signal.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/vmalloc.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 8a43db6284eb..673942094328 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1695,11 +1695,6 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	for (i = 0; i < area->nr_pages; i++) {
 		struct page *page;
 
-		if (fatal_signal_pending(current)) {
-			area->nr_pages = i;
-			goto fail_no_warn;
-		}
-
 		if (node == NUMA_NO_NODE)
 			page = alloc_page(alloc_mask|highmem_mask);
 		else
@@ -1723,7 +1718,6 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	warn_alloc(gfp_mask, NULL,
 			  "vmalloc: allocation failure, allocated %ld of %ld bytes",
 			  (area->nr_pages*PAGE_SIZE), area->size);
-fail_no_warn:
 	vfree(area->addr);
 	return NULL;
 }
-- 
2.14.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 2/2] tty: fall back to N_NULL if switching to N_TTY fails during hangup
  2017-10-04 18:58 ` Johannes Weiner
  2017-10-04 18:59   ` [PATCH 1/2] Revert "vmalloc: back off when the current task is killed" Johannes Weiner
@ 2017-10-04 18:59   ` Johannes Weiner
  1 sibling, 0 replies; 22+ messages in thread
From: Johannes Weiner @ 2017-10-04 18:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alan Cox, Christoph Hellwig, Michal Hocko, linux-mm,
	linux-kernel, kernel-team

We have seen NULL-pointer dereference crashes in tty->disc_data when
the N_TTY fallback driver failed to open during hangup. The immediate
cause of this open to fail has been addressed in the preceding patch
to vmalloc(), but this code could be more robust.

As Alan pointed out in 8a8dabf2dd68 ("tty: handle the case where we
cannot restore a line discipline"), the N_TTY driver, historically the
safe fallback that could never fail, can indeed fail, but the
surrounding code is not prepared to handle this. To avoid crashes he
added a new N_NULL driver to take N_TTY's place as the last resort.

Hook that fallback up to the hangup path. Update tty_ldisc_reinit() to
reflect the reality that n_tty_open can indeed fail.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 drivers/tty/tty_ldisc.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 2fe216b276e2..84a8ac2a779f 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -694,10 +694,8 @@ int tty_ldisc_reinit(struct tty_struct *tty, int disc)
 	tty_set_termios_ldisc(tty, disc);
 	retval = tty_ldisc_open(tty, tty->ldisc);
 	if (retval) {
-		if (!WARN_ON(disc == N_TTY)) {
-			tty_ldisc_put(tty->ldisc);
-			tty->ldisc = NULL;
-		}
+		tty_ldisc_put(tty->ldisc);
+		tty->ldisc = NULL;
 	}
 	return retval;
 }
@@ -752,8 +750,9 @@ void tty_ldisc_hangup(struct tty_struct *tty, bool reinit)
 
 	if (tty->ldisc) {
 		if (reinit) {
-			if (tty_ldisc_reinit(tty, tty->termios.c_line) < 0)
-				tty_ldisc_reinit(tty, N_TTY);
+			if (tty_ldisc_reinit(tty, tty->termios.c_line) < 0 &&
+			    tty_ldisc_reinit(tty, N_TTY) < 0)
+				WARN_ON(tty_ldisc_reinit(tty, N_NULL) < 0);
 		} else
 			tty_ldisc_kill(tty);
 	}
-- 
2.14.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] Revert "vmalloc: back off when the current task is killed"
  2017-10-04 18:59   ` [PATCH 1/2] Revert "vmalloc: back off when the current task is killed" Johannes Weiner
@ 2017-10-04 20:49     ` Tetsuo Handa
  2017-10-04 21:00       ` Johannes Weiner
  2017-10-04 22:32     ` Andrew Morton
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Tetsuo Handa @ 2017-10-04 20:49 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Alan Cox, Christoph Hellwig, Michal Hocko, linux-mm,
	linux-kernel, kernel-team

On 2017/10/05 3:59, Johannes Weiner wrote:
> But the justification to make that vmalloc() call fail like this isn't
> convincing, either. The patch mentions an OOM victim exhausting the
> memory reserves and thus deadlocking the machine. But the OOM killer
> is only one, improbable source of fatal signals. It doesn't make sense
> to fail allocations preemptively with plenty of memory in most cases.

By the time the current thread reaches do_exit(), fatal_signal_pending(current)
should become false. As far as I can guess, the source of fatal signal will be
tty_signal_session_leader(tty, exit_session) which is called just before
tty_ldisc_hangup(tty, cons_filp != NULL) rather than the OOM killer. I don't
know whether it is possible to make fatal_signal_pending(current) true inside
do_exit() though...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] Revert "vmalloc: back off when the current task is killed"
  2017-10-04 20:49     ` Tetsuo Handa
@ 2017-10-04 21:00       ` Johannes Weiner
  2017-10-04 21:42         ` Tetsuo Handa
  0 siblings, 1 reply; 22+ messages in thread
From: Johannes Weiner @ 2017-10-04 21:00 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Andrew Morton, Alan Cox, Christoph Hellwig, Michal Hocko,
	linux-mm, linux-kernel, kernel-team

On Thu, Oct 05, 2017 at 05:49:43AM +0900, Tetsuo Handa wrote:
> On 2017/10/05 3:59, Johannes Weiner wrote:
> > But the justification to make that vmalloc() call fail like this isn't
> > convincing, either. The patch mentions an OOM victim exhausting the
> > memory reserves and thus deadlocking the machine. But the OOM killer
> > is only one, improbable source of fatal signals. It doesn't make sense
> > to fail allocations preemptively with plenty of memory in most cases.
> 
> By the time the current thread reaches do_exit(), fatal_signal_pending(current)
> should become false. As far as I can guess, the source of fatal signal will be
> tty_signal_session_leader(tty, exit_session) which is called just before
> tty_ldisc_hangup(tty, cons_filp != NULL) rather than the OOM killer. I don't
> know whether it is possible to make fatal_signal_pending(current) true inside
> do_exit() though...

It's definitely not the OOM killer, the memory situation looks fine
when this happens. I didn't look closer where the signal comes from.

That said, we trigger this issue fairly easily. We tested the revert
over night on a couple thousand machines, and it fixed the issue
(whereas the control group still saw the crashes).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] Revert "vmalloc: back off when the current task is killed"
  2017-10-04 21:00       ` Johannes Weiner
@ 2017-10-04 21:42         ` Tetsuo Handa
  2017-10-04 23:21           ` Johannes Weiner
  0 siblings, 1 reply; 22+ messages in thread
From: Tetsuo Handa @ 2017-10-04 21:42 UTC (permalink / raw)
  To: hannes; +Cc: akpm, alan, hch, mhocko, linux-mm, linux-kernel, kernel-team

Johannes Weiner wrote:
> On Thu, Oct 05, 2017 at 05:49:43AM +0900, Tetsuo Handa wrote:
> > On 2017/10/05 3:59, Johannes Weiner wrote:
> > > But the justification to make that vmalloc() call fail like this isn't
> > > convincing, either. The patch mentions an OOM victim exhausting the
> > > memory reserves and thus deadlocking the machine. But the OOM killer
> > > is only one, improbable source of fatal signals. It doesn't make sense
> > > to fail allocations preemptively with plenty of memory in most cases.
> > 
> > By the time the current thread reaches do_exit(), fatal_signal_pending(current)
> > should become false. As far as I can guess, the source of fatal signal will be
> > tty_signal_session_leader(tty, exit_session) which is called just before
> > tty_ldisc_hangup(tty, cons_filp != NULL) rather than the OOM killer. I don't
> > know whether it is possible to make fatal_signal_pending(current) true inside
> > do_exit() though...
> 
> It's definitely not the OOM killer, the memory situation looks fine
> when this happens. I didn't look closer where the signal comes from.
> 

Then, we could check tsk_is_oom_victim() instead of fatal_signal_pending().

> That said, we trigger this issue fairly easily. We tested the revert
> over night on a couple thousand machines, and it fixed the issue
> (whereas the control group still saw the crashes).
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] Revert "vmalloc: back off when the current task is killed"
  2017-10-04 18:59   ` [PATCH 1/2] Revert "vmalloc: back off when the current task is killed" Johannes Weiner
  2017-10-04 20:49     ` Tetsuo Handa
@ 2017-10-04 22:32     ` Andrew Morton
  2017-10-04 23:18       ` Johannes Weiner
  2017-10-05  6:49     ` Vlastimil Babka
  2017-10-05  7:54     ` Michal Hocko
  3 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2017-10-04 22:32 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Alan Cox, Christoph Hellwig, Michal Hocko, linux-mm,
	linux-kernel, kernel-team

On Wed, 4 Oct 2017 14:59:06 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:

> This reverts commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb and
> commit 171012f561274784160f666f8398af8b42216e1f.
> 
> 5d17a73a2ebe ("vmalloc: back off when the current task is killed")
> made all vmalloc allocations from a signal-killed task fail. We have
> seen crashes in the tty driver from this, where a killed task exiting
> tries to switch back to N_TTY, fails n_tty_open because of the vmalloc
> failing, and later crashes when dereferencing tty->disc_data.
> 
> Arguably, relying on a vmalloc() call to succeed in order to properly
> exit a task is not the most robust way of doing things. There will be
> a follow-up patch to the tty code to fall back to the N_NULL ldisc.
> 
> But the justification to make that vmalloc() call fail like this isn't
> convincing, either. The patch mentions an OOM victim exhausting the
> memory reserves and thus deadlocking the machine. But the OOM killer
> is only one, improbable source of fatal signals. It doesn't make sense
> to fail allocations preemptively with plenty of memory in most cases.
> 
> The patch doesn't mention real-life instances where vmalloc sites
> would exhaust memory, which makes it sound more like a theoretical
> issue to begin with. But just in case, the OOM access to memory
> reserves has been restricted on the allocator side in cd04ae1e2dc8
> ("mm, oom: do not rely on TIF_MEMDIE for memory reserves access"),
> which should take care of any theoretical concerns on that front.
> 
> Revert this patch, and the follow-up that suppresses the allocation
> warnings when we fail the allocations due to a signal.

You don't think they should be backported into -stables?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] Revert "vmalloc: back off when the current task is killed"
  2017-10-04 22:32     ` Andrew Morton
@ 2017-10-04 23:18       ` Johannes Weiner
  2017-10-05  7:57         ` Michal Hocko
  0 siblings, 1 reply; 22+ messages in thread
From: Johannes Weiner @ 2017-10-04 23:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alan Cox, Christoph Hellwig, Michal Hocko, linux-mm,
	linux-kernel, kernel-team

On Wed, Oct 04, 2017 at 03:32:45PM -0700, Andrew Morton wrote:
> On Wed, 4 Oct 2017 14:59:06 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > This reverts commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb and
> > commit 171012f561274784160f666f8398af8b42216e1f.
> > 
> > 5d17a73a2ebe ("vmalloc: back off when the current task is killed")
> > made all vmalloc allocations from a signal-killed task fail. We have
> > seen crashes in the tty driver from this, where a killed task exiting
> > tries to switch back to N_TTY, fails n_tty_open because of the vmalloc
> > failing, and later crashes when dereferencing tty->disc_data.
> > 
> > Arguably, relying on a vmalloc() call to succeed in order to properly
> > exit a task is not the most robust way of doing things. There will be
> > a follow-up patch to the tty code to fall back to the N_NULL ldisc.
> > 
> > But the justification to make that vmalloc() call fail like this isn't
> > convincing, either. The patch mentions an OOM victim exhausting the
> > memory reserves and thus deadlocking the machine. But the OOM killer
> > is only one, improbable source of fatal signals. It doesn't make sense
> > to fail allocations preemptively with plenty of memory in most cases.
> > 
> > The patch doesn't mention real-life instances where vmalloc sites
> > would exhaust memory, which makes it sound more like a theoretical
> > issue to begin with. But just in case, the OOM access to memory
> > reserves has been restricted on the allocator side in cd04ae1e2dc8
> > ("mm, oom: do not rely on TIF_MEMDIE for memory reserves access"),
> > which should take care of any theoretical concerns on that front.
> > 
> > Revert this patch, and the follow-up that suppresses the allocation
> > warnings when we fail the allocations due to a signal.
> 
> You don't think they should be backported into -stables?

Good point. For this one, it makes sense to CC stable, for 4.11 and
up. The second patch is more of a fortification against potential
future issues, and probably shouldn't go into stable.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] Revert "vmalloc: back off when the current task is killed"
  2017-10-04 21:42         ` Tetsuo Handa
@ 2017-10-04 23:21           ` Johannes Weiner
  0 siblings, 0 replies; 22+ messages in thread
From: Johannes Weiner @ 2017-10-04 23:21 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: akpm, alan, hch, mhocko, linux-mm, linux-kernel, kernel-team

On Thu, Oct 05, 2017 at 06:42:38AM +0900, Tetsuo Handa wrote:
> Johannes Weiner wrote:
> > On Thu, Oct 05, 2017 at 05:49:43AM +0900, Tetsuo Handa wrote:
> > > On 2017/10/05 3:59, Johannes Weiner wrote:
> > > > But the justification to make that vmalloc() call fail like this isn't
> > > > convincing, either. The patch mentions an OOM victim exhausting the
> > > > memory reserves and thus deadlocking the machine. But the OOM killer
> > > > is only one, improbable source of fatal signals. It doesn't make sense
> > > > to fail allocations preemptively with plenty of memory in most cases.
> > > 
> > > By the time the current thread reaches do_exit(), fatal_signal_pending(current)
> > > should become false. As far as I can guess, the source of fatal signal will be
> > > tty_signal_session_leader(tty, exit_session) which is called just before
> > > tty_ldisc_hangup(tty, cons_filp != NULL) rather than the OOM killer. I don't
> > > know whether it is possible to make fatal_signal_pending(current) true inside
> > > do_exit() though...
> > 
> > It's definitely not the OOM killer, the memory situation looks fine
> > when this happens. I didn't look closer where the signal comes from.
> > 
> 
> Then, we could check tsk_is_oom_victim() instead of fatal_signal_pending().

The case for this patch didn't seem very strong to beging with, and
since it's causing problems a simple revert makes more sense than an
attempt to fine-tune it.

Generally, we should leave it to the page allocator to handle memory
reserves, not annotate random alloc_page() callsites.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] Revert "vmalloc: back off when the current task is killed"
  2017-10-04 18:59   ` [PATCH 1/2] Revert "vmalloc: back off when the current task is killed" Johannes Weiner
  2017-10-04 20:49     ` Tetsuo Handa
  2017-10-04 22:32     ` Andrew Morton
@ 2017-10-05  6:49     ` Vlastimil Babka
  2017-10-05  7:54     ` Michal Hocko
  3 siblings, 0 replies; 22+ messages in thread
From: Vlastimil Babka @ 2017-10-05  6:49 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Alan Cox, Christoph Hellwig, Michal Hocko, linux-mm,
	linux-kernel, kernel-team

On 10/04/2017 08:59 PM, Johannes Weiner wrote:
> This reverts commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb and
> commit 171012f561274784160f666f8398af8b42216e1f.
> 
> 5d17a73a2ebe ("vmalloc: back off when the current task is killed")
> made all vmalloc allocations from a signal-killed task fail. We have
> seen crashes in the tty driver from this, where a killed task exiting
> tries to switch back to N_TTY, fails n_tty_open because of the vmalloc
> failing, and later crashes when dereferencing tty->disc_data.
> 
> Arguably, relying on a vmalloc() call to succeed in order to properly
> exit a task is not the most robust way of doing things. There will be
> a follow-up patch to the tty code to fall back to the N_NULL ldisc.
> 
> But the justification to make that vmalloc() call fail like this isn't
> convincing, either. The patch mentions an OOM victim exhausting the
> memory reserves and thus deadlocking the machine. But the OOM killer
> is only one, improbable source of fatal signals. It doesn't make sense
> to fail allocations preemptively with plenty of memory in most cases.
> 
> The patch doesn't mention real-life instances where vmalloc sites
> would exhaust memory, which makes it sound more like a theoretical
> issue to begin with. But just in case, the OOM access to memory
> reserves has been restricted on the allocator side in cd04ae1e2dc8
> ("mm, oom: do not rely on TIF_MEMDIE for memory reserves access"),
> which should take care of any theoretical concerns on that front.
> 
> Revert this patch, and the follow-up that suppresses the allocation
> warnings when we fail the allocations due to a signal.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/vmalloc.c | 6 ------
>  1 file changed, 6 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 8a43db6284eb..673942094328 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1695,11 +1695,6 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  	for (i = 0; i < area->nr_pages; i++) {
>  		struct page *page;
>  
> -		if (fatal_signal_pending(current)) {
> -			area->nr_pages = i;
> -			goto fail_no_warn;
> -		}
> -
>  		if (node == NUMA_NO_NODE)
>  			page = alloc_page(alloc_mask|highmem_mask);
>  		else
> @@ -1723,7 +1718,6 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  	warn_alloc(gfp_mask, NULL,
>  			  "vmalloc: allocation failure, allocated %ld of %ld bytes",
>  			  (area->nr_pages*PAGE_SIZE), area->size);
> -fail_no_warn:
>  	vfree(area->addr);
>  	return NULL;
>  }
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] Revert "vmalloc: back off when the current task is killed"
  2017-10-04 18:59   ` [PATCH 1/2] Revert "vmalloc: back off when the current task is killed" Johannes Weiner
                       ` (2 preceding siblings ...)
  2017-10-05  6:49     ` Vlastimil Babka
@ 2017-10-05  7:54     ` Michal Hocko
  3 siblings, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2017-10-05  7:54 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Alan Cox, Christoph Hellwig, linux-mm,
	linux-kernel, kernel-team

On Wed 04-10-17 14:59:06, Johannes Weiner wrote:
> This reverts commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb and
> commit 171012f561274784160f666f8398af8b42216e1f.
> 
> 5d17a73a2ebe ("vmalloc: back off when the current task is killed")
> made all vmalloc allocations from a signal-killed task fail. We have
> seen crashes in the tty driver from this, where a killed task exiting
> tries to switch back to N_TTY, fails n_tty_open because of the vmalloc
> failing, and later crashes when dereferencing tty->disc_data.
> 
> Arguably, relying on a vmalloc() call to succeed in order to properly
> exit a task is not the most robust way of doing things. There will be
> a follow-up patch to the tty code to fall back to the N_NULL ldisc.
> 
> But the justification to make that vmalloc() call fail like this isn't
> convincing, either. The patch mentions an OOM victim exhausting the
> memory reserves and thus deadlocking the machine. But the OOM killer
> is only one, improbable source of fatal signals. It doesn't make sense
> to fail allocations preemptively with plenty of memory in most cases.
> 
> The patch doesn't mention real-life instances where vmalloc sites
> would exhaust memory, which makes it sound more like a theoretical
> issue to begin with. But just in case, the OOM access to memory
> reserves has been restricted on the allocator side in cd04ae1e2dc8
> ("mm, oom: do not rely on TIF_MEMDIE for memory reserves access"),
> which should take care of any theoretical concerns on that front.
> 
> Revert this patch, and the follow-up that suppresses the allocation
> warnings when we fail the allocations due to a signal.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/vmalloc.c | 6 ------
>  1 file changed, 6 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 8a43db6284eb..673942094328 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1695,11 +1695,6 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  	for (i = 0; i < area->nr_pages; i++) {
>  		struct page *page;
>  
> -		if (fatal_signal_pending(current)) {
> -			area->nr_pages = i;
> -			goto fail_no_warn;
> -		}
> -
>  		if (node == NUMA_NO_NODE)
>  			page = alloc_page(alloc_mask|highmem_mask);
>  		else
> @@ -1723,7 +1718,6 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  	warn_alloc(gfp_mask, NULL,
>  			  "vmalloc: allocation failure, allocated %ld of %ld bytes",
>  			  (area->nr_pages*PAGE_SIZE), area->size);
> -fail_no_warn:
>  	vfree(area->addr);
>  	return NULL;
>  }
> -- 
> 2.14.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] Revert "vmalloc: back off when the current task is killed"
  2017-10-04 23:18       ` Johannes Weiner
@ 2017-10-05  7:57         ` Michal Hocko
  2017-10-05 10:36           ` Tetsuo Handa
  0 siblings, 1 reply; 22+ messages in thread
From: Michal Hocko @ 2017-10-05  7:57 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Alan Cox, Christoph Hellwig, linux-mm,
	linux-kernel, kernel-team

On Wed 04-10-17 19:18:21, Johannes Weiner wrote:
> On Wed, Oct 04, 2017 at 03:32:45PM -0700, Andrew Morton wrote:
[...]
> > You don't think they should be backported into -stables?
> 
> Good point. For this one, it makes sense to CC stable, for 4.11 and
> up. The second patch is more of a fortification against potential
> future issues, and probably shouldn't go into stable.

I am not against. It is true that the memory reserves depletion fix was
theoretical because I haven't seen any real life bug. I would argue that
the more robust allocation failure behavior is a stable candidate as
well, though, because the allocation can fail regardless of the vmalloc
revert. It is less likely but still possible.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] Revert "vmalloc: back off when the current task is killed"
  2017-10-05  7:57         ` Michal Hocko
@ 2017-10-05 10:36           ` Tetsuo Handa
  2017-10-05 10:49             ` Michal Hocko
  2017-10-07  2:21             ` Tetsuo Handa
  0 siblings, 2 replies; 22+ messages in thread
From: Tetsuo Handa @ 2017-10-05 10:36 UTC (permalink / raw)
  To: Michal Hocko, Johannes Weiner
  Cc: Andrew Morton, Alan Cox, Christoph Hellwig, linux-mm,
	linux-kernel, kernel-team

On 2017/10/05 16:57, Michal Hocko wrote:
> On Wed 04-10-17 19:18:21, Johannes Weiner wrote:
>> On Wed, Oct 04, 2017 at 03:32:45PM -0700, Andrew Morton wrote:
> [...]
>>> You don't think they should be backported into -stables?
>>
>> Good point. For this one, it makes sense to CC stable, for 4.11 and
>> up. The second patch is more of a fortification against potential
>> future issues, and probably shouldn't go into stable.
> 
> I am not against. It is true that the memory reserves depletion fix was
> theoretical because I haven't seen any real life bug. I would argue that
> the more robust allocation failure behavior is a stable candidate as
> well, though, because the allocation can fail regardless of the vmalloc
> revert. It is less likely but still possible.
> 

I don't want this patch backported. If you want to backport,
"s/fatal_signal_pending/tsk_is_oom_victim/" is the safer way.

On 2017/10/04 17:33, Michal Hocko wrote:
> Now that we have cd04ae1e2dc8 ("mm, oom: do not rely on TIF_MEMDIE for
> memory reserves access") the risk of the memory depletion is much
> smaller so reverting the above commit should be acceptable. 

Are you aware that stable kernels do not have cd04ae1e2dc8 ?

We added fatal_signal_pending() check inside read()/write() loop
because one read()/write() request could consume 2GB of kernel memory.

What if there is a kernel module which uses vmalloc(1GB) from some
ioctl() for legitimate reason? You are going to allow such vmalloc()
calls to deplete memory reserves completely.

On 2017/10/05 8:21, Johannes Weiner wrote:
> Generally, we should leave it to the page allocator to handle memory
> reserves, not annotate random alloc_page() callsites.

I disagree. Interrupting the loop as soon as possible is preferable.

Since we don't have __GFP_KILLABLE, we had to do fatal_signal_pending()
check inside read()/write() loop. Since vmalloc() resembles read()/write()
in a sense that it can consume GB of memory, it is pointless to expect
the caller of vmalloc() to check tsk_is_oom_victim().

Again, checking tsk_is_oom_victim() inside vmalloc() loop is the better.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] Revert "vmalloc: back off when the current task is killed"
  2017-10-05 10:36           ` Tetsuo Handa
@ 2017-10-05 10:49             ` Michal Hocko
  2017-10-07  2:21             ` Tetsuo Handa
  1 sibling, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2017-10-05 10:49 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Johannes Weiner, Andrew Morton, Alan Cox, Christoph Hellwig,
	linux-mm, linux-kernel, kernel-team

On Thu 05-10-17 19:36:17, Tetsuo Handa wrote:
> On 2017/10/05 16:57, Michal Hocko wrote:
> > On Wed 04-10-17 19:18:21, Johannes Weiner wrote:
> >> On Wed, Oct 04, 2017 at 03:32:45PM -0700, Andrew Morton wrote:
> > [...]
> >>> You don't think they should be backported into -stables?
> >>
> >> Good point. For this one, it makes sense to CC stable, for 4.11 and
> >> up. The second patch is more of a fortification against potential
> >> future issues, and probably shouldn't go into stable.
> > 
> > I am not against. It is true that the memory reserves depletion fix was
> > theoretical because I haven't seen any real life bug. I would argue that
> > the more robust allocation failure behavior is a stable candidate as
> > well, though, because the allocation can fail regardless of the vmalloc
> > revert. It is less likely but still possible.
> > 
> 
> I don't want this patch backported. If you want to backport,
> "s/fatal_signal_pending/tsk_is_oom_victim/" is the safer way.
> 
> On 2017/10/04 17:33, Michal Hocko wrote:
> > Now that we have cd04ae1e2dc8 ("mm, oom: do not rely on TIF_MEMDIE for
> > memory reserves access") the risk of the memory depletion is much
> > smaller so reverting the above commit should be acceptable. 
> 
> Are you aware that stable kernels do not have cd04ae1e2dc8 ?

yes

> We added fatal_signal_pending() check inside read()/write() loop
> because one read()/write() request could consume 2GB of kernel memory.

yes, because this is easily trigerable by userspace.

> What if there is a kernel module which uses vmalloc(1GB) from some
> ioctl() for legitimate reason? You are going to allow such vmalloc()
> calls to deplete memory reserves completely.

Do you have any specific example in mind? If yes we can handle it.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] Revert "vmalloc: back off when the current task is killed"
  2017-10-05 10:36           ` Tetsuo Handa
  2017-10-05 10:49             ` Michal Hocko
@ 2017-10-07  2:21             ` Tetsuo Handa
  2017-10-07  2:51               ` Johannes Weiner
  1 sibling, 1 reply; 22+ messages in thread
From: Tetsuo Handa @ 2017-10-07  2:21 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Michal Hocko, Alan Cox, Christoph Hellwig, linux-mm,
	linux-kernel, kernel-team

On 2017/10/05 19:36, Tetsuo Handa wrote:
> I don't want this patch backported. If you want to backport,
> "s/fatal_signal_pending/tsk_is_oom_victim/" is the safer way.

If you backport this patch, you will see "complete depletion of memory reserves"
and "extra OOM kills due to depletion of memory reserves" using below reproducer.

----------
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/oom.h>

static char *buffer;

static int __init test_init(void)
{
	set_current_oom_origin();
	buffer = vmalloc((1UL << 32) - 480 * 1048576);
	clear_current_oom_origin();
	return buffer ? 0 : -ENOMEM;
}

static void test_exit(void)
{
	vfree(buffer);
}

module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");
----------

----------
CentOS Linux 7 (Core)
Kernel 4.13.5+ on an x86_64

ccsecurity login: [   53.637666] test: loading out-of-tree module taints kernel.
[   53.856166] insmod invoked oom-killer: gfp_mask=0x14002c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_NOWARN), nodemask=(null),  order=0, oom_score_adj=0
[   53.858754] insmod cpuset=/ mems_allowed=0
[   53.859713] CPU: 1 PID: 2763 Comm: insmod Tainted: G           O    4.13.5+ #10
[   53.861134] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[   53.863072] Call Trace:
[   53.863548]  dump_stack+0x4d/0x6f
[   53.864172]  dump_header+0x92/0x22a
[   53.864869]  ? has_ns_capability_noaudit+0x30/0x40
[   53.865887]  oom_kill_process+0x250/0x440
[   53.866644]  out_of_memory+0x10d/0x480
[   53.867343]  __alloc_pages_nodemask+0x1087/0x1140
[   53.868216]  alloc_pages_current+0x65/0xd0
[   53.869086]  __vmalloc_node_range+0x129/0x230
[   53.869895]  vmalloc+0x39/0x40
[   53.870472]  ? test_init+0x26/0x1000 [test]
[   53.871248]  test_init+0x26/0x1000 [test]
[   53.871993]  ? 0xffffffffa00fa000
[   53.872609]  do_one_initcall+0x4d/0x190
[   53.873301]  do_init_module+0x5a/0x1f7
[   53.873999]  load_module+0x2022/0x2960
[   53.874678]  ? vfs_read+0x116/0x130
[   53.875312]  SyS_finit_module+0xe1/0xf0
[   53.876074]  ? SyS_finit_module+0xe1/0xf0
[   53.876806]  do_syscall_64+0x5c/0x140
[   53.877488]  entry_SYSCALL64_slow_path+0x25/0x25
[   53.878316] RIP: 0033:0x7f1b27c877f9
[   53.878964] RSP: 002b:00007ffff552e718 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
[   53.880620] RAX: ffffffffffffffda RBX: 0000000000a2d210 RCX: 00007f1b27c877f9
[   53.881883] RDX: 0000000000000000 RSI: 000000000041a678 RDI: 0000000000000003
[   53.883167] RBP: 000000000041a678 R08: 0000000000000000 R09: 00007ffff552e8b8
[   53.884685] R10: 0000000000000003 R11: 0000000000000206 R12: 0000000000000000
[   53.885949] R13: 0000000000a2d1e0 R14: 0000000000000000 R15: 0000000000000000
[   53.887392] Mem-Info:
[   53.887909] active_anon:14248 inactive_anon:2088 isolated_anon:0
[   53.887909]  active_file:4 inactive_file:2 isolated_file:2
[   53.887909]  unevictable:0 dirty:3 writeback:2 unstable:0
[   53.887909]  slab_reclaimable:2818 slab_unreclaimable:4420
[   53.887909]  mapped:453 shmem:2162 pagetables:1676 bounce:0
[   53.887909]  free:21418 free_pcp:0 free_cma:0
[   53.895172] Node 0 active_anon:56992kB inactive_anon:8352kB active_file:12kB inactive_file:12kB unevictable:0kB isolated(anon):0kB isolated(file):8kB mapped:1812kB dirty:12kB writeback:8kB shmem:8648kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 6144kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[   53.901844] Node 0 DMA free:14932kB min:284kB low:352kB high:420kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   53.907765] lowmem_reserve[]: 0 2703 3662 3662
[   53.909333] Node 0 DMA32 free:53424kB min:49684kB low:62104kB high:74524kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2790292kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   53.915597] lowmem_reserve[]: 0 0 958 958
[   53.916992] Node 0 Normal free:17192kB min:17608kB low:22008kB high:26408kB active_anon:56992kB inactive_anon:8352kB active_file:12kB inactive_file:12kB unevictable:0kB writepending:20kB present:1048576kB managed:981384kB mlocked:0kB kernel_stack:3648kB pagetables:6704kB bounce:0kB free_pcp:112kB local_pcp:0kB free_cma:0kB
[   53.924610] lowmem_reserve[]: 0 0 0 0
[   53.926131] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 1*64kB (U) 0*128kB 0*256kB 1*512kB (U) 0*1024kB 1*2048kB (M) 3*4096kB (M) = 14932kB
[   53.929273] Node 0 DMA32: 4*4kB (UM) 2*8kB (UM) 5*16kB (UM) 4*32kB (M) 3*64kB (M) 4*128kB (M) 5*256kB (UM) 4*512kB (M) 4*1024kB (UM) 2*2048kB (UM) 10*4096kB (M) = 53424kB
[   53.934010] Node 0 Normal: 896*4kB (ME) 466*8kB (UME) 288*16kB (UME) 128*32kB (UME) 23*64kB (UM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 17488kB
[   53.937833] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[   53.940769] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[   53.943250] 2166 total pagecache pages
[   53.944788] 0 pages in swap cache
[   53.946249] Swap cache stats: add 0, delete 0, find 0/0
[   53.948075] Free swap  = 0kB
[   53.949419] Total swap = 0kB
[   53.950873] 1048445 pages RAM
[   53.952238] 0 pages HighMem/MovableOnly
[   53.953768] 101550 pages reserved
[   53.955555] 0 pages hwpoisoned
[   53.956923] Out of memory: Kill process 2763 (insmod) score 3621739297 or sacrifice child
[   53.959298] Killed process 2763 (insmod) total-vm:13084kB, anon-rss:132kB, file-rss:0kB, shmem-rss:0kB
[   53.962059] oom_reaper: reaped process 2763 (insmod), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[   53.968054] insmod invoked oom-killer: gfp_mask=0x14002c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_NOWARN), nodemask=(null),  order=0, oom_score_adj=0
[   53.971406] insmod cpuset=/ mems_allowed=0
[   53.973066] CPU: 1 PID: 2763 Comm: insmod Tainted: G           O    4.13.5+ #10
[   53.975339] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[   53.978388] Call Trace:
[   53.979714]  dump_stack+0x4d/0x6f
[   53.981176]  dump_header+0x92/0x22a
[   53.982747]  ? has_ns_capability_noaudit+0x30/0x40
[   53.984481]  oom_kill_process+0x250/0x440
[   53.986133]  out_of_memory+0x10d/0x480
[   53.987667]  __alloc_pages_nodemask+0x1087/0x1140
[   53.989431]  alloc_pages_current+0x65/0xd0
[   53.991037]  __vmalloc_node_range+0x129/0x230
[   53.992775]  vmalloc+0x39/0x40
[   53.994421]  ? test_init+0x26/0x1000 [test]
[   53.996063]  test_init+0x26/0x1000 [test]
[   53.997825]  ? 0xffffffffa00fa000
[   53.999280]  do_one_initcall+0x4d/0x190
[   54.000786]  do_init_module+0x5a/0x1f7
[   54.002351]  load_module+0x2022/0x2960
[   54.003789]  ? vfs_read+0x116/0x130
[   54.005299]  SyS_finit_module+0xe1/0xf0
[   54.006872]  ? SyS_finit_module+0xe1/0xf0
[   54.008300]  do_syscall_64+0x5c/0x140
[   54.009912]  entry_SYSCALL64_slow_path+0x25/0x25
[   54.011464] RIP: 0033:0x7f1b27c877f9
[   54.012816] RSP: 002b:00007ffff552e718 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
[   54.014958] RAX: ffffffffffffffda RBX: 0000000000a2d210 RCX: 00007f1b27c877f9
[   54.017062] RDX: 0000000000000000 RSI: 000000000041a678 RDI: 0000000000000003
[   54.019065] RBP: 000000000041a678 R08: 0000000000000000 R09: 00007ffff552e8b8
[   54.020951] R10: 0000000000000003 R11: 0000000000000206 R12: 0000000000000000
[   54.022738] R13: 0000000000a2d1e0 R14: 0000000000000000 R15: 0000000000000000
[   54.024673] Mem-Info:
[   54.025767] active_anon:14220 inactive_anon:2088 isolated_anon:0
[   54.025767]  active_file:3 inactive_file:0 isolated_file:0
[   54.025767]  unevictable:0 dirty:1 writeback:2 unstable:0
[   54.025767]  slab_reclaimable:2774 slab_unreclaimable:4420
[   54.025767]  mapped:453 shmem:2162 pagetables:1676 bounce:0
[   54.025767]  free:72 free_pcp:0 free_cma:0
[   54.034925] Node 0 active_anon:56880kB inactive_anon:8352kB active_file:12kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:1812kB dirty:4kB writeback:8kB shmem:8648kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 6144kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
[   54.041176] Node 0 DMA free:12kB min:284kB low:352kB high:420kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   54.047349] lowmem_reserve[]: 0 2703 3662 3662
[   54.048922] Node 0 DMA32 free:104kB min:49684kB low:62104kB high:74524kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2790292kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   54.055698] lowmem_reserve[]: 0 0 958 958
[   54.057182] Node 0 Normal free:188kB min:17608kB low:22008kB high:26408kB active_anon:56880kB inactive_anon:8352kB active_file:12kB inactive_file:0kB unevictable:0kB writepending:12kB present:1048576kB managed:981384kB mlocked:0kB kernel_stack:3648kB pagetables:6704kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   54.065665] lowmem_reserve[]: 0 0 0 0
[   54.067279] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   54.069949] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   54.072630] Node 0 Normal: 31*4kB (UM) 5*8kB (UM) 1*16kB (E) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 180kB
[   54.075624] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[   54.078142] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[   54.080509] 2165 total pagecache pages
[   54.081931] 0 pages in swap cache
[   54.083381] Swap cache stats: add 0, delete 0, find 0/0
[   54.085051] Free swap  = 0kB
[   54.086305] Total swap = 0kB
[   54.087931] 1048445 pages RAM
[   54.089296] 0 pages HighMem/MovableOnly
[   54.090731] 101550 pages reserved
[   54.092161] 0 pages hwpoisoned
[   54.093738] Out of memory: Kill process 2458 (tuned) score 3 or sacrifice child
[   54.095910] Killed process 2458 (tuned) total-vm:562424kB, anon-rss:12764kB, file-rss:0kB, shmem-rss:0kB
[   54.098531] insmod: vmalloc: allocation failure, allocated 3725393920 of 3791654912 bytes, mode:0x14000c0(GFP_KERNEL), nodemask=(null)
[   54.101771] insmod cpuset=/ mems_allowed=0
[   54.103661] oom_reaper: reaped process 2458 (tuned), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[   54.103807] tuned invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=0, oom_score_adj=0
[   54.103809] tuned cpuset=/ mems_allowed=0
[   54.103815] CPU: 2 PID: 2712 Comm: tuned Tainted: G           O    4.13.5+ #10
[   54.103815] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[   54.103816] Call Trace:
[   54.103825]  dump_stack+0x4d/0x6f
[   54.103827]  dump_header+0x92/0x22a
[   54.103830]  ? has_ns_capability_noaudit+0x30/0x40
[   54.103834]  oom_kill_process+0x250/0x440
[   54.103835]  out_of_memory+0x10d/0x480
[   54.103836]  __alloc_pages_nodemask+0x1087/0x1140
[   54.103840]  alloc_pages_current+0x65/0xd0
[   54.103843]  pte_alloc_one+0x12/0x40
[   54.103845]  do_huge_pmd_anonymous_page+0xfd/0x620
[   54.103847]  __handle_mm_fault+0x9a7/0x1040
[   54.103848]  ? _lookup_address_cpa.isra.7+0x38/0x40
[   54.103849]  handle_mm_fault+0xd1/0x1c0
[   54.103852]  __do_page_fault+0x28b/0x4f0
[   54.103854]  do_page_fault+0x20/0x70
[   54.103857]  page_fault+0x22/0x30
[   54.103859] RIP: 0010:__get_user_8+0x1b/0x25
[   54.103860] RSP: 0000:ffffc90002703c38 EFLAGS: 00010287
[   54.103860] RAX: 00007fc407fff9e7 RBX: ffff880136cbc740 RCX: 00000000000002b0
[   54.103861] RDX: ffff880133c98e00 RSI: ffff880136cbc740 RDI: ffff880133c98e00
[   54.103861] RBP: ffffc90002703c80 R08: 0000000000000001 R09: 0000000000000000
[   54.103862] R10: ffffc90002703c48 R11: 00000000000003f6 R12: ffff880133c98e00
[   54.103862] R13: ffff880133c98e00 R14: 00007fc407fff9e0 R15: 0000000001399fc8
[   54.103866]  ? exit_robust_list+0x2e/0x110
[   54.103868]  mm_release+0x100/0x140
[   54.103869]  do_exit+0x14b/0xb50
[   54.103871]  ? pick_next_task_fair+0x17d/0x4d0
[   54.103874]  ? put_prev_entity+0x26/0x340
[   54.103875]  do_group_exit+0x36/0xb0
[   54.103878]  get_signal+0x263/0x5f0
[   54.103881]  do_signal+0x32/0x630
[   54.103884]  ? __audit_syscall_exit+0x21a/0x2b0
[   54.103886]  ? syscall_slow_exit_work+0x15c/0x1a0
[   54.103888]  ? getnstimeofday64+0x9/0x20
[   54.103890]  ? wake_up_q+0x80/0x80
[   54.103891]  exit_to_usermode_loop+0x76/0x90
[   54.103892]  do_syscall_64+0x12e/0x140
[   54.103893]  entry_SYSCALL64_slow_path+0x25/0x25
[   54.103895] RIP: 0033:0x7fc42486e923
[   54.103895] RSP: 002b:00007fc407ffe360 EFLAGS: 00000293 ORIG_RAX: 00000000000000e8
[   54.103896] RAX: fffffffffffffffc RBX: 00007fc4259b7828 RCX: 00007fc42486e923
[   54.103896] RDX: 00000000000003ff RSI: 00007fc400001980 RDI: 000000000000000a
[   54.103897] RBP: 00000000ffffffff R08: 00007fc41a1558e0 R09: 0000000000002ff4
[   54.103897] R10: 00000000ffffffff R11: 0000000000000293 R12: 00007fc40c010140
[   54.103898] R13: 00007fc400001980 R14: 00007fc400001790 R15: 0000000001399fc8
[   54.103899] Mem-Info:
[   54.103902] active_anon:11004 inactive_anon:2088 isolated_anon:0
[   54.103902]  active_file:6 inactive_file:0 isolated_file:0
[   54.103902]  unevictable:0 dirty:1 writeback:2 unstable:0
[   54.103902]  slab_reclaimable:2770 slab_unreclaimable:4420
[   54.103902]  mapped:453 shmem:2162 pagetables:1676 bounce:0
[   54.103902]  free:3117 free_pcp:158 free_cma:0
[   54.103904] Node 0 active_anon:44016kB inactive_anon:8352kB active_file:24kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:1812kB dirty:4kB writeback:8kB shmem:8648kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 6144kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
[   54.103905] Node 0 DMA free:12kB min:284kB low:352kB high:420kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   54.103908] lowmem_reserve[]: 0 2703 3662 3662
[   54.103909] Node 0 DMA32 free:104kB min:49684kB low:62104kB high:74524kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2790292kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   54.103911] lowmem_reserve[]: 0 0 958 958
[   54.103912] Node 0 Normal free:12352kB min:17608kB low:22008kB high:26408kB active_anon:44068kB inactive_anon:8352kB active_file:24kB inactive_file:0kB unevictable:0kB writepending:12kB present:1048576kB managed:981384kB mlocked:0kB kernel_stack:3616kB pagetables:6704kB bounce:0kB free_pcp:632kB local_pcp:632kB free_cma:0kB
[   54.103914] lowmem_reserve[]: 0 0 0 0
[   54.103915] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   54.103918] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   54.103921] Node 0 Normal: 536*4kB (UM) 281*8kB (UM) 124*16kB (UME) 76*32kB (UM) 12*64kB (U) 3*128kB (U) 2*256kB (U) 0*512kB 0*1024kB 1*2048kB (M) 0*4096kB = 12520kB
[   54.103926] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[   54.103926] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[   54.103927] 2165 total pagecache pages
[   54.103928] 0 pages in swap cache
[   54.103929] Swap cache stats: add 0, delete 0, find 0/0
[   54.103929] Free swap  = 0kB
[   54.103929] Total swap = 0kB
[   54.103929] 1048445 pages RAM
[   54.103930] 0 pages HighMem/MovableOnly
[   54.103930] 101550 pages reserved
[   54.103930] 0 pages hwpoisoned
[   54.103931] Out of memory: Kill process 2353 (dhclient) score 3 or sacrifice child
[   54.103984] Killed process 2353 (dhclient) total-vm:113384kB, anon-rss:12488kB, file-rss:0kB, shmem-rss:0kB
[   54.262237] CPU: 1 PID: 2763 Comm: insmod Tainted: G           O    4.13.5+ #10
[   54.264476] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[   54.267326] Call Trace:
[   54.268614]  dump_stack+0x4d/0x6f
[   54.270200]  warn_alloc+0x10f/0x1a0
[   54.271723]  __vmalloc_node_range+0x14e/0x230
[   54.273359]  vmalloc+0x39/0x40
[   54.274778]  ? test_init+0x26/0x1000 [test]
[   54.276480]  test_init+0x26/0x1000 [test]
[   54.278081]  ? 0xffffffffa00fa000
[   54.279576]  do_one_initcall+0x4d/0x190
[   54.281089]  do_init_module+0x5a/0x1f7
[   54.282637]  load_module+0x2022/0x2960
[   54.284221]  ? vfs_read+0x116/0x130
[   54.285674]  SyS_finit_module+0xe1/0xf0
[   54.287216]  ? SyS_finit_module+0xe1/0xf0
[   54.288737]  do_syscall_64+0x5c/0x140
[   54.290285]  entry_SYSCALL64_slow_path+0x25/0x25
[   54.291930] RIP: 0033:0x7f1b27c877f9
[   54.293557] RSP: 002b:00007ffff552e718 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
[   54.295810] RAX: ffffffffffffffda RBX: 0000000000a2d210 RCX: 00007f1b27c877f9
[   54.297875] RDX: 0000000000000000 RSI: 000000000041a678 RDI: 0000000000000003
[   54.299904] RBP: 000000000041a678 R08: 0000000000000000 R09: 00007ffff552e8b8
[   54.301935] R10: 0000000000000003 R11: 0000000000000206 R12: 0000000000000000
[   54.303884] R13: 0000000000a2d1e0 R14: 0000000000000000 R15: 0000000000000000
[   54.305896] Mem-Info:
[   54.307238] active_anon:7863 inactive_anon:2088 isolated_anon:0
[   54.307238]  active_file:3 inactive_file:431 isolated_file:0
[   54.307238]  unevictable:0 dirty:1 writeback:2 unstable:0
[   54.307238]  slab_reclaimable:2767 slab_unreclaimable:4413
[   54.307238]  mapped:660 shmem:2162 pagetables:1529 bounce:0
[   54.307238]  free:5315 free_pcp:291 free_cma:0
[   54.317589] Node 0 active_anon:31452kB inactive_anon:8352kB active_file:12kB inactive_file:1836kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2700kB dirty:4kB writeback:8kB shmem:8648kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 4096kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
[   54.324325] Node 0 DMA free:12kB min:284kB low:352kB high:420kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   54.330628] lowmem_reserve[]: 0 2703 3662 3662
[   54.332163] Node 0 DMA32 free:104kB min:49684kB low:62104kB high:74524kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2790292kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   54.338996] lowmem_reserve[]: 0 0 958 958
[   54.340615] Node 0 Normal free:20648kB min:17608kB low:22008kB high:26408kB active_anon:31452kB inactive_anon:8352kB active_file:12kB inactive_file:2360kB unevictable:0kB writepending:12kB present:1048576kB managed:981384kB mlocked:0kB kernel_stack:3584kB pagetables:6116kB bounce:0kB free_pcp:1192kB local_pcp:8kB free_cma:0kB
[   54.348671] lowmem_reserve[]: 0 0 0 0
[   54.350205] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   54.353027] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   54.355895] Node 0 Normal: 580*4kB (UE) 329*8kB (U) 129*16kB (UE) 70*32kB (U) 16*64kB (U) 5*128kB (UM) 5*256kB (UM) 2*512kB (M) 1*1024kB (M) 3*2048kB (M) 0*4096kB = 20392kB
[   54.360581] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[   54.363080] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[   54.365507] 2864 total pagecache pages
[   54.366963] 0 pages in swap cache
[   54.368390] Swap cache stats: add 0, delete 0, find 0/0
[   54.370124] Free swap  = 0kB
[   54.371431] Total swap = 0kB
[   54.372770] 1048445 pages RAM
[   54.374085] 0 pages HighMem/MovableOnly
[   54.376827] 101550 pages reserved
[   54.378635] 0 pages hwpoisoned
----------

On the other hand, if you do "s/fatal_signal_pending/tsk_is_oom_victim/", there
is no "depletion of memory reseres" and no "extra OOM kills due to depletion of
memory reserves".

----------
CentOS Linux 7 (Core)
Kernel 4.13.5+ on an x86_64

ccsecurity login: [   54.746704] test: loading out-of-tree module taints kernel.
[   54.896608] insmod invoked oom-killer: gfp_mask=0x14002c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_NOWARN), nodemask=(null),  order=0, oom_score_adj=0
[   54.900107] insmod cpuset=/ mems_allowed=0
[   54.902235] CPU: 3 PID: 2749 Comm: insmod Tainted: G           O    4.13.5+ #11
[   54.906886] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[   54.909943] Call Trace:
[   54.911433]  dump_stack+0x4d/0x6f
[   54.912957]  dump_header+0x92/0x22a
[   54.914426]  ? has_ns_capability_noaudit+0x30/0x40
[   54.916242]  oom_kill_process+0x250/0x440
[   54.917912]  out_of_memory+0x10d/0x480
[   54.919426]  __alloc_pages_nodemask+0x1087/0x1140
[   54.921365]  ? vmap_page_range_noflush+0x280/0x320
[   54.923232]  alloc_pages_current+0x65/0xd0
[   54.924784]  __vmalloc_node_range+0x16a/0x280
[   54.926386]  vmalloc+0x39/0x40
[   54.927686]  ? test_init+0x26/0x1000 [test]
[   54.929258]  test_init+0x26/0x1000 [test]
[   54.930793]  ? 0xffffffffa00a0000
[   54.932167]  do_one_initcall+0x4d/0x190
[   54.933586]  ? kfree+0x16f/0x180
[   54.934992]  ? kfree+0x16f/0x180
[   54.936393]  do_init_module+0x5a/0x1f7
[   54.937807]  load_module+0x2022/0x2960
[   54.939344]  ? vfs_read+0x116/0x130
[   54.940901]  SyS_finit_module+0xe1/0xf0
[   54.942386]  ? SyS_finit_module+0xe1/0xf0
[   54.943955]  do_syscall_64+0x5c/0x140
[   54.945991]  entry_SYSCALL64_slow_path+0x25/0x25
[   54.947802] RIP: 0033:0x7fd1655057f9
[   54.949220] RSP: 002b:00007fff9d59fdf8 EFLAGS: 00000202 ORIG_RAX: 0000000000000139
[   54.951317] RAX: ffffffffffffffda RBX: 000000000085e210 RCX: 00007fd1655057f9
[   54.953379] RDX: 0000000000000000 RSI: 000000000041a678 RDI: 0000000000000003
[   54.955837] RBP: 000000000041a678 R08: 0000000000000000 R09: 00007fff9d59ff98
[   54.959966] R10: 0000000000000003 R11: 0000000000000202 R12: 0000000000000000
[   54.962171] R13: 000000000085e1e0 R14: 0000000000000000 R15: 0000000000000000
[   54.978917] Mem-Info:
[   54.980118] active_anon:13936 inactive_anon:2088 isolated_anon:0
[   54.980118]  active_file:32 inactive_file:6 isolated_file:0
[   54.980118]  unevictable:0 dirty:10 writeback:0 unstable:0
[   54.980118]  slab_reclaimable:2812 slab_unreclaimable:4414
[   54.980118]  mapped:456 shmem:2162 pagetables:1681 bounce:0
[   54.980118]  free:21335 free_pcp:0 free_cma:0
[   54.990120] Node 0 active_anon:55744kB inactive_anon:8352kB active_file:128kB inactive_file:24kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:1824kB dirty:40kB writeback:0kB shmem:8648kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 10240kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[   54.996847] Node 0 DMA free:14932kB min:284kB low:352kB high:420kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   55.003426] lowmem_reserve[]: 0 2703 3662 3662
[   55.004962] Node 0 DMA32 free:53056kB min:49684kB low:62104kB high:74524kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2790292kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   55.011598] lowmem_reserve[]: 0 0 958 958
[   55.013852] Node 0 Normal free:17352kB min:17608kB low:22008kB high:26408kB active_anon:55696kB inactive_anon:8352kB active_file:364kB inactive_file:180kB unevictable:0kB writepending:36kB present:1048576kB managed:981384kB mlocked:0kB kernel_stack:3600kB pagetables:6724kB bounce:0kB free_pcp:120kB local_pcp:120kB free_cma:0kB
[   55.021929] lowmem_reserve[]: 0 0 0 0
[   55.023636] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 1*64kB (U) 0*128kB 0*256kB 1*512kB (U) 0*1024kB 1*2048kB (M) 3*4096kB (M) = 14932kB
[   55.026942] Node 0 DMA32: 4*4kB (UM) 2*8kB (UM) 5*16kB (UM) 4*32kB (M) 3*64kB (M) 5*128kB (UM) 4*256kB (M) 4*512kB (M) 4*1024kB (UM) 2*2048kB (UM) 10*4096kB (M) = 53296kB
[   55.031534] Node 0 Normal: 974*4kB (UME) 560*8kB (UME) 288*16kB (ME) 96*32kB (ME) 24*64kB (UM) 0*128kB 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 17848kB
[   55.036126] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[   55.038841] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[   55.041431] 2197 total pagecache pages
[   55.043071] 0 pages in swap cache
[   55.044597] Swap cache stats: add 0, delete 0, find 0/0
[   55.046509] Free swap  = 0kB
[   55.047977] Total swap = 0kB
[   55.049548] 1048445 pages RAM
[   55.051143] 0 pages HighMem/MovableOnly
[   55.052799] 101550 pages reserved
[   55.054319] 0 pages hwpoisoned
[   55.055906] Out of memory: Kill process 2749 (insmod) score 3621739297 or sacrifice child
[   55.058429] Killed process 2749 (insmod) total-vm:13084kB, anon-rss:132kB, file-rss:0kB, shmem-rss:0kB
[   55.061278] oom_reaper: reaped process 2749 (insmod), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
----------

Therfore, I throw

Nacked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] Revert "vmalloc: back off when the current task is killed"
  2017-10-07  2:21             ` Tetsuo Handa
@ 2017-10-07  2:51               ` Johannes Weiner
  2017-10-07  4:05                 ` Tetsuo Handa
  0 siblings, 1 reply; 22+ messages in thread
From: Johannes Weiner @ 2017-10-07  2:51 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Andrew Morton, Michal Hocko, Alan Cox, Christoph Hellwig,
	linux-mm, linux-kernel, kernel-team

On Sat, Oct 07, 2017 at 11:21:26AM +0900, Tetsuo Handa wrote:
> On 2017/10/05 19:36, Tetsuo Handa wrote:
> > I don't want this patch backported. If you want to backport,
> > "s/fatal_signal_pending/tsk_is_oom_victim/" is the safer way.
> 
> If you backport this patch, you will see "complete depletion of memory reserves"
> and "extra OOM kills due to depletion of memory reserves" using below reproducer.
> 
> ----------
> #include <linux/module.h>
> #include <linux/slab.h>
> #include <linux/oom.h>
> 
> static char *buffer;
> 
> static int __init test_init(void)
> {
> 	set_current_oom_origin();
> 	buffer = vmalloc((1UL << 32) - 480 * 1048576);

That's not a reproducer, that's a kernel module. It's not hard to
crash the kernel from within the kernel.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] Revert "vmalloc: back off when the current task is killed"
  2017-10-07  2:51               ` Johannes Weiner
@ 2017-10-07  4:05                 ` Tetsuo Handa
  2017-10-07  7:59                   ` Michal Hocko
  0 siblings, 1 reply; 22+ messages in thread
From: Tetsuo Handa @ 2017-10-07  4:05 UTC (permalink / raw)
  To: hannes; +Cc: akpm, mhocko, alan, hch, linux-mm, linux-kernel, kernel-team

Johannes Weiner wrote:
> On Sat, Oct 07, 2017 at 11:21:26AM +0900, Tetsuo Handa wrote:
> > On 2017/10/05 19:36, Tetsuo Handa wrote:
> > > I don't want this patch backported. If you want to backport,
> > > "s/fatal_signal_pending/tsk_is_oom_victim/" is the safer way.
> > 
> > If you backport this patch, you will see "complete depletion of memory reserves"
> > and "extra OOM kills due to depletion of memory reserves" using below reproducer.
> > 
> > ----------
> > #include <linux/module.h>
> > #include <linux/slab.h>
> > #include <linux/oom.h>
> > 
> > static char *buffer;
> > 
> > static int __init test_init(void)
> > {
> > 	set_current_oom_origin();
> > 	buffer = vmalloc((1UL << 32) - 480 * 1048576);
> 
> That's not a reproducer, that's a kernel module. It's not hard to
> crash the kernel from within the kernel.
> 

When did we agree that "reproducer" is "userspace program" ?
A "reproducer" is a program that triggers something intended.

Year by year, people are spending efforts for kernel hardening.
It is silly to say that "It's not hard to crash the kernel from
within the kernel." when we can easily mitigate.

Even with cd04ae1e2dc8, there is no point with triggering extra
OOM kills by needlessly consuming memory reserves.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] Revert "vmalloc: back off when the current task is killed"
  2017-10-07  4:05                 ` Tetsuo Handa
@ 2017-10-07  7:59                   ` Michal Hocko
  2017-10-07  9:57                     ` Tetsuo Handa
  0 siblings, 1 reply; 22+ messages in thread
From: Michal Hocko @ 2017-10-07  7:59 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: hannes, akpm, alan, hch, linux-mm, linux-kernel, kernel-team

On Sat 07-10-17 13:05:24, Tetsuo Handa wrote:
> Johannes Weiner wrote:
> > On Sat, Oct 07, 2017 at 11:21:26AM +0900, Tetsuo Handa wrote:
> > > On 2017/10/05 19:36, Tetsuo Handa wrote:
> > > > I don't want this patch backported. If you want to backport,
> > > > "s/fatal_signal_pending/tsk_is_oom_victim/" is the safer way.
> > > 
> > > If you backport this patch, you will see "complete depletion of memory reserves"
> > > and "extra OOM kills due to depletion of memory reserves" using below reproducer.
> > > 
> > > ----------
> > > #include <linux/module.h>
> > > #include <linux/slab.h>
> > > #include <linux/oom.h>
> > > 
> > > static char *buffer;
> > > 
> > > static int __init test_init(void)
> > > {
> > > 	set_current_oom_origin();
> > > 	buffer = vmalloc((1UL << 32) - 480 * 1048576);
> > 
> > That's not a reproducer, that's a kernel module. It's not hard to
> > crash the kernel from within the kernel.
> > 
> 
> When did we agree that "reproducer" is "userspace program" ?
> A "reproducer" is a program that triggers something intended.

This way of argumentation is just ridiculous. I can construct whatever
code to put kernel on knees and there is no way around it.

The patch in question was supposed to mitigate a theoretical problem
while it caused a real issue seen out there. That is a reason to
revert the patch. Especially when a better mitigation has been put
in place. You are right that replacing fatal_signal_pending by
tsk_is_oom_victim would keep the original mitigation in pre-cd04ae1e2dc8
kernels but I would only agree to do that if the mitigated problem was
real. And this doesn't seem to be the case. If any of the stable kernels
regresses due to the revert I am willing to put a mitigation in place.
 
> Year by year, people are spending efforts for kernel hardening.
> It is silly to say that "It's not hard to crash the kernel from
> within the kernel." when we can easily mitigate.

This is true but we do not spread random hacks around for problems that
are not real and there are better ways to address them. In this
particular case cd04ae1e2dc8 was a better way to address the problem in
general without spreading tsk_is_oom_victim all over the place.
 
> Even with cd04ae1e2dc8, there is no point with triggering extra
> OOM kills by needlessly consuming memory reserves.

Yet again you are making unfounded claims and I am really fed up
arguing discussing that any further.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] Revert "vmalloc: back off when the current task is killed"
  2017-10-07  7:59                   ` Michal Hocko
@ 2017-10-07  9:57                     ` Tetsuo Handa
  0 siblings, 0 replies; 22+ messages in thread
From: Tetsuo Handa @ 2017-10-07  9:57 UTC (permalink / raw)
  To: mhocko; +Cc: hannes, akpm, alan, hch, linux-mm, linux-kernel, kernel-team

Michal Hocko wrote:
> On Sat 07-10-17 13:05:24, Tetsuo Handa wrote:
> > Johannes Weiner wrote:
> > > On Sat, Oct 07, 2017 at 11:21:26AM +0900, Tetsuo Handa wrote:
> > > > On 2017/10/05 19:36, Tetsuo Handa wrote:
> > > > > I don't want this patch backported. If you want to backport,
> > > > > "s/fatal_signal_pending/tsk_is_oom_victim/" is the safer way.
> > > > 
> > > > If you backport this patch, you will see "complete depletion of memory reserves"
> > > > and "extra OOM kills due to depletion of memory reserves" using below reproducer.
> > > > 
> > > > ----------
> > > > #include <linux/module.h>
> > > > #include <linux/slab.h>
> > > > #include <linux/oom.h>
> > > > 
> > > > static char *buffer;
> > > > 
> > > > static int __init test_init(void)
> > > > {
> > > > 	set_current_oom_origin();
> > > > 	buffer = vmalloc((1UL << 32) - 480 * 1048576);
> > > 
> > > That's not a reproducer, that's a kernel module. It's not hard to
> > > crash the kernel from within the kernel.
> > > 
> > 
> > When did we agree that "reproducer" is "userspace program" ?
> > A "reproducer" is a program that triggers something intended.
> 
> This way of argumentation is just ridiculous. I can construct whatever
> code to put kernel on knees and there is no way around it.

But you don't distinguish between kernel module and userspace program.
What you distinguish is "real" and "theoretical". And, more you reject
with "ridiculous"/"theoretical", more I resist stronger.

> 
> The patch in question was supposed to mitigate a theoretical problem
> while it caused a real issue seen out there. That is a reason to
> revert the patch. Especially when a better mitigation has been put
> in place. You are right that replacing fatal_signal_pending by
> tsk_is_oom_victim would keep the original mitigation in pre-cd04ae1e2dc8
> kernels but I would only agree to do that if the mitigated problem was
> real. And this doesn't seem to be the case. If any of the stable kernels
> regresses due to the revert I am willing to put a mitigation in place.

The real issue here is that caller of vmalloc() was not ready to handle
allocation failure. We addressed kmem_zalloc_greedy() case
( https://marc.info/?l=linux-mm&m=148844910724880 ) by 08b005f1333154ae
rather than reverting fatal_signal_pending(). Removing
fatal_signal_pending() in order to hide real issues is a random hack.

>  
> > Year by year, people are spending efforts for kernel hardening.
> > It is silly to say that "It's not hard to crash the kernel from
> > within the kernel." when we can easily mitigate.
> 
> This is true but we do not spread random hacks around for problems that
> are not real and there are better ways to address them. In this
> particular case cd04ae1e2dc8 was a better way to address the problem in
> general without spreading tsk_is_oom_victim all over the place.

Using tsk_is_oom_victim() is reasonable for vmalloc() because it is a
memory allocation function which belongs to memory management subsystem.

>  
> > Even with cd04ae1e2dc8, there is no point with triggering extra
> > OOM kills by needlessly consuming memory reserves.
> 
> Yet again you are making unfounded claims and I am really fed up
> arguing discussing that any further.

Kernel hardening changes are mostly addressing "theoretical" issues
but we don't call them "ridiculous".

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, back to index

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-03 22:55 tty crash due to auto-failing vmalloc Johannes Weiner
2017-10-03 23:51 ` Alan Cox
2017-10-04  8:33 ` Michal Hocko
2017-10-04 18:58 ` Johannes Weiner
2017-10-04 18:59   ` [PATCH 1/2] Revert "vmalloc: back off when the current task is killed" Johannes Weiner
2017-10-04 20:49     ` Tetsuo Handa
2017-10-04 21:00       ` Johannes Weiner
2017-10-04 21:42         ` Tetsuo Handa
2017-10-04 23:21           ` Johannes Weiner
2017-10-04 22:32     ` Andrew Morton
2017-10-04 23:18       ` Johannes Weiner
2017-10-05  7:57         ` Michal Hocko
2017-10-05 10:36           ` Tetsuo Handa
2017-10-05 10:49             ` Michal Hocko
2017-10-07  2:21             ` Tetsuo Handa
2017-10-07  2:51               ` Johannes Weiner
2017-10-07  4:05                 ` Tetsuo Handa
2017-10-07  7:59                   ` Michal Hocko
2017-10-07  9:57                     ` Tetsuo Handa
2017-10-05  6:49     ` Vlastimil Babka
2017-10-05  7:54     ` Michal Hocko
2017-10-04 18:59   ` [PATCH 2/2] tty: fall back to N_NULL if switching to N_TTY fails during hangup Johannes Weiner

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox