linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Christoph Hellwig <hch@lst.de>
Cc: mgorman@suse.de, viro@ZenIV.linux.org.uk, linux-mm@kvack.org,
	hannes@cmpxchg.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone
Date: Wed, 25 Jan 2017 11:15:17 +0100	[thread overview]
Message-ID: <20170125101517.GG32377@dhcp22.suse.cz> (raw)
In-Reply-To: <201701211642.JBC39590.SFtVJHMFOLFOQO@I-love.SAKURA.ne.jp>

[Let's add Christoph]

The below insane^Wstress test should exercise the OOM killer behavior.

On Sat 21-01-17 16:42:42, Tetsuo Handa wrote:
> Tetsuo Handa wrote:
> > And I think that there is a different problem if I tune a reproducer
> > like below (i.e. increased the buffer size to write()/fsync() from 4096).
> > 
> > ----------
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <string.h>
> > #include <unistd.h>
> > #include <sys/types.h>
> > #include <sys/stat.h>
> > #include <fcntl.h>
> > 
> > int main(int argc, char *argv[])
> > {
> > 	static char buffer[10485760] = { }; /* or 1048576 */
> > 	char *buf = NULL;
> > 	unsigned long size;
> > 	unsigned long i;
> > 	for (i = 0; i < 1024; i++) {
> > 		if (fork() == 0) {
> > 			int fd = open("/proc/self/oom_score_adj", O_WRONLY);
> > 			write(fd, "1000", 4);
> > 			close(fd);
> > 			sleep(1);
> > 			snprintf(buffer, sizeof(buffer), "/tmp/file.%u", getpid());
> > 			fd = open(buffer, O_WRONLY | O_CREAT | O_APPEND, 0600);
> > 			while (write(fd, buffer, sizeof(buffer)) == sizeof(buffer))
> > 				fsync(fd);
> > 			_exit(0);
> > 		}
> > 	}
> > 	for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) {
> > 		char *cp = realloc(buf, size);
> > 		if (!cp) {
> > 			size >>= 1;
> > 			break;
> > 		}
> > 		buf = cp;
> > 	}
> > 	sleep(2);
> > 	/* Will cause OOM due to overcommit */
> > 	for (i = 0; i < size; i += 4096)
> > 		buf[i] = 0;
> > 	pause();
> > 	return 0;
> > }
> > ----------
> > 
> > Above reproducer sometimes kills all OOM killable processes and the system
> > finally panics. I guess that somebody is abusing TIF_MEMDIE for needless
> > allocations to the level where GFP_ATOMIC allocations start failing.
[...] 
> And I got flood of traces shown below. It seems to be consuming memory reserves
> until the size passed to write() request is stored to the page cache even after
> OOM-killed.
> 
> Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20170121.txt.xz .
> ----------------------------------------
> [  202.306077] a.out(9789): TIF_MEMDIE allocation: order=0 mode=0x1c2004a(GFP_NOFS|__GFP_HIGHMEM|__GFP_HARDWALL|__GFP_MOVABLE|__GFP_WRITE)
> [  202.309832] CPU: 0 PID: 9789 Comm: a.out Not tainted 4.10.0-rc4-next-20170120+ #492
> [  202.312323] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
> [  202.315429] Call Trace:
> [  202.316902]  dump_stack+0x85/0xc9
> [  202.318810]  __alloc_pages_slowpath+0xa99/0xd7c
> [  202.320697]  ? node_dirty_ok+0xef/0x130
> [  202.322454]  __alloc_pages_nodemask+0x436/0x4d0
> [  202.324506]  alloc_pages_current+0x97/0x1b0
> [  202.326397]  __page_cache_alloc+0x15d/0x1a0          mm/filemap.c:728
> [  202.328209]  pagecache_get_page+0x5a/0x2b0           mm/filemap.c:1331
> [  202.329989]  grab_cache_page_write_begin+0x23/0x40   mm/filemap.c:2773
> [  202.331905]  iomap_write_begin+0x50/0xd0             fs/iomap.c:118
> [  202.333641]  iomap_write_actor+0xb5/0x1a0            fs/iomap.c:190
> [  202.335377]  ? iomap_write_end+0x80/0x80             fs/iomap.c:150
> [  202.337090]  iomap_apply+0xb3/0x130                  fs/iomap.c:79
> [  202.338721]  iomap_file_buffered_write+0x68/0xa0     fs/iomap.c:243
> [  202.340613]  ? iomap_write_end+0x80/0x80
> [  202.342471]  xfs_file_buffered_aio_write+0x132/0x390 [xfs]
> [  202.344501]  ? remove_wait_queue+0x59/0x60
> [  202.346261]  xfs_file_write_iter+0x90/0x130 [xfs]
> [  202.348082]  __vfs_write+0xe5/0x140
> [  202.349743]  vfs_write+0xc7/0x1f0
> [  202.351214]  ? syscall_trace_enter+0x1d0/0x380
> [  202.353155]  SyS_write+0x58/0xc0
> [  202.354628]  do_syscall_64+0x6c/0x200
> [  202.356100]  entry_SYSCALL64_slow_path+0x25/0x25
> ----------------------------------------
> 
> Do we need to allow access to memory reserves for this allocation?
> Or, should the caller check for SIGKILL rather than iterate the loop?

I think we are missing a check for fatal_signal_pending in
iomap_file_buffered_write. This means that an oom victim can consume the
full memory reserves. What do you think about the following? I haven't
tested this but it mimics generic_perform_write so I guess it should
work.
---

  reply	other threads:[~2017-01-25 10:15 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-18 13:44 [RFC PATCH 0/2] fix unbounded too_many_isolated Michal Hocko
2017-01-18 13:44 ` [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone Michal Hocko
2017-01-18 14:46   ` Mel Gorman
2017-01-18 15:15     ` Michal Hocko
2017-01-18 15:54       ` Mel Gorman
2017-01-18 16:17         ` Michal Hocko
2017-01-18 17:00           ` Mel Gorman
2017-01-18 17:29             ` Michal Hocko
2017-01-19 10:07               ` Mel Gorman
2017-01-19 11:23                 ` Michal Hocko
2017-01-19 13:11                   ` Mel Gorman
2017-01-20 13:27                     ` Tetsuo Handa
2017-01-21  7:42                       ` Tetsuo Handa
2017-01-25 10:15                         ` Michal Hocko [this message]
2017-01-25 10:19                           ` Christoph Hellwig
2017-01-25 10:46                             ` Michal Hocko
2017-01-25 11:09                               ` Tetsuo Handa
2017-01-25 13:00                                 ` Michal Hocko
2017-01-27 14:49                                   ` Michal Hocko
2017-01-28 15:27                                     ` Tetsuo Handa
2017-01-30  8:55                                       ` Michal Hocko
2017-02-02 10:14                                         ` Michal Hocko
2017-02-03 10:57                                           ` Tetsuo Handa
2017-02-03 14:41                                             ` Michal Hocko
2017-02-03 14:50                                             ` Michal Hocko
2017-02-03 17:24                                               ` Brian Foster
2017-02-06  6:29                                                 ` Tetsuo Handa
2017-02-06 14:35                                                   ` Brian Foster
2017-02-06 14:42                                                     ` Michal Hocko
2017-02-06 15:47                                                       ` Brian Foster
2017-02-07 10:30                                                     ` Tetsuo Handa
2017-02-07 16:54                                                       ` Brian Foster
2017-02-03 14:55                                             ` Michal Hocko
2017-02-05 10:43                                               ` Tetsuo Handa
2017-02-06 10:34                                                 ` Michal Hocko
2017-02-06 10:39                                                 ` Michal Hocko
2017-02-07 21:12                                                   ` Michal Hocko
2017-02-08  9:24                                                     ` Peter Zijlstra
2017-02-21  9:40                                             ` Michal Hocko
2017-02-21 14:35                                               ` Tetsuo Handa
2017-02-21 15:53                                                 ` Michal Hocko
2017-02-22  2:02                                                   ` Tetsuo Handa
2017-02-22  7:54                                                     ` Michal Hocko
2017-02-26  6:30                                                       ` Tetsuo Handa
2017-01-31 11:58                                   ` Michal Hocko
2017-01-31 12:51                                     ` Christoph Hellwig
2017-01-31 13:21                                       ` Michal Hocko
2017-01-25 10:33                           ` [RFC PATCH 1/2] mm, vmscan: account the number of isolated pagesper zone Tetsuo Handa
2017-01-25 12:34                             ` Michal Hocko
2017-01-25 13:13                               ` [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone Tetsuo Handa
2017-01-25  9:53                       ` Michal Hocko
2017-01-20  6:42                 ` Hillf Danton
2017-01-20  9:25                   ` Mel Gorman
2017-01-18 13:44 ` [RFC PATCH 2/2] mm, vmscan: do not loop on too_many_isolated for ever Michal Hocko
2017-01-18 14:50   ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170125101517.GG32377@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).