From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: David Rientjes <rientjes@google.com>, Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>, Roman Gushchin <guro@fb.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Vladimir Davydov <vdavydov.dev@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-mm <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().
Date: Wed, 5 Sep 2018 22:20:58 +0900 [thread overview]
Message-ID: <195a512f-aecc-f8cf-f409-6c42ee924a8c@i-love.sakura.ne.jp> (raw)
In-Reply-To: <201808240031.w7O0V5hT019529@www262.sakura.ne.jp>
On 2018/08/24 9:31, Tetsuo Handa wrote:
> For now, I don't think we need to add af5679fbc669f31f to the list for
> CVE-2016-10723, for af5679fbc669f31f might cause premature next OOM victim
> selection (especially with CONFIG_PREEMPT=y kernels) due to
>
> __alloc_pages_may_oom(): oom_reap_task():
>
> mutex_trylock(&oom_lock) succeeds.
> get_page_from_freelist() fails.
> Preempted to other process.
> oom_reap_task_mm() succeeds.
> Sets MMF_OOM_SKIP.
> Returned from preemption.
> Finds that MMF_OOM_SKIP was already set.
> Selects next OOM victim and kills it.
> mutex_unlock(&oom_lock) is called.
>
> race window like described as
>
> Tetsuo was arguing that at least MMF_OOM_SKIP should be set under the lock
> to prevent from races when the page allocator didn't manage to get the
> freed (reaped) memory in __alloc_pages_may_oom but it sees the flag later
> on and move on to another victim. Although this is possible in principle
> let's wait for it to actually happen in real life before we make the
> locking more complex again.
>
> in that commit.
>
Yes, that race window is real. We can needlessly select next OOM victim.
I think that af5679fbc669f31f was too optimistic.
[ 278.147280] Out of memory: Kill process 9943 (a.out) score 919 or sacrifice child
[ 278.148927] Killed process 9943 (a.out) total-vm:4267252kB, anon-rss:3430056kB, file-rss:0kB, shmem-rss:0kB
[ 278.151586] vmtoolsd invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
[ 278.156642] vmtoolsd cpuset=/ mems_allowed=0
[ 278.158884] CPU: 2 PID: 8916 Comm: vmtoolsd Kdump: loaded Not tainted 4.19.0-rc2+ #465
[ 278.162252] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[ 278.165499] Call Trace:
[ 278.166693] dump_stack+0x99/0xdc
[ 278.167922] dump_header+0x70/0x2fa
[ 278.169414] oom_reaper: reaped process 9943 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 278.169629] ? _raw_spin_unlock_irqrestore+0x6a/0x8c
[ 278.169633] oom_kill_process+0x2ee/0x380
[ 278.169635] out_of_memory+0x136/0x540
[ 278.169636] ? out_of_memory+0x1fc/0x540
[ 278.169640] __alloc_pages_slowpath+0x986/0xce4
[ 278.169641] ? get_page_from_freelist+0x16b/0x1600
[ 278.169646] __alloc_pages_nodemask+0x398/0x3d0
[ 278.180594] alloc_pages_current+0x65/0xb0
[ 278.182173] __page_cache_alloc+0x154/0x190
[ 278.184200] ? pagecache_get_page+0x27/0x250
[ 278.185410] filemap_fault+0x4df/0x880
[ 278.186282] ? filemap_fault+0x31b/0x880
[ 278.187395] ? xfs_ilock+0x1bd/0x220
[ 278.188264] ? __xfs_filemap_fault+0x76/0x270
[ 278.189268] ? down_read_nested+0x48/0x80
[ 278.190229] ? xfs_ilock+0x1bd/0x220
[ 278.191061] __xfs_filemap_fault+0x89/0x270
[ 278.192059] xfs_filemap_fault+0x27/0x30
[ 278.192967] __do_fault+0x1f/0x70
[ 278.193777] __handle_mm_fault+0xfbd/0x1470
[ 278.194743] handle_mm_fault+0x1f2/0x400
[ 278.195679] ? handle_mm_fault+0x47/0x400
[ 278.196618] __do_page_fault+0x217/0x4b0
[ 278.197504] do_page_fault+0x3c/0x21e
[ 278.198303] ? page_fault+0x8/0x30
[ 278.199092] page_fault+0x1e/0x30
[ 278.199821] RIP: 0033:0x7f2322ebbfb0
[ 278.200605] Code: Bad RIP value.
[ 278.201370] RSP: 002b:00007ffda96e7648 EFLAGS: 00010246
[ 278.202518] RAX: 0000000000000000 RBX: 00007f23220f26b0 RCX: 0000000000000010
[ 278.204280] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007f2321ecfb5b
[ 278.205838] RBP: 0000000002504b70 R08: 00007f2321ecfb60 R09: 000000000250bd20
[ 278.207426] R10: 383836312d646c69 R11: 0000000000000000 R12: 00007ffda96e76b0
[ 278.208982] R13: 00007f2322ea8540 R14: 000000000250ba90 R15: 00007f2323173920
[ 278.210840] Mem-Info:
[ 278.211462] active_anon:18629 inactive_anon:2390 isolated_anon:0
[ 278.211462] active_file:19 inactive_file:1565 isolated_file:0
[ 278.211462] unevictable:0 dirty:0 writeback:0 unstable:0
[ 278.211462] slab_reclaimable:5820 slab_unreclaimable:8964
[ 278.211462] mapped:2128 shmem:2493 pagetables:1826 bounce:0
[ 278.211462] free:878043 free_pcp:909 free_cma:0
[ 278.218830] Node 0 active_anon:74516kB inactive_anon:9560kB active_file:76kB inactive_file:6260kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:8512kB dirty:0kB writeback:0kB shmem:9972kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 43008kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
[ 278.224997] Node 0 DMA free:15888kB min:288kB low:360kB high:432kB active_anon:32kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:4kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 278.230602] lowmem_reserve[]: 0 2663 3610 3610
[ 278.231887] Node 0 DMA32 free:2746332kB min:49636kB low:62044kB high:74452kB active_anon:2536kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2749920kB mlocked:0kB kernel_stack:0kB pagetables:4kB bounce:0kB free_pcp:1500kB local_pcp:0kB free_cma:0kB
[ 278.238291] lowmem_reserve[]: 0 0 947 947
[ 278.239270] Node 0 Normal free:749952kB min:17652kB low:22064kB high:26476kB active_anon:72816kB inactive_anon:9560kB active_file:264kB inactive_file:5556kB unevictable:0kB writepending:4kB present:1048576kB managed:969932kB mlocked:0kB kernel_stack:5328kB pagetables:7092kB bounce:0kB free_pcp:2132kB local_pcp:64kB free_cma:0kB
[ 278.245895] lowmem_reserve[]: 0 0 0 0
[ 278.246820] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB
[ 278.249659] Node 0 DMA32: 7*4kB (U) 8*8kB (UM) 8*16kB (U) 8*32kB (U) 8*64kB (U) 6*128kB (U) 7*256kB (UM) 7*512kB (UM) 3*1024kB (UM) 2*2048kB (M) 667*4096kB (UM) = 2746332kB
[ 278.253054] Node 0 Normal: 4727*4kB (UME) 3423*8kB (UME) 1679*16kB (UME) 704*32kB (UME) 253*64kB (UME) 107*128kB (UME) 38*256kB (M) 16*512kB (M) 10*1024kB (M) 9*2048kB (M) 141*4096kB (M) = 749700kB
[ 278.257158] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 278.259018] 4125 total pagecache pages
[ 278.259896] 0 pages in swap cache
[ 278.260745] Swap cache stats: add 0, delete 0, find 0/0
[ 278.261934] Free swap = 0kB
[ 278.262750] Total swap = 0kB
[ 278.263483] 1048445 pages RAM
[ 278.264216] 0 pages HighMem/MovableOnly
[ 278.265077] 114506 pages reserved
[ 278.265971] Tasks state (memory values in pages):
[ 278.267118] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[ 278.269090] [ 2846] 0 2846 267 171 32768 0 0 none
[ 278.271221] [ 4891] 0 4891 9772 856 110592 0 0 systemd-journal
[ 278.273253] [ 6610] 0 6610 10811 261 114688 0 -1000 systemd-udevd
[ 278.275197] [ 6709] 0 6709 13880 112 131072 0 -1000 auditd
[ 278.277067] [ 7388] 0 7388 3307 52 69632 0 0 rngd
[ 278.278880] [ 7393] 0 7393 24917 403 184320 0 0 VGAuthService
[ 278.280864] [ 7510] 70 7510 15043 102 155648 0 0 avahi-daemon
[ 278.282898] [ 7555] 0 7555 5420 372 81920 0 0 irqbalance
[ 278.284836] [ 7563] 0 7563 6597 83 102400 0 0 systemd-logind
[ 278.286959] [ 7565] 81 7565 14553 157 167936 0 -900 dbus-daemon
[ 278.288985] [ 8286] 70 8286 15010 98 151552 0 0 avahi-daemon
[ 278.290958] [ 8731] 0 8731 74697 999 270336 0 0 vmtoolsd
[ 278.293008] [ 8732] 999 8732 134787 1730 274432 0 0 polkitd
[ 278.294906] [ 8733] 0 8733 55931 467 274432 0 0 abrtd
[ 278.296774] [ 8734] 0 8734 55311 354 266240 0 0 abrt-watch-log
[ 278.298839] [ 8774] 0 8774 31573 155 106496 0 0 crond
[ 278.300810] [ 8790] 0 8790 89503 5482 421888 0 0 firewalld
[ 278.302727] [ 8916] 0 8916 45262 211 204800 0 0 vmtoolsd
[ 278.304841] [ 9230] 0 9230 26877 507 229376 0 0 dhclient
[ 278.306733] [ 9333] 0 9333 87236 451 528384 0 0 nmbd
[ 278.308554] [ 9334] 0 9334 28206 257 253952 0 -1000 sshd
[ 278.310431] [ 9335] 0 9335 143457 3260 430080 0 0 tuned
[ 278.312278] [ 9337] 0 9337 55682 2442 200704 0 0 rsyslogd
[ 278.314188] [ 9497] 0 9497 24276 170 233472 0 0 login
[ 278.316038] [ 9498] 0 9498 27525 33 73728 0 0 agetty
[ 278.317918] [ 9539] 0 9539 104864 581 659456 0 0 smbd
[ 278.319738] [ 9590] 0 9590 103799 555 610304 0 0 smbd-notifyd
[ 278.321918] [ 9591] 0 9591 103797 555 602112 0 0 cleanupd
[ 278.323935] [ 9592] 0 9592 104864 580 610304 0 0 lpqd
[ 278.325835] [ 9596] 0 9596 28894 129 90112 0 0 bash
[ 278.327663] [ 9639] 0 9639 28833 474 249856 0 0 sendmail
[ 278.329550] [ 9773] 51 9773 26644 411 229376 0 0 sendmail
[ 278.331527] Out of memory: Kill process 8790 (firewalld) score 5 or sacrifice child
[ 278.333267] Killed process 8790 (firewalld) total-vm:358012kB, anon-rss:21928kB, file-rss:0kB, shmem-rss:0kB
[ 278.336430] oom_reaper: reaped process 8790 (firewalld), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
next prev parent reply other threads:[~2018-09-05 13:21 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-26 11:06 [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry() Tetsuo Handa
2018-07-26 11:06 ` Tetsuo Handa
2018-07-26 11:39 ` Michal Hocko
2018-07-27 15:47 ` Tetsuo Handa
2018-07-30 9:32 ` Michal Hocko
2018-07-30 14:34 ` Tetsuo Handa
2018-07-30 14:46 ` Michal Hocko
2018-07-30 14:54 ` Tejun Heo
2018-07-30 15:25 ` Tetsuo Handa
2018-07-30 15:44 ` Tejun Heo
2018-07-30 18:51 ` Michal Hocko
2018-07-30 19:10 ` Michal Hocko
2018-07-30 19:10 ` Michal Hocko
2018-07-30 21:01 ` Tetsuo Handa
2018-07-31 5:09 ` Michal Hocko
2018-07-31 10:47 ` Tetsuo Handa
2018-07-31 11:15 ` Michal Hocko
2018-07-31 11:30 ` Tetsuo Handa
2018-07-31 11:55 ` Michal Hocko
2018-08-02 22:05 ` Tetsuo Handa
2018-08-03 6:16 ` Michal Hocko
2018-08-21 21:07 ` Tetsuo Handa
2018-08-22 7:32 ` Michal Hocko
2018-08-23 20:06 ` David Rientjes
2018-08-23 21:00 ` Tetsuo Handa
2018-08-23 22:45 ` David Rientjes
2018-08-24 0:31 ` Tetsuo Handa
2018-09-05 13:20 ` Tetsuo Handa [this message]
2018-09-05 13:40 ` Michal Hocko
2018-09-05 13:53 ` Tetsuo Handa
2018-09-05 14:04 ` Michal Hocko
2018-09-06 1:00 ` Tetsuo Handa
2018-09-06 5:57 ` Michal Hocko
2018-09-06 6:22 ` Tetsuo Handa
2018-09-06 7:03 ` Tetsuo Handa
2018-07-30 19:14 ` Tejun Heo
2018-08-27 13:51 Michal Hocko
2018-08-27 13:51 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=195a512f-aecc-f8cf-f409-6c42ee924a8c@i-love.sakura.ne.jp \
--to=penguin-kernel@i-love.sakura.ne.jp \
--cc=akpm@linux-foundation.org \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=rientjes@google.com \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vdavydov.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.