linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: David Rientjes <rientjes@google.com>, Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>, Roman Gushchin <guro@fb.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().
Date: Wed, 5 Sep 2018 22:20:58 +0900	[thread overview]
Message-ID: <195a512f-aecc-f8cf-f409-6c42ee924a8c@i-love.sakura.ne.jp> (raw)
In-Reply-To: <201808240031.w7O0V5hT019529@www262.sakura.ne.jp>

On 2018/08/24 9:31, Tetsuo Handa wrote:
> For now, I don't think we need to add af5679fbc669f31f to the list for
> CVE-2016-10723, for af5679fbc669f31f might cause premature next OOM victim
> selection (especially with CONFIG_PREEMPT=y kernels) due to
> 
>    __alloc_pages_may_oom():               oom_reap_task():
> 
>      mutex_trylock(&oom_lock) succeeds.
>      get_page_from_freelist() fails.
>      Preempted to other process.
>                                             oom_reap_task_mm() succeeds.
>                                             Sets MMF_OOM_SKIP.
>      Returned from preemption.
>      Finds that MMF_OOM_SKIP was already set.
>      Selects next OOM victim and kills it.
>      mutex_unlock(&oom_lock) is called.
> 
> race window like described as
> 
>     Tetsuo was arguing that at least MMF_OOM_SKIP should be set under the lock
>     to prevent from races when the page allocator didn't manage to get the
>     freed (reaped) memory in __alloc_pages_may_oom but it sees the flag later
>     on and move on to another victim.  Although this is possible in principle
>     let's wait for it to actually happen in real life before we make the
>     locking more complex again.
> 
> in that commit.
> 

Yes, that race window is real. We can needlessly select next OOM victim.
I think that af5679fbc669f31f was too optimistic.

[  278.147280] Out of memory: Kill process 9943 (a.out) score 919 or sacrifice child
[  278.148927] Killed process 9943 (a.out) total-vm:4267252kB, anon-rss:3430056kB, file-rss:0kB, shmem-rss:0kB
[  278.151586] vmtoolsd invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
[  278.156642] vmtoolsd cpuset=/ mems_allowed=0
[  278.158884] CPU: 2 PID: 8916 Comm: vmtoolsd Kdump: loaded Not tainted 4.19.0-rc2+ #465
[  278.162252] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[  278.165499] Call Trace:
[  278.166693]  dump_stack+0x99/0xdc
[  278.167922]  dump_header+0x70/0x2fa
[  278.169414] oom_reaper: reaped process 9943 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[  278.169629]  ? _raw_spin_unlock_irqrestore+0x6a/0x8c
[  278.169633]  oom_kill_process+0x2ee/0x380
[  278.169635]  out_of_memory+0x136/0x540
[  278.169636]  ? out_of_memory+0x1fc/0x540
[  278.169640]  __alloc_pages_slowpath+0x986/0xce4
[  278.169641]  ? get_page_from_freelist+0x16b/0x1600
[  278.169646]  __alloc_pages_nodemask+0x398/0x3d0
[  278.180594]  alloc_pages_current+0x65/0xb0
[  278.182173]  __page_cache_alloc+0x154/0x190
[  278.184200]  ? pagecache_get_page+0x27/0x250
[  278.185410]  filemap_fault+0x4df/0x880
[  278.186282]  ? filemap_fault+0x31b/0x880
[  278.187395]  ? xfs_ilock+0x1bd/0x220
[  278.188264]  ? __xfs_filemap_fault+0x76/0x270
[  278.189268]  ? down_read_nested+0x48/0x80
[  278.190229]  ? xfs_ilock+0x1bd/0x220
[  278.191061]  __xfs_filemap_fault+0x89/0x270
[  278.192059]  xfs_filemap_fault+0x27/0x30
[  278.192967]  __do_fault+0x1f/0x70
[  278.193777]  __handle_mm_fault+0xfbd/0x1470
[  278.194743]  handle_mm_fault+0x1f2/0x400
[  278.195679]  ? handle_mm_fault+0x47/0x400
[  278.196618]  __do_page_fault+0x217/0x4b0
[  278.197504]  do_page_fault+0x3c/0x21e
[  278.198303]  ? page_fault+0x8/0x30
[  278.199092]  page_fault+0x1e/0x30
[  278.199821] RIP: 0033:0x7f2322ebbfb0
[  278.200605] Code: Bad RIP value.
[  278.201370] RSP: 002b:00007ffda96e7648 EFLAGS: 00010246
[  278.202518] RAX: 0000000000000000 RBX: 00007f23220f26b0 RCX: 0000000000000010
[  278.204280] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007f2321ecfb5b
[  278.205838] RBP: 0000000002504b70 R08: 00007f2321ecfb60 R09: 000000000250bd20
[  278.207426] R10: 383836312d646c69 R11: 0000000000000000 R12: 00007ffda96e76b0
[  278.208982] R13: 00007f2322ea8540 R14: 000000000250ba90 R15: 00007f2323173920
[  278.210840] Mem-Info:
[  278.211462] active_anon:18629 inactive_anon:2390 isolated_anon:0
[  278.211462]  active_file:19 inactive_file:1565 isolated_file:0
[  278.211462]  unevictable:0 dirty:0 writeback:0 unstable:0
[  278.211462]  slab_reclaimable:5820 slab_unreclaimable:8964
[  278.211462]  mapped:2128 shmem:2493 pagetables:1826 bounce:0
[  278.211462]  free:878043 free_pcp:909 free_cma:0
[  278.218830] Node 0 active_anon:74516kB inactive_anon:9560kB active_file:76kB inactive_file:6260kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:8512kB dirty:0kB writeback:0kB shmem:9972kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 43008kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
[  278.224997] Node 0 DMA free:15888kB min:288kB low:360kB high:432kB active_anon:32kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:4kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[  278.230602] lowmem_reserve[]: 0 2663 3610 3610
[  278.231887] Node 0 DMA32 free:2746332kB min:49636kB low:62044kB high:74452kB active_anon:2536kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2749920kB mlocked:0kB kernel_stack:0kB pagetables:4kB bounce:0kB free_pcp:1500kB local_pcp:0kB free_cma:0kB
[  278.238291] lowmem_reserve[]: 0 0 947 947
[  278.239270] Node 0 Normal free:749952kB min:17652kB low:22064kB high:26476kB active_anon:72816kB inactive_anon:9560kB active_file:264kB inactive_file:5556kB unevictable:0kB writepending:4kB present:1048576kB managed:969932kB mlocked:0kB kernel_stack:5328kB pagetables:7092kB bounce:0kB free_pcp:2132kB local_pcp:64kB free_cma:0kB
[  278.245895] lowmem_reserve[]: 0 0 0 0
[  278.246820] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB
[  278.249659] Node 0 DMA32: 7*4kB (U) 8*8kB (UM) 8*16kB (U) 8*32kB (U) 8*64kB (U) 6*128kB (U) 7*256kB (UM) 7*512kB (UM) 3*1024kB (UM) 2*2048kB (M) 667*4096kB (UM) = 2746332kB
[  278.253054] Node 0 Normal: 4727*4kB (UME) 3423*8kB (UME) 1679*16kB (UME) 704*32kB (UME) 253*64kB (UME) 107*128kB (UME) 38*256kB (M) 16*512kB (M) 10*1024kB (M) 9*2048kB (M) 141*4096kB (M) = 749700kB
[  278.257158] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[  278.259018] 4125 total pagecache pages
[  278.259896] 0 pages in swap cache
[  278.260745] Swap cache stats: add 0, delete 0, find 0/0
[  278.261934] Free swap  = 0kB
[  278.262750] Total swap = 0kB
[  278.263483] 1048445 pages RAM
[  278.264216] 0 pages HighMem/MovableOnly
[  278.265077] 114506 pages reserved
[  278.265971] Tasks state (memory values in pages):
[  278.267118] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[  278.269090] [   2846]     0  2846      267      171    32768        0             0 none
[  278.271221] [   4891]     0  4891     9772      856   110592        0             0 systemd-journal
[  278.273253] [   6610]     0  6610    10811      261   114688        0         -1000 systemd-udevd
[  278.275197] [   6709]     0  6709    13880      112   131072        0         -1000 auditd
[  278.277067] [   7388]     0  7388     3307       52    69632        0             0 rngd
[  278.278880] [   7393]     0  7393    24917      403   184320        0             0 VGAuthService
[  278.280864] [   7510]    70  7510    15043      102   155648        0             0 avahi-daemon
[  278.282898] [   7555]     0  7555     5420      372    81920        0             0 irqbalance
[  278.284836] [   7563]     0  7563     6597       83   102400        0             0 systemd-logind
[  278.286959] [   7565]    81  7565    14553      157   167936        0          -900 dbus-daemon
[  278.288985] [   8286]    70  8286    15010       98   151552        0             0 avahi-daemon
[  278.290958] [   8731]     0  8731    74697      999   270336        0             0 vmtoolsd
[  278.293008] [   8732]   999  8732   134787     1730   274432        0             0 polkitd
[  278.294906] [   8733]     0  8733    55931      467   274432        0             0 abrtd
[  278.296774] [   8734]     0  8734    55311      354   266240        0             0 abrt-watch-log
[  278.298839] [   8774]     0  8774    31573      155   106496        0             0 crond
[  278.300810] [   8790]     0  8790    89503     5482   421888        0             0 firewalld
[  278.302727] [   8916]     0  8916    45262      211   204800        0             0 vmtoolsd
[  278.304841] [   9230]     0  9230    26877      507   229376        0             0 dhclient
[  278.306733] [   9333]     0  9333    87236      451   528384        0             0 nmbd
[  278.308554] [   9334]     0  9334    28206      257   253952        0         -1000 sshd
[  278.310431] [   9335]     0  9335   143457     3260   430080        0             0 tuned
[  278.312278] [   9337]     0  9337    55682     2442   200704        0             0 rsyslogd
[  278.314188] [   9497]     0  9497    24276      170   233472        0             0 login
[  278.316038] [   9498]     0  9498    27525       33    73728        0             0 agetty
[  278.317918] [   9539]     0  9539   104864      581   659456        0             0 smbd
[  278.319738] [   9590]     0  9590   103799      555   610304        0             0 smbd-notifyd
[  278.321918] [   9591]     0  9591   103797      555   602112        0             0 cleanupd
[  278.323935] [   9592]     0  9592   104864      580   610304        0             0 lpqd
[  278.325835] [   9596]     0  9596    28894      129    90112        0             0 bash
[  278.327663] [   9639]     0  9639    28833      474   249856        0             0 sendmail
[  278.329550] [   9773]    51  9773    26644      411   229376        0             0 sendmail
[  278.331527] Out of memory: Kill process 8790 (firewalld) score 5 or sacrifice child
[  278.333267] Killed process 8790 (firewalld) total-vm:358012kB, anon-rss:21928kB, file-rss:0kB, shmem-rss:0kB
[  278.336430] oom_reaper: reaped process 8790 (firewalld), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB


  reply	other threads:[~2018-09-05 13:21 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-26 11:06 [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry() Tetsuo Handa
2018-07-26 11:39 ` Michal Hocko
2018-07-27 15:47   ` Tetsuo Handa
2018-07-30  9:32     ` Michal Hocko
2018-07-30 14:34       ` Tetsuo Handa
2018-07-30 14:46         ` Michal Hocko
2018-07-30 14:54           ` Tejun Heo
2018-07-30 15:25             ` Tetsuo Handa
2018-07-30 15:44               ` Tejun Heo
2018-07-30 18:51                 ` Michal Hocko
2018-07-30 19:10                   ` Michal Hocko
2018-07-30 21:01                     ` Tetsuo Handa
2018-07-31  5:09                       ` Michal Hocko
2018-07-31 10:47                         ` Tetsuo Handa
2018-07-31 11:15                           ` Michal Hocko
2018-07-31 11:30                             ` Tetsuo Handa
2018-07-31 11:55                               ` Michal Hocko
2018-08-02 22:05                         ` Tetsuo Handa
2018-08-03  6:16                           ` Michal Hocko
2018-08-21 21:07                             ` Tetsuo Handa
2018-08-22  7:32                               ` Michal Hocko
2018-08-23 20:06                               ` David Rientjes
2018-08-23 21:00                                 ` Tetsuo Handa
2018-08-23 22:45                                   ` David Rientjes
2018-08-24  0:31                                     ` Tetsuo Handa
2018-09-05 13:20                                       ` Tetsuo Handa [this message]
2018-09-05 13:40                                         ` Michal Hocko
2018-09-05 13:53                                           ` Tetsuo Handa
2018-09-05 14:04                                             ` Michal Hocko
2018-09-06  1:00                                               ` Tetsuo Handa
2018-09-06  5:57                                                 ` Michal Hocko
2018-09-06  6:22                                                   ` Tetsuo Handa
2018-09-06  7:03                                                   ` Tetsuo Handa
2018-07-30 19:14                   ` Tejun Heo
2018-08-27 13:51 Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=195a512f-aecc-f8cf-f409-6c42ee924a8c@i-love.sakura.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).