* [BUG] Early OOM and kernel NULL pointer dereference in 4.19.69
@ 2019-09-01 20:43 Thomas Lindroth
2019-09-02 7:16 ` Michal Hocko
` (2 more replies)
0 siblings, 3 replies; 29+ messages in thread
From: Thomas Lindroth @ 2019-09-01 20:43 UTC (permalink / raw)
To: linux-mm; +Cc: stable
After upgrading to the 4.19 series I've started getting problems with
early OOM.
I run a Gentoo system and do large compiles like the chromium browser in a
v1 memory cgroup. When I build chromium in the memory cgroup the OOM killer
runs and kills programs outside of the cgroup. This happens even when there
is plenty of free memory both in and outside of the cgroup.
The memory cgroup is named "12G" and is setup like this:
/sys/fs/cgroup/memory/12G/memory.kmem.limit_in_bytes:1073741824
/sys/fs/cgroup/memory/12G/memory.kmem.tcp.limit_in_bytes:1073741824
/sys/fs/cgroup/memory/12G/memory.limit_in_bytes:12884901888
/sys/fs/cgroup/memory/12G/memory.memsw.limit_in_bytes:12884901888
/sys/fs/cgroup/memory/12G/memory.soft_limit_in_bytes:9223372036854771712
The system has MemTotal: 16244996 kB
I run the chromium compile job using Gentoo's package manager like this:
cgexec -g memory:12G emerge -1 www-client/chromium
That compile job usually fails and dmesg looks like this:
[ 1084.634827] SLUB: Unable to allocate memory on node -1, gfp=0x6000c0(GFP_KERNEL)
[ 1084.634836] cache: dentry(100:12G), object size: 192, buffer size: 192, default order: 0, min order: 0
[ 1084.634838] node 0: slabs: 26888, objs: 564648, free: 0
[ 1084.634857] SLUB: Unable to allocate memory on node -1, gfp=0x6000c0(GFP_KERNEL)
[ 1084.634860] cache: dentry(100:12G), object size: 192, buffer size: 192, default order: 0, min order: 0
[ 1084.634861] node 0: slabs: 26888, objs: 564648, free: 0
[ 1084.648583] SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
[ 1084.648593] cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
[ 1084.648595] node 0: slabs: 19132, objs: 592952, free: 0
[ 1084.648695] SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
[ 1084.648700] cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
[ 1084.648702] node 0: slabs: 19132, objs: 592952, free: 0
[ 1084.648794] SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
[ 1084.648799] cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
[ 1084.648801] node 0: slabs: 19132, objs: 592952, free: 0
[ 1084.648898] SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
[ 1084.648900] cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
[ 1084.648908] node 0: slabs: 19132, objs: 592952, free: 0
[ 1084.649000] SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
[ 1084.649004] cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
[ 1084.649006] node 0: slabs: 19132, objs: 592952, free: 0
[ 1084.649103] SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
[ 1084.649107] cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
[ 1084.649109] node 0: slabs: 19132, objs: 592952, free: 0
[ 1084.649198] SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
[ 1084.649199] cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
[ 1084.649199] node 0: slabs: 19132, objs: 592952, free: 0
[ 1084.649293] SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
[ 1084.649299] cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
[ 1084.649299] node 0: slabs: 19132, objs: 592952, free: 0
[ 1146.798499] Purging GPU memory, 41040 pages freed, 6933 pages still pinned.
[ 1146.798512] 4 and 0 pages still available in the bound and unbound GPU page lists.
[ 1146.798649] Purging GPU memory, 0 pages freed, 6933 pages still pinned.
[ 1146.798653] 4 and 0 pages still available in the bound and unbound GPU page lists.
[ 1146.798696] emerge invoked oom-killer: gfp_mask=0x0(), nodemask=(null), order=0, oom_score_adj=0
[ 1146.798699] emerge cpuset=
[ 1146.798701] /
[ 1146.798703] mems_allowed=0
[ 1146.798705] CPU: 4 PID: 16719 Comm: emerge Not tainted 4.19.69 #43
[ 1146.798707] Hardware name: Gigabyte Technology Co., Ltd. Z97X-Gaming G1/Z97X-Gaming G1, BIOS F9 07/31/2015
[ 1146.798708] Call Trace:
[ 1146.798713] dump_stack+0x46/0x60
[ 1146.798718] dump_header+0x67/0x28d
[ 1146.798721] oom_kill_process.cold.31+0xb/0x1f3
[ 1146.798723] out_of_memory+0x129/0x250
[ 1146.798728] pagefault_out_of_memory+0x64/0x77
[ 1146.798732] __do_page_fault+0x3c1/0x3d0
[ 1146.798735] do_page_fault+0x2c/0x123
[ 1146.798738] ? page_fault+0x8/0x30
[ 1146.798740] page_fault+0x1e/0x30
[ 1146.798743] RIP: 0033:0x7f3745ccf628
[ 1146.798744] Code: 43 68 48 8b 40 08 48 89 44 24 08 48 8b 83 00 03 00 00 48 85 c0 0f 84 ff 00 00 00 48 8b 74 24 20 8b 54 24 30 23 93 f8 02 00 00 <48> 8b 14 d0 8b 83 fc 02 00 00 89 f7 c4 e2 fb f7 c6 c4 e2 fb f7 c2
[ 1146.798746] RSP: 002b:00007ffd34c11460 EFLAGS: 00010202
[ 1146.798748] RAX: 000056439d3e5288 RBX: 00007f3745cf0130 RCX: 0000000000000013
[ 1146.798750] RDX: 0000000000000001 RSI: 00000000777f2f73 RDI: 00007f37458c7303
[ 1146.798752] RBP: 0000000000000000 R08: 00007ffd34c11590 R09: 00007f3745cf03f0
[ 1146.798754] R10: 00007f3745c91000 R11: 00007f374557b5c0 R12: 0000000000000000
[ 1146.798756] R13: 0000000000000001 R14: 0000000000000000 R15: 00007f3745c92f58
[ 1146.798758] Mem-Info:
[ 1146.798761] active_anon:438779 inactive_anon:41657 isolated_anon:0
active_file:591199 inactive_file:2307684 isolated_file:0
unevictable:1 dirty:499 writeback:0 unstable:0
slab_reclaimable:299846 slab_unreclaimable:22656
mapped:134859 shmem:51982 pagetables:5460 bounce:0
free:311407 free_pcp:4594 free_cma:0
[ 1146.798765] Node 0 active_anon:1755116kB inactive_anon:166628kB active_file:2364796kB inactive_file:9230736kB unevictable:4kB isolated(anon):0kB isolated(file):0kB mapped:539436kB dirty:1996kB writeback:0kB shmem:207928kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 598016kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 1146.798770] DMA free:15900kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:15900kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1146.798772] lowmem_reserve[]:
[ 1146.798773] 0
[ 1146.798774] 3030
[ 1146.798775] 15777
[ 1146.798776] 15777
[ 1146.798780] DMA32 free:1111400kB min:12968kB low:16208kB high:19448kB active_anon:253352kB inactive_anon:1160kB active_file:48104kB inactive_file:1489456kB unevictable:0kB writepending:0kB present:3172224kB managed:3172224kB mlocked:0kB kernel_stack:16kB pagetables:248kB bounce:0kB free_pcp:10952kB local_pcp:1356kB free_cma:0kB
[ 1146.798782] lowmem_reserve[]:
[ 1146.798783] 0
[ 1146.798784] 0
[ 1146.798786] 12746
[ 1146.798787] 12746
[ 1146.798791] Normal free:118328kB min:54544kB low:68180kB high:81816kB active_anon:1501764kB inactive_anon:165596kB active_file:2316692kB inactive_file:7741280kB unevictable:4kB writepending:1996kB present:13367296kB managed:13056872kB mlocked:4kB kernel_stack:9216kB pagetables:21592kB bounce:0kB free_pcp:7412kB local_pcp:1320kB free_cma:0kB
[ 1146.798793] lowmem_reserve[]:
[ 1146.798794] 0
[ 1146.798795] 0
[ 1146.798796] 0
[ 1146.798797] 0
[ 1146.798799] DMA:
[ 1146.798800] 1*4kB
[ 1146.798802] (U)
[ 1146.798803] 1*8kB
[ 1146.798805] (U)
[ 1146.798806] 1*16kB
[ 1146.798807] (U)
[ 1146.798808] 0*32kB
[ 1146.798810] 2*64kB
[ 1146.798811] (U)
[ 1146.798812] 1*128kB
[ 1146.798813] (U)
[ 1146.798815] 1*256kB
[ 1146.798816] (U)
[ 1146.798817] 0*512kB
[ 1146.798818] 1*1024kB
[ 1146.798819] (U)
[ 1146.798821] 1*2048kB
[ 1146.798822] (M)
[ 1146.798823] 3*4096kB
[ 1146.798824] (M)
[ 1146.798826] = 15900kB
[ 1146.798828] DMA32:
[ 1146.798829] 6346*4kB
[ 1146.798831] (UME)
[ 1146.798977] 2650*8kB
[ 1146.798980] (UME)
[ 1146.798981] 1118*16kB
[ 1146.799118] (UME)
[ 1146.799119] 774*32kB
[ 1146.799121] (UME)
[ 1146.799122] 367*64kB
[ 1146.799123] (UME)
[ 1146.799125] 191*128kB
[ 1146.799126] (UME)
[ 1146.799128] 64*256kB
[ 1146.799290] (UME)
[ 1146.799291] 37*512kB
[ 1146.799291] (UM)
[ 1146.799291] 21*1024kB
[ 1146.799292] (UME)
[ 1146.799292] 194*2048kB
[ 1146.799292] (UM)
[ 1146.799292] 127*4096kB
[ 1146.799293] (M)
[ 1146.799293] = 1111512kB
[ 1146.799293] Normal:
[ 1146.799294] 2360*4kB
[ 1146.799294] (UME)
[ 1146.799294] 1483*8kB
[ 1146.799295] (UME)
[ 1146.799295] 506*16kB
[ 1146.799295] (UME)
[ 1146.799295] 377*32kB
[ 1146.799296] (UME)
[ 1146.799296] 171*64kB
[ 1146.799296] (UME)
[ 1146.799296] 59*128kB
[ 1146.799297] (UME)
[ 1146.799297] 46*256kB
[ 1146.799297] (UME)
[ 1146.799297] 7*512kB
[ 1146.799298] (UME)
[ 1146.799298] 4*1024kB
[ 1146.799298] (UE)
[ 1146.799299] 5*2048kB
[ 1146.799299] (ME)
[ 1146.799299] 7*4096kB
[ 1146.799299] (M)
[ 1146.799300] = 118328kB
[ 1146.799301] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1146.799301] 2950857 total pagecache pages
[ 1146.799303] 0 pages in swap cache
[ 1146.799576] Swap cache stats: add 0, delete 0, find 0/0
[ 1146.799578] Free swap = 4194300kB
[ 1146.799579] Total swap = 4194300kB
[ 1146.799581] 4138876 pages RAM
[ 1146.799582] 0 pages HighMem/MovableOnly
[ 1146.799584] 77627 pages reserved
[ 1146.799585] Tasks state (memory values in pages):
[ 1146.799586] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[ 1146.799601] [ 971] 0 971 3161 967 57344 0 0 systemd-udevd
[ 1146.799605] [ 2152] 103 2152 902 692 40960 0 0 dbus-daemon
[ 1146.799607] [ 2185] 0 2185 3040 597 57344 0 0 syslog-ng
[ 1146.799609] [ 2186] 0 2186 77434 1839 94208 0 0 syslog-ng
[ 1146.799611] [ 2187] 0 2187 3299 1286 69632 0 0 syslog_redirect
[ 1146.799798] [ 2222] 0 2222 96102 1318 110592 0 0 console-kit-dae
[ 1146.799800] [ 2239] 119 2239 486941 5736 237568 0 0 polkitd
[ 1146.799801] [ 2259] 0 2259 612 23 45056 0 0 gpm
[ 1146.799802] [ 2735] 0 2735 691 66 45056 0 0 dhcpcd
[ 1146.799803] [ 2806] 123 2806 983 412 45056 0 0 ntpd
[ 1146.799805] [ 2807] 123 2807 999 483 49152 0 0 ntpd
[ 1146.799814] [ 2817] 0 2817 945 54 45056 0 0 ntpd
[ 1146.799815] [ 2847] 0 2847 18576 902 118784 0 0 virtlogd
[ 1146.799817] [ 2878] 0 2878 18577 885 110592 0 0 virtlockd
[ 1146.799822] [ 2912] 0 2912 352359 5586 299008 0 0 libvirtd
[ 1146.799823] [ 3055] 0 3055 1917 374 53248 0 0 smartd
[ 1146.799824] [ 3084] 0 3084 2010 454 53248 0 0 cron
[ 1146.799977] [ 3159] 0 3159 78172 1652 106496 0 0 lightdm
[ 1146.799978] [ 3178] 0 3178 223859 45920 868352 0 0 X
[ 1146.799984] [ 3218] 0 3218 42557 1645 98304 0 0 lightdm
[ 1146.799985] [ 3230] 1000 3230 2302 795 61440 0 0 sh
[ 1146.799986] [ 3238] 1000 3238 2160 559 57344 0 0 dbus-launch
[ 1146.799992] [ 3239] 1000 3239 1023 705 49152 0 0 dbus-daemon
[ 1146.799994] [ 3245] 1000 3245 47306 4073 274432 0 0 xfce4-session
[ 1146.799998] [ 3247] 1000 3247 3564 1246 73728 0 0 xfconfd
[ 1146.799999] [ 3250] 1000 3250 1776 82 49152 0 0 ssh-agent
[ 1146.800000] [ 3252] 1000 3252 40853 214 73728 0 0 gpg-agent
[ 1146.800001] [ 3255] 1000 3255 37070 6711 294912 0 0 xfwm4
[ 1146.800002] [ 3257] 1000 3257 29454 4047 266240 0 0 Thunar
[ 1146.800004] [ 3259] 1000 3259 56521 6429 315392 0 0 xfce4-panel
[ 1146.800005] [ 3260] 1000 3260 76198 5296 307200 0 0 xfsettingsd
[ 1146.800006] [ 3264] 1000 3264 164200 15514 430080 0 0 xfdesktop
[ 1146.800007] [ 3269] 0 3269 61046 1602 102400 0 0 upowerd
[ 1146.800008] [ 3285] 1000 3285 66179 4569 290816 0 0 panel-2-actions
[ 1146.800009] [ 3287] 1000 3287 36585 5112 278528 0 0 panel-8-quickla
[ 1146.800010] [ 3288] 1000 3288 633361 62058 1359872 0 0 akregator
[ 1146.800011] [ 3290] 1000 3290 71292 7355 294912 0 0 gkrellm
[ 1146.800012] [ 3292] 1000 3292 60506 1728 102400 0 0 gvfsd
[ 1146.800013] [ 3294] 1000 3294 28673 3873 262144 0 0 panel-6-systray
[ 1146.800014] [ 3295] 1000 3295 155110 18850 507904 0 0 konversation
[ 1146.800015] [ 3296] 1000 3296 35017 5801 274432 0 0 panel-1-genmon
[ 1146.800016] [ 3297] 1000 3297 95199 7413 335872 0 0 panel-4-xfce4-t
[ 1146.800018] [ 3298] 1000 3298 118574 21860 516096 0 0 kate
[ 1146.800045] [ 3300] 1000 3300 35016 5745 278528 0 0 panel-7-datetim
[ 1146.800046] [ 3301] 1000 3301 116227 18944 458752 0 0 konsole
[ 1146.800047] [ 3303] 1000 3303 389058 54055 917504 0 0 clementine
[ 1146.800048] [ 3372] 1000 3372 77563 1551 102400 0 0 at-spi-bus-laun
[ 1146.800201] [ 3381] 1000 3381 865 687 49152 0 0 dbus-daemon
[ 1146.800203] [ 3383] 1000 3383 41371 1516 90112 0 0 at-spi2-registr
[ 1146.800205] [ 3399] 1000 3399 150830 9120 372736 0 0 kactivitymanage
[ 1146.800212] [ 3418] 1000 3418 34971 4175 126976 0 0 clementine-tagr
[ 1146.800214] [ 3419] 1000 3419 34971 4210 131072 0 0 clementine-tagr
[ 1146.800218] [ 3420] 1000 3420 34971 4198 126976 0 0 clementine-tagr
[ 1146.800219] [ 3421] 1000 3421 34971 4193 126976 0 0 clementine-tagr
[ 1146.800220] [ 3422] 1000 3422 34971 4166 135168 0 0 clementine-tagr
[ 1146.800222] [ 3427] 1000 3427 72028 6718 278528 0 0 kglobalaccel5
[ 1146.800228] [ 3428] 1000 3428 34971 4195 126976 0 0 clementine-tagr
[ 1146.800229] [ 3429] 1000 3429 34971 4215 131072 0 0 clementine-tagr
[ 1146.800231] [ 3431] 1000 3431 34971 4219 126976 0 0 clementine-tagr
[ 1146.800236] [ 3444] 1000 3444 6474 5049 98304 0 0 bash
[ 1146.800237] [ 3447] 1000 3447 90666 12645 548864 0 0 QtWebEngineProc
[ 1146.800242] [ 3452] 1000 3452 149454 569 122880 0 0 mergerfs
[ 1146.800243] [ 3487] 0 3487 100615 3753 151552 0 0 udisksd
[ 1146.800244] [ 3512] 1000 3512 8823 1162 102400 0 0 sd_cicero
[ 1146.800246] [ 3516] 1000 3516 8823 1123 98304 0 0 sd_dummy
[ 1146.800247] [ 3519] 1000 3519 8829 1121 94208 0 0 sd_generic
[ 1146.800251] [ 3521] 1000 3521 31475 2098 131072 0 0 sd_espeak
[ 1146.800252] [ 3523] 1000 3523 74986 7943 311296 0 0 kiod5
[ 1146.800253] [ 3546] 0 3546 1992 411 53248 0 0 agetty
[ 1146.800258] [ 3547] 0 3547 2025 601 53248 0 0 agetty
[ 1146.800259] [ 3548] 0 3548 2025 588 57344 0 0 agetty
[ 1146.800264] [ 3549] 0 3549 2025 649 49152 0 0 agetty
[ 1146.800265] [ 3550] 0 3550 2025 608 49152 0 0 agetty
[ 1146.800269] [ 3551] 0 3551 2025 604 49152 0 0 agetty
[ 1146.800270] [ 3559] 1000 3559 39834 554 81920 0 0 speech-dispatch
[ 1146.800274] [ 3569] 1000 3569 60094 2084 94208 0 0 gvfs-udisks2-vo
[ 1146.800276] [ 3574] 1000 3574 24587 2321 172032 0 0 kdeinit5
[ 1146.800279] [ 3575] 1000 3575 76563 9081 331776 0 0 klauncher
[ 1146.800280] [ 3596] 1000 3596 74713 9413 311296 0 0 kded5
[ 1146.800281] [ 3623] 1000 3623 46085 4836 212992 0 0 kio_http_cache_
[ 1146.800282] [ 3653] 1000 3653 138832 11871 372736 0 0 easystroke
[ 1146.800289] [ 3654] 1000 3654 83783 12054 348160 0 0 redshift-gtk
[ 1146.800290] [ 3655] 1000 3655 41365 7457 294912 0 0 orage
[ 1146.800291] [ 3661] 1000 3661 60416 5023 249856 0 0 polkit-gnome-au
[ 1146.800292] [ 3664] 1000 3664 60900 4510 245760 0 0 parcellite
[ 1146.800293] [ 3680] 1000 3680 21790 2021 122880 0 0 xbindkeys
[ 1146.800297] [ 3695] 1000 3695 3987 390 69632 0 0 redshift
[ 1146.800298] [ 3696] 1000 3696 2198 691 57344 0 0 su
[ 1146.800299] [ 3701] 1000 3701 57610 9559 380928 0 0 fcitx
[ 1146.800305] [ 3706] 1000 3706 934 551 53248 0 0 dbus-daemon
[ 1146.800306] [ 3711] 1000 3711 1224 28 49152 0 0 fcitx-dbus-watc
[ 1146.800307] [ 3713] 1000 3713 89866 7908 303104 0 0 xfce4-notifyd
[ 1146.800433] [ 3719] 1000 3719 27953 3652 196608 0 0 file.so
[ 1146.800435] [ 3722] 0 3722 6339 4921 90112 0 0 bash
[ 1146.800436] [ 3754] 0 3754 2899 865 65536 0 0 tmux: client
[ 1146.800437] [ 3756] 0 3756 3593 1512 65536 0 0 tmux: server
[ 1146.800438] [ 3760] 0 3760 1256 492 49152 0 0 tomoyo-queryd
[ 1146.800439] [ 3764] 0 3764 6369 4973 90112 0 0 bash
[ 1146.800440] [ 3771] 1000 3771 6473 5063 94208 0 0 bash
[ 1146.800441] [ 3831] 1000 3831 6474 5057 90112 0 0 bash
[ 1146.800442] [ 3847] 1000 3847 2084 226 57344 0 0 dmesg
[ 1146.800444] [ 4187] 1000 4187 726438 221563 3211264 0 0 firefox
[ 1146.800459] [ 23437] 1000 23437 416243 18201 1122304 0 300 QtWebEngineProc
[ 1146.800460] [ 16704] 0 16704 69439 48429 610304 0 0 emerge
[ 1146.800462] [ 16719] 0 16719 69439 46469 585728 0 0 emerge
[ 1146.800467] [ 16724] 0 16724 1764 316 53248 0 0 rssnotify.pl
[ 1146.800468] Out of memory: Kill process 23437 (QtWebEngineProc) score 303 or sacrifice child
[ 1146.800491] Killed process 23437 (QtWebEngineProc) total-vm:1664972kB, anon-rss:16924kB, file-rss:55892kB, shmem-rss:0kB
[ 1146.803320] oom_reaper: reaped process 23437 (QtWebEngineProc), now anon-rss:0kB, file-rss:0kB, shmem-rss:8kB
[ 1146.837984] emerge: vmalloc: allocation failure, allocated 8192 of 20480 bytes, mode:0x7080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), nodemask=(null)
[ 1146.837993] emerge cpuset=
[ 1146.837995] /
[ 1146.837997] mems_allowed=0
[ 1146.837999] CPU: 4 PID: 16742 Comm: emerge Not tainted 4.19.69 #43
[ 1146.838001] Hardware name: Gigabyte Technology Co., Ltd. Z97X-Gaming G1/Z97X-Gaming G1, BIOS F9 07/31/2015
[ 1146.838002] Call Trace:
[ 1146.838008] dump_stack+0x46/0x60
[ 1146.838013] warn_alloc.cold.133+0x68/0xe8
[ 1146.838016] ? __alloc_pages_nodemask+0x1f7/0x290
[ 1146.838019] __vmalloc_node_range+0x148/0x220
[ 1146.838023] copy_process+0xaab/0x27b0
[ 1146.838025] ? _do_fork+0xb2/0x390
[ 1146.838028] _do_fork+0xb2/0x390
[ 1146.838030] do_syscall_64+0x59/0x180
[ 1146.838033] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1146.838036] RIP: 0033:0x7f3745793aae
[ 1146.838038] Code: db 0f 85 25 01 00 00 64 4c 8b 0c 25 10 00 00 00 45 31 c0 4d 8d 91 d0 02 00 00 31 d2 31 f6 bf 11 00 20 01 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 a6 00 00 00 41 89 c4 85 c0 0f 85 b3 00 00
[ 1146.838040] RSP: 002b:00007ffd34c12ed0 EFLAGS: 00000246
[ 1146.838041] ORIG_RAX: 0000000000000038
[ 1146.838045] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3745793aae
[ 1146.838046] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[ 1146.838048] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f374557b5c0
[ 1146.838050] R10: 00007f374557b890 R11: 0000000000000246 R12: 00007f3744cb17b0
[ 1146.838051] R13: 00007f3745545ad4 R14: 00007f373a32d168 R15: 00007f3745538050
[ 1146.838053] Mem-Info:
[ 1146.838056] active_anon:439660 inactive_anon:41529 isolated_anon:0
active_file:591639 inactive_file:2307377 isolated_file:0
unevictable:1 dirty:500 writeback:0 unstable:0
slab_reclaimable:299846 slab_unreclaimable:22643
mapped:133960 shmem:51818 pagetables:5302 bounce:0
free:310992 free_pcp:4130 free_cma:0
[ 1146.838060] Node 0 active_anon:1758640kB inactive_anon:166116kB active_file:2366556kB inactive_file:9229508kB unevictable:4kB isolated(anon):0kB isolated(file):0kB mapped:535840kB dirty:2000kB writeback:0kB shmem:207272kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 598016kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 1146.838063] DMA free:15900kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:15900kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1146.838064] lowmem_reserve[]:
[ 1146.838065] 0
[ 1146.838067] 3030
[ 1146.838068] 15777
[ 1146.838069] 15777
[ 1146.838071] DMA32 free:1115072kB min:12968kB low:16208kB high:19448kB active_anon:250368kB inactive_anon:416kB active_file:48104kB inactive_file:1489456kB unevictable:0kB writepending:0kB present:3172224kB managed:3172224kB mlocked:0kB kernel_stack:16kB pagetables:228kB bounce:0kB free_pcp:11000kB local_pcp:1348kB free_cma:0kB
[ 1146.838073] lowmem_reserve[]:
[ 1146.838074] 0
[ 1146.838075] 0
[ 1146.838076] 12746
[ 1146.838077] 12746
[ 1146.838080] Normal free:112996kB min:54544kB low:68180kB high:81816kB active_anon:1508272kB inactive_anon:165700kB active_file:2318452kB inactive_file:7740052kB unevictable:4kB writepending:2000kB present:13367296kB managed:13056872kB mlocked:4kB kernel_stack:9220kB pagetables:20980kB bounce:0kB free_pcp:5472kB local_pcp:508kB free_cma:0kB
[ 1146.838081] lowmem_reserve[]:
[ 1146.838082] 0
[ 1146.838083] 0
[ 1146.838085] 0
[ 1146.838086] 0
[ 1146.838088] DMA:
[ 1146.838089] 1*4kB
[ 1146.838090] (U)
[ 1146.838091] 1*8kB
[ 1146.838092] (U)
[ 1146.838093] 1*16kB
[ 1146.838094] (U)
[ 1146.838095] 0*32kB
[ 1146.838097] 2*64kB
[ 1146.838098] (U)
[ 1146.838099] 1*128kB
[ 1146.838100] (U)
[ 1146.838101] 1*256kB
[ 1146.838102] (U)
[ 1146.838103] 0*512kB
[ 1146.838104] 1*1024kB
[ 1146.838106] (U)
[ 1146.838107] 1*2048kB
[ 1146.838279] (M)
[ 1146.838282] 3*4096kB
[ 1146.838284] (M)
[ 1146.838287] = 15900kB
[ 1146.838290] DMA32:
[ 1146.838292] 6524*4kB
[ 1146.838294] (UME)
[ 1146.838295] 2692*8kB
[ 1146.838297] (UME)
[ 1146.838298] 1125*16kB
[ 1146.838300] (UME)
[ 1146.838301] 775*32kB
[ 1146.838303] (UME)
[ 1146.838304] 368*64kB
[ 1146.838306] (UME)
[ 1146.838307] 193*128kB
[ 1146.838308] (UME)
[ 1146.838310] 64*256kB
[ 1146.838311] (UME)
[ 1146.838312] 37*512kB
[ 1146.838313] (UM)
[ 1146.838315] 21*1024kB
[ 1146.838316] (UME)
[ 1146.838318] 195*2048kB
[ 1146.838319] (UM)
[ 1146.838321] 127*4096kB
[ 1146.838322] (M)
[ 1146.838323] = 1115072kB
[ 1146.838324] Normal:
[ 1146.838325] 1083*4kB
[ 1146.838326] (UME)
[ 1146.838327] 1426*8kB
[ 1146.838328] (UM)
[ 1146.838329] 493*16kB
[ 1146.838330] (UE)
[ 1146.838331] 359*32kB
[ 1146.838333] (U)
[ 1146.838334] 161*64kB
[ 1146.838335] (UME)
[ 1146.838336] 54*128kB
[ 1146.838337] (UM)
[ 1146.838339] 43*256kB
[ 1146.838340] (UM)
[ 1146.838341] 9*512kB
[ 1146.838342] (UME)
[ 1146.838343] 5*1024kB
[ 1146.838345] (UME)
[ 1146.838347] 5*2048kB
[ 1146.838348] (ME)
[ 1146.838349] 7*4096kB
[ 1146.838350] (M)
[ 1146.838351] = 111980kB
[ 1146.838353] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1146.838355] 2950826 total pagecache pages
[ 1146.838357] 0 pages in swap cache
[ 1146.838358] Swap cache stats: add 0, delete 0, find 0/0
[ 1146.838360] Free swap = 4194300kB
[ 1146.838362] Total swap = 4194300kB
[ 1146.838363] 4138876 pages RAM
[ 1146.838364] 0 pages HighMem/MovableOnly
[ 1146.838365] 77627 pages reserved
From the output it looks like there was about 9G inactive file memory and no
swap used at all when the OOM killer triggered. The output is messy because I
use netconsole.
Swap is using a zram dev setup like this:
echo 8 > /sys/block/zram0/max_comp_streams
echo lz4 > /sys/block/zram0/comp_algorithm
echo 4G > /sys/block/zram0/disksize
mkswap /dev/zram0
swapon -v --discard /dev/zram0
Those kernel memory allocation failures can also cause kernel NULL pointer
dereference. Here is a dmesg captured over netconsole when that happens:
4,1210,922134743,-;SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
4,1211,922134751,-; cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
4,1212,922134753,-; node 0: slabs: 18346, objs: 568698, free: 0
4,1213,922134762,-;SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
4,1214,922134764,-; cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
4,1215,922134765,-; node 0: slabs: 18346, objs: 568698, free: 0
4,1216,922134770,-;SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
4,1217,922134771,-; cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
4,1218,922134773,-; node 0: slabs: 18346, objs: 568698, free: 0
4,1219,922134776,-;SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
4,1220,922134778,-; cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
4,1221,922134779,-; node 0: slabs: 18346, objs: 568698, free: 0
4,1222,922134784,-;SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
4,1223,922134872,-; cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
4,1224,922134874,-; node 0: slabs: 18346, objs: 568698, free: 0
4,1225,922135143,-;SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
4,1226,922135147,-; cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
4,1227,922135148,-; node 0: slabs: 18351, objs: 568713, free: 0
4,1228,922135152,-;SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
4,1229,922135154,-; cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
4,1230,922135162,-; node 0: slabs: 18351, objs: 568713, free: 0
4,1231,922135165,-;SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
4,1232,922135166,-; cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
4,1233,922135166,-; node 0: slabs: 18351, objs: 568713, free: 0
4,1234,922135168,-;SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
4,1235,922135175,-; cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
4,1236,922135176,-; node 0: slabs: 18351, objs: 568713, free: 0
4,1237,922135181,-;SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
4,1238,922135183,-; cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
4,1239,922135183,-; node 0: slabs: 18351, objs: 568713, free: 0
1,1240,922137835,-;BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
6,1241,922137839,-;PGD 0
4,1242,922137840,c;P4D 0
4,1243,922137842,-;Oops: 0000 [#1] PREEMPT SMP PTI
4,1244,922137844,-;CPU: 3 PID: 16923 Comm: gtk-update-icon Not tainted 4.19.51 #42
4,1245,922137846,-;Hardware name: Gigabyte Technology Co., Ltd. Z97X-Gaming G1/Z97X-Gaming G1, BIOS F9 07/31/2015
4,1246,922137850,-;RIP: 0010:create_empty_buffers+0x24/0x100
4,1247,922137852,-;Code: cd 0f 1f 44 00 00 0f 1f 44 00 00 41 54 49 89 d4 ba 01 00 00 00 55 53 48 89 fb e8 97 fe ff ff 48 89 c5 48 89 c2 eb 03 48 89 ca <48> 8b 4a 08 4c 09 22 48 85 c9 75 f1 48 89 6a 08 48 8b 43 18 48 8d
4,1248,922137853,-;RSP: 0018:ffff927ac1b37bf8 EFLAGS: 00010286
4,1249,922137855,-;RAX: 0000000000000000 RBX: fffff2d4429fd740 RCX: 0000000100097149
4,1250,922137856,-;RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff9075a99fbe00
4,1251,922137857,-;RBP: 0000000000000000 R08: fffff2d440949cc8 R09: 00000000000960c0
4,1252,922137859,-;R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000
4,1253,922137860,-;R13: ffff907601f18360 R14: 0000000000002000 R15: 0000000000001000
4,1254,922137861,-;FS: 00007fb55b288bc0(0000) GS:ffff90761f8c0000(0000) knlGS:0000000000000000
4,1255,922137863,-;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
4,1256,922137865,-;CR2: 0000000000000008 CR3: 000000007aebc002 CR4: 00000000001606e0
4,1257,922137866,-;Call Trace:
4,1258,922137967,-; create_page_buffers+0x4d/0x60
4,1259,922137969,-; __block_write_begin_int+0x8e/0x5a0
4,1260,922137972,-; ? ext4_inode_attach_jinode.part.82+0xb0/0xb0
4,1261,922137975,-; ? jbd2__journal_start+0xd7/0x1f0
4,1262,922137977,-; ext4_da_write_begin+0x112/0x3d0
4,1263,922137980,-; generic_perform_write+0xf1/0x1b0
4,1264,922137983,-; ? file_update_time+0x70/0x140
4,1265,922137985,-; __generic_file_write_iter+0x141/0x1a0
4,1266,922137988,-; ext4_file_write_iter+0xef/0x3b0
4,1267,922137990,-; __vfs_write+0x17e/0x1e0
4,1268,922137992,-; vfs_write+0xa5/0x1a0
4,1269,922137994,-; ksys_write+0x57/0xd0
4,1270,922137997,-; do_syscall_64+0x55/0x160
4,1271,922138000,-; entry_SYSCALL_64_after_hwframe+0x44/0xa9
4,1272,922138003,-;RIP: 0033:0x7fb55ba9c0d8
4,1273,922138004,-;Code: 00 90 48 83 ec 38 64 48 8b 04 25 28 00 00 00 48 89 44 24 28 31 c0 48 8d 05 05 96 0d 00 8b 00 85 c0 75 27 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 60 48 8b 4c 24 28 64 48 33 0c 25 28 00 00 00
4,1274,922138006,-;RSP: 002b:00007fff718c1260 EFLAGS: 00000246
4,1275,922138007,c; ORIG_RAX: 0000000000000001
4,1276,922138124,-;RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007fb55ba9c0d8
4,1277,922138125,-;RDX: 0000000000001000 RSI: 000056395ab78710 RDI: 0000000000000003
4,1278,922138126,-;RBP: 000056395ab78710 R08: 00007fb55b288bc0 R09: 0000000000000000
4,1279,922138128,-;R10: 0000000000000002 R11: 0000000000000246 R12: 000056395ab69900
4,1280,922138129,-;R13: 0000000000001000 R14: 00007fb55bb6c760 R15: 0000000000001000
4,1281,922138131,-;Modules linked in:
4,1282,922138133,c; 8021q
4,1283,922138134,c; iptable_mangle
4,1284,922138135,c; xt_limit
4,1285,922138136,c; xt_conntrack
4,1286,922138137,c; iptable_filter
4,1287,922138138,c; iptable_nat
4,1288,922138139,c; nf_nat_ipv4
4,1289,922138141,c; nf_nat
4,1290,922138142,c; ip_tables
4,1291,922138143,c; arc4
4,1292,922138144,c; ath9k_htc
4,1293,922138145,c; ath9k_common
4,1294,922138146,c; ath9k_hw
4,1295,922138147,c; ath
4,1296,922138148,c; mac80211
4,1297,922138150,c; kvm_intel
4,1298,922138151,c; cfg80211
4,1299,922138152,c; kvm
4,1300,922138153,c; uas
4,1301,922138154,c; crc32_pclmul
4,1302,922138156,c; usb_storage
4,1303,922138157,c; joydev
4,1304,922138201,c; cdc_acm
4,1305,922138203,-;CR2: 0000000000000008
4,1306,922138270,-;---[ end trace ee8624c121072f8e ]---
4,1307,922138273,-;RIP: 0010:create_empty_buffers+0x24/0x100
4,1308,922138275,-;Code: cd 0f 1f 44 00 00 0f 1f 44 00 00 41 54 49 89 d4 ba 01 00 00 00 55 53 48 89 fb e8 97 fe ff ff 48 89 c5 48 89 c2 eb 03 48 89 ca <48> 8b 4a 08 4c 09 22 48 85 c9 75 f1 48 89 6a 08 48 8b 43 18 48 8d
4,1309,922138278,-;RSP: 0018:ffff927ac1b37bf8 EFLAGS: 00010286
4,1310,922138280,-;RAX: 0000000000000000 RBX: fffff2d4429fd740 RCX: 0000000100097149
4,1311,922138281,-;RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff9075a99fbe00
4,1312,922138282,-;RBP: 0000000000000000 R08: fffff2d440949cc8 R09: 00000000000960c0
4,1313,922138284,-;R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000
4,1314,922138285,-;R13: ffff907601f18360 R14: 0000000000002000 R15: 0000000000001000
4,1315,922138286,-;FS: 00007fb55b288bc0(0000) GS:ffff90761f8c0000(0000) knlGS:0000000000000000
4,1316,922138288,-;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
4,1317,922138289,-;CR2: 0000000000000008 CR3: 000000007aebc002 CR4: 00000000001606e0
0,1318,922138290,-;Kernel panic - not syncing: Fatal exception
0,1319,922138296,-;Kernel Offset: 0x6000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
0,1320,922138299,-;---[ end Kernel panic - not syncing: Fatal exception ]---
Since I never had this problem with 4.14 I tested the old 4.14.115 that I used
before updating to 4.19 and I couldn't reproduce the problem with it. I can
easily reproduce the problem at least back to 4.19.42 in the 4.19 series.
I could bisect the problem but that's going to take forever so I'm hoping I
can avoid that.
The problem only occurs when that "12G" memory cgroups is used to build
chromium and nothing else. Chromium is the largest package I regularly build
and I selectively enable ccache for the chromium build. My gut feeling tells
the that the massive number of file operations needed for the build is what is
triggering the problem. Perhaps when the memory.kmem.limit_in_bytes limit is
reached?
Here is some more info I don't think fit in a single mail.
full dmesg https://pastebin.com/raw/tKgTCTJ2
.config https://pastebin.com/raw/jKhSqqCX
Some relevant parts of /etc/sysctl.conf:
vm.dirty_writeback_centisecs = 3000
vm.dirty_background_bytes = 52428800
vm.dirty_bytes = 262144000
vm.swappiness = 99
vm.vfs_cache_pressure = 25
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] Early OOM and kernel NULL pointer dereference in 4.19.69
2019-09-01 20:43 [BUG] Early OOM and kernel NULL pointer dereference in 4.19.69 Thomas Lindroth
@ 2019-09-02 7:16 ` Michal Hocko
2019-09-02 7:27 ` Michal Hocko
2019-09-02 19:34 ` Thomas Lindroth
[not found] ` <666dbcde-1b8a-9e2d-7d1f-48a117c78ae1@I-love.SAKURA.ne.jp>
2019-09-06 12:56 ` [PATCH] memcg, kmem: do not fail __GFP_NOFAIL charges Michal Hocko
2 siblings, 2 replies; 29+ messages in thread
From: Michal Hocko @ 2019-09-02 7:16 UTC (permalink / raw)
To: Thomas Lindroth; +Cc: linux-mm, stable
On Sun 01-09-19 22:43:05, Thomas Lindroth wrote:
> After upgrading to the 4.19 series I've started getting problems with
> early OOM.
What is the kenrel you have updated from? Would it be possible to try
the current Linus' tree?
> I run a Gentoo system and do large compiles like the chromium browser in a
> v1 memory cgroup. When I build chromium in the memory cgroup the OOM killer
> runs and kills programs outside of the cgroup. This happens even when there
> is plenty of free memory both in and outside of the cgroup.
[...]
> [ 1146.798696] emerge invoked oom-killer: gfp_mask=0x0(), nodemask=(null), order=0, oom_score_adj=0
> [ 1146.798699] emerge cpuset=
> [ 1146.798701] /
> [ 1146.798703] mems_allowed=0
> [ 1146.798705] CPU: 4 PID: 16719 Comm: emerge Not tainted 4.19.69 #43
> [ 1146.798707] Hardware name: Gigabyte Technology Co., Ltd. Z97X-Gaming G1/Z97X-Gaming G1, BIOS F9 07/31/2015
> [ 1146.798708] Call Trace:
> [ 1146.798713] dump_stack+0x46/0x60
> [ 1146.798718] dump_header+0x67/0x28d
> [ 1146.798721] oom_kill_process.cold.31+0xb/0x1f3
> [ 1146.798723] out_of_memory+0x129/0x250
> [ 1146.798728] pagefault_out_of_memory+0x64/0x77
> [ 1146.798732] __do_page_fault+0x3c1/0x3d0
> [ 1146.798735] do_page_fault+0x2c/0x123
> [ 1146.798738] ? page_fault+0x8/0x30
> [ 1146.798740] page_fault+0x1e/0x30
This is not a memcg oom killer and the oom killer itself is a reaction
to the allocation not making a forward progress. It smells like
something in the page fault path has return ENOMEM leading to
VM_FAULT_OOM. Seeing unexpected SLUB allocation failures would suggest
that something is not really working properly there.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] Early OOM and kernel NULL pointer dereference in 4.19.69
2019-09-02 7:16 ` Michal Hocko
@ 2019-09-02 7:27 ` Michal Hocko
2019-09-02 19:34 ` Thomas Lindroth
1 sibling, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2019-09-02 7:27 UTC (permalink / raw)
To: Thomas Lindroth; +Cc: linux-mm, stable
On Mon 02-09-19 09:16:17, Michal Hocko wrote:
> On Sun 01-09-19 22:43:05, Thomas Lindroth wrote:
> > After upgrading to the 4.19 series I've started getting problems with
> > early OOM.
>
> What is the kenrel you have updated from? Would it be possible to try
> the current Linus' tree?
Btw. checking vanilla 4.19 without stable patches might be interesting
as well.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] Early OOM and kernel NULL pointer dereference in 4.19.69
2019-09-02 7:16 ` Michal Hocko
2019-09-02 7:27 ` Michal Hocko
@ 2019-09-02 19:34 ` Thomas Lindroth
2019-09-03 7:41 ` Michal Hocko
1 sibling, 1 reply; 29+ messages in thread
From: Thomas Lindroth @ 2019-09-02 19:34 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, stable
On 9/2/19 9:16 AM, Michal Hocko wrote:
> On Sun 01-09-19 22:43:05, Thomas Lindroth wrote:
>> After upgrading to the 4.19 series I've started getting problems with
>> early OOM.
>
> What is the kenrel you have updated from? Would it be possible to try
> the current Linus' tree?
I did some more testing and it turns out this is not a regression after all.
I followed up on my hunch and monitored memory.kmem.max_usage_in_bytes while
running cgexec -g memory:12G bash -c 'find / -xdev -type f -print0 | \
xargs -0 -n 1 -P 8 stat > /dev/null'
Just as memory.kmem.max_usage_in_bytes = memory.kmem.limit_in_bytes the OOM
killer kicked in and killed my X server.
Using the find|stat approach it was easy to test the problem in a testing VM.
I was able to reproduce the problem in all these kernels:
4.9.0
4.14.0
4.14.115
4.19.0
5.2.11
5.3-rc6 didn't build in the VM. The build environment is too old probably.
I was curious why I initially couldn't reproduce the problem in 4.14 by
building chromium. I was again able to successfully build chromium using
4.14.115. Turns out memory.kmem.max_usage_in_bytes was 1015689216 after
building and my limit is set to 1073741824. I guess some unrelated change in
memory management raised that slightly for 4.19 triggering the problem.
If you want to reproduce for yourself here are the steps:
1. build any kernel above 4.9 using something like my .config
2. setup a v1 memory cgroup with memory.kmem.limit_in_bytes lower than
memory.limit_in_bytes. I used 100M in my testing VM.
3. Run "find / -xdev -type f -print0 | xargs -0 -n 1 -P 8 stat > /dev/null"
in the cgroup.
4. Assuming there is enough inodes on the rootfs the global OOM killer
should kick in when memory.kmem.max_usage_in_bytes =
memory.kmem.limit_in_bytes and kill something outside the cgroup.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] Early OOM and kernel NULL pointer dereference in 4.19.69
2019-09-02 19:34 ` Thomas Lindroth
@ 2019-09-03 7:41 ` Michal Hocko
2019-09-03 12:01 ` Thomas Lindroth
2019-09-03 12:05 ` Andrey Ryabinin
0 siblings, 2 replies; 29+ messages in thread
From: Michal Hocko @ 2019-09-03 7:41 UTC (permalink / raw)
To: Thomas Lindroth; +Cc: linux-mm, stable
On Mon 02-09-19 21:34:29, Thomas Lindroth wrote:
> On 9/2/19 9:16 AM, Michal Hocko wrote:
> > On Sun 01-09-19 22:43:05, Thomas Lindroth wrote:
> > > After upgrading to the 4.19 series I've started getting problems with
> > > early OOM.
> >
> > What is the kenrel you have updated from? Would it be possible to try
> > the current Linus' tree?
>
> I did some more testing and it turns out this is not a regression after all.
>
> I followed up on my hunch and monitored memory.kmem.max_usage_in_bytes while
> running cgexec -g memory:12G bash -c 'find / -xdev -type f -print0 | \
> xargs -0 -n 1 -P 8 stat > /dev/null'
>
> Just as memory.kmem.max_usage_in_bytes = memory.kmem.limit_in_bytes the OOM
> killer kicked in and killed my X server.
>
> Using the find|stat approach it was easy to test the problem in a testing VM.
> I was able to reproduce the problem in all these kernels:
> 4.9.0
> 4.14.0
> 4.14.115
> 4.19.0
> 5.2.11
>
> 5.3-rc6 didn't build in the VM. The build environment is too old probably.
>
> I was curious why I initially couldn't reproduce the problem in 4.14 by
> building chromium. I was again able to successfully build chromium using
> 4.14.115. Turns out memory.kmem.max_usage_in_bytes was 1015689216 after
> building and my limit is set to 1073741824. I guess some unrelated change in
> memory management raised that slightly for 4.19 triggering the problem.
>
> If you want to reproduce for yourself here are the steps:
> 1. build any kernel above 4.9 using something like my .config
> 2. setup a v1 memory cgroup with memory.kmem.limit_in_bytes lower than
> memory.limit_in_bytes. I used 100M in my testing VM.
> 3. Run "find / -xdev -type f -print0 | xargs -0 -n 1 -P 8 stat > /dev/null"
> in the cgroup.
> 4. Assuming there is enough inodes on the rootfs the global OOM killer
> should kick in when memory.kmem.max_usage_in_bytes =
> memory.kmem.limit_in_bytes and kill something outside the cgroup.
This is certainly a bug. Is this still an OOM triggered from
pagefault_out_of_memory? Since 4.19 (29ef680ae7c21) the memcg charge
path should invoke the memcg oom killer directly from the charge path.
If that doesn't happen then the failing charge is either GFP_NOFS or a
large allocation.
The former has been fixed just recently by http://lkml.kernel.org/r/cbe54ed1-b6ba-a056-8899-2dc42526371d@i-love.sakura.ne.jp
and I suspect this is a fix you are looking for. Although it is curious
that you can see a global oom even before because the charge path would
mark an oom situation even for NOFS context and it should trigger the
memcg oom killer on the way out from the page fault path. So essentially
the same call trace except the oom killer should be constrained to the
memcg context.
Could you try the above patch please?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] Early OOM and kernel NULL pointer dereference in 4.19.69
2019-09-03 7:41 ` Michal Hocko
@ 2019-09-03 12:01 ` Thomas Lindroth
2019-09-03 12:05 ` Andrey Ryabinin
1 sibling, 0 replies; 29+ messages in thread
From: Thomas Lindroth @ 2019-09-03 12:01 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, stable
On 9/3/19 9:41 AM, Michal Hocko wrote:
> On Mon 02-09-19 21:34:29, Thomas Lindroth wrote:
>> On 9/2/19 9:16 AM, Michal Hocko wrote:
>>> On Sun 01-09-19 22:43:05, Thomas Lindroth wrote:
>>>> After upgrading to the 4.19 series I've started getting problems with
>>>> early OOM.
>>>
>>> What is the kenrel you have updated from? Would it be possible to try
>>> the current Linus' tree?
>>
>> I did some more testing and it turns out this is not a regression after all.
>>
>> I followed up on my hunch and monitored memory.kmem.max_usage_in_bytes while
>> running cgexec -g memory:12G bash -c 'find / -xdev -type f -print0 | \
>> xargs -0 -n 1 -P 8 stat > /dev/null'
>>
>> Just as memory.kmem.max_usage_in_bytes = memory.kmem.limit_in_bytes the OOM
>> killer kicked in and killed my X server.
>>
>> Using the find|stat approach it was easy to test the problem in a testing VM.
>> I was able to reproduce the problem in all these kernels:
>> 4.9.0
>> 4.14.0
>> 4.14.115
>> 4.19.0
>> 5.2.11
>>
>> 5.3-rc6 didn't build in the VM. The build environment is too old probably.
>>
>> I was curious why I initially couldn't reproduce the problem in 4.14 by
>> building chromium. I was again able to successfully build chromium using
>> 4.14.115. Turns out memory.kmem.max_usage_in_bytes was 1015689216 after
>> building and my limit is set to 1073741824. I guess some unrelated change in
>> memory management raised that slightly for 4.19 triggering the problem.
>>
>> If you want to reproduce for yourself here are the steps:
>> 1. build any kernel above 4.9 using something like my .config
>> 2. setup a v1 memory cgroup with memory.kmem.limit_in_bytes lower than
>> memory.limit_in_bytes. I used 100M in my testing VM.
>> 3. Run "find / -xdev -type f -print0 | xargs -0 -n 1 -P 8 stat > /dev/null"
>> in the cgroup.
>> 4. Assuming there is enough inodes on the rootfs the global OOM killer
>> should kick in when memory.kmem.max_usage_in_bytes =
>> memory.kmem.limit_in_bytes and kill something outside the cgroup.
>
> This is certainly a bug. Is this still an OOM triggered from
> pagefault_out_of_memory? Since 4.19 (29ef680ae7c21) the memcg charge
> path should invoke the memcg oom killer directly from the charge path.
> If that doesn't happen then the failing charge is either GFP_NOFS or a
> large allocation.
>
> The former has been fixed just recently by http://lkml.kernel.org/r/cbe54ed1-b6ba-a056-8899-2dc42526371d@i-love.sakura.ne.jp
> and I suspect this is a fix you are looking for. Although it is curious
> that you can see a global oom even before because the charge path would
> mark an oom situation even for NOFS context and it should trigger the
> memcg oom killer on the way out from the page fault path. So essentially
> the same call trace except the oom killer should be constrained to the
> memcg context.
>
> Could you try the above patch please?
I tried the patch in my testing VM on top of 5.2.11. The VM got 8G ram and
these cgroup settings:
memory.kmem.limit_in_bytes:107374182
memory.kmem.tcp.limit_in_bytes:1073741824
memory.limit_in_bytes:1073741824
memory.memsw.limit_in_bytes:12884901888
As kmem.limit_in_bytes was hit the OOM killer killed Xorg. Here is the
full dmesg:
[ 0.000000] Linux version 5.2.11+ (root@debian) (gcc version 4.7.2 (Debian 4.7.2-5)) #5 SMP Tue Sep 3 08:33:32 EDT 2019
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.2.11+ root=UUID=d51ad2bd-595d-4dad-abf3-21cddbb2aee5 ro quiet
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007ffddfff] usable
[ 0.000000] BIOS-e820: [mem 0x000000007ffde000-0x000000007fffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000b0000000-0x00000000bfffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000027fffffff] usable
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] SMBIOS 2.8 present.
[ 0.000000] DMI: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-1.fc28 04/01/2014
[ 0.000000] tsc: Fast TSC calibration using PIT
[ 0.000000] tsc: Detected 4000.128 MHz processor
[ 0.001244] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[ 0.001245] e820: remove [mem 0x000a0000-0x000fffff] usable
[ 0.001247] last_pfn = 0x280000 max_arch_pfn = 0x400000000
[ 0.001266] MTRR default type: write-back
[ 0.001267] MTRR fixed ranges enabled:
[ 0.001268] 00000-9FFFF write-back
[ 0.001268] A0000-BFFFF uncachable
[ 0.001268] C0000-FFFFF write-protect
[ 0.001269] MTRR variable ranges enabled:
[ 0.001269] 0 base 00C0000000 mask FFC0000000 uncachable
[ 0.001270] 1 disabled
[ 0.001270] 2 disabled
[ 0.001270] 3 disabled
[ 0.001271] 4 disabled
[ 0.001271] 5 disabled
[ 0.001271] 6 disabled
[ 0.001271] 7 disabled
[ 0.001278] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
[ 0.001282] last_pfn = 0x7ffde max_arch_pfn = 0x400000000
[ 0.006102] found SMP MP-table at [mem 0x000f5d10-0x000f5d1f]
[ 0.006309] Using GB pages for direct mapping
[ 0.006311] BRK [0x3b201000, 0x3b201fff] PGTABLE
[ 0.006312] BRK [0x3b202000, 0x3b202fff] PGTABLE
[ 0.006313] BRK [0x3b203000, 0x3b203fff] PGTABLE
[ 0.006326] BRK [0x3b204000, 0x3b204fff] PGTABLE
[ 0.006377] RAMDISK: [mem 0x246fa000-0x2e374fff]
[ 0.006385] ACPI: Early table checksum verification disabled
[ 0.006417] ACPI: RSDP 0x00000000000F5B40 000014 (v00 BOCHS )
[ 0.006419] ACPI: RSDT 0x000000007FFE21CF 000034 (v01 BOCHS BXPCRSDT 00000001 BXPC 00000001)
[ 0.006423] ACPI: FACP 0x000000007FFE1FD7 0000F4 (v03 BOCHS BXPCFACP 00000001 BXPC 00000001)
[ 0.006426] ACPI: DSDT 0x000000007FFE0040 001F97 (v01 BOCHS BXPCDSDT 00000001 BXPC 00000001)
[ 0.006429] ACPI: FACS 0x000000007FFE0000 000040
[ 0.006430] ACPI: APIC 0x000000007FFE20CB 000090 (v01 BOCHS BXPCAPIC 00000001 BXPC 00000001)
[ 0.006432] ACPI: HPET 0x000000007FFE215B 000038 (v01 BOCHS BXPCHPET 00000001 BXPC 00000001)
[ 0.006435] ACPI: MCFG 0x000000007FFE2193 00003C (v01 BOCHS BXPCMCFG 00000001 BXPC 00000001)
[ 0.006440] ACPI: Local APIC address 0xfee00000
[ 0.006610] No NUMA configuration found
[ 0.006611] Faking a node at [mem 0x0000000000000000-0x000000027fffffff]
[ 0.006613] NODE_DATA(0) allocated [mem 0x27fffa000-0x27fffdfff]
[ 0.006629] Zone ranges:
[ 0.006630] DMA [mem 0x0000000000001000-0x0000000000ffffff]
[ 0.006631] DMA32 [mem 0x0000000001000000-0x00000000ffffffff]
[ 0.006631] Normal [mem 0x0000000100000000-0x000000027fffffff]
[ 0.006632] Movable zone start for each node
[ 0.006632] Early memory node ranges
[ 0.006633] node 0: [mem 0x0000000000001000-0x000000000009efff]
[ 0.006633] node 0: [mem 0x0000000000100000-0x000000007ffddfff]
[ 0.006633] node 0: [mem 0x0000000100000000-0x000000027fffffff]
[ 0.006641] Zeroed struct page in unavailable ranges: 132 pages
[ 0.006642] Initmem setup node 0 [mem 0x0000000000001000-0x000000027fffffff]
[ 0.006643] On node 0 totalpages: 2097020
[ 0.006643] DMA zone: 64 pages used for memmap
[ 0.006643] DMA zone: 21 pages reserved
[ 0.006644] DMA zone: 3998 pages, LIFO batch:0
[ 0.006677] DMA32 zone: 8128 pages used for memmap
[ 0.006678] DMA32 zone: 520158 pages, LIFO batch:63
[ 0.012564] Normal zone: 24576 pages used for memmap
[ 0.012565] Normal zone: 1572864 pages, LIFO batch:63
[ 0.025222] ACPI: PM-Timer IO Port: 0x608
[ 0.025224] ACPI: Local APIC address 0xfee00000
[ 0.025231] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[ 0.025837] IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23
[ 0.025840] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.025841] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[ 0.025841] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.025842] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[ 0.025843] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[ 0.025844] ACPI: IRQ0 used by override.
[ 0.025844] ACPI: IRQ5 used by override.
[ 0.025845] ACPI: IRQ9 used by override.
[ 0.025845] ACPI: IRQ10 used by override.
[ 0.025845] ACPI: IRQ11 used by override.
[ 0.025847] Using ACPI (MADT) for SMP configuration information
[ 0.025848] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[ 0.025855] smpboot: Allowing 4 CPUs, 0 hotplug CPUs
[ 0.025863] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
[ 0.025864] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
[ 0.025864] PM: Registered nosave memory: [mem 0x000a0000-0x000effff]
[ 0.025864] PM: Registered nosave memory: [mem 0x000f0000-0x000fffff]
[ 0.025865] PM: Registered nosave memory: [mem 0x7ffde000-0x7fffffff]
[ 0.025865] PM: Registered nosave memory: [mem 0x80000000-0xafffffff]
[ 0.025866] PM: Registered nosave memory: [mem 0xb0000000-0xbfffffff]
[ 0.025866] PM: Registered nosave memory: [mem 0xc0000000-0xfed1bfff]
[ 0.025866] PM: Registered nosave memory: [mem 0xfed1c000-0xfed1ffff]
[ 0.025867] PM: Registered nosave memory: [mem 0xfed20000-0xfeffbfff]
[ 0.025867] PM: Registered nosave memory: [mem 0xfeffc000-0xfeffffff]
[ 0.025867] PM: Registered nosave memory: [mem 0xff000000-0xfffbffff]
[ 0.025868] PM: Registered nosave memory: [mem 0xfffc0000-0xffffffff]
[ 0.025869] [mem 0xc0000000-0xfed1bfff] available for PCI devices
[ 0.025871] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[ 0.025875] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:4 nr_node_ids:1
[ 0.025993] percpu: Embedded 51 pages/cpu s168088 r8192 d32616 u524288
[ 0.025996] pcpu-alloc: s168088 r8192 d32616 u524288 alloc=1*2097152
[ 0.025997] pcpu-alloc: [0] 0 1 2 3
[ 0.026008] Built 1 zonelists, mobility grouping on. Total pages: 2064231
[ 0.026008] Policy zone: Normal
[ 0.026009] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.2.11+ root=UUID=d51ad2bd-595d-4dad-abf3-21cddbb2aee5 ro quiet
[ 0.028738] Calgary: detecting Calgary via BIOS EBDA area
[ 0.028739] Calgary: Unable to locate Rio Grande table in EBDA - bailing!
[ 0.047644] Memory: 8011344K/8388080K available (8194K kernel code, 823K rwdata, 2088K rodata, 1148K init, 2048K bss, 376736K reserved, 0K cma-reserved)
[ 0.047707] Kernel/User page tables isolation: enabled
[ 0.047759] rcu: Hierarchical RCU implementation.
[ 0.047760] rcu: RCU restricting CPUs from NR_CPUS=512 to nr_cpu_ids=4.
[ 0.047761] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[ 0.047761] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[ 0.047849] NR_IRQS: 33024, nr_irqs: 456, preallocated irqs: 16
[ 0.048065] random: get_random_bytes called from start_kernel+0x2e9/0x4b3 with crng_init=0
[ 0.060768] Console: colour VGA+ 80x25
[ 0.060772] printk: console [tty0] enabled
[ 0.060783] ACPI: Core revision 20190509
[ 0.060925] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604467 ns
[ 0.060966] hpet clockevent registered
[ 0.060974] APIC: Switch to symmetric I/O mode setup
[ 0.062688] x2apic: IRQ remapping doesn't support X2APIC mode
[ 0.070234] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.089009] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x39a8d58b854, max_idle_ns: 440795351064 ns
[ 0.089011] Calibrating delay loop (skipped), value calculated using timer frequency.. 8000.25 BogoMIPS (lpj=16000512)
[ 0.089012] pid_max: default: 32768 minimum: 301
[ 0.089040] LSM: Security Framework initializing
[ 0.089883] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[ 0.090314] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
[ 0.090336] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes)
[ 0.090349] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes)
[ 0.090531] x86/cpu: User Mode Instruction Prevention (UMIP) activated
[ 0.090565] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[ 0.090565] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
[ 0.090567] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[ 0.090568] Spectre V2 : Spectre mitigation: kernel not compiled with retpoline; no mitigation available!
[ 0.090568] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp
[ 0.090773] MDS: Mitigation: Clear CPU buffers
[ 0.090880] Freeing SMP alternatives memory: 16K
[ 0.090954] TSC deadline timer enabled
[ 0.090963] smpboot: CPU0: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz (family: 0x6, model: 0x3c, stepping: 0x3)
[ 0.091020] Performance Events: Haswell events, Intel PMU driver.
[ 0.091039] ... version: 2
[ 0.091039] ... bit width: 48
[ 0.091040] ... generic registers: 4
[ 0.091040] ... value mask: 0000ffffffffffff
[ 0.091041] ... max period: 000000007fffffff
[ 0.091041] ... fixed-purpose events: 3
[ 0.091041] ... event mask: 000000070000000f
[ 0.091065] rcu: Hierarchical SRCU implementation.
[ 0.091114] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
[ 0.093010] smp: Bringing up secondary CPUs ...
[ 0.093010] x86: Booting SMP configuration:
[ 0.093010] .... node #0, CPUs: #1 #2 #3
[ 0.093329] smp: Brought up 1 node, 4 CPUs
[ 0.093329] smpboot: Max logical packages: 1
[ 0.093329] smpboot: Total of 4 processors activated (32001.02 BogoMIPS)
[ 0.093345] devtmpfs: initialized
[ 0.093345] x86/mm: Memory block size: 128MB
[ 0.093538] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[ 0.093538] futex hash table entries: 1024 (order: 4, 65536 bytes)
[ 0.093538] NET: Registered protocol family 16
[ 0.093538] audit: initializing netlink subsys (disabled)
[ 0.093538] audit: type=2000 audit(1567514293.032:1): state=initialized audit_enabled=0 res=1
[ 0.093538] cpuidle: using governor ladder
[ 0.093538] cpuidle: using governor menu
[ 0.093538] ACPI: bus type PCI registered
[ 0.093538] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[ 0.093538] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0xb0000000-0xbfffffff] (base 0xb0000000)
[ 0.093538] PCI: MMCONFIG at [mem 0xb0000000-0xbfffffff] reserved in E820
[ 0.093538] PCI: Using configuration type 1 for base access
[ 0.093538] core: PMU erratum BJ122, BV98, HSD29 workaround disabled, HT off
[ 0.093918] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
[ 0.093918] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[ 0.245109] ACPI: Added _OSI(Module Device)
[ 0.245110] ACPI: Added _OSI(Processor Device)
[ 0.245111] ACPI: Added _OSI(3.0 _SCP Extensions)
[ 0.245111] ACPI: Added _OSI(Processor Aggregator Device)
[ 0.245113] ACPI: Added _OSI(Linux-Dell-Video)
[ 0.245113] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[ 0.245114] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
[ 0.245894] ACPI: 1 ACPI AML tables successfully acquired and loaded
[ 0.246706] ACPI: Interpreter enabled
[ 0.246714] ACPI: (supports S0 S3 S4 S5)
[ 0.246715] ACPI: Using IOAPIC for interrupt routing
[ 0.246730] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[ 0.246782] ACPI: Enabled 1 GPEs in block 00 to 3F
[ 0.248024] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[ 0.248027] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
[ 0.248074] acpi PNP0A08:00: _OSC: platform does not support [LTR]
[ 0.248113] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug PME AER PCIeCapability]
[ 0.248168] PCI host bridge to bus 0000:00
[ 0.248169] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window]
[ 0.248170] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window]
[ 0.248171] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
[ 0.248172] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xfebfffff window]
[ 0.248172] pci_bus 0000:00: root bus resource [mem 0x280000000-0xa7fffffff window]
[ 0.248173] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 0.248191] pci 0000:00:00.0: [8086:29c0] type 00 class 0x060000
[ 0.248413] pci 0000:00:01.0: [1b36:0100] type 00 class 0x030000
[ 0.250024] pci 0000:00:01.0: reg 0x10: [mem 0xf4000000-0xf7ffffff]
[ 0.251636] pci 0000:00:01.0: reg 0x14: [mem 0xf8000000-0xfbffffff]
[ 0.253062] pci 0000:00:01.0: reg 0x18: [mem 0xfc094000-0xfc095fff]
[ 0.254713] pci 0000:00:01.0: reg 0x1c: [io 0xc040-0xc05f]
[ 0.260010] pci 0000:00:01.0: reg 0x30: [mem 0xfc080000-0xfc08ffff pref]
[ 0.260274] pci 0000:00:02.0: [8086:10d3] type 00 class 0x020000
[ 0.261010] pci 0000:00:02.0: reg 0x10: [mem 0xfc040000-0xfc05ffff]
[ 0.261353] pci 0000:00:02.0: reg 0x14: [mem 0xfc060000-0xfc07ffff]
[ 0.262094] pci 0000:00:02.0: reg 0x18: [io 0xc060-0xc07f]
[ 0.262757] pci 0000:00:02.0: reg 0x1c: [mem 0xfc090000-0xfc093fff]
[ 0.265016] pci 0000:00:02.0: reg 0x30: [mem 0xfc000000-0xfc03ffff pref]
[ 0.265498] pci 0000:00:1f.0: [8086:2918] type 00 class 0x060100
[ 0.265732] pci 0000:00:1f.0: quirk: [io 0x0600-0x067f] claimed by ICH6 ACPI/GPIO/TCO
[ 0.265841] pci 0000:00:1f.2: [8086:2922] type 00 class 0x010601
[ 0.269007] pci 0000:00:1f.2: reg 0x20: [io 0xc080-0xc09f]
[ 0.269470] pci 0000:00:1f.2: reg 0x24: [mem 0xfc096000-0xfc096fff]
[ 0.270038] pci 0000:00:1f.3: [8086:2930] type 00 class 0x0c0500
[ 0.271177] pci 0000:00:1f.3: reg 0x20: [io 0x0700-0x073f]
[ 0.271942] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[ 0.271989] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[ 0.272031] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[ 0.272072] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[ 0.272114] ACPI: PCI Interrupt Link [LNKE] (IRQs 5 *10 11)
[ 0.272155] ACPI: PCI Interrupt Link [LNKF] (IRQs 5 *10 11)
[ 0.272260] ACPI: PCI Interrupt Link [LNKG] (IRQs 5 10 *11)
[ 0.272304] ACPI: PCI Interrupt Link [LNKH] (IRQs 5 10 *11)
[ 0.272320] ACPI: PCI Interrupt Link [GSIA] (IRQs *16)
[ 0.272326] ACPI: PCI Interrupt Link [GSIB] (IRQs *17)
[ 0.272331] ACPI: PCI Interrupt Link [GSIC] (IRQs *18)
[ 0.272337] ACPI: PCI Interrupt Link [GSID] (IRQs *19)
[ 0.272341] ACPI: PCI Interrupt Link [GSIE] (IRQs *20)
[ 0.272346] ACPI: PCI Interrupt Link [GSIF] (IRQs *21)
[ 0.272355] ACPI: PCI Interrupt Link [GSIG] (IRQs *22)
[ 0.272360] ACPI: PCI Interrupt Link [GSIH] (IRQs *23)
[ 0.272646] pci 0000:00:01.0: vgaarb: setting as boot VGA device
[ 0.272646] pci 0000:00:01.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[ 0.272646] pci 0000:00:01.0: vgaarb: bridge control possible
[ 0.272646] vgaarb: loaded
[ 0.272646] pps_core: LinuxPPS API ver. 1 registered
[ 0.272646] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[ 0.272646] PTP clock support registered
[ 0.272646] EDAC MC: Ver: 3.0.0
[ 0.272646] PCI: Using ACPI for IRQ routing
[ 0.303863] PCI: pci_cache_line_size set to 64 bytes
[ 0.303921] e820: reserve RAM buffer [mem 0x0009fc00-0x0009ffff]
[ 0.303924] e820: reserve RAM buffer [mem 0x7ffde000-0x7fffffff]
[ 0.304074] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[ 0.304074] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
[ 0.305026] clocksource: Switched to clocksource tsc-early
[ 0.309208] VFS: Disk quotas dquot_6.6.0
[ 0.309223] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[ 0.309262] pnp: PnP ACPI init
[ 0.309303] pnp 00:00: Plug and Play ACPI device, IDs PNP0b00 (active)
[ 0.309324] pnp 00:01: Plug and Play ACPI device, IDs PNP0303 (active)
[ 0.309340] pnp 00:02: Plug and Play ACPI device, IDs PNP0f13 (active)
[ 0.309373] pnp 00:03: Plug and Play ACPI device, IDs PNP0400 (active)
[ 0.309400] pnp 00:04: Plug and Play ACPI device, IDs PNP0501 (active)
[ 0.309542] pnp: PnP ACPI: found 5 devices
[ 0.315938] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[ 0.315947] pci_bus 0000:00: resource 4 [io 0x0000-0x0cf7 window]
[ 0.315948] pci_bus 0000:00: resource 5 [io 0x0d00-0xffff window]
[ 0.315948] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff window]
[ 0.315949] pci_bus 0000:00: resource 7 [mem 0xc0000000-0xfebfffff window]
[ 0.315950] pci_bus 0000:00: resource 8 [mem 0x280000000-0xa7fffffff window]
[ 0.315988] NET: Registered protocol family 2
[ 0.316139] tcp_listen_portaddr_hash hash table entries: 4096 (order: 4, 65536 bytes)
[ 0.316157] TCP established hash table entries: 65536 (order: 7, 524288 bytes)
[ 0.316223] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[ 0.316317] TCP: Hash tables configured (established 65536 bind 65536)
[ 0.316349] UDP hash table entries: 4096 (order: 5, 131072 bytes)
[ 0.316364] UDP-Lite hash table entries: 4096 (order: 5, 131072 bytes)
[ 0.316408] NET: Registered protocol family 1
[ 0.316445] pci 0000:00:01.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
[ 0.316472] PCI: CLS 0 bytes, default 64
[ 0.316504] Trying to unpack rootfs image as initramfs...
[ 1.807053] Freeing initrd memory: 160236K
[ 1.807076] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[ 1.807077] software IO TLB: mapped [mem 0x7bfde000-0x7ffde000] (64MB)
[ 1.807295] RAPL PMU: API unit is 2^-32 Joules, 4 fixed counters, 10737418240 ms ovfl timer
[ 1.807296] RAPL PMU: hw unit of domain pp0-core 2^-0 Joules
[ 1.807296] RAPL PMU: hw unit of domain package 2^-0 Joules
[ 1.807297] RAPL PMU: hw unit of domain dram 2^-0 Joules
[ 1.807297] RAPL PMU: hw unit of domain pp1-gpu 2^-0 Joules
[ 1.807819] workingset: timestamp_bits=40 max_order=21 bucket_order=0
[ 1.807988] Key type asymmetric registered
[ 1.807990] Asymmetric key parser 'x509' registered
[ 1.807995] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 249)
[ 1.807996] io scheduler mq-deadline registered
[ 1.807996] io scheduler kyber registered
[ 1.808138] intel_idle: Please enable MWAIT in BIOS SETUP
[ 1.808316] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[ 1.830738] 00:04: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[ 1.831156] Linux agpgart interface v0.103
[ 1.831488] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[ 1.833069] serio: i8042 KBD port at 0x60,0x64 irq 1
[ 1.833073] serio: i8042 AUX port at 0x60,0x64 irq 12
[ 1.833231] mousedev: PS/2 mouse device common for all mice
[ 1.833306] rtc_cmos 00:00: RTC can wake from S4
[ 1.833979] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
[ 1.834109] rtc_cmos 00:00: registered as rtc0
[ 1.834124] rtc_cmos 00:00: alarms up to one day, y3k, 114 bytes nvram, hpet irqs
[ 1.834130] intel_pstate: CPU model not supported
[ 1.834158] drop_monitor: Initializing network drop monitor service
[ 1.834231] NET: Registered protocol family 10
[ 1.834423] Segment Routing with IPv6
[ 1.834435] mip6: Mobile IPv6
[ 1.834436] NET: Registered protocol family 17
[ 1.834445] Key type dns_resolver registered
[ 1.834652] mce: Using 10 MCE banks
[ 1.834666] sched_clock: Marking stable (1821634086, 12978712)->(1842031235, -7418437)
[ 1.834795] registered taskstats version 1
[ 1.835220] rtc_cmos 00:00: setting system clock to 2019-09-03T12:38:15 UTC (1567514295)
[ 1.835531] Freeing unused kernel image memory: 1148K
[ 1.849072] Write protecting the kernel read-only data: 14336k
[ 1.849927] Freeing unused kernel image memory: 2028K
[ 1.850647] Freeing unused kernel image memory: 2008K
[ 1.850679] Run /init as init process
[ 1.858108] udevd[91]: starting version 175
[ 1.874277] SCSI subsystem initialized
[ 1.882016] libata version 3.00 loaded.
[ 1.882917] ahci 0000:00:1f.2: version 3.0
[ 1.883133] PCI Interrupt Link [GSIA] enabled at IRQ 16
[ 1.883483] ahci 0000:00:1f.2: AHCI 0001.0000 32 slots 6 ports 1.5 Gbps 0x3f impl SATA mode
[ 1.883485] ahci 0000:00:1f.2: flags: 64bit ncq only
[ 1.890825] scsi host0: ahci
[ 1.891020] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[ 1.891020] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[ 1.891330] PCI Interrupt Link [GSIG] enabled at IRQ 22
[ 1.891870] e1000e 0000:00:02.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[ 1.895622] scsi host1: ahci
[ 1.896917] scsi host2: ahci
[ 1.897483] scsi host3: ahci
[ 1.899133] scsi host4: ahci
[ 1.901667] scsi host5: ahci
[ 1.901762] ata1: SATA max UDMA/133 abar m4096@0xfc096000 port 0xfc096100 irq 24
[ 1.901769] ata2: SATA max UDMA/133 abar m4096@0xfc096000 port 0xfc096180 irq 24
[ 1.901774] ata3: SATA max UDMA/133 abar m4096@0xfc096000 port 0xfc096200 irq 24
[ 1.901778] ata4: SATA max UDMA/133 abar m4096@0xfc096000 port 0xfc096280 irq 24
[ 1.901782] ata5: SATA max UDMA/133 abar m4096@0xfc096000 port 0xfc096300 irq 24
[ 1.901785] ata6: SATA max UDMA/133 abar m4096@0xfc096000 port 0xfc096380 irq 24
[ 1.957727] e1000e 0000:00:02.0 0000:00:02.0 (uninitialized): registered PHC clock
[ 2.020118] e1000e 0000:00:02.0 eth0: (PCI Express:2.5GT/s:Width x1) 52:54:00:12:34:56
[ 2.020119] e1000e 0000:00:02.0 eth0: Intel(R) PRO/1000 Network Connection
[ 2.020138] e1000e 0000:00:02.0 eth0: MAC: 3, PHY: 8, PBA No: 000000-000
[ 2.216698] ata6: SATA link down (SStatus 0 SControl 300)
[ 2.216796] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 2.216943] ata5: SATA link down (SStatus 0 SControl 300)
[ 2.217049] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 2.217184] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 2.217303] ata4: SATA link down (SStatus 0 SControl 300)
[ 2.217340] ata1.00: ATA-7: QEMU HARDDISK, 2.5+, max UDMA/100
[ 2.217341] ata1.00: 31457280 sectors, multi 16: LBA48 NCQ (depth 32)
[ 2.217343] ata1.00: applying bridge limits
[ 2.217392] ata3.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100
[ 2.217394] ata3.00: applying bridge limits
[ 2.217434] ata2.00: ATA-7: QEMU HARDDISK, 2.5+, max UDMA/100
[ 2.217434] ata2.00: 104857600 sectors, multi 16: LBA48 NCQ (depth 32)
[ 2.217436] ata2.00: applying bridge limits
[ 2.217576] ata2.00: configured for UDMA/100
[ 2.217625] ata1.00: configured for UDMA/100
[ 2.217638] ata3.00: configured for UDMA/100
[ 2.217817] scsi 0:0:0:0: Direct-Access ATA QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
[ 2.218168] scsi 1:0:0:0: Direct-Access ATA QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
[ 2.218434] scsi 2:0:0:0: CD-ROM QEMU QEMU DVD-ROM 2.5+ PQ: 0 ANSI: 5
[ 2.221788] sd 0:0:0:0: [sda] 31457280 512-byte logical blocks: (16.1 GB/15.0 GiB)
[ 2.221793] sd 0:0:0:0: [sda] Write Protect is off
[ 2.221793] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 2.221798] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 2.222009] sd 1:0:0:0: [sdb] 104857600 512-byte logical blocks: (53.7 GB/50.0 GiB)
[ 2.222013] sd 1:0:0:0: [sdb] Write Protect is off
[ 2.222014] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 2.222019] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 2.222384] sda: sda1 sda2 < sda5 >
[ 2.222422] sdb: sdb1
[ 2.222741] sd 1:0:0:0: [sdb] Attached SCSI disk
[ 2.222838] sd 0:0:0:0: [sda] Attached SCSI disk
[ 2.223977] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 2.224010] sd 1:0:0:0: Attached scsi generic sg1 type 0
[ 2.224037] sr 2:0:0:0: Attached scsi generic sg2 type 5
[ 2.241692] random: fast init done
[ 2.257275] sr 2:0:0:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray
[ 2.257277] cdrom: Uniform CD-ROM driver Revision: 3.20
[ 2.257450] sr 2:0:0:0: Attached scsi CD-ROM sr0
[ 2.484802] PM: Image not found (code -22)
[ 2.500786] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[ 2.809075] tsc: Refined TSC clocksource calibration: 4000.021 MHz
[ 2.809088] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x39a87016a1f, max_idle_ns: 440795206381 ns
[ 2.809128] clocksource: Switched to clocksource tsc
[ 3.598106] udevd[437]: starting version 175
[ 4.166877] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
[ 4.167943] parport_pc 00:03: reported by Plug and Play ACPI
[ 4.168032] parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
[ 4.188464] input: PC Speaker as /devices/platform/pcspkr/input/input2
[ 4.189832] lpc_ich 0000:00:1f.0: I/O space for GPIO uninitialized
[ 4.207278] iTCO_vendor_support: vendor-support=0
[ 4.207682] iTCO_wdt: Intel TCO WatchDog Timer Driver v1.11
[ 4.207723] iTCO_wdt: Found a ICH9 TCO device (Version=2, TCOBASE=0x0660)
[ 4.207852] iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)
[ 4.241841] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input4
[ 4.241919] ACPI: Power Button [PWRF]
[ 4.448579] random: crng init done
[ 5.045981] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3
[ 5.649077] Adding 688124k swap on /dev/sda5. Priority:-2 extents:1 across:688124k
[ 5.657555] EXT4-fs (sda1): re-mounted. Opts: (null)
[ 5.821773] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro
[ 5.931729] loop: module loaded
[ 6.589006] RPC: Registered named UNIX socket transport module.
[ 6.589007] RPC: Registered udp transport module.
[ 6.589007] RPC: Registered tcp transport module.
[ 6.589008] RPC: Registered tcp NFSv4.1 backchannel transport module.
[ 6.592778] FS-Cache: Loaded
[ 6.604859] FS-Cache: Netfs 'nfs' registered for caching
[ 6.614700] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
[ 9.281335] Bluetooth: Core ver 2.22
[ 9.281351] NET: Registered protocol family 31
[ 9.281352] Bluetooth: HCI device and connection manager initialized
[ 9.281354] Bluetooth: HCI socket layer initialized
[ 9.281355] Bluetooth: L2CAP socket layer initialized
[ 9.281357] Bluetooth: SCO socket layer initialized
[ 9.283691] Bluetooth: RFCOMM TTY layer initialized
[ 9.283695] Bluetooth: RFCOMM socket layer initialized
[ 9.283699] Bluetooth: RFCOMM ver 1.11
[ 9.284992] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[ 9.284993] Bluetooth: BNEP filters: protocol multicast
[ 9.284995] Bluetooth: BNEP socket layer initialized
[ 12.058154] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 12.058575] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 96.893773] perf: interrupt took too long (2513 > 2500), lowering kernel.perf_event_max_sample_rate to 79500
[ 99.136902] stat invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=0
[ 99.136904] CPU: 2 PID: 20281 Comm: stat Not tainted 5.2.11+ #5
[ 99.136905] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-1.fc28 04/01/2014
[ 99.136905] Call Trace:
[ 99.136925] dump_stack+0x4d/0x64
[ 99.136934] dump_header+0x54/0x2f5
[ 99.136937] ? do_raw_spin_trylock+0x1f/0x28
[ 99.136938] ? ___ratelimit+0xc3/0xe4
[ 99.136939] ? task_will_free_mem+0x25/0xa0
[ 99.136940] oom_kill_process+0x7a/0xec
[ 99.136942] out_of_memory+0x3dd/0x3f8
[ 99.136943] ? __mutex_trylock_or_owner+0x4b/0x63
[ 99.136944] pagefault_out_of_memory+0x3c/0x4b
[ 99.136947] mm_fault_error+0x66/0x150
[ 99.136948] do_user_addr_fault+0x29f/0x3a4
[ 99.136954] ? fpregs_assert_state_consistent+0x16/0x43
[ 99.136955] __do_page_fault+0x44/0x46
[ 99.136955] do_page_fault+0x9c/0xdf
[ 99.136957] ? page_fault+0x8/0x30
[ 99.136958] page_fault+0x1e/0x30
[ 99.136959] RIP: 0033:0x7f6c27463f84
[ 99.136960] Code: 10 4d 89 4b 18 5b 5d c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec 08 48 8b 87 98 02 00 00 48 85 c0 74 5f 48 8b 40 08 <8b> 08 89 8f ec 02 00 00 8b 50 08 44 8b 40 04 8d 72 ff 85 d6 75 72
[ 99.136961] RSP: 002b:00007ffc57285840 EFLAGS: 00010206
[ 99.136962] RAX: 00007f6c26473280 RBX: 00000000026161b0 RCX: 00007f6c274711d7
[ 99.136962] RDX: 00007f6c2667de48 RSI: 0000000000000030 RDI: 00000000026161b0
[ 99.136962] RBP: 00007ffc572859b0 R08: 0000000070000029 R09: 000000006ffffdff
[ 99.136964] R10: 000000006ffffeff R11: 0000000000000246 R12: 00007ffc57285a98
[ 99.136964] R13: 000000006fffff48 R14: 00007ffc57285730 R15: 00007ffc572856d0
[ 99.136965] Mem-Info:
[ 99.136968] active_anon:24008 inactive_anon:391 isolated_anon:0
[ 99.136968] active_file:22836 inactive_file:33264 isolated_file:0
[ 99.136968] unevictable:0 dirty:13 writeback:0 unstable:0
[ 99.136968] slab_reclaimable:28549 slab_unreclaimable:4068
[ 99.136968] mapped:16364 shmem:498 pagetables:2551 bounce:0
[ 99.136968] free:1918966 free_pcp:1781 free_cma:0
[ 99.136970] Node 0 active_anon:96032kB inactive_anon:1564kB active_file:91344kB inactive_file:133056kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:65456kB dirty:52kB writeback:0kB shmem:1992kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 99.136970] Node 0 DMA free:15908kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 99.136972] lowmem_reserve[]: 0 1793 7808 7808
[ 99.136973] Node 0 DMA32 free:1998916kB min:15488kB low:19360kB high:23232kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:2080632kB managed:2001812kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:2896kB local_pcp:0kB free_cma:0kB
[ 99.136975] lowmem_reserve[]: 0 0 6014 6014
[ 99.136976] Node 0 Normal free:5661040kB min:51956kB low:64944kB high:77932kB active_anon:96032kB inactive_anon:1564kB active_file:91344kB inactive_file:133056kB unevictable:0kB writepending:52kB present:6291456kB managed:6159060kB mlocked:0kB kernel_stack:3920kB pagetables:10204kB bounce:0kB free_pcp:4220kB local_pcp:468kB free_cma:0kB
[ 99.136978] lowmem_reserve[]: 0 0 0 0
[ 99.136978] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB
[ 99.136982] Node 0 DMA32: 5*4kB (M) 6*8kB (M) 4*16kB (M) 4*32kB (M) 5*64kB (M) 6*128kB (M) 5*256kB (M) 5*512kB (M) 3*1024kB (M) 2*2048kB (M) 485*4096kB (M) = 1998916kB
[ 99.136985] Node 0 Normal: 19*4kB (ME) 11*8kB (ME) 35*16kB (UME) 11*32kB (UME) 4*64kB (UE) 2*128kB (UE) 1*256kB (E) 1*512kB (E) 8*1024kB (UME) 5*2048kB (UME) 1377*4096kB (UM) = 5660980kB
[ 99.136989] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[ 99.136989] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 99.136990] 56617 total pagecache pages
[ 99.136990] 0 pages in swap cache
[ 99.136991] Swap cache stats: add 0, delete 0, find 0/0
[ 99.136991] Free swap = 688124kB
[ 99.136992] Total swap = 688124kB
[ 99.136992] 2097020 pages RAM
[ 99.136992] 0 pages HighMem/MovableOnly
[ 99.136992] 52825 pages reserved
[ 99.136993] 0 pages hwpoisoned
[ 99.136993] Tasks state (memory values in pages):
[ 99.136993] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[ 99.136995] [ 437] 0 437 5428 690 90112 0 -1000 udevd
[ 99.136996] [ 565] 0 565 5427 614 86016 0 -1000 udevd
[ 99.136997] [ 567] 0 567 5427 615 86016 0 -1000 udevd
[ 99.136998] [ 1800] 0 1800 4747 513 86016 0 0 rpcbind
[ 99.136999] [ 1833] 105 1833 5840 617 90112 0 0 rpc.statd
[ 99.137000] [ 1849] 0 1849 6328 55 94208 0 0 rpc.idmapd
[ 99.137000] [ 2193] 0 2193 13198 690 110592 0 0 rsyslogd
[ 99.137001] [ 2260] 7 2260 3147 357 65536 0 0 lpd
[ 99.137002] [ 2320] 0 2320 4172 38 73728 0 0 atd
[ 99.137003] [ 2566] 0 2566 5106 538 86016 0 0 cron
[ 99.137005] [ 2584] 0 2584 1033 395 57344 0 0 acpid
[ 99.137005] [ 2611] 104 2611 12736 758 143360 0 0 exim4
[ 99.137006] [ 2632] 101 2632 7567 659 106496 0 0 dbus-daemon
[ 99.137007] [ 2682] 0 2682 35396 1326 167936 0 0 lightdm
[ 99.137008] [ 2700] 0 2700 5255 680 81920 0 0 bluetoothd
[ 99.137009] [ 2752] 0 2752 41044 1935 221184 0 0 NetworkManager
[ 99.137009] [ 2782] 0 2782 47228 9503 368640 0 0 Xorg
[ 99.137011] [ 2787] 0 2787 33036 1542 159744 0 0 polkitd
[ 99.137024] [ 2814] 0 2814 4068 464 77824 0 0 getty
[ 99.137025] [ 2815] 0 2815 4068 462 77824 0 0 getty
[ 99.137026] [ 2816] 0 2816 4068 488 73728 0 0 getty
[ 99.137027] [ 2817] 0 2817 4068 486 77824 0 0 getty
[ 99.137027] [ 2818] 0 2818 4068 485 77824 0 0 getty
[ 99.137028] [ 2819] 0 2819 4068 489 69632 0 0 getty
[ 99.137029] [ 2823] 0 2823 20217 1297 200704 0 0 modem-manager
[ 99.137030] [ 2842] 0 2842 31895 1439 151552 0 0 console-kit-dae
[ 99.137030] [ 2916] 0 2916 2494 1229 69632 0 0 dhclient
[ 99.137031] [ 2922] 0 2922 39491 1693 192512 0 0 upowerd
[ 99.137032] [ 3016] 0 3016 39659 1450 229376 0 0 lightdm
[ 99.137032] [ 3096] 0 3096 17323 732 159744 0 0 gnome-keyring-d
[ 99.137033] [ 3107] 0 3107 1049 358 53248 0 0 sh
[ 99.137034] [ 3123] 0 3123 3122 83 65536 0 0 ssh-agent
[ 99.137035] [ 3126] 0 3126 6051 504 90112 0 0 dbus-launch
[ 99.137035] [ 3127] 0 3127 7554 609 102400 0 0 dbus-daemon
[ 99.137036] [ 3135] 0 3135 12370 1172 143360 0 0 xfconfd
[ 99.137037] [ 3140] 0 3140 38776 2956 323584 0 0 xfce4-session
[ 99.137038] [ 3145] 0 3145 37828 3832 344064 0 0 xfwm4
[ 99.137038] [ 3147] 0 3147 31422 2231 286720 0 0 xfsettingsd
[ 99.137039] [ 3148] 0 3148 39021 3116 360448 0 0 Thunar
[ 99.137040] [ 3150] 0 3150 15479 1242 159744 0 0 gvfsd
[ 99.137040] [ 3153] 0 3153 72257 5559 466944 0 0 xfce4-panel
[ 99.137041] [ 3154] 0 3154 91757 6682 487424 0 0 xfdesktop
[ 99.137042] [ 3157] 0 3157 38106 1856 323584 0 0 xfce4-settings-
[ 99.137043] [ 3158] 0 3158 53548 2101 315392 0 0 xfce4-power-man
[ 99.137044] [ 3165] 0 3165 46647 2774 266240 0 0 polkit-gnome-au
[ 99.137044] [ 3167] 0 3167 17743 1675 180224 0 0 gvfs-gdu-volume
[ 99.137045] [ 3170] 0 3170 30373 1444 147456 0 0 udisks-daemon
[ 99.137046] [ 3172] 0 3172 120880 5112 434176 0 0 nm-applet
[ 99.137046] [ 3173] 0 3173 11857 705 131072 0 0 udisks-daemon
[ 99.137047] [ 3176] 0 3176 15095 1259 167936 0 0 gvfs-gphoto2-vo
[ 99.137048] [ 3179] 0 3179 35081 1389 303104 0 0 xfce4-power-man
[ 99.137050] [ 3181] 0 3181 58995 8092 507904 0 0 system-config-p
[ 99.137051] [ 3182] 0 3182 57804 3624 356352 0 0 xfce4-volumed
[ 99.137051] [ 3184] 0 3184 64742 5974 471040 0 0 xfce4-terminal
[ 99.137052] [ 3187] 0 3187 34538 2410 311296 0 0 xfce4-notifyd
[ 99.137053] [ 3189] 0 3189 13290 1180 143360 0 0 gconfd-2
[ 99.137054] [ 3192] 0 3192 19734 864 176128 0 0 gvfs-afc-volume
[ 99.137054] [ 3195] 0 3195 16606 1567 176128 0 0 gvfsd-trash
[ 99.137055] [ 3199] 0 3199 36763 3111 331776 0 0 panel-6-systray
[ 99.137056] [ 3201] 0 3201 3642 419 73728 0 0 gnome-pty-helpe
[ 99.137056] [ 3202] 0 3202 4869 880 81920 0 0 bash
[ 99.137057] [ 3206] 0 3206 4869 897 77824 0 0 bash
[ 99.137058] [ 3207] 0 3207 3655 668 73728 0 0 watch
[ 99.137058] [ 3223] 0 3223 3183 615 69632 0 0 find
[ 99.137059] [ 3224] 0 3224 1473 437 57344 0 0 xargs
[ 99.137060] [ 20281] 0 20281 4602 561 77824 0 0 stat
[ 99.137060] [ 20282] 0 20282 4078 583 69632 0 0 stat
[ 99.137061] [ 20283] 0 20283 3019 526 65536 0 0 stat
[ 99.137062] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=Xorg,pid=2782,uid=0
[ 99.137077] Out of memory: Killed process 2782 (Xorg) total-vm:188912kB, anon-rss:24884kB, file-rss:11664kB, shmem-rss:1464kB
[ 99.137873] oom_reaper: reaped process 2782 (Xorg), now anon-rss:0kB, file-rss:0kB, shmem-rss:1468kB
[ 192.530507] perf: interrupt took too long (3155 > 3141), lowering kernel.perf_event_max_sample_rate to 63250
All I'm doing is running
"find / -xdev -type f -print0 | xargs -0 -n 1 -P 8 stat > /dev/null"
inside the memory cgroup. Find, xargs and stat only use a tiny amount of ram
by themselves so most of the ram usage in the cgroup is ext4 inode cache.
That should never trigger the OOM killer (outside or inside the cgroup).
Instead old cache data should be evicted.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] Early OOM and kernel NULL pointer dereference in 4.19.69
2019-09-03 7:41 ` Michal Hocko
2019-09-03 12:01 ` Thomas Lindroth
@ 2019-09-03 12:05 ` Andrey Ryabinin
2019-09-03 12:22 ` Michal Hocko
1 sibling, 1 reply; 29+ messages in thread
From: Andrey Ryabinin @ 2019-09-03 12:05 UTC (permalink / raw)
To: Michal Hocko, Thomas Lindroth; +Cc: linux-mm, stable
On 9/3/19 10:41 AM, Michal Hocko wrote:
> On Mon 02-09-19 21:34:29, Thomas Lindroth wrote:
>> On 9/2/19 9:16 AM, Michal Hocko wrote:
>>> On Sun 01-09-19 22:43:05, Thomas Lindroth wrote:
>>>> After upgrading to the 4.19 series I've started getting problems with
>>>> early OOM.
>>>
>>> What is the kenrel you have updated from? Would it be possible to try
>>> the current Linus' tree?
>>
>> I did some more testing and it turns out this is not a regression after all.
>>
>> I followed up on my hunch and monitored memory.kmem.max_usage_in_bytes while
>> running cgexec -g memory:12G bash -c 'find / -xdev -type f -print0 | \
>> xargs -0 -n 1 -P 8 stat > /dev/null'
>>
>> Just as memory.kmem.max_usage_in_bytes = memory.kmem.limit_in_bytes the OOM
>> killer kicked in and killed my X server.
>>
>> Using the find|stat approach it was easy to test the problem in a testing VM.
>> I was able to reproduce the problem in all these kernels:
>> 4.9.0
>> 4.14.0
>> 4.14.115
>> 4.19.0
>> 5.2.11
>>
>> 5.3-rc6 didn't build in the VM. The build environment is too old probably.
>>
>> I was curious why I initially couldn't reproduce the problem in 4.14 by
>> building chromium. I was again able to successfully build chromium using
>> 4.14.115. Turns out memory.kmem.max_usage_in_bytes was 1015689216 after
>> building and my limit is set to 1073741824. I guess some unrelated change in
>> memory management raised that slightly for 4.19 triggering the problem.
>>
>> If you want to reproduce for yourself here are the steps:
>> 1. build any kernel above 4.9 using something like my .config
>> 2. setup a v1 memory cgroup with memory.kmem.limit_in_bytes lower than
>> memory.limit_in_bytes. I used 100M in my testing VM.
>> 3. Run "find / -xdev -type f -print0 | xargs -0 -n 1 -P 8 stat > /dev/null"
>> in the cgroup.
>> 4. Assuming there is enough inodes on the rootfs the global OOM killer
>> should kick in when memory.kmem.max_usage_in_bytes =
>> memory.kmem.limit_in_bytes and kill something outside the cgroup.
>
> This is certainly a bug. Is this still an OOM triggered from
> pagefault_out_of_memory? Since 4.19 (29ef680ae7c21) the memcg charge
> path should invoke the memcg oom killer directly from the charge path.
> If that doesn't happen then the failing charge is either GFP_NOFS or a
> large allocation.
>
> The former has been fixed just recently by http://lkml.kernel.org/r/cbe54ed1-b6ba-a056-8899-2dc42526371d@i-love.sakura.ne.jp
> and I suspect this is a fix you are looking for. Although it is curious
> that you can see a global oom even before because the charge path would
> mark an oom situation even for NOFS context and it should trigger the
> memcg oom killer on the way out from the page fault path. So essentially
> the same call trace except the oom killer should be constrained to the
> memcg context.
>
> Could you try the above patch please?
>
It won't help. We hitting ->kmem limit here, not the ->memory or ->memsw, so try_charge() is successful and
only __memcg_kmem_charge_memcg() fails to charge ->kmem and returns -ENOMEM.
Limiting kmem just never worked and it doesn't work now. AFAIK this feature hasn't been finished because
there was no clear purpose/use case found. I remember that there was some discussion on lsfmm about this https://lwn.net/Articles/636331/
but I don't remember the discussion itself.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] Early OOM and kernel NULL pointer dereference in 4.19.69
2019-09-03 12:05 ` Andrey Ryabinin
@ 2019-09-03 12:22 ` Michal Hocko
2019-09-03 18:20 ` Thomas Lindroth
0 siblings, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2019-09-03 12:22 UTC (permalink / raw)
To: Andrey Ryabinin; +Cc: Thomas Lindroth, linux-mm, stable
On Tue 03-09-19 15:05:22, Andrey Ryabinin wrote:
>
>
> On 9/3/19 10:41 AM, Michal Hocko wrote:
> > On Mon 02-09-19 21:34:29, Thomas Lindroth wrote:
> >> On 9/2/19 9:16 AM, Michal Hocko wrote:
> >>> On Sun 01-09-19 22:43:05, Thomas Lindroth wrote:
> >>>> After upgrading to the 4.19 series I've started getting problems with
> >>>> early OOM.
> >>>
> >>> What is the kenrel you have updated from? Would it be possible to try
> >>> the current Linus' tree?
> >>
> >> I did some more testing and it turns out this is not a regression after all.
> >>
> >> I followed up on my hunch and monitored memory.kmem.max_usage_in_bytes while
> >> running cgexec -g memory:12G bash -c 'find / -xdev -type f -print0 | \
> >> xargs -0 -n 1 -P 8 stat > /dev/null'
> >>
> >> Just as memory.kmem.max_usage_in_bytes = memory.kmem.limit_in_bytes the OOM
> >> killer kicked in and killed my X server.
> >>
> >> Using the find|stat approach it was easy to test the problem in a testing VM.
> >> I was able to reproduce the problem in all these kernels:
> >> 4.9.0
> >> 4.14.0
> >> 4.14.115
> >> 4.19.0
> >> 5.2.11
> >>
> >> 5.3-rc6 didn't build in the VM. The build environment is too old probably.
> >>
> >> I was curious why I initially couldn't reproduce the problem in 4.14 by
> >> building chromium. I was again able to successfully build chromium using
> >> 4.14.115. Turns out memory.kmem.max_usage_in_bytes was 1015689216 after
> >> building and my limit is set to 1073741824. I guess some unrelated change in
> >> memory management raised that slightly for 4.19 triggering the problem.
> >>
> >> If you want to reproduce for yourself here are the steps:
> >> 1. build any kernel above 4.9 using something like my .config
> >> 2. setup a v1 memory cgroup with memory.kmem.limit_in_bytes lower than
> >> memory.limit_in_bytes. I used 100M in my testing VM.
> >> 3. Run "find / -xdev -type f -print0 | xargs -0 -n 1 -P 8 stat > /dev/null"
> >> in the cgroup.
> >> 4. Assuming there is enough inodes on the rootfs the global OOM killer
> >> should kick in when memory.kmem.max_usage_in_bytes =
> >> memory.kmem.limit_in_bytes and kill something outside the cgroup.
> >
> > This is certainly a bug. Is this still an OOM triggered from
> > pagefault_out_of_memory? Since 4.19 (29ef680ae7c21) the memcg charge
> > path should invoke the memcg oom killer directly from the charge path.
> > If that doesn't happen then the failing charge is either GFP_NOFS or a
> > large allocation.
> >
> > The former has been fixed just recently by http://lkml.kernel.org/r/cbe54ed1-b6ba-a056-8899-2dc42526371d@i-love.sakura.ne.jp
> > and I suspect this is a fix you are looking for. Although it is curious
> > that you can see a global oom even before because the charge path would
> > mark an oom situation even for NOFS context and it should trigger the
> > memcg oom killer on the way out from the page fault path. So essentially
> > the same call trace except the oom killer should be constrained to the
> > memcg context.
> >
> > Could you try the above patch please?
> >
>
> It won't help. We hitting ->kmem limit here, not the ->memory or ->memsw, so try_charge() is successful and
> only __memcg_kmem_charge_memcg() fails to charge ->kmem and returns -ENOMEM.
>
> Limiting kmem just never worked and it doesn't work now. AFAIK this feature hasn't been finished because
> there was no clear purpose/use case found. I remember that there was some discussion on lsfmm about this https://lwn.net/Articles/636331/
> but I don't remember the discussion itself.
Ohh, right you are. I completely forgot that __memcg_kmem_charge_memcg
doesn't really trigger the normal charge path but rather charge the
counter directly.
So you are right. The v1 kmem accounting is broken and probably
unfixable. Do not use it.
Thanks!
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] Early OOM and kernel NULL pointer dereference in 4.19.69
2019-09-03 12:22 ` Michal Hocko
@ 2019-09-03 18:20 ` Thomas Lindroth
2019-09-03 19:36 ` Michal Hocko
0 siblings, 1 reply; 29+ messages in thread
From: Thomas Lindroth @ 2019-09-03 18:20 UTC (permalink / raw)
To: Michal Hocko, Andrey Ryabinin; +Cc: linux-mm, stable
On 9/3/19 2:22 PM, Michal Hocko wrote:
> On Tue 03-09-19 15:05:22, Andrey Ryabinin wrote:
>>
>>
>> On 9/3/19 10:41 AM, Michal Hocko wrote:
>>> On Mon 02-09-19 21:34:29, Thomas Lindroth wrote:
>>>> On 9/2/19 9:16 AM, Michal Hocko wrote:
>>>>> On Sun 01-09-19 22:43:05, Thomas Lindroth wrote:
>>>>>> After upgrading to the 4.19 series I've started getting problems with
>>>>>> early OOM.
>>>>>
>>>>> What is the kenrel you have updated from? Would it be possible to try
>>>>> the current Linus' tree?
>>>>
>>>> I did some more testing and it turns out this is not a regression after all.
>>>>
>>>> I followed up on my hunch and monitored memory.kmem.max_usage_in_bytes while
>>>> running cgexec -g memory:12G bash -c 'find / -xdev -type f -print0 | \
>>>> xargs -0 -n 1 -P 8 stat > /dev/null'
>>>>
>>>> Just as memory.kmem.max_usage_in_bytes = memory.kmem.limit_in_bytes the OOM
>>>> killer kicked in and killed my X server.
>>>>
>>>> Using the find|stat approach it was easy to test the problem in a testing VM.
>>>> I was able to reproduce the problem in all these kernels:
>>>> 4.9.0
>>>> 4.14.0
>>>> 4.14.115
>>>> 4.19.0
>>>> 5.2.11
>>>>
>>>> 5.3-rc6 didn't build in the VM. The build environment is too old probably.
>>>>
>>>> I was curious why I initially couldn't reproduce the problem in 4.14 by
>>>> building chromium. I was again able to successfully build chromium using
>>>> 4.14.115. Turns out memory.kmem.max_usage_in_bytes was 1015689216 after
>>>> building and my limit is set to 1073741824. I guess some unrelated change in
>>>> memory management raised that slightly for 4.19 triggering the problem.
>>>>
>>>> If you want to reproduce for yourself here are the steps:
>>>> 1. build any kernel above 4.9 using something like my .config
>>>> 2. setup a v1 memory cgroup with memory.kmem.limit_in_bytes lower than
>>>> memory.limit_in_bytes. I used 100M in my testing VM.
>>>> 3. Run "find / -xdev -type f -print0 | xargs -0 -n 1 -P 8 stat > /dev/null"
>>>> in the cgroup.
>>>> 4. Assuming there is enough inodes on the rootfs the global OOM killer
>>>> should kick in when memory.kmem.max_usage_in_bytes =
>>>> memory.kmem.limit_in_bytes and kill something outside the cgroup.
>>>
>>> This is certainly a bug. Is this still an OOM triggered from
>>> pagefault_out_of_memory? Since 4.19 (29ef680ae7c21) the memcg charge
>>> path should invoke the memcg oom killer directly from the charge path.
>>> If that doesn't happen then the failing charge is either GFP_NOFS or a
>>> large allocation.
>>>
>>> The former has been fixed just recently by http://lkml.kernel.org/r/cbe54ed1-b6ba-a056-8899-2dc42526371d@i-love.sakura.ne.jp
>>> and I suspect this is a fix you are looking for. Although it is curious
>>> that you can see a global oom even before because the charge path would
>>> mark an oom situation even for NOFS context and it should trigger the
>>> memcg oom killer on the way out from the page fault path. So essentially
>>> the same call trace except the oom killer should be constrained to the
>>> memcg context.
>>>
>>> Could you try the above patch please?
>>>
>>
>> It won't help. We hitting ->kmem limit here, not the ->memory or ->memsw, so try_charge() is successful and
>> only __memcg_kmem_charge_memcg() fails to charge ->kmem and returns -ENOMEM.
>>
>> Limiting kmem just never worked and it doesn't work now. AFAIK this feature hasn't been finished because
>> there was no clear purpose/use case found. I remember that there was some discussion on lsfmm about this https://lwn.net/Articles/636331/
>> but I don't remember the discussion itself.
>
> Ohh, right you are. I completely forgot that __memcg_kmem_charge_memcg
> doesn't really trigger the normal charge path but rather charge the
> counter directly.
>
> So you are right. The v1 kmem accounting is broken and probably
> unfixable. Do not use it.
I don't know why I setup a kmem limit. I think the documentation I followed
when setting up the cgroup said that kmem is counted separately from the
regular memory limit so if you want to limit total memory you have to limit
both. That's what I did.
If kmem accounting is both broken, unfixable and cause kernel crashes when
used why not remove it? Or perhaps disable it per default like
cgroup.memory=nokmem or at least print a warning to dmesg if the user tries
to user it in a way that cause crashes?
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] Early OOM and kernel NULL pointer dereference in 4.19.69
2019-09-03 18:20 ` Thomas Lindroth
@ 2019-09-03 19:36 ` Michal Hocko
0 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2019-09-03 19:36 UTC (permalink / raw)
To: Thomas Lindroth; +Cc: Andrey Ryabinin, linux-mm, stable
On Tue 03-09-19 20:20:20, Thomas Lindroth wrote:
[...]
> If kmem accounting is both broken, unfixable and cause kernel crashes when
> used why not remove it? Or perhaps disable it per default like
> cgroup.memory=nokmem or at least print a warning to dmesg if the user tries
> to user it in a way that cause crashes?
Well, cgroup v1 interfaces and implementation is mostly frozen and users
are advised to use v2 interface that doesn't suffer from this problem
because there is no separate kmem limit and both user and kernel charges
are tight to the same counter.
We can be more explicit about shortcomings in the documentation but in
general v1 is deprecated.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <666dbcde-1b8a-9e2d-7d1f-48a117c78ae1@I-love.SAKURA.ne.jp>]
* Re: [BUG] Early OOM and kernel NULL pointer dereference in 4.19.69
[not found] ` <666dbcde-1b8a-9e2d-7d1f-48a117c78ae1@I-love.SAKURA.ne.jp>
@ 2019-09-03 18:25 ` Thomas Lindroth
[not found] ` <4d0eda9a-319d-1a7d-1eed-71da90902367@i-love.sakura.ne.jp>
0 siblings, 1 reply; 29+ messages in thread
From: Thomas Lindroth @ 2019-09-03 18:25 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: linux-mm
On 9/3/19 3:33 PM, Tetsuo Handa wrote:
> On 2019/09/02 5:43, Thomas Lindroth wrote:
>> Those kernel memory allocation failures can also cause kernel NULL pointer
>> dereference. Here is a dmesg captured over netconsole when that happens:
>
> Can you establish steps to reproduce this crash?
> Since it seems that __GFP_NOFAIL allocation is failing for some reason, we should fix it.
I have no reliable way to reproduce the crash. I just setup a v1 memory cgroup
with memory.kmem.limit_in_bytes < memory.limit_in_bytes then run something that
allocates SLUB memory and deplete the kmem limit. Usually the OOM killer is
triggered when the kmem limit is hit but sometimes I get warnings like
"SLUB: Unable to allocate memory on node -1" and kernel null pointer
dereference.
Running "find / -xdev -type f -print0 | xargs -0 -n 1 -P 8 stat > /dev/null"
in the cgroup is an easy way to allocate ext4_inode_cache and deplete the kmem
limit but I never got any null pointer deref that way. Building the chromium
browser in the cgroup can also trigger the kmem limit and will sometimes cause
null pointer deref.
Here is another null pointer deref I got while building chromium in the cgroup.
4,1180,556857645,-;SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
4,1181,556857652,-; cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
4,1182,556857654,-; node 0: slabs: 17997, objs: 557851, free: 0
4,1183,556857675,-;SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
4,1184,556857677,-; cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
4,1185,556857679,-; node 0: slabs: 17997, objs: 557851, free: 0
4,1186,556857955,-;SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
4,1187,556857957,-; cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
4,1188,556857959,-; node 0: slabs: 18003, objs: 557869, free: 0
4,1189,556857974,-;SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
4,1190,556857976,-; cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
4,1191,556857979,-; node 0: slabs: 18003, objs: 557869, free: 0
4,1192,556857989,-;SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
4,1193,556857992,-; cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
4,1194,556857994,-; node 0: slabs: 18003, objs: 557869, free: 0
4,1195,556858518,-;SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
4,1196,556858522,-; cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
4,1197,556858523,-; node 0: slabs: 18003, objs: 557869, free: 0
4,1198,556858535,-;SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
4,1199,556858537,-; cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
4,1200,556858538,-; node 0: slabs: 18003, objs: 557869, free: 0
4,1201,556858545,-;SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
4,1202,556858547,-; cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
4,1203,556858548,-; node 0: slabs: 18003, objs: 557869, free: 0
4,1204,556858554,-;SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
4,1205,556858556,-; cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
4,1206,556858558,-; node 0: slabs: 18003, objs: 557869, free: 0
4,1207,556858748,-;SLUB: Unable to allocate memory on node -1, gfp=0x600040(GFP_NOFS)
4,1208,556858751,-; cache: ext4_inode_cache(100:12G), object size: 1024, buffer size: 1032, default order: 3, min order: 0
4,1209,556858753,-; node 0: slabs: 18003, objs: 557869, free: 0
1,1210,556861832,-;BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
6,1211,556861836,-;PGD 0
4,1212,556861837,c;P4D 0
4,1213,556861839,-;Oops: 0000 [#1] PREEMPT SMP PTI
4,1214,556861841,-;CPU: 7 PID: 12228 Comm: find Not tainted 4.19.69 #43
4,1215,556861842,-;Hardware name: Gigabyte Technology Co., Ltd. Z97X-Gaming G1/Z97X-Gaming G1, BIOS F9 07/31/2015
4,1216,556861846,-;RIP: 0010:__getblk_gfp+0x181/0x240
4,1217,556861848,-;Code: e8 e4 ee ff ff 48 89 04 24 49 8b 46 30 48 8d b8 80 00 00 00 e8 20 5e 67 00 48 8b 04 24 44 8b 4c 24 1c 48 89 c1 eb 03 48 89 d1 <48> 8b 51 08 48 85 d2 75 f4 48 89 41 08 49 8b 4f 08 48 8d 51 ff 83
4,1218,556861850,-;RSP: 0018:ffffaba441853be8 EFLAGS: 00010246
4,1219,556861851,-;RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
4,1220,556861853,-;RDX: 0000000000000001 RSI: 0000000000000082 RDI: ffff9824dd8943c8
4,1221,556861854,-;RBP: 0000000000000000 R08: ffffd552cd660e48 R09: 0000000000000000
4,1222,556861855,-;R10: 0000000000000000 R11: 0000000000000036 R12: ffff9824dd894100
4,1223,556861856,-;R13: 0000000001301775 R14: ffff9824dd8941d8 R15: ffffd552c84f1380
4,1224,556861858,-;FS: 00007fdd32a0cb80(0000) GS:ffff9824df9c0000(0000) knlGS:0000000000000000
4,1225,556861859,-;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
4,1226,556861861,-;CR2: 0000000000000008 CR3: 00000003614b6002 CR4: 00000000001606e0
4,1227,556861862,-;Call Trace:
4,1228,556861866,-; ext4_getblk+0x91/0x1a0
4,1229,556861868,-; ext4_bread+0x1e/0xa0
4,1230,556861871,-; ? tomoyo_path_perm+0xa3/0x200
4,1231,556861873,-; __ext4_read_dirblock+0x2c/0x2e0
4,1232,556861875,-; htree_dirblock_to_tree+0x6a/0x1e0
4,1233,556861877,-; ext4_htree_fill_tree+0xcd/0x2f0
4,1234,556861880,-; ? kmem_cache_alloc_trace+0x163/0x1c0
4,1235,556861882,-; ext4_readdir+0x472/0x870
4,1236,556861886,-; iterate_dir+0x138/0x180
4,1237,556861967,-; ksys_getdents64+0x9c/0x130
4,1238,556861969,-; ? iterate_dir+0x180/0x180
4,1239,556861972,-; __x64_sys_getdents64+0x16/0x20
4,1240,556861974,-; do_syscall_64+0x59/0x180
4,1241,556861977,-; entry_SYSCALL_64_after_hwframe+0x44/0xa9
4,1242,556861979,-;RIP: 0033:0x7fdd32adef3b
4,1243,556861981,-;Code: 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 83 ec 18 64 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 1d 48 8b 4c 24 08 64 48 33 0c 25 28 00 00 00
4,1244,556861982,-;RSP: 002b:00007ffdf210cc10 EFLAGS: 00000246
4,1245,556861984,c; ORIG_RAX: 00000000000000d9
4,1246,556861985,-;RAX: ffffffffffffffda RBX: 0000563985f7f110 RCX: 00007fdd32adef3b
4,1247,556861986,-;RDX: 0000000000008000 RSI: 0000563985f7f140 RDI: 0000000000000006
4,1248,556861987,-;RBP: 0000563985f7f140 R08: 0000563985f740a8 R09: 0000563985f768f0
4,1249,556861988,-;R10: 0000000000000100 R11: 0000000000000246 R12: ffffffffffffff80
4,1250,556861990,-;R13: 0000000000000000 R14: 0000563985f73c00 R15: 0000563985f74040
4,1251,556861991,-;Modules linked in:
4,1252,556861993,c; 8021q
4,1253,556861994,c; iptable_mangle
4,1254,556861996,c; xt_limit
4,1255,556861997,c; xt_conntrack
4,1256,556861998,c; iptable_filter
4,1257,556862000,c; iptable_nat
4,1258,556862001,c; nf_nat_ipv4
4,1259,556862002,c; nf_nat
4,1260,556862101,c; ip_tables
4,1261,556862102,c; arc4
4,1262,556862103,c; ath9k_htc
4,1263,556862104,c; ath9k_common
4,1264,556862105,c; ath9k_hw
4,1265,556862107,c; ath
4,1266,556862108,c; mac80211
4,1267,556862109,c; kvm_intel
4,1268,556862110,c; cfg80211
4,1269,556862111,c; kvm
4,1270,556862112,c; crc32_pclmul
4,1271,556862113,c; uas
4,1272,556862115,c; usb_storage
4,1273,556862116,c; cdc_acm
4,1274,556862117,c; joydev
4,1275,556862118,-;CR2: 0000000000000008
4,1276,556862120,-;---[ end trace b7a234b0d1e0ec38 ]---
4,1277,556862122,-;RIP: 0010:__getblk_gfp+0x181/0x240
4,1278,556862123,-;Code: e8 e4 ee ff ff 48 89 04 24 49 8b 46 30 48 8d b8 80 00 00 00 e8 20 5e 67 00 48 8b 04 24 44 8b 4c 24 1c 48 89 c1 eb 03 48 89 d1 <48> 8b 51 08 48 85 d2 75 f4 48 89 41 08 49 8b 4f 08 48 8d 51 ff 83
4,1279,556862125,-;RSP: 0018:ffffaba441853be8 EFLAGS: 00010246
4,1280,556862126,-;RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
4,1281,556862127,-;RDX: 0000000000000001 RSI: 0000000000000082 RDI: ffff9824dd8943c8
4,1282,556862129,-;RBP: 0000000000000000 R08: ffffd552cd660e48 R09: 0000000000000000
4,1283,556862130,-;R10: 0000000000000000 R11: 0000000000000036 R12: ffff9824dd894100
4,1284,556862131,-;R13: 0000000001301775 R14: ffff9824dd8941d8 R15: ffffd552c84f1380
4,1285,556862132,-;FS: 00007fdd32a0cb80(0000) GS:ffff9824df9c0000(0000) knlGS:0000000000000000
4,1286,556862134,-;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
4,1287,556862176,-;CR2: 0000000000000008 CR3: 00000003614b6002 CR4: 00000000001606e0
0,1288,556862178,-;Kernel panic - not syncing: Fatal exception
0,1289,556862184,-;Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
0,1290,556862186,-;---[ end Kernel panic - not syncing: Fatal exception ]---
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH] memcg, kmem: do not fail __GFP_NOFAIL charges
2019-09-01 20:43 [BUG] Early OOM and kernel NULL pointer dereference in 4.19.69 Thomas Lindroth
2019-09-02 7:16 ` Michal Hocko
[not found] ` <666dbcde-1b8a-9e2d-7d1f-48a117c78ae1@I-love.SAKURA.ne.jp>
@ 2019-09-06 12:56 ` Michal Hocko
2019-09-06 18:24 ` Shakeel Butt
2019-09-24 10:53 ` Michal Hocko
2 siblings, 2 replies; 29+ messages in thread
From: Michal Hocko @ 2019-09-06 12:56 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Vladimir Davydov, LKML, linux-mm,
Andrey Ryabinin, Michal Hocko, Thomas Lindroth, Tetsuo Handa
From: Michal Hocko <mhocko@suse.com>
Thomas has noticed the following NULL ptr dereference when using cgroup
v1 kmem limit:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
PGD 0
P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 16923 Comm: gtk-update-icon Not tainted 4.19.51 #42
Hardware name: Gigabyte Technology Co., Ltd. Z97X-Gaming G1/Z97X-Gaming G1, BIOS F9 07/31/2015
RIP: 0010:create_empty_buffers+0x24/0x100
Code: cd 0f 1f 44 00 00 0f 1f 44 00 00 41 54 49 89 d4 ba 01 00 00 00 55 53 48 89 fb e8 97 fe ff ff 48 89 c5 48 89 c2 eb 03 48 89 ca <48> 8b 4a 08 4c 09 22 48 85 c9 75 f1 48 89 6a 08 48 8b 43 18 48 8d
RSP: 0018:ffff927ac1b37bf8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: fffff2d4429fd740 RCX: 0000000100097149
RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff9075a99fbe00
RBP: 0000000000000000 R08: fffff2d440949cc8 R09: 00000000000960c0
R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000
R13: ffff907601f18360 R14: 0000000000002000 R15: 0000000000001000
FS: 00007fb55b288bc0(0000) GS:ffff90761f8c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000007aebc002 CR4: 00000000001606e0
Call Trace:
create_page_buffers+0x4d/0x60
__block_write_begin_int+0x8e/0x5a0
? ext4_inode_attach_jinode.part.82+0xb0/0xb0
? jbd2__journal_start+0xd7/0x1f0
ext4_da_write_begin+0x112/0x3d0
generic_perform_write+0xf1/0x1b0
? file_update_time+0x70/0x140
__generic_file_write_iter+0x141/0x1a0
ext4_file_write_iter+0xef/0x3b0
__vfs_write+0x17e/0x1e0
vfs_write+0xa5/0x1a0
ksys_write+0x57/0xd0
do_syscall_64+0x55/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Tetsuo then noticed that this is because the __memcg_kmem_charge_memcg
fails __GFP_NOFAIL charge when the kmem limit is reached. This is a
wrong behavior because nofail allocations are not allowed to fail.
Normal charge path simply forces the charge even if that means to cross
the limit. Kmem accounting should be doing the same.
Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com>
Debugged-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: stable
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
mm/memcontrol.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9ec5e12486a7..e18108b2b786 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2821,6 +2821,16 @@ int __memcg_kmem_charge_memcg(struct page *page, gfp_t gfp, int order,
if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) &&
!page_counter_try_charge(&memcg->kmem, nr_pages, &counter)) {
+
+ /*
+ * Enforce __GFP_NOFAIL allocation because callers are not
+ * prepared to see failures and likely do not have any failure
+ * handling code.
+ */
+ if (gfp & __GFP_NOFAIL) {
+ page_counter_charge(&memcg->kmem, nr_pages);
+ return 0;
+ }
cancel_charge(memcg, nr_pages);
return -ENOMEM;
}
--
2.20.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH] memcg, kmem: do not fail __GFP_NOFAIL charges
2019-09-06 12:56 ` [PATCH] memcg, kmem: do not fail __GFP_NOFAIL charges Michal Hocko
@ 2019-09-06 18:24 ` Shakeel Butt
2019-09-24 10:53 ` Michal Hocko
1 sibling, 0 replies; 29+ messages in thread
From: Shakeel Butt @ 2019-09-06 18:24 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, Johannes Weiner, Vladimir Davydov, LKML, Linux MM,
Andrey Ryabinin, Michal Hocko, Thomas Lindroth, Tetsuo Handa
On Fri, Sep 6, 2019 at 5:56 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> From: Michal Hocko <mhocko@suse.com>
>
> Thomas has noticed the following NULL ptr dereference when using cgroup
> v1 kmem limit:
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> PGD 0
> P4D 0
> Oops: 0000 [#1] PREEMPT SMP PTI
> CPU: 3 PID: 16923 Comm: gtk-update-icon Not tainted 4.19.51 #42
> Hardware name: Gigabyte Technology Co., Ltd. Z97X-Gaming G1/Z97X-Gaming G1, BIOS F9 07/31/2015
> RIP: 0010:create_empty_buffers+0x24/0x100
> Code: cd 0f 1f 44 00 00 0f 1f 44 00 00 41 54 49 89 d4 ba 01 00 00 00 55 53 48 89 fb e8 97 fe ff ff 48 89 c5 48 89 c2 eb 03 48 89 ca <48> 8b 4a 08 4c 09 22 48 85 c9 75 f1 48 89 6a 08 48 8b 43 18 48 8d
> RSP: 0018:ffff927ac1b37bf8 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: fffff2d4429fd740 RCX: 0000000100097149
> RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff9075a99fbe00
> RBP: 0000000000000000 R08: fffff2d440949cc8 R09: 00000000000960c0
> R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000
> R13: ffff907601f18360 R14: 0000000000002000 R15: 0000000000001000
> FS: 00007fb55b288bc0(0000) GS:ffff90761f8c0000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000008 CR3: 000000007aebc002 CR4: 00000000001606e0
> Call Trace:
> create_page_buffers+0x4d/0x60
> __block_write_begin_int+0x8e/0x5a0
> ? ext4_inode_attach_jinode.part.82+0xb0/0xb0
> ? jbd2__journal_start+0xd7/0x1f0
> ext4_da_write_begin+0x112/0x3d0
> generic_perform_write+0xf1/0x1b0
> ? file_update_time+0x70/0x140
> __generic_file_write_iter+0x141/0x1a0
> ext4_file_write_iter+0xef/0x3b0
> __vfs_write+0x17e/0x1e0
> vfs_write+0xa5/0x1a0
> ksys_write+0x57/0xd0
> do_syscall_64+0x55/0x160
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Tetsuo then noticed that this is because the __memcg_kmem_charge_memcg
> fails __GFP_NOFAIL charge when the kmem limit is reached. This is a
> wrong behavior because nofail allocations are not allowed to fail.
> Normal charge path simply forces the charge even if that means to cross
> the limit. Kmem accounting should be doing the same.
>
> Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com>
> Debugged-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
> Cc: stable
> Signed-off-by: Michal Hocko <mhocko@suse.com>
I wonder what has changed since
<http://lkml.kernel.org/r/20180525185501.82098-1-shakeelb@google.com/>.
> ---
> mm/memcontrol.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9ec5e12486a7..e18108b2b786 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2821,6 +2821,16 @@ int __memcg_kmem_charge_memcg(struct page *page, gfp_t gfp, int order,
>
> if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) &&
> !page_counter_try_charge(&memcg->kmem, nr_pages, &counter)) {
> +
> + /*
> + * Enforce __GFP_NOFAIL allocation because callers are not
> + * prepared to see failures and likely do not have any failure
> + * handling code.
> + */
> + if (gfp & __GFP_NOFAIL) {
> + page_counter_charge(&memcg->kmem, nr_pages);
> + return 0;
> + }
> cancel_charge(memcg, nr_pages);
> return -ENOMEM;
> }
> --
> 2.20.1
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] memcg, kmem: do not fail __GFP_NOFAIL charges
@ 2019-09-06 18:24 ` Shakeel Butt
0 siblings, 0 replies; 29+ messages in thread
From: Shakeel Butt @ 2019-09-06 18:24 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, Johannes Weiner, Vladimir Davydov, LKML, Linux MM,
Andrey Ryabinin, Michal Hocko, Thomas Lindroth, Tetsuo Handa
On Fri, Sep 6, 2019 at 5:56 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> From: Michal Hocko <mhocko@suse.com>
>
> Thomas has noticed the following NULL ptr dereference when using cgroup
> v1 kmem limit:
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> PGD 0
> P4D 0
> Oops: 0000 [#1] PREEMPT SMP PTI
> CPU: 3 PID: 16923 Comm: gtk-update-icon Not tainted 4.19.51 #42
> Hardware name: Gigabyte Technology Co., Ltd. Z97X-Gaming G1/Z97X-Gaming G1, BIOS F9 07/31/2015
> RIP: 0010:create_empty_buffers+0x24/0x100
> Code: cd 0f 1f 44 00 00 0f 1f 44 00 00 41 54 49 89 d4 ba 01 00 00 00 55 53 48 89 fb e8 97 fe ff ff 48 89 c5 48 89 c2 eb 03 48 89 ca <48> 8b 4a 08 4c 09 22 48 85 c9 75 f1 48 89 6a 08 48 8b 43 18 48 8d
> RSP: 0018:ffff927ac1b37bf8 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: fffff2d4429fd740 RCX: 0000000100097149
> RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff9075a99fbe00
> RBP: 0000000000000000 R08: fffff2d440949cc8 R09: 00000000000960c0
> R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000
> R13: ffff907601f18360 R14: 0000000000002000 R15: 0000000000001000
> FS: 00007fb55b288bc0(0000) GS:ffff90761f8c0000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000008 CR3: 000000007aebc002 CR4: 00000000001606e0
> Call Trace:
> create_page_buffers+0x4d/0x60
> __block_write_begin_int+0x8e/0x5a0
> ? ext4_inode_attach_jinode.part.82+0xb0/0xb0
> ? jbd2__journal_start+0xd7/0x1f0
> ext4_da_write_begin+0x112/0x3d0
> generic_perform_write+0xf1/0x1b0
> ? file_update_time+0x70/0x140
> __generic_file_write_iter+0x141/0x1a0
> ext4_file_write_iter+0xef/0x3b0
> __vfs_write+0x17e/0x1e0
> vfs_write+0xa5/0x1a0
> ksys_write+0x57/0xd0
> do_syscall_64+0x55/0x160
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Tetsuo then noticed that this is because the __memcg_kmem_charge_memcg
> fails __GFP_NOFAIL charge when the kmem limit is reached. This is a
> wrong behavior because nofail allocations are not allowed to fail.
> Normal charge path simply forces the charge even if that means to cross
> the limit. Kmem accounting should be doing the same.
>
> Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com>
> Debugged-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
> Cc: stable
> Signed-off-by: Michal Hocko <mhocko@suse.com>
I wonder what has changed since
<http://lkml.kernel.org/r/20180525185501.82098-1-shakeelb@google.com/>.
> ---
> mm/memcontrol.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9ec5e12486a7..e18108b2b786 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2821,6 +2821,16 @@ int __memcg_kmem_charge_memcg(struct page *page, gfp_t gfp, int order,
>
> if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) &&
> !page_counter_try_charge(&memcg->kmem, nr_pages, &counter)) {
> +
> + /*
> + * Enforce __GFP_NOFAIL allocation because callers are not
> + * prepared to see failures and likely do not have any failure
> + * handling code.
> + */
> + if (gfp & __GFP_NOFAIL) {
> + page_counter_charge(&memcg->kmem, nr_pages);
> + return 0;
> + }
> cancel_charge(memcg, nr_pages);
> return -ENOMEM;
> }
> --
> 2.20.1
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] memcg, kmem: do not fail __GFP_NOFAIL charges
2019-09-06 18:24 ` Shakeel Butt
(?)
@ 2019-09-09 11:22 ` Michal Hocko
2019-09-11 12:00 ` Michal Hocko
-1 siblings, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2019-09-09 11:22 UTC (permalink / raw)
To: Shakeel Butt
Cc: Andrew Morton, Johannes Weiner, Vladimir Davydov, LKML, Linux MM,
Andrey Ryabinin, Thomas Lindroth, Tetsuo Handa
On Fri 06-09-19 11:24:55, Shakeel Butt wrote:
> On Fri, Sep 6, 2019 at 5:56 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > From: Michal Hocko <mhocko@suse.com>
> >
> > Thomas has noticed the following NULL ptr dereference when using cgroup
> > v1 kmem limit:
> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> > PGD 0
> > P4D 0
> > Oops: 0000 [#1] PREEMPT SMP PTI
> > CPU: 3 PID: 16923 Comm: gtk-update-icon Not tainted 4.19.51 #42
> > Hardware name: Gigabyte Technology Co., Ltd. Z97X-Gaming G1/Z97X-Gaming G1, BIOS F9 07/31/2015
> > RIP: 0010:create_empty_buffers+0x24/0x100
> > Code: cd 0f 1f 44 00 00 0f 1f 44 00 00 41 54 49 89 d4 ba 01 00 00 00 55 53 48 89 fb e8 97 fe ff ff 48 89 c5 48 89 c2 eb 03 48 89 ca <48> 8b 4a 08 4c 09 22 48 85 c9 75 f1 48 89 6a 08 48 8b 43 18 48 8d
> > RSP: 0018:ffff927ac1b37bf8 EFLAGS: 00010286
> > RAX: 0000000000000000 RBX: fffff2d4429fd740 RCX: 0000000100097149
> > RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff9075a99fbe00
> > RBP: 0000000000000000 R08: fffff2d440949cc8 R09: 00000000000960c0
> > R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000
> > R13: ffff907601f18360 R14: 0000000000002000 R15: 0000000000001000
> > FS: 00007fb55b288bc0(0000) GS:ffff90761f8c0000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000008 CR3: 000000007aebc002 CR4: 00000000001606e0
> > Call Trace:
> > create_page_buffers+0x4d/0x60
> > __block_write_begin_int+0x8e/0x5a0
> > ? ext4_inode_attach_jinode.part.82+0xb0/0xb0
> > ? jbd2__journal_start+0xd7/0x1f0
> > ext4_da_write_begin+0x112/0x3d0
> > generic_perform_write+0xf1/0x1b0
> > ? file_update_time+0x70/0x140
> > __generic_file_write_iter+0x141/0x1a0
> > ext4_file_write_iter+0xef/0x3b0
> > __vfs_write+0x17e/0x1e0
> > vfs_write+0xa5/0x1a0
> > ksys_write+0x57/0xd0
> > do_syscall_64+0x55/0x160
> > entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > Tetsuo then noticed that this is because the __memcg_kmem_charge_memcg
> > fails __GFP_NOFAIL charge when the kmem limit is reached. This is a
> > wrong behavior because nofail allocations are not allowed to fail.
> > Normal charge path simply forces the charge even if that means to cross
> > the limit. Kmem accounting should be doing the same.
> >
> > Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com>
> > Debugged-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
> > Cc: stable
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
>
> I wonder what has changed since
> <http://lkml.kernel.org/r/20180525185501.82098-1-shakeelb@google.com/>.
I have completely forgot about that one. It seems that we have just
repeated the same discussion again. This time we have a poor user who
actually enabled the kmem limit.
I guess there was no real objection to the change back then. The primary
discussion revolved around the fact that the accounting will stay broken
even when this particular part was fixed. Considering this leads to easy
to trigger crash (with the limit enabled) then I guess we should just
make it less broken and backport to stable trees and have a serious
discussion about discontinuing of the limit. Start by simply failing to
set any limit in the current upstream kernels.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] memcg, kmem: do not fail __GFP_NOFAIL charges
2019-09-09 11:22 ` Michal Hocko
@ 2019-09-11 12:00 ` Michal Hocko
2019-09-11 14:37 ` Andrew Morton
0 siblings, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2019-09-11 12:00 UTC (permalink / raw)
To: Shakeel Butt
Cc: Andrew Morton, Johannes Weiner, Vladimir Davydov, LKML, Linux MM,
Andrey Ryabinin, Thomas Lindroth, Tetsuo Handa
On Mon 09-09-19 13:22:45, Michal Hocko wrote:
> On Fri 06-09-19 11:24:55, Shakeel Butt wrote:
[...]
> > I wonder what has changed since
> > <http://lkml.kernel.org/r/20180525185501.82098-1-shakeelb@google.com/>.
>
> I have completely forgot about that one. It seems that we have just
> repeated the same discussion again. This time we have a poor user who
> actually enabled the kmem limit.
>
> I guess there was no real objection to the change back then. The primary
> discussion revolved around the fact that the accounting will stay broken
> even when this particular part was fixed. Considering this leads to easy
> to trigger crash (with the limit enabled) then I guess we should just
> make it less broken and backport to stable trees and have a serious
> discussion about discontinuing of the limit. Start by simply failing to
> set any limit in the current upstream kernels.
Any more concerns/objections to the patch? I can add a reference to your
earlier post Shakeel if you want or to credit you the way you prefer.
Also are there any objections to start deprecating process of kmem
limit? I would see it in two stages
- 1st warn in the kernel log
pr_warn("kmem.limit_in_bytes is deprecated and will be removed.
"Please report your usecase to linux-mm@kvack.org if you "
"depend on this functionality."
- 2nd fail any write to kmem.limit_in_bytes
- 3rd remove the control file completely
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] memcg, kmem: do not fail __GFP_NOFAIL charges
2019-09-11 12:00 ` Michal Hocko
@ 2019-09-11 14:37 ` Andrew Morton
2019-09-11 15:16 ` Michal Hocko
0 siblings, 1 reply; 29+ messages in thread
From: Andrew Morton @ 2019-09-11 14:37 UTC (permalink / raw)
To: Michal Hocko
Cc: Shakeel Butt, Johannes Weiner, Vladimir Davydov, LKML, Linux MM,
Andrey Ryabinin, Thomas Lindroth, Tetsuo Handa
On Wed, 11 Sep 2019 14:00:02 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> On Mon 09-09-19 13:22:45, Michal Hocko wrote:
> > On Fri 06-09-19 11:24:55, Shakeel Butt wrote:
> [...]
> > > I wonder what has changed since
> > > <http://lkml.kernel.org/r/20180525185501.82098-1-shakeelb@google.com/>.
> >
> > I have completely forgot about that one. It seems that we have just
> > repeated the same discussion again. This time we have a poor user who
> > actually enabled the kmem limit.
> >
> > I guess there was no real objection to the change back then. The primary
> > discussion revolved around the fact that the accounting will stay broken
> > even when this particular part was fixed. Considering this leads to easy
> > to trigger crash (with the limit enabled) then I guess we should just
> > make it less broken and backport to stable trees and have a serious
> > discussion about discontinuing of the limit. Start by simply failing to
> > set any limit in the current upstream kernels.
>
> Any more concerns/objections to the patch? I can add a reference to your
> earlier post Shakeel if you want or to credit you the way you prefer.
>
> Also are there any objections to start deprecating process of kmem
> limit? I would see it in two stages
> - 1st warn in the kernel log
> pr_warn("kmem.limit_in_bytes is deprecated and will be removed.
> "Please report your usecase to linux-mm@kvack.org if you "
> "depend on this functionality."
pr_warn_once() :)
> - 2nd fail any write to kmem.limit_in_bytes
> - 3rd remove the control file completely
Sounds good to me.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] memcg, kmem: do not fail __GFP_NOFAIL charges
2019-09-11 14:37 ` Andrew Morton
@ 2019-09-11 15:16 ` Michal Hocko
2019-09-13 2:46 ` Shakeel Butt
0 siblings, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2019-09-11 15:16 UTC (permalink / raw)
To: Andrew Morton
Cc: Shakeel Butt, Johannes Weiner, Vladimir Davydov, LKML, Linux MM,
Andrey Ryabinin, Thomas Lindroth, Tetsuo Handa
On Wed 11-09-19 07:37:40, Andrew Morton wrote:
> On Wed, 11 Sep 2019 14:00:02 +0200 Michal Hocko <mhocko@kernel.org> wrote:
>
> > On Mon 09-09-19 13:22:45, Michal Hocko wrote:
> > > On Fri 06-09-19 11:24:55, Shakeel Butt wrote:
> > [...]
> > > > I wonder what has changed since
> > > > <http://lkml.kernel.org/r/20180525185501.82098-1-shakeelb@google.com/>.
> > >
> > > I have completely forgot about that one. It seems that we have just
> > > repeated the same discussion again. This time we have a poor user who
> > > actually enabled the kmem limit.
> > >
> > > I guess there was no real objection to the change back then. The primary
> > > discussion revolved around the fact that the accounting will stay broken
> > > even when this particular part was fixed. Considering this leads to easy
> > > to trigger crash (with the limit enabled) then I guess we should just
> > > make it less broken and backport to stable trees and have a serious
> > > discussion about discontinuing of the limit. Start by simply failing to
> > > set any limit in the current upstream kernels.
> >
> > Any more concerns/objections to the patch? I can add a reference to your
> > earlier post Shakeel if you want or to credit you the way you prefer.
> >
> > Also are there any objections to start deprecating process of kmem
> > limit? I would see it in two stages
> > - 1st warn in the kernel log
> > pr_warn("kmem.limit_in_bytes is deprecated and will be removed.
> > "Please report your usecase to linux-mm@kvack.org if you "
> > "depend on this functionality."
>
> pr_warn_once() :)
>
> > - 2nd fail any write to kmem.limit_in_bytes
> > - 3rd remove the control file completely
>
> Sounds good to me.
Here we go
From 512822e551fe2960040c23b12c7b27a5fdab9013 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Wed, 11 Sep 2019 17:02:33 +0200
Subject: [PATCH] memcg, kmem: deprecate kmem.limit_in_bytes
Cgroup v1 memcg controller has exposed a dedicated kmem limit to users
which turned out to be really a bad idea because there are paths which
cannot shrink the kernel memory usage enough to get below the limit
(e.g. because the accounted memory is not reclaimable). There are cases
when the failure is even not allowed (e.g. __GFP_NOFAIL). This means
that the kmem limit is in excess to the hard limit without any way to
shrink and thus completely useless. OOM killer cannot be invoked to
handle the situation because that would lead to a premature oom killing.
As a result many places might see ENOMEM returning from kmalloc and
result in unexpected errors. E.g. a global OOM killer when there is a
lot of free memory because ENOMEM is translated into VM_FAULT_OOM in #PF
path and therefore pagefault_out_of_memory would result in OOM killer.
Please note that the kernel memory is still accounted to the overall
limit along with the user memory so removing the kmem specific limit
should still allow to contain kernel memory consumption. Unlike the kmem
one, though, it invokes memory reclaim and targeted memcg oom killing if
necessary.
Start the deprecation process by crying to the kernel log. Let's see
whether there are relevant usecases and simply return to EINVAL in the
second stage if nobody complains in few releases.
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
Documentation/admin-guide/cgroup-v1/memory.rst | 3 +++
mm/memcontrol.c | 3 +++
2 files changed, 6 insertions(+)
diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
index 41bdc038dad9..e53fc2f31549 100644
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -87,6 +87,9 @@ Brief summary of control files.
node
memory.kmem.limit_in_bytes set/show hard limit for kernel memory
+ This knob is deprecated it shouldn't be
+ used. It is planned to be removed in
+ a foreseeable future.
memory.kmem.usage_in_bytes show current kernel memory allocation
memory.kmem.failcnt show the number of kernel memory usage
hits limits
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e18108b2b786..113969bc57e8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3518,6 +3518,9 @@ static ssize_t mem_cgroup_write(struct kernfs_open_file *of,
ret = mem_cgroup_resize_max(memcg, nr_pages, true);
break;
case _KMEM:
+ pr_warn_once("kmem.limit_in_bytes is deprecated and will be removed. "
+ "Please report your usecase to linux-mm@kvack.org if you "
+ "depend on this functionality.\n");
ret = memcg_update_kmem_max(memcg, nr_pages);
break;
case _TCP:
--
2.20.1
--
Michal Hocko
SUSE Labs
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH] memcg, kmem: do not fail __GFP_NOFAIL charges
2019-09-11 15:16 ` Michal Hocko
@ 2019-09-13 2:46 ` Shakeel Butt
0 siblings, 0 replies; 29+ messages in thread
From: Shakeel Butt @ 2019-09-13 2:46 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, Johannes Weiner, Vladimir Davydov, LKML, Linux MM,
Andrey Ryabinin, Thomas Lindroth, Tetsuo Handa
On Wed, Sep 11, 2019 at 8:16 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 11-09-19 07:37:40, Andrew Morton wrote:
> > On Wed, 11 Sep 2019 14:00:02 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> >
> > > On Mon 09-09-19 13:22:45, Michal Hocko wrote:
> > > > On Fri 06-09-19 11:24:55, Shakeel Butt wrote:
> > > [...]
> > > > > I wonder what has changed since
> > > > > <http://lkml.kernel.org/r/20180525185501.82098-1-shakeelb@google.com/>.
> > > >
> > > > I have completely forgot about that one. It seems that we have just
> > > > repeated the same discussion again. This time we have a poor user who
> > > > actually enabled the kmem limit.
> > > >
> > > > I guess there was no real objection to the change back then. The primary
> > > > discussion revolved around the fact that the accounting will stay broken
> > > > even when this particular part was fixed. Considering this leads to easy
> > > > to trigger crash (with the limit enabled) then I guess we should just
> > > > make it less broken and backport to stable trees and have a serious
> > > > discussion about discontinuing of the limit. Start by simply failing to
> > > > set any limit in the current upstream kernels.
> > >
> > > Any more concerns/objections to the patch? I can add a reference to your
> > > earlier post Shakeel if you want or to credit you the way you prefer.
> > >
> > > Also are there any objections to start deprecating process of kmem
> > > limit? I would see it in two stages
> > > - 1st warn in the kernel log
> > > pr_warn("kmem.limit_in_bytes is deprecated and will be removed.
> > > "Please report your usecase to linux-mm@kvack.org if you "
> > > "depend on this functionality."
> >
> > pr_warn_once() :)
> >
> > > - 2nd fail any write to kmem.limit_in_bytes
> > > - 3rd remove the control file completely
> >
> > Sounds good to me.
>
> Here we go
>
> From 512822e551fe2960040c23b12c7b27a5fdab9013 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Wed, 11 Sep 2019 17:02:33 +0200
> Subject: [PATCH] memcg, kmem: deprecate kmem.limit_in_bytes
>
> Cgroup v1 memcg controller has exposed a dedicated kmem limit to users
> which turned out to be really a bad idea because there are paths which
> cannot shrink the kernel memory usage enough to get below the limit
> (e.g. because the accounted memory is not reclaimable). There are cases
> when the failure is even not allowed (e.g. __GFP_NOFAIL). This means
> that the kmem limit is in excess to the hard limit without any way to
> shrink and thus completely useless. OOM killer cannot be invoked to
> handle the situation because that would lead to a premature oom killing.
>
> As a result many places might see ENOMEM returning from kmalloc and
> result in unexpected errors. E.g. a global OOM killer when there is a
> lot of free memory because ENOMEM is translated into VM_FAULT_OOM in #PF
> path and therefore pagefault_out_of_memory would result in OOM killer.
>
> Please note that the kernel memory is still accounted to the overall
> limit along with the user memory so removing the kmem specific limit
> should still allow to contain kernel memory consumption. Unlike the kmem
> one, though, it invokes memory reclaim and targeted memcg oom killing if
> necessary.
>
> Start the deprecation process by crying to the kernel log. Let's see
> whether there are relevant usecases and simply return to EINVAL in the
> second stage if nobody complains in few releases.
>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
> ---
> Documentation/admin-guide/cgroup-v1/memory.rst | 3 +++
> mm/memcontrol.c | 3 +++
> 2 files changed, 6 insertions(+)
>
> diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
> index 41bdc038dad9..e53fc2f31549 100644
> --- a/Documentation/admin-guide/cgroup-v1/memory.rst
> +++ b/Documentation/admin-guide/cgroup-v1/memory.rst
> @@ -87,6 +87,9 @@ Brief summary of control files.
> node
>
> memory.kmem.limit_in_bytes set/show hard limit for kernel memory
> + This knob is deprecated it shouldn't be
> + used. It is planned to be removed in
> + a foreseeable future.
> memory.kmem.usage_in_bytes show current kernel memory allocation
> memory.kmem.failcnt show the number of kernel memory usage
> hits limits
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index e18108b2b786..113969bc57e8 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3518,6 +3518,9 @@ static ssize_t mem_cgroup_write(struct kernfs_open_file *of,
> ret = mem_cgroup_resize_max(memcg, nr_pages, true);
> break;
> case _KMEM:
> + pr_warn_once("kmem.limit_in_bytes is deprecated and will be removed. "
> + "Please report your usecase to linux-mm@kvack.org if you "
> + "depend on this functionality.\n");
> ret = memcg_update_kmem_max(memcg, nr_pages);
> break;
> case _TCP:
> --
> 2.20.1
>
>
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] memcg, kmem: do not fail __GFP_NOFAIL charges
@ 2019-09-13 2:46 ` Shakeel Butt
0 siblings, 0 replies; 29+ messages in thread
From: Shakeel Butt @ 2019-09-13 2:46 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, Johannes Weiner, Vladimir Davydov, LKML, Linux MM,
Andrey Ryabinin, Thomas Lindroth, Tetsuo Handa
On Wed, Sep 11, 2019 at 8:16 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 11-09-19 07:37:40, Andrew Morton wrote:
> > On Wed, 11 Sep 2019 14:00:02 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> >
> > > On Mon 09-09-19 13:22:45, Michal Hocko wrote:
> > > > On Fri 06-09-19 11:24:55, Shakeel Butt wrote:
> > > [...]
> > > > > I wonder what has changed since
> > > > > <http://lkml.kernel.org/r/20180525185501.82098-1-shakeelb@google.com/>.
> > > >
> > > > I have completely forgot about that one. It seems that we have just
> > > > repeated the same discussion again. This time we have a poor user who
> > > > actually enabled the kmem limit.
> > > >
> > > > I guess there was no real objection to the change back then. The primary
> > > > discussion revolved around the fact that the accounting will stay broken
> > > > even when this particular part was fixed. Considering this leads to easy
> > > > to trigger crash (with the limit enabled) then I guess we should just
> > > > make it less broken and backport to stable trees and have a serious
> > > > discussion about discontinuing of the limit. Start by simply failing to
> > > > set any limit in the current upstream kernels.
> > >
> > > Any more concerns/objections to the patch? I can add a reference to your
> > > earlier post Shakeel if you want or to credit you the way you prefer.
> > >
> > > Also are there any objections to start deprecating process of kmem
> > > limit? I would see it in two stages
> > > - 1st warn in the kernel log
> > > pr_warn("kmem.limit_in_bytes is deprecated and will be removed.
> > > "Please report your usecase to linux-mm@kvack.org if you "
> > > "depend on this functionality."
> >
> > pr_warn_once() :)
> >
> > > - 2nd fail any write to kmem.limit_in_bytes
> > > - 3rd remove the control file completely
> >
> > Sounds good to me.
>
> Here we go
>
> From 512822e551fe2960040c23b12c7b27a5fdab9013 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Wed, 11 Sep 2019 17:02:33 +0200
> Subject: [PATCH] memcg, kmem: deprecate kmem.limit_in_bytes
>
> Cgroup v1 memcg controller has exposed a dedicated kmem limit to users
> which turned out to be really a bad idea because there are paths which
> cannot shrink the kernel memory usage enough to get below the limit
> (e.g. because the accounted memory is not reclaimable). There are cases
> when the failure is even not allowed (e.g. __GFP_NOFAIL). This means
> that the kmem limit is in excess to the hard limit without any way to
> shrink and thus completely useless. OOM killer cannot be invoked to
> handle the situation because that would lead to a premature oom killing.
>
> As a result many places might see ENOMEM returning from kmalloc and
> result in unexpected errors. E.g. a global OOM killer when there is a
> lot of free memory because ENOMEM is translated into VM_FAULT_OOM in #PF
> path and therefore pagefault_out_of_memory would result in OOM killer.
>
> Please note that the kernel memory is still accounted to the overall
> limit along with the user memory so removing the kmem specific limit
> should still allow to contain kernel memory consumption. Unlike the kmem
> one, though, it invokes memory reclaim and targeted memcg oom killing if
> necessary.
>
> Start the deprecation process by crying to the kernel log. Let's see
> whether there are relevant usecases and simply return to EINVAL in the
> second stage if nobody complains in few releases.
>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
> ---
> Documentation/admin-guide/cgroup-v1/memory.rst | 3 +++
> mm/memcontrol.c | 3 +++
> 2 files changed, 6 insertions(+)
>
> diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
> index 41bdc038dad9..e53fc2f31549 100644
> --- a/Documentation/admin-guide/cgroup-v1/memory.rst
> +++ b/Documentation/admin-guide/cgroup-v1/memory.rst
> @@ -87,6 +87,9 @@ Brief summary of control files.
> node
>
> memory.kmem.limit_in_bytes set/show hard limit for kernel memory
> + This knob is deprecated it shouldn't be
> + used. It is planned to be removed in
> + a foreseeable future.
> memory.kmem.usage_in_bytes show current kernel memory allocation
> memory.kmem.failcnt show the number of kernel memory usage
> hits limits
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index e18108b2b786..113969bc57e8 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3518,6 +3518,9 @@ static ssize_t mem_cgroup_write(struct kernfs_open_file *of,
> ret = mem_cgroup_resize_max(memcg, nr_pages, true);
> break;
> case _KMEM:
> + pr_warn_once("kmem.limit_in_bytes is deprecated and will be removed. "
> + "Please report your usecase to linux-mm@kvack.org if you "
> + "depend on this functionality.\n");
> ret = memcg_update_kmem_max(memcg, nr_pages);
> break;
> case _TCP:
> --
> 2.20.1
>
>
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] memcg, kmem: do not fail __GFP_NOFAIL charges
2019-09-06 12:56 ` [PATCH] memcg, kmem: do not fail __GFP_NOFAIL charges Michal Hocko
2019-09-06 18:24 ` Shakeel Butt
@ 2019-09-24 10:53 ` Michal Hocko
2019-09-24 23:06 ` Andrew Morton
1 sibling, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2019-09-24 10:53 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Vladimir Davydov, LKML, linux-mm,
Andrey Ryabinin, Thomas Lindroth, Tetsuo Handa
Andrew, do you plan to send this patch to Linus as ell?
On Fri 06-09-19 14:56:08, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> Thomas has noticed the following NULL ptr dereference when using cgroup
> v1 kmem limit:
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> PGD 0
> P4D 0
> Oops: 0000 [#1] PREEMPT SMP PTI
> CPU: 3 PID: 16923 Comm: gtk-update-icon Not tainted 4.19.51 #42
> Hardware name: Gigabyte Technology Co., Ltd. Z97X-Gaming G1/Z97X-Gaming G1, BIOS F9 07/31/2015
> RIP: 0010:create_empty_buffers+0x24/0x100
> Code: cd 0f 1f 44 00 00 0f 1f 44 00 00 41 54 49 89 d4 ba 01 00 00 00 55 53 48 89 fb e8 97 fe ff ff 48 89 c5 48 89 c2 eb 03 48 89 ca <48> 8b 4a 08 4c 09 22 48 85 c9 75 f1 48 89 6a 08 48 8b 43 18 48 8d
> RSP: 0018:ffff927ac1b37bf8 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: fffff2d4429fd740 RCX: 0000000100097149
> RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff9075a99fbe00
> RBP: 0000000000000000 R08: fffff2d440949cc8 R09: 00000000000960c0
> R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000
> R13: ffff907601f18360 R14: 0000000000002000 R15: 0000000000001000
> FS: 00007fb55b288bc0(0000) GS:ffff90761f8c0000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000008 CR3: 000000007aebc002 CR4: 00000000001606e0
> Call Trace:
> create_page_buffers+0x4d/0x60
> __block_write_begin_int+0x8e/0x5a0
> ? ext4_inode_attach_jinode.part.82+0xb0/0xb0
> ? jbd2__journal_start+0xd7/0x1f0
> ext4_da_write_begin+0x112/0x3d0
> generic_perform_write+0xf1/0x1b0
> ? file_update_time+0x70/0x140
> __generic_file_write_iter+0x141/0x1a0
> ext4_file_write_iter+0xef/0x3b0
> __vfs_write+0x17e/0x1e0
> vfs_write+0xa5/0x1a0
> ksys_write+0x57/0xd0
> do_syscall_64+0x55/0x160
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Tetsuo then noticed that this is because the __memcg_kmem_charge_memcg
> fails __GFP_NOFAIL charge when the kmem limit is reached. This is a
> wrong behavior because nofail allocations are not allowed to fail.
> Normal charge path simply forces the charge even if that means to cross
> the limit. Kmem accounting should be doing the same.
>
> Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com>
> Debugged-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
> Cc: stable
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
> mm/memcontrol.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9ec5e12486a7..e18108b2b786 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2821,6 +2821,16 @@ int __memcg_kmem_charge_memcg(struct page *page, gfp_t gfp, int order,
>
> if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) &&
> !page_counter_try_charge(&memcg->kmem, nr_pages, &counter)) {
> +
> + /*
> + * Enforce __GFP_NOFAIL allocation because callers are not
> + * prepared to see failures and likely do not have any failure
> + * handling code.
> + */
> + if (gfp & __GFP_NOFAIL) {
> + page_counter_charge(&memcg->kmem, nr_pages);
> + return 0;
> + }
> cancel_charge(memcg, nr_pages);
> return -ENOMEM;
> }
> --
> 2.20.1
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] memcg, kmem: do not fail __GFP_NOFAIL charges
2019-09-24 10:53 ` Michal Hocko
@ 2019-09-24 23:06 ` Andrew Morton
0 siblings, 0 replies; 29+ messages in thread
From: Andrew Morton @ 2019-09-24 23:06 UTC (permalink / raw)
To: Michal Hocko
Cc: Johannes Weiner, Vladimir Davydov, LKML, linux-mm,
Andrey Ryabinin, Thomas Lindroth, Tetsuo Handa
On Tue, 24 Sep 2019 12:53:55 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> Andrew, do you plan to send this patch to Linus as ell?
I suppose so. We don't actually have any reviewed-bys or acked-bys but
I expect that's because Shakeel forgot to type them in.
The followup deprecation warning patch I parked for 5.4-rc1. Best to
give it a spin in -next and see if anyone complains before we go
bothering mainline users.
From: Michal Hocko <mhocko@suse.com>
Subject: memcg, kmem: do not fail __GFP_NOFAIL charges
Thomas has noticed the following NULL ptr dereference when using cgroup
v1 kmem limit:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
PGD 0
P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 16923 Comm: gtk-update-icon Not tainted 4.19.51 #42
Hardware name: Gigabyte Technology Co., Ltd. Z97X-Gaming G1/Z97X-Gaming G1, BIOS F9 07/31/2015
RIP: 0010:create_empty_buffers+0x24/0x100
Code: cd 0f 1f 44 00 00 0f 1f 44 00 00 41 54 49 89 d4 ba 01 00 00 00 55 53 48 89 fb e8 97 fe ff ff 48 89 c5 48 89 c2 eb 03 48 89 ca <48> 8b 4a 08 4c 09 22 48 85 c9 75 f1 48 89 6a 08 48 8b 43 18 48 8d
RSP: 0018:ffff927ac1b37bf8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: fffff2d4429fd740 RCX: 0000000100097149
RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff9075a99fbe00
RBP: 0000000000000000 R08: fffff2d440949cc8 R09: 00000000000960c0
R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000
R13: ffff907601f18360 R14: 0000000000002000 R15: 0000000000001000
FS: 00007fb55b288bc0(0000) GS:ffff90761f8c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000007aebc002 CR4: 00000000001606e0
Call Trace:
create_page_buffers+0x4d/0x60
__block_write_begin_int+0x8e/0x5a0
? ext4_inode_attach_jinode.part.82+0xb0/0xb0
? jbd2__journal_start+0xd7/0x1f0
ext4_da_write_begin+0x112/0x3d0
generic_perform_write+0xf1/0x1b0
? file_update_time+0x70/0x140
__generic_file_write_iter+0x141/0x1a0
ext4_file_write_iter+0xef/0x3b0
__vfs_write+0x17e/0x1e0
vfs_write+0xa5/0x1a0
ksys_write+0x57/0xd0
do_syscall_64+0x55/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Tetsuo then noticed that this is because the __memcg_kmem_charge_memcg
fails __GFP_NOFAIL charge when the kmem limit is reached. This is a wrong
behavior because nofail allocations are not allowed to fail. Normal
charge path simply forces the charge even if that means to cross the
limit. Kmem accounting should be doing the same.
Link: http://lkml.kernel.org/r/20190906125608.32129-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com>
Debugged-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Thomas Lindroth <thomas.lindroth@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 10 ++++++++++
1 file changed, 10 insertions(+)
--- a/mm/memcontrol.c~memcg-kmem-do-not-fail-__gfp_nofail-charges
+++ a/mm/memcontrol.c
@@ -2943,6 +2943,16 @@ int __memcg_kmem_charge_memcg(struct pag
if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) &&
!page_counter_try_charge(&memcg->kmem, nr_pages, &counter)) {
+
+ /*
+ * Enforce __GFP_NOFAIL allocation because callers are not
+ * prepared to see failures and likely do not have any failure
+ * handling code.
+ */
+ if (gfp & __GFP_NOFAIL) {
+ page_counter_charge(&memcg->kmem, nr_pages);
+ return 0;
+ }
cancel_charge(memcg, nr_pages);
return -ENOMEM;
}
_
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2019-09-24 23:06 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-01 20:43 [BUG] Early OOM and kernel NULL pointer dereference in 4.19.69 Thomas Lindroth
2019-09-02 7:16 ` Michal Hocko
2019-09-02 7:27 ` Michal Hocko
2019-09-02 19:34 ` Thomas Lindroth
2019-09-03 7:41 ` Michal Hocko
2019-09-03 12:01 ` Thomas Lindroth
2019-09-03 12:05 ` Andrey Ryabinin
2019-09-03 12:22 ` Michal Hocko
2019-09-03 18:20 ` Thomas Lindroth
2019-09-03 19:36 ` Michal Hocko
[not found] ` <666dbcde-1b8a-9e2d-7d1f-48a117c78ae1@I-love.SAKURA.ne.jp>
2019-09-03 18:25 ` Thomas Lindroth
[not found] ` <4d0eda9a-319d-1a7d-1eed-71da90902367@i-love.sakura.ne.jp>
2019-09-04 11:25 ` [BUG] kmemcg limit defeats __GFP_NOFAIL allocation Michal Hocko
[not found] ` <4d87d770-c110-224f-6c0c-d6fada90417d@i-love.sakura.ne.jp>
2019-09-04 11:59 ` Michal Hocko
[not found] ` <0056063b-46ff-0ebd-ff0d-c96a1f9ae6b1@i-love.sakura.ne.jp>
2019-09-04 14:29 ` Michal Hocko
[not found] ` <405ce28b-c0b4-780c-c883-42d741ec60e0@i-love.sakura.ne.jp>
2019-09-05 23:11 ` Thomas Lindroth
2019-09-06 7:27 ` Michal Hocko
2019-09-06 10:54 ` Andrey Ryabinin
2019-09-06 11:29 ` Michal Hocko
2019-09-06 12:56 ` [PATCH] memcg, kmem: do not fail __GFP_NOFAIL charges Michal Hocko
2019-09-06 18:24 ` Shakeel Butt
2019-09-06 18:24 ` Shakeel Butt
2019-09-09 11:22 ` Michal Hocko
2019-09-11 12:00 ` Michal Hocko
2019-09-11 14:37 ` Andrew Morton
2019-09-11 15:16 ` Michal Hocko
2019-09-13 2:46 ` Shakeel Butt
2019-09-13 2:46 ` Shakeel Butt
2019-09-24 10:53 ` Michal Hocko
2019-09-24 23:06 ` Andrew Morton
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.