linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* OOM: Better, but still there on 4.9
@ 2016-12-15 22:57 Nils Holland
  2016-12-16  7:39 ` Michal Hocko
  0 siblings, 1 reply; 62+ messages in thread
From: Nils Holland @ 2016-12-15 22:57 UTC (permalink / raw)
  To: linux-kernel

Hi folks,

I've been reading quite a bit about OOM related issues in recent
kernels, and as I've been experiencing some of these myself for quite
a while, I thought I'd send in my report in case in the hope that it
is useful. Of course, if there's ever anything to test, like some
patches or something, I'd be glad to help!

Now, my situation: I have two different x86 machines, both equipped
with 4 GB of RAM, running 32 bit kernels. I've never really observed
any OOM issues until kernel 4.8, but with that kernel, it was enough
to unpack a bigger source tarball (like the firefox sources) on a
freshly booted system and subsequently compile them, and with very
high certainty, the OOL killer would kick in during the compile,
killing a whole lot of processes, with the machine then becoming
unresponsive and finally a kernel panic taking place.

With kernel 4.9, these OOM events seem to be somewhat harder to
trigger, but in most cases, unpacking some larger tarballs and then
launching a build process on a freshly booted system without many
other processed (not even X) running seems to do the trick. However,
the consequences don't seem to be as severe as they were in 4.8: The
machines did, in fact, become unresponsive in the way that logging in
locally (after I'm being thrown out when my bash gets killed by the
OOM reaper) is no longer possible, sshing into the machine also
doesn't work anymore (most likely because sshd has also been killed),
but in all cases the machine was still pingable and the magic
SysRequest key combo was still working. I've not yet seen a single
real kernel panic as I did with 4.8, still, the only way to get the
machine back into action was a hard reboot via the magic SysRequest
commands or a power cycle.

For the reference, I'm attaching an OOM I've observed under 4.9 at the
end of this machine. This one happened after I had been using the
machine in a normal fashion for a short time, and then had portage,
Gentoo's build system, unpack the firefox sources - compiling hadn't
even started yet at this point. Oh yes, I'm using btrfs in case that
might make a differences - at least I've seen some references to it in
other similar reports I've found on the web.

Of course, none of this are workloads that are new / special in any
way - prior to 4.8, I never experienced any issues doing the exact
same things.

Dec 15 19:02:16 teela kernel: kworker/u4:5 invoked oom-killer: gfp_mask=0x2400840(GFP_NOFS|__GFP_NOFAIL), nodemask=0, order=0, oom_score_adj=0
Dec 15 19:02:18 teela kernel: kworker/u4:5 cpuset=/ mems_allowed=0
Dec 15 19:02:18 teela kernel: CPU: 1 PID: 2603 Comm: kworker/u4:5 Not tainted 4.9.0-gentoo #2
Dec 15 19:02:18 teela kernel: Hardware name: Hewlett-Packard Compaq 15 Notebook PC/21F7, BIOS F.22 08/06/2014
Dec 15 19:02:18 teela kernel: Workqueue: writeback wb_workfn (flush-btrfs-1)
Dec 15 19:02:18 teela kernel:  eff0b604 c142bcce eff0b734 00000000 eff0b634 c1163332 00000000 00000292
Dec 15 19:02:18 teela kernel:  eff0b634 c1431876 eff0b638 e7fb0b00 e7fa2900 e7fa2900 c1b58785 eff0b734
Dec 15 19:02:18 teela kernel:  eff0b678 c110795f c1043895 eff0b664 c11075c7 00000007 00000000 00000000
Dec 15 19:02:18 teela kernel: Call Trace:
Dec 15 19:02:18 teela kernel:  [<c142bcce>] dump_stack+0x47/0x69
Dec 15 19:02:18 teela kernel:  [<c1163332>] dump_header+0x60/0x178
Dec 15 19:02:18 teela kernel:  [<c1431876>] ? ___ratelimit+0x86/0xe0
Dec 15 19:02:18 teela kernel:  [<c110795f>] oom_kill_process+0x20f/0x3d0
Dec 15 19:02:18 teela kernel:  [<c1043895>] ? has_capability_noaudit+0x15/0x20
Dec 15 19:02:18 teela kernel:  [<c11075c7>] ? oom_badness.part.13+0xb7/0x130
Dec 15 19:02:18 teela kernel:  [<c1107df9>] out_of_memory+0xd9/0x260
Dec 15 19:02:18 teela kernel:  [<c110ba0b>] __alloc_pages_nodemask+0xbfb/0xc80
Dec 15 19:02:18 teela kernel:  [<c110414d>] pagecache_get_page+0xad/0x270
Dec 15 19:02:18 teela kernel:  [<c13664a6>] alloc_extent_buffer+0x116/0x3e0
Dec 15 19:02:18 teela kernel:  [<c1334a2e>] btrfs_find_create_tree_block+0xe/0x10
Dec 15 19:02:18 teela kernel:  [<c132a57f>] btrfs_alloc_tree_block+0x1ef/0x5f0
Dec 15 19:02:18 teela kernel:  [<c130f7c3>] __btrfs_cow_block+0x143/0x5f0
Dec 15 19:02:18 teela kernel:  [<c130fe1a>] btrfs_cow_block+0x13a/0x220
Dec 15 19:02:18 teela kernel:  [<c13132f1>] btrfs_search_slot+0x1d1/0x870
Dec 15 19:02:18 teela kernel:  [<c132fcdd>] btrfs_lookup_file_extent+0x4d/0x60
Dec 15 19:02:18 teela kernel:  [<c1354fe6>] __btrfs_drop_extents+0x176/0x1070
Dec 15 19:02:18 teela kernel:  [<c1150377>] ? kmem_cache_alloc+0xb7/0x190
Dec 15 19:02:18 teela kernel:  [<c133dbb5>] ? start_transaction+0x65/0x4b0
Dec 15 19:02:18 teela kernel:  [<c1150597>] ? __kmalloc+0x147/0x1e0
Dec 15 19:02:18 teela kernel:  [<c1345005>] cow_file_range_inline+0x215/0x6b0
Dec 15 19:02:18 teela kernel:  [<c13459fc>] cow_file_range.isra.49+0x55c/0x6d0
Dec 15 19:02:18 teela kernel:  [<c1361795>] ? lock_extent_bits+0x75/0x1e0
Dec 15 19:02:18 teela kernel:  [<c1346d51>] run_delalloc_range+0x441/0x470
Dec 15 19:02:18 teela kernel:  [<c13626e4>] writepage_delalloc.isra.47+0x144/0x1e0
Dec 15 19:02:18 teela kernel:  [<c1364548>] __extent_writepage+0xd8/0x2b0
Dec 15 19:02:18 teela kernel:  [<c1365c4c>] extent_writepages+0x25c/0x380
Dec 15 19:02:18 teela kernel:  [<c1342cd0>] ? btrfs_real_readdir+0x610/0x610
Dec 15 19:02:18 teela kernel:  [<c133ff0f>] btrfs_writepages+0x1f/0x30
Dec 15 19:02:18 teela kernel:  [<c110ff85>] do_writepages+0x15/0x40
Dec 15 19:02:18 teela kernel:  [<c1190a95>] __writeback_single_inode+0x35/0x2f0
Dec 15 19:02:18 teela kernel:  [<c119112e>] writeback_sb_inodes+0x16e/0x340
Dec 15 19:02:18 teela kernel:  [<c119145a>] wb_writeback+0xaa/0x280
Dec 15 19:02:18 teela kernel:  [<c1191de8>] wb_workfn+0xd8/0x3e0
Dec 15 19:02:18 teela kernel:  [<c104fd34>] process_one_work+0x114/0x3e0
Dec 15 19:02:18 teela kernel:  [<c1050b4f>] worker_thread+0x2f/0x4b0
Dec 15 19:02:18 teela kernel:  [<c1050b20>] ? create_worker+0x180/0x180
Dec 15 19:02:18 teela kernel:  [<c10552e7>] kthread+0x97/0xb0
Dec 15 19:02:18 teela kernel:  [<c1055250>] ? __kthread_parkme+0x60/0x60
Dec 15 19:02:18 teela kernel:  [<c19b5cb7>] ret_from_fork+0x1b/0x28
Dec 15 19:02:18 teela kernel: Mem-Info:
Dec 15 19:02:18 teela kernel: active_anon:58685 inactive_anon:90 isolated_anon:0
                               active_file:274324 inactive_file:281962 isolated_file:0
                               unevictable:0 dirty:649 writeback:0 unstable:0
                               slab_reclaimable:40662 slab_unreclaimable:17754
                               mapped:7382 shmem:202 pagetables:351 bounce:0
                               free:206736 free_pcp:332 free_cma:0
Dec 15 19:02:18 teela kernel: Node 0 active_anon:234740kB inactive_anon:360kB active_file:1097296kB inactive_file:1127848kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:29528kB dirty:2596kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 184320kB anon_thp: 808kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
Dec 15 19:02:18 teela kernel: DMA free:3952kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:7316kB inactive_file:0kB unevictable:0kB writepending:96kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:3200kB slab_unreclaimable:1408kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Dec 15 19:02:18 teela kernel: lowmem_reserve[]: 0 813 3474 3474
Dec 15 19:02:18 teela kernel: Normal free:41332kB min:41368kB low:51708kB high:62048kB active_anon:0kB inactive_anon:0kB active_file:532748kB inactive_file:44kB unevictable:0kB writepending:24kB present:897016kB managed:836248kB mlocked:0kB slab_reclaimable:159448kB slab_unreclaimable:69608kB kernel_stack:1112kB pagetables:1404kB bounce:0kB free_pcp:528kB local_pcp:340kB free_cma:0kB
Dec 15 19:02:18 teela kernel: lowmem_reserve[]: 0 0 21292 21292
Dec 15 19:02:18 teela kernel: HighMem free:781660kB min:512kB low:34356kB high:68200kB active_anon:234740kB inactive_anon:360kB active_file:557232kB inactive_file:1127804kB unevictable:0kB writepending:2592kB present:2725384kB managed:2725384kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:800kB local_pcp:608kB free_cma:0kB
Dec 15 19:02:18 teela kernel: lowmem_reserve[]: 0 0 0 0
Dec 15 19:02:18 teela kernel: DMA: 0*4kB 2*8kB (ME) 5*16kB (UME) 13*32kB (UM) 11*64kB (UME) 3*128kB (UM) 1*256kB (M) 2*512kB (E) 1*1024kB (M) 0*2048kB 0*4096kB = 3904kB
Dec 15 19:02:18 teela kernel: Normal: 27*4kB (ME) 25*8kB (UME) 442*16kB (UME) 189*32kB (UME) 411*64kB (UME) 13*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 41396kB
Dec 15 19:02:18 teela kernel: HighMem: 1*4kB (M) 11*8kB (U) 2*16kB (U) 3*32kB (UM) 16*64kB (U) 1*128kB (M) 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 190*4096kB (UM) = 781660kB
Dec 15 19:02:18 teela kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec 15 19:02:18 teela kernel: 556515 total pagecache pages
Dec 15 19:02:18 teela kernel: 0 pages in swap cache
Dec 15 19:02:18 teela kernel: Swap cache stats: add 0, delete 0, find 0/0
Dec 15 19:02:18 teela kernel: Free swap  = 8191996kB
Dec 15 19:02:18 teela kernel: Total swap = 8191996kB
Dec 15 19:02:18 teela kernel: 909598 pages RAM
Dec 15 19:02:18 teela kernel: 681346 pages HighMem/MovableOnly
Dec 15 19:02:18 teela kernel: 15211 pages reserved
Dec 15 19:02:18 teela kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Dec 15 19:02:18 teela kernel: [ 1888]     0  1888     6161     1112      10       3        0             0 systemd-journal
Dec 15 19:02:18 teela kernel: [ 2508]     0  2508     2959      945       6       3        0         -1000 systemd-udevd
Dec 15 19:02:18 teela kernel: [ 2610]   105  2610     3870      899       8       3        0             0 systemd-timesyn
Dec 15 19:02:18 teela kernel: [ 2613]     0  2613     6300      948      10       3        0             0 rsyslogd
Dec 15 19:02:18 teela kernel: [ 2615]    88  2615     1158      568       6       3        0             0 nullmailer-send
Dec 15 19:02:18 teela kernel: [ 2618]     0  2618     1514     1027       7       3        0             0 systemd-logind
Dec 15 19:02:18 teela kernel: [ 2619]   101  2619     1266      847       6       3        0          -900 dbus-daemon
Dec 15 19:02:18 teela kernel: [ 2620]     0  2620      622      300       5       3        0             0 atd
Dec 15 19:02:18 teela kernel: [ 2628]     0  2628    26097     3193      27       3        0             0 NetworkManager
Dec 15 19:02:18 teela kernel: [ 2647]     0  2647     1511      458       5       3        0             0 fcron
Dec 15 19:02:18 teela kernel: [ 2673]     0  2673      750      543       6       3        0             0 dhcpcd
Dec 15 19:02:18 teela kernel: [ 2676]     0  2676      638      447       5       3        0             0 vnstatd
Dec 15 19:02:18 teela kernel: [ 2690]     0  2690     1457     1061       6       3        0         -1000 sshd
Dec 15 19:02:18 teela kernel: [ 2716]   106  2716    16384     4239      20       3        0             0 polkitd
Dec 15 19:02:18 teela kernel: [ 2717]     0  2717     2145     1360       7       3        0             0 wpa_supplicant
Dec 15 19:02:18 teela kernel: [ 2947]     0  2947     1794      775       7       3        0             0 screen
Dec 15 19:02:18 teela kernel: [ 2950]     0  2950     1831      913       7       3        0             0 bash
Dec 15 19:02:18 teela kernel: [ 2954]     0  2954    37411    36100      79       3        0             0 emerge
Dec 15 19:02:18 teela kernel: [ 2970]     0  2970     1152      507       6       3        0             0 agetty
Dec 15 19:02:18 teela kernel: [ 3897]   250  3897      548      358       5       3        0             0 sandbox
Dec 15 19:02:18 teela kernel: [ 3906]   250  3906     2625     1584       8       3        0             0 ebuild.sh
Dec 15 19:02:18 teela kernel: [ 3926]   250  3926     2657     1370       7       3        0             0 ebuild.sh
Dec 15 19:02:18 teela kernel: [ 3935]   250  3935    17160    16891      37       3        0             0 xz
Dec 15 19:02:18 teela kernel: [ 3936]   250  3936      799      510       5       3        0             0 tar
Dec 15 19:02:18 teela kernel: [ 4117]     0  4117     2598     1389       9       3        0             0 sshd
Dec 15 19:02:18 teela kernel: [ 4119]     0  4119     1964     1243       7       3        0             0 systemd
Dec 15 19:02:18 teela kernel: [ 4144]     0  4144     6645      632      10       3        0             0 (sd-pam)
Dec 15 19:02:18 teela kernel: [ 4163]     0  4163     1830      909       7       3        0             0 bash
Dec 15 19:02:18 teela kernel: [ 4182]     0  4182     1695      684       7       3        0             0 screen
Dec 15 19:02:18 teela kernel: [ 4221]     0  4221     1831      893       7       3        0             0 bash
Dec 15 19:02:18 teela kernel: Out of memory: Kill process 2954 (emerge) score 11 or sacrifice child
Dec 15 19:02:18 teela kernel: Killed process 3897 (sandbox) total-vm:2192kB, anon-rss:128kB, file-rss:1304kB, shmem-rss:0kB
Dec 15 19:02:18 teela kernel: bash invoked oom-killer: gfp_mask=0x27080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=0, order=1, oom_score_adj=0
Dec 15 19:02:18 teela kernel: bash cpuset=/ mems_allowed=0
Dec 15 19:02:18 teela kernel: CPU: 0 PID: 4221 Comm: bash Not tainted 4.9.0-gentoo #2
Dec 15 19:02:18 teela kernel: Hardware name: Hewlett-Packard Compaq 15 Notebook PC/21F7, BIOS F.22 08/06/2014
Dec 15 19:02:18 teela kernel:  c5187d68 c142bcce c5187e98 00000000 c5187d98 c1163332 00000000 00200282
Dec 15 19:02:18 teela kernel:  c5187d98 c1431876 c5187d9c e7fb0b00 e7fa2900 e7fa2900 c1b58785 c5187e98
Dec 15 19:02:18 teela kernel:  c5187ddc c110795f c1043895 c5187dc8 c11075c7 00000007 00000000 00000000
Dec 15 19:02:18 teela kernel: Call Trace:
Dec 15 19:02:18 teela kernel:  [<c142bcce>] dump_stack+0x47/0x69
Dec 15 19:02:18 teela kernel:  [<c1163332>] dump_header+0x60/0x178
Dec 15 19:02:18 teela kernel:  [<c1431876>] ? ___ratelimit+0x86/0xe0
Dec 15 19:02:18 teela kernel:  [<c110795f>] oom_kill_process+0x20f/0x3d0
Dec 15 19:02:18 teela kernel:  [<c1043895>] ? has_capability_noaudit+0x15/0x20
Dec 15 19:02:18 teela kernel:  [<c11075c7>] ? oom_badness.part.13+0xb7/0x130
Dec 15 19:02:18 teela kernel:  [<c1107df9>] out_of_memory+0xd9/0x260
Dec 15 19:02:18 teela kernel:  [<c110b989>] __alloc_pages_nodemask+0xb79/0xc80
Dec 15 19:02:18 teela kernel:  [<c1038a05>] copy_process.part.51+0xe5/0x1420
Dec 15 19:02:18 teela kernel:  [<c1150377>] ? kmem_cache_alloc+0xb7/0x190
Dec 15 19:02:18 teela kernel:  [<c1039ee7>] _do_fork+0xc7/0x360
Dec 15 19:02:18 teela kernel:  [<c1182a4b>] ? fd_install+0x1b/0x20
Dec 15 19:02:18 teela kernel:  [<c103a247>] SyS_clone+0x27/0x30
Dec 15 19:02:18 teela kernel:  [<c10018bc>] do_fast_syscall_32+0x7c/0x130
Dec 15 19:02:18 teela kernel:  [<c19b5d2b>] sysenter_past_esp+0x40/0x6a
Dec 15 19:02:18 teela kernel: Mem-Info:
Dec 15 19:02:18 teela kernel: active_anon:57050 inactive_anon:90 isolated_anon:0
                               active_file:274371 inactive_file:281954 isolated_file:0
                               unevictable:0 dirty:616 writeback:0 unstable:0
                               slab_reclaimable:40669 slab_unreclaimable:17758
                               mapped:7370 shmem:202 pagetables:346 bounce:0
                               free:208199 free_pcp:501 free_cma:0
Dec 15 19:02:18 teela kernel: Node 0 active_anon:228200kB inactive_anon:360kB active_file:1097484kB inactive_file:1127816kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:29480kB dirty:2464kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 184320kB anon_thp: 808kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
Dec 15 19:02:18 teela kernel: DMA free:3904kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:7356kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:3208kB slab_unreclaimable:1448kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Dec 15 19:02:18 teela kernel: lowmem_reserve[]: 0 813 3474 3474
Dec 15 19:02:18 teela kernel: Normal free:41280kB min:41368kB low:51708kB high:62048kB active_anon:0kB inactive_anon:0kB active_file:532796kB inactive_file:44kB unevictable:0kB writepending:144kB present:897016kB managed:836248kB mlocked:0kB slab_reclaimable:159468kB slab_unreclaimable:69584kB kernel_stack:1104kB pagetables:1384kB bounce:0kB free_pcp:628kB local_pcp:188kB free_cma:0kB
Dec 15 19:02:18 teela kernel: lowmem_reserve[]: 0 0 21292 21292
Dec 15 19:02:18 teela kernel: HighMem free:787612kB min:512kB low:34356kB high:68200kB active_anon:228200kB inactive_anon:360kB active_file:557332kB inactive_file:1127772kB unevictable:0kB writepending:2320kB present:2725384kB managed:2725384kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:1376kB local_pcp:648kB free_cma:0kB
Dec 15 19:02:18 teela kernel: lowmem_reserve[]: 0 0 0 0
Dec 15 19:02:18 teela kernel: DMA: 0*4kB 2*8kB (ME) 5*16kB (UME) 13*32kB (UM) 11*64kB (UME) 3*128kB (UM) 1*256kB (M) 2*512kB (E) 1*1024kB (M) 0*2048kB 0*4096kB = 3904kB
Dec 15 19:02:18 teela kernel: Normal: 26*4kB (M) 25*8kB (UM) 441*16kB (UM) 188*32kB (UM) 410*64kB (UME) 13*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 41280kB
Dec 15 19:02:18 teela kernel: HighMem: 37*4kB (UM) 21*8kB (UM) 6*16kB (UM) 8*32kB (UM) 22*64kB (UM) 5*128kB (M) 4*256kB (M) 3*512kB (M) 2*1024kB (M) 1*2048kB (M) 190*4096kB (UM) = 787612kB
Dec 15 19:02:18 teela kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec 15 19:02:18 teela kernel: 556527 total pagecache pages
Dec 15 19:02:18 teela kernel: 0 pages in swap cache
Dec 15 19:02:18 teela kernel: Swap cache stats: add 0, delete 0, find 0/0
Dec 15 19:02:18 teela kernel: Free swap  = 8191996kB
Dec 15 19:02:18 teela kernel: Total swap = 8191996kB
Dec 15 19:02:18 teela kernel: 909598 pages RAM
Dec 15 19:02:18 teela kernel: 681346 pages HighMem/MovableOnly
Dec 15 19:02:18 teela kernel: 15211 pages reserved
Dec 15 19:02:18 teela kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Dec 15 19:02:18 teela kernel: [ 1888]     0  1888     6161     1112      10       3        0             0 systemd-journal
Dec 15 19:02:18 teela kernel: [ 2508]     0  2508     2959      945       6       3        0         -1000 systemd-udevd
Dec 15 19:02:18 teela kernel: [ 2610]   105  2610     3870      899       8       3        0             0 systemd-timesyn
Dec 15 19:02:18 teela kernel: [ 2613]     0  2613     6300      951      10       3        0             0 rsyslogd
Dec 15 19:02:18 teela kernel: [ 2615]    88  2615     1158      568       6       3        0             0 nullmailer-send
Dec 15 19:02:18 teela kernel: [ 2618]     0  2618     1514     1027       7       3        0             0 systemd-logind
Dec 15 19:02:18 teela kernel: [ 2619]   101  2619     1266      847       6       3        0          -900 dbus-daemon
Dec 15 19:02:18 teela kernel: [ 2620]     0  2620      622      300       5       3        0             0 atd
Dec 15 19:02:18 teela kernel: [ 2628]     0  2628    26097     3193      27       3        0             0 NetworkManager
Dec 15 19:02:18 teela kernel: [ 2647]     0  2647     1511      458       5       3        0             0 fcron
Dec 15 19:02:18 teela kernel: [ 2673]     0  2673      750      543       6       3        0             0 dhcpcd
Dec 15 19:02:18 teela kernel: [ 2676]     0  2676      638      447       5       3        0             0 vnstatd
Dec 15 19:02:18 teela kernel: [ 2690]     0  2690     1457     1061       6       3        0         -1000 sshd
Dec 15 19:02:18 teela kernel: [ 2716]   106  2716    16384     4239      20       3        0             0 polkitd
Dec 15 19:02:18 teela kernel: [ 2717]     0  2717     2145     1360       7       3        0             0 wpa_supplicant
Dec 15 19:02:18 teela kernel: [ 2947]     0  2947     1794      775       7       3        0             0 screen
Dec 15 19:02:18 teela kernel: [ 2950]     0  2950     1831      913       7       3        0             0 bash
Dec 15 19:02:18 teela kernel: [ 2954]     0  2954    37411    36100      79       3        0             0 emerge
Dec 15 19:02:18 teela kernel: [ 2970]     0  2970     1152      507       6       3        0             0 agetty
Dec 15 19:02:18 teela kernel: [ 3906]   250  3906     2625     1584       8       3        0             0 ebuild.sh
Dec 15 19:02:18 teela kernel: [ 3926]   250  3926     2657     1370       7       3        0             0 ebuild.sh
Dec 15 19:02:18 teela kernel: [ 3935]   250  3935    17160    16891      37       3        0             0 xz
Dec 15 19:02:18 teela kernel: [ 3936]   250  3936      799      510       5       3        0             0 tar
Dec 15 19:02:18 teela kernel: [ 4117]     0  4117     2598     1389       9       3        0             0 sshd
Dec 15 19:02:18 teela kernel: [ 4119]     0  4119     1964     1243       7       3        0             0 systemd
Dec 15 19:02:18 teela kernel: [ 4144]     0  4144     6645      632      10       3        0             0 (sd-pam)
Dec 15 19:02:18 teela kernel: [ 4163]     0  4163     1830      909       7       3        0             0 bash
Dec 15 19:02:18 teela kernel: [ 4182]     0  4182     1695      684       7       3        0             0 screen
Dec 15 19:02:18 teela kernel: [ 4221]     0  4221     1831      893       7       3        0             0 bash
Dec 15 19:02:18 teela kernel: Out of memory: Kill process 2954 (emerge) score 11 or sacrifice child
Dec 15 19:02:18 teela kernel: Killed process 2954 (emerge) total-vm:149644kB, anon-rss:137136kB, file-rss:7264kB, shmem-rss:0kB
Dec 15 19:02:18 teela kernel: bash invoked oom-killer: gfp_mask=0x27080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=0, order=1, oom_score_adj=0
Dec 15 19:02:18 teela kernel: bash cpuset=/ mems_allowed=0
Dec 15 19:02:18 teela kernel: CPU: 0 PID: 4221 Comm: bash Not tainted 4.9.0-gentoo #2
Dec 15 19:02:18 teela kernel: Hardware name: Hewlett-Packard Compaq 15 Notebook PC/21F7, BIOS F.22 08/06/2014
Dec 15 19:02:18 teela kernel:  c5187d68 c142bcce c5187e98 00000000 c5187d98 c1163332 00000000 00000282
Dec 15 19:02:18 teela kernel:  c5187d98 c1431876 c5187d9c e7f9e100 e7f947c0 e7f947c0 c1b58785 c5187e98
Dec 15 19:02:18 teela kernel:  c5187ddc c110795f c1043895 c5187dc8 c11075c7 00000007 00000000 00000000
Dec 15 19:02:18 teela kernel: Call Trace:
Dec 15 19:02:18 teela kernel:  [<c142bcce>] dump_stack+0x47/0x69
Dec 15 19:02:18 teela kernel:  [<c1163332>] dump_header+0x60/0x178
Dec 15 19:02:18 teela kernel:  [<c1431876>] ? ___ratelimit+0x86/0xe0
Dec 15 19:02:18 teela kernel:  [<c110795f>] oom_kill_process+0x20f/0x3d0
Dec 15 19:02:18 teela kernel:  [<c1043895>] ? has_capability_noaudit+0x15/0x20
Dec 15 19:02:18 teela kernel:  [<c11075c7>] ? oom_badness.part.13+0xb7/0x130
Dec 15 19:02:18 teela kernel:  [<c1107df9>] out_of_memory+0xd9/0x260
Dec 15 19:02:18 teela kernel:  [<c110b989>] __alloc_pages_nodemask+0xb79/0xc80
Dec 15 19:02:18 teela kernel:  [<c1038a05>] copy_process.part.51+0xe5/0x1420
Dec 15 19:02:18 teela kernel:  [<c1150377>] ? kmem_cache_alloc+0xb7/0x190
Dec 15 19:02:18 teela kernel:  [<c1039ee7>] _do_fork+0xc7/0x360
Dec 15 19:02:18 teela kernel:  [<c1182a4b>] ? fd_install+0x1b/0x20
Dec 15 19:02:18 teela kernel:  [<c103a247>] SyS_clone+0x27/0x30
Dec 15 19:02:18 teela kernel:  [<c10018bc>] do_fast_syscall_32+0x7c/0x130
Dec 15 19:02:18 teela kernel:  [<c19b5d2b>] sysenter_past_esp+0x40/0x6a
Dec 15 19:02:18 teela kernel: Mem-Info:
Dec 15 19:02:18 teela kernel: active_anon:22769 inactive_anon:90 isolated_anon:0
                               active_file:274396 inactive_file:281929 isolated_file:0
                               unevictable:0 dirty:616 writeback:0 unstable:0
                               slab_reclaimable:40669 slab_unreclaimable:17741
                               mapped:6595 shmem:202 pagetables:271 bounce:0
                               free:242474 free_pcp:608 free_cma:0
Dec 15 19:02:18 teela kernel: Node 0 active_anon:91076kB inactive_anon:360kB active_file:1097584kB inactive_file:1127716kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:26380kB dirty:2464kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 108544kB anon_thp: 808kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
Dec 15 19:02:18 teela kernel: DMA free:3904kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:7356kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:3208kB slab_unreclaimable:1448kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Dec 15 19:02:18 teela kernel: lowmem_reserve[]: 0 813 3474 3474
Dec 15 19:02:18 teela kernel: Normal free:41280kB min:41368kB low:51708kB high:62048kB active_anon:0kB inactive_anon:0kB active_file:532796kB inactive_file:44kB unevictable:0kB writepending:144kB present:897016kB managed:836248kB mlocked:0kB slab_reclaimable:159468kB slab_unreclaimable:69516kB kernel_stack:1104kB pagetables:1084kB bounce:0kB free_pcp:1048kB local_pcp:608kB free_cma:0kB
Dec 15 19:02:18 teela kernel: lowmem_reserve[]: 0 0 21292 21292
Dec 15 19:02:18 teela kernel: HighMem free:924712kB min:512kB low:34356kB high:68200kB active_anon:91076kB inactive_anon:360kB active_file:557432kB inactive_file:1127672kB unevictable:0kB writepending:2320kB present:2725384kB managed:2725384kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:1384kB local_pcp:656kB free_cma:0kB
Dec 15 19:02:18 teela kernel: lowmem_reserve[]: 0 0 0 0
Dec 15 19:02:18 teela kernel: DMA: 0*4kB 2*8kB (ME) 5*16kB (UME) 13*32kB (UM) 11*64kB (UME) 3*128kB (UM) 1*256kB (M) 2*512kB (E) 1*1024kB (M) 0*2048kB 0*4096kB = 3904kB
Dec 15 19:02:18 teela kernel: Normal: 26*4kB (M) 26*8kB (UM) 441*16kB (UM) 188*32kB (UM) 410*64kB (UME) 13*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 41288kB
Dec 15 19:02:18 teela kernel: HighMem: 1518*4kB (UM) 608*8kB (UM) 155*16kB (UM) 67*32kB (UM) 34*64kB (UM) 6*128kB (M) 2*256kB (M) 3*512kB (M) 1*1024kB (M) 43*2048kB (M) 199*4096kB (UM) = 924744kB
Dec 15 19:02:18 teela kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec 15 19:02:18 teela kernel: 556527 total pagecache pages
Dec 15 19:02:18 teela kernel: 0 pages in swap cache
Dec 15 19:02:18 teela kernel: Swap cache stats: add 0, delete 0, find 0/0
Dec 15 19:02:18 teela kernel: Free swap  = 8191996kB
Dec 15 19:02:18 teela kernel: Total swap = 8191996kB
Dec 15 19:02:18 teela kernel: 909598 pages RAM
Dec 15 19:02:18 teela kernel: 681346 pages HighMem/MovableOnly
Dec 15 19:02:18 teela kernel: 15211 pages reserved
Dec 15 19:02:18 teela kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Dec 15 19:02:18 teela kernel: [ 1888]     0  1888     6161     1112      10       3        0             0 systemd-journal
Dec 15 19:02:18 teela kernel: [ 2508]     0  2508     2959      945       6       3        0         -1000 systemd-udevd
Dec 15 19:02:18 teela kernel: [ 2610]   105  2610     3870      899       8       3        0             0 systemd-timesyn
Dec 15 19:02:18 teela kernel: [ 2613]     0  2613     6300      951      10       3        0             0 rsyslogd
Dec 15 19:02:18 teela kernel: [ 2615]    88  2615     1158      568       6       3        0             0 nullmailer-send
Dec 15 19:02:18 teela kernel: [ 2618]     0  2618     1514     1027       7       3        0             0 systemd-logind
Dec 15 19:02:18 teela kernel: [ 2619]   101  2619     1266      847       6       3        0          -900 dbus-daemon
Dec 15 19:02:18 teela kernel: [ 2620]     0  2620      622      300       5       3        0             0 atd
Dec 15 19:02:18 teela kernel: [ 2628]     0  2628    26097     3193      27       3        0             0 NetworkManager
Dec 15 19:02:18 teela kernel: [ 2647]     0  2647     1511      458       5       3        0             0 fcron
Dec 15 19:02:18 teela kernel: [ 2673]     0  2673      750      543       6       3        0             0 dhcpcd
Dec 15 19:02:18 teela kernel: [ 2676]     0  2676      638      447       5       3        0             0 vnstatd
Dec 15 19:02:18 teela kernel: [ 2690]     0  2690     1457     1061       6       3        0         -1000 sshd
Dec 15 19:02:18 teela kernel: [ 2716]   106  2716    16384     4239      20       3        0             0 polkitd
Dec 15 19:02:18 teela kernel: [ 2717]     0  2717     2145     1360       7       3        0             0 wpa_supplicant
Dec 15 19:02:18 teela kernel: [ 2947]     0  2947     1794      775       7       3        0             0 screen
Dec 15 19:02:18 teela kernel: [ 2950]     0  2950     1831      915       7       3        0             0 bash
Dec 15 19:02:18 teela kernel: [ 2970]     0  2970     1152      507       6       3        0             0 agetty
Dec 15 19:02:18 teela kernel: [ 3906]   250  3906     2625     1584       8       3        0             0 ebuild.sh
Dec 15 19:02:18 teela kernel: [ 3926]   250  3926     2657     1370       7       3        0             0 ebuild.sh
Dec 15 19:02:18 teela kernel: [ 3935]   250  3935    17160    16891      37       3        0             0 xz
Dec 15 19:02:18 teela kernel: [ 3936]   250  3936      799      510       5       3        0             0 tar
Dec 15 19:02:18 teela kernel: [ 4117]     0  4117     2598     1389       9       3        0             0 sshd
Dec 15 19:02:18 teela kernel: [ 4119]     0  4119     1964     1243       7       3        0             0 systemd
Dec 15 19:02:18 teela kernel: [ 4144]     0  4144     6645      632      10       3        0             0 (sd-pam)
Dec 15 19:02:18 teela kernel: [ 4163]     0  4163     1830      909       7       3        0             0 bash
Dec 15 19:02:18 teela kernel: [ 4182]     0  4182     1695      684       7       3        0             0 screen
Dec 15 19:02:18 teela kernel: [ 4221]     0  4221     1831      893       7       3        0             0 bash
Dec 15 19:02:18 teela kernel: Out of memory: Kill process 3935 (xz) score 5 or sacrifice child
Dec 15 19:02:18 teela kernel: Killed process 3935 (xz) total-vm:68640kB, anon-rss:65928kB, file-rss:1636kB, shmem-rss:0kB
Dec 15 19:02:18 teela kernel: ebuild.sh invoked oom-killer: gfp_mask=0x27080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=0, order=1, oom_score_adj=0
Dec 15 19:02:18 teela kernel: ebuild.sh cpuset=/ mems_allowed=0
Dec 15 19:02:18 teela kernel: CPU: 0 PID: 3926 Comm: ebuild.sh Not tainted 4.9.0-gentoo #2
Dec 15 19:02:18 teela kernel: Hardware name: Hewlett-Packard Compaq 15 Notebook PC/21F7, BIOS F.22 08/06/2014
Dec 15 19:02:18 teela kernel:  d3473d68 c142bcce d3473e98 00000000 d3473d98 c1163332 00000000 00000282
Dec 15 19:02:18 teela kernel:  d3473d98 c1431876 d3473d9c f12463c0 f25ea900 f25ea900 c1b58785 d3473e98
Dec 15 19:02:18 teela kernel:  d3473ddc c110795f c1043895 d3473dc8 c11075c7 00000006 00000000 00000000
Dec 15 19:02:18 teela kernel: Call Trace:
Dec 15 19:02:18 teela kernel:  [<c142bcce>] dump_stack+0x47/0x69
Dec 15 19:02:18 teela kernel:  [<c1163332>] dump_header+0x60/0x178
Dec 15 19:02:18 teela kernel:  [<c1431876>] ? ___ratelimit+0x86/0xe0
Dec 15 19:02:18 teela kernel:  [<c110795f>] oom_kill_process+0x20f/0x3d0
Dec 15 19:02:18 teela kernel:  [<c1043895>] ? has_capability_noaudit+0x15/0x20
Dec 15 19:02:18 teela kernel:  [<c11075c7>] ? oom_badness.part.13+0xb7/0x130
Dec 15 19:02:18 teela kernel:  [<c1107df9>] out_of_memory+0xd9/0x260
Dec 15 19:02:18 teela kernel:  [<c110b989>] __alloc_pages_nodemask+0xb79/0xc80
Dec 15 19:02:18 teela kernel:  [<c1151a00>] ? __kmem_cache_shutdown+0x220/0x290
Dec 15 19:02:18 teela kernel:  [<c1038a05>] copy_process.part.51+0xe5/0x1420
Dec 15 19:02:18 teela kernel:  [<c1150377>] ? kmem_cache_alloc+0xb7/0x190
Dec 15 19:02:18 teela kernel:  [<c1039ee7>] _do_fork+0xc7/0x360
Dec 15 19:02:18 teela kernel:  [<c1438c64>] ? _copy_to_user+0x44/0x60
Dec 15 19:02:18 teela kernel:  [<c103a247>] SyS_clone+0x27/0x30
Dec 15 19:02:18 teela kernel:  [<c10018bc>] do_fast_syscall_32+0x7c/0x130
Dec 15 19:02:18 teela kernel:  [<c19b5d2b>] sysenter_past_esp+0x40/0x6a
Dec 15 19:02:18 teela kernel: Mem-Info:
Dec 15 19:02:18 teela kernel: active_anon:6238 inactive_anon:90 isolated_anon:0
                               active_file:274469 inactive_file:281903 isolated_file:0
                               unevictable:0 dirty:557 writeback:255 unstable:0
                               slab_reclaimable:40673 slab_unreclaimable:17738
                               mapped:6479 shmem:202 pagetables:238 bounce:0
                               free:258997 free_pcp:617 free_cma:0
Dec 15 19:02:18 teela kernel: Node 0 active_anon:24952kB inactive_anon:360kB active_file:1097876kB inactive_file:1127612kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:25916kB dirty:2228kB writeback:1020kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 6144kB anon_thp: 808kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
Dec 15 19:02:18 teela kernel: DMA free:3904kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:7356kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:3208kB slab_unreclaimable:1448kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Dec 15 19:02:18 teela kernel: lowmem_reserve[]: 0 813 3474 3474
Dec 15 19:02:18 teela kernel: Normal free:41272kB min:41368kB low:51708kB high:62048kB active_anon:0kB inactive_anon:0kB active_file:532844kB inactive_file:48kB unevictable:0kB writepending:400kB present:897016kB managed:836248kB mlocked:0kB slab_reclaimable:159484kB slab_unreclaimable:69504kB kernel_stack:1096kB pagetables:952kB bounce:0kB free_pcp:1128kB local_pcp:588kB free_cma:0kB
Dec 15 19:02:18 teela kernel: lowmem_reserve[]: 0 0 21292 21292
Dec 15 19:02:18 teela kernel: HighMem free:990812kB min:512kB low:34356kB high:68200kB active_anon:24952kB inactive_anon:360kB active_file:557676kB inactive_file:1127564kB unevictable:0kB writepending:2848kB present:2725384kB managed:2725384kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:1340kB local_pcp:688kB free_cma:0kB
Dec 15 19:02:18 teela kernel: lowmem_reserve[]: 0 0 0 0
Dec 15 19:02:18 teela kernel: DMA: 0*4kB 2*8kB (ME) 5*16kB (UME) 13*32kB (UM) 11*64kB (UME) 3*128kB (UM) 1*256kB (M) 2*512kB (E) 1*1024kB (M) 0*2048kB 0*4096kB = 3904kB
Dec 15 19:02:18 teela kernel: Normal: 30*4kB (UME) 31*8kB (UM) 437*16kB (UME) 188*32kB (UM) 410*64kB (UME) 13*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 41280kB
Dec 15 19:02:18 teela kernel: HighMem: 1621*4kB (UM) 660*8kB (UM) 184*16kB (UM) 90*32kB (UM) 41*64kB (UM) 7*128kB (M) 2*256kB (M) 3*512kB (M) 1*1024kB (M) 50*2048kB (M) 211*4096kB (UM) = 990836kB
Dec 15 19:02:18 teela kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec 15 19:02:18 teela kernel: 556574 total pagecache pages
Dec 15 19:02:18 teela kernel: 0 pages in swap cache
Dec 15 19:02:18 teela kernel: Swap cache stats: add 0, delete 0, find 0/0
Dec 15 19:02:18 teela kernel: Free swap  = 8191996kB
Dec 15 19:02:18 teela kernel: Total swap = 8191996kB
Dec 15 19:02:18 teela kernel: 909598 pages RAM
Dec 15 19:02:18 teela kernel: 681346 pages HighMem/MovableOnly
Dec 15 19:02:18 teela kernel: 15211 pages reserved
Dec 15 19:02:18 teela kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Dec 15 19:02:18 teela kernel: [ 1888]     0  1888     6161     1143      10       3        0             0 systemd-journal
Dec 15 19:02:18 teela kernel: [ 2508]     0  2508     2959      945       6       3        0         -1000 systemd-udevd
Dec 15 19:02:18 teela kernel: [ 2610]   105  2610     3870      899       8       3        0             0 systemd-timesyn
Dec 15 19:02:18 teela kernel: [ 2613]     0  2613     6300      956      10       3        0             0 rsyslogd
Dec 15 19:02:18 teela kernel: [ 2615]    88  2615     1158      568       6       3        0             0 nullmailer-send
Dec 15 19:02:18 teela kernel: [ 2618]     0  2618     1514     1027       7       3        0             0 systemd-logind
Dec 15 19:02:18 teela kernel: [ 2619]   101  2619     1266      847       6       3        0          -900 dbus-daemon
Dec 15 19:02:18 teela kernel: [ 2620]     0  2620      622      300       5       3        0             0 atd
Dec 15 19:02:26 teela kernel: [ 2628]     0  2628    26097     3193      27       3        0             0 NetworkManager
Dec 15 19:02:26 teela kernel: [ 2647]     0  2647     1511      458       5       3        0             0 fcron
Dec 15 19:02:26 teela kernel: [ 2673]     0  2673      750      543       6       3        0             0 dhcpcd
Dec 15 19:02:26 teela kernel: [ 2676]     0  2676      638      447       5       3        0             0 vnstatd
Dec 15 19:02:26 teela kernel: [ 2690]     0  2690     1457     1061       6       3        0         -1000 sshd
Dec 15 19:02:26 teela kernel: [ 2716]   106  2716    16384     4239      20       3        0             0 polkitd
Dec 15 19:02:26 teela kernel: [ 2717]     0  2717     2145     1360       7       3        0             0 wpa_supplicant
Dec 15 19:02:26 teela kernel: [ 2947]     0  2947     1794      775       7       3        0             0 screen
Dec 15 19:02:26 teela kernel: [ 2950]     0  2950     1831      915       7       3        0             0 bash
Dec 15 19:02:26 teela kernel: [ 2970]     0  2970     1152      507       6       3        0             0 agetty
Dec 15 19:02:26 teela kernel: [ 3906]   250  3906     2625     1584       8       3        0             0 ebuild.sh
Dec 15 19:02:26 teela kernel: [ 3926]   250  3926     2657     1377       7       3        0             0 ebuild.sh
Dec 15 19:02:26 teela kernel: [ 4117]     0  4117     2598     1389       9       3        0             0 sshd
Dec 15 19:02:26 teela kernel: [ 4119]     0  4119     1964     1243       7       3        0             0 systemd
Dec 15 19:02:26 teela kernel: [ 4144]     0  4144     6645      632      10       3        0             0 (sd-pam)
Dec 15 19:02:26 teela kernel: [ 4163]     0  4163     1830      909       7       3        0             0 bash
Dec 15 19:02:26 teela kernel: [ 4182]     0  4182     1695      684       7       3        0             0 screen
Dec 15 19:02:26 teela kernel: [ 4221]     0  4221     1831      893       7       3        0             0 bash
Dec 15 19:02:26 teela kernel: [ 4225]     0  4225     1831      400       6       3        0             0 bash

Greetings
Nils

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on 4.9
  2016-12-15 22:57 OOM: Better, but still there on 4.9 Nils Holland
@ 2016-12-16  7:39 ` Michal Hocko
  2016-12-16 15:58   ` OOM: Better, but still there on Michal Hocko
                     ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: Michal Hocko @ 2016-12-16  7:39 UTC (permalink / raw)
  To: Nils Holland
  Cc: linux-kernel, linux-mm, Chris Mason, David Sterba, linux-btrfs

[CC linux-mm and btrfs guys]

On Thu 15-12-16 23:57:04, Nils Holland wrote:
[...]
> Of course, none of this are workloads that are new / special in any
> way - prior to 4.8, I never experienced any issues doing the exact
> same things.
> 
> Dec 15 19:02:16 teela kernel: kworker/u4:5 invoked oom-killer: gfp_mask=0x2400840(GFP_NOFS|__GFP_NOFAIL), nodemask=0, order=0, oom_score_adj=0
> Dec 15 19:02:18 teela kernel: kworker/u4:5 cpuset=/ mems_allowed=0
> Dec 15 19:02:18 teela kernel: CPU: 1 PID: 2603 Comm: kworker/u4:5 Not tainted 4.9.0-gentoo #2
> Dec 15 19:02:18 teela kernel: Hardware name: Hewlett-Packard Compaq 15 Notebook PC/21F7, BIOS F.22 08/06/2014
> Dec 15 19:02:18 teela kernel: Workqueue: writeback wb_workfn (flush-btrfs-1)
> Dec 15 19:02:18 teela kernel:  eff0b604 c142bcce eff0b734 00000000 eff0b634 c1163332 00000000 00000292
> Dec 15 19:02:18 teela kernel:  eff0b634 c1431876 eff0b638 e7fb0b00 e7fa2900 e7fa2900 c1b58785 eff0b734
> Dec 15 19:02:18 teela kernel:  eff0b678 c110795f c1043895 eff0b664 c11075c7 00000007 00000000 00000000
> Dec 15 19:02:18 teela kernel: Call Trace:
> Dec 15 19:02:18 teela kernel:  [<c142bcce>] dump_stack+0x47/0x69
> Dec 15 19:02:18 teela kernel:  [<c1163332>] dump_header+0x60/0x178
> Dec 15 19:02:18 teela kernel:  [<c1431876>] ? ___ratelimit+0x86/0xe0
> Dec 15 19:02:18 teela kernel:  [<c110795f>] oom_kill_process+0x20f/0x3d0
> Dec 15 19:02:18 teela kernel:  [<c1043895>] ? has_capability_noaudit+0x15/0x20
> Dec 15 19:02:18 teela kernel:  [<c11075c7>] ? oom_badness.part.13+0xb7/0x130
> Dec 15 19:02:18 teela kernel:  [<c1107df9>] out_of_memory+0xd9/0x260
> Dec 15 19:02:18 teela kernel:  [<c110ba0b>] __alloc_pages_nodemask+0xbfb/0xc80
> Dec 15 19:02:18 teela kernel:  [<c110414d>] pagecache_get_page+0xad/0x270
> Dec 15 19:02:18 teela kernel:  [<c13664a6>] alloc_extent_buffer+0x116/0x3e0
> Dec 15 19:02:18 teela kernel:  [<c1334a2e>] btrfs_find_create_tree_block+0xe/0x10
> Dec 15 19:02:18 teela kernel:  [<c132a57f>] btrfs_alloc_tree_block+0x1ef/0x5f0
> Dec 15 19:02:18 teela kernel:  [<c130f7c3>] __btrfs_cow_block+0x143/0x5f0
> Dec 15 19:02:18 teela kernel:  [<c130fe1a>] btrfs_cow_block+0x13a/0x220
> Dec 15 19:02:18 teela kernel:  [<c13132f1>] btrfs_search_slot+0x1d1/0x870
> Dec 15 19:02:18 teela kernel:  [<c132fcdd>] btrfs_lookup_file_extent+0x4d/0x60
> Dec 15 19:02:18 teela kernel:  [<c1354fe6>] __btrfs_drop_extents+0x176/0x1070
> Dec 15 19:02:18 teela kernel:  [<c1150377>] ? kmem_cache_alloc+0xb7/0x190
> Dec 15 19:02:18 teela kernel:  [<c133dbb5>] ? start_transaction+0x65/0x4b0
> Dec 15 19:02:18 teela kernel:  [<c1150597>] ? __kmalloc+0x147/0x1e0
> Dec 15 19:02:18 teela kernel:  [<c1345005>] cow_file_range_inline+0x215/0x6b0
> Dec 15 19:02:18 teela kernel:  [<c13459fc>] cow_file_range.isra.49+0x55c/0x6d0
> Dec 15 19:02:18 teela kernel:  [<c1361795>] ? lock_extent_bits+0x75/0x1e0
> Dec 15 19:02:18 teela kernel:  [<c1346d51>] run_delalloc_range+0x441/0x470
> Dec 15 19:02:18 teela kernel:  [<c13626e4>] writepage_delalloc.isra.47+0x144/0x1e0
> Dec 15 19:02:18 teela kernel:  [<c1364548>] __extent_writepage+0xd8/0x2b0
> Dec 15 19:02:18 teela kernel:  [<c1365c4c>] extent_writepages+0x25c/0x380
> Dec 15 19:02:18 teela kernel:  [<c1342cd0>] ? btrfs_real_readdir+0x610/0x610
> Dec 15 19:02:18 teela kernel:  [<c133ff0f>] btrfs_writepages+0x1f/0x30
> Dec 15 19:02:18 teela kernel:  [<c110ff85>] do_writepages+0x15/0x40
> Dec 15 19:02:18 teela kernel:  [<c1190a95>] __writeback_single_inode+0x35/0x2f0
> Dec 15 19:02:18 teela kernel:  [<c119112e>] writeback_sb_inodes+0x16e/0x340
> Dec 15 19:02:18 teela kernel:  [<c119145a>] wb_writeback+0xaa/0x280
> Dec 15 19:02:18 teela kernel:  [<c1191de8>] wb_workfn+0xd8/0x3e0
> Dec 15 19:02:18 teela kernel:  [<c104fd34>] process_one_work+0x114/0x3e0
> Dec 15 19:02:18 teela kernel:  [<c1050b4f>] worker_thread+0x2f/0x4b0
> Dec 15 19:02:18 teela kernel:  [<c1050b20>] ? create_worker+0x180/0x180
> Dec 15 19:02:18 teela kernel:  [<c10552e7>] kthread+0x97/0xb0
> Dec 15 19:02:18 teela kernel:  [<c1055250>] ? __kthread_parkme+0x60/0x60
> Dec 15 19:02:18 teela kernel:  [<c19b5cb7>] ret_from_fork+0x1b/0x28
> Dec 15 19:02:18 teela kernel: Mem-Info:
> Dec 15 19:02:18 teela kernel: active_anon:58685 inactive_anon:90 isolated_anon:0
>                                active_file:274324 inactive_file:281962 isolated_file:0

OK, so there is still some anonymous memory that could be swapped out
and quite a lot of page cache. This might be harder to reclaim because
the allocation is a GFP_NOFS request which is limited in its reclaim
capabilities. It might be possible that those pagecache pages are pinned
in some way by the the filesystem.

>                                unevictable:0 dirty:649 writeback:0 unstable:0
>                                slab_reclaimable:40662 slab_unreclaimable:17754
>                                mapped:7382 shmem:202 pagetables:351 bounce:0
>                                free:206736 free_pcp:332 free_cma:0
> Dec 15 19:02:18 teela kernel: Node 0 active_anon:234740kB inactive_anon:360kB active_file:1097296kB inactive_file:1127848kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:29528kB dirty:2596kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 184320kB anon_thp: 808kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
> Dec 15 19:02:18 teela kernel: DMA free:3952kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:7316kB inactive_file:0kB unevictable:0kB writepending:96kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:3200kB slab_unreclaimable:1408kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> Dec 15 19:02:18 teela kernel: lowmem_reserve[]: 0 813 3474 3474
> Dec 15 19:02:18 teela kernel: Normal free:41332kB min:41368kB low:51708kB high:62048kB active_anon:0kB inactive_anon:0kB active_file:532748kB inactive_file:44kB unevictable:0kB writepending:24kB present:897016kB managed:836248kB mlocked:0kB slab_reclaimable:159448kB slab_unreclaimable:69608kB kernel_stack:1112kB pagetables:1404kB bounce:0kB free_pcp:528kB local_pcp:340kB free_cma:0kB

And this shows that there is no anonymous memory in the lowmem zone.
Note that this request cannot use the highmem zone so no swap out would
help. So if we are not able to reclaim those pages on the file LRU then
we are out of luck

> Dec 15 19:02:18 teela kernel: lowmem_reserve[]: 0 0 21292 21292
> Dec 15 19:02:18 teela kernel: HighMem free:781660kB min:512kB low:34356kB high:68200kB active_anon:234740kB inactive_anon:360kB active_file:557232kB inactive_file:1127804kB unevictable:0kB writepending:2592kB present:2725384kB managed:2725384kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:800kB local_pcp:608kB free_cma:0kB

That being said, the OOM killer invocation is clearly pointless and
pre-mature. We normally do not invoke it normally for GFP_NOFS requests
exactly for these reasons. But this is GFP_NOFS|__GFP_NOFAIL which
behaves differently. I am about to change that but my last attempt [1]
has to be rethought.

Now another thing is that the __GFP_NOFAIL which has this nasty side
effect has been introduced by me d1b5c5671d01 ("btrfs: Prevent from
early transaction abort") in 4.3 so I am quite surprised that this has
shown up only in 4.8. Anyway there might be some other changes in the
btrfs which could make it more subtle.

I believe the right way to go around this is to pursue what I've started
in [1]. I will try to prepare something for testing today for you. Stay
tuned. But I would be really happy if somebody from the btrfs camp could
check the NOFS aspect of this allocation. We have already seen
allocation stalls from this path quite recently

[1] http://lkml.kernel.org/r/20161201152517.27698-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/20161214101743.GA25578@dhcp22.suse.cz
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-16  7:39 ` Michal Hocko
@ 2016-12-16 15:58   ` Michal Hocko
  2016-12-16 15:58     ` [PATCH 1/2] mm: consolidate GFP_NOFAIL checks in the allocator slowpath Michal Hocko
                       ` (2 more replies)
  2016-12-16 18:15   ` OOM: Better, but still there on 4.9 Chris Mason
  2016-12-16 19:50   ` Chris Mason
  2 siblings, 3 replies; 62+ messages in thread
From: Michal Hocko @ 2016-12-16 15:58 UTC (permalink / raw)
  To: Nils Holland
  Cc: linux-kernel, linux-mm, Chris Mason, David Sterba, linux-btrfs

On Fri 16-12-16 08:39:41, Michal Hocko wrote:
[...]
> That being said, the OOM killer invocation is clearly pointless and
> pre-mature. We normally do not invoke it normally for GFP_NOFS requests
> exactly for these reasons. But this is GFP_NOFS|__GFP_NOFAIL which
> behaves differently. I am about to change that but my last attempt [1]
> has to be rethought.
> 
> Now another thing is that the __GFP_NOFAIL which has this nasty side
> effect has been introduced by me d1b5c5671d01 ("btrfs: Prevent from
> early transaction abort") in 4.3 so I am quite surprised that this has
> shown up only in 4.8. Anyway there might be some other changes in the
> btrfs which could make it more subtle.
> 
> I believe the right way to go around this is to pursue what I've started
> in [1]. I will try to prepare something for testing today for you. Stay
> tuned. But I would be really happy if somebody from the btrfs camp could
> check the NOFS aspect of this allocation. We have already seen
> allocation stalls from this path quite recently

Could you try to run with the two following patches?

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 1/2] mm: consolidate GFP_NOFAIL checks in the allocator slowpath
  2016-12-16 15:58   ` OOM: Better, but still there on Michal Hocko
@ 2016-12-16 15:58     ` Michal Hocko
  2016-12-16 15:58     ` [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically Michal Hocko
  2016-12-16 18:47     ` OOM: Better, but still there on Nils Holland
  2 siblings, 0 replies; 62+ messages in thread
From: Michal Hocko @ 2016-12-16 15:58 UTC (permalink / raw)
  To: Nils Holland
  Cc: linux-kernel, linux-mm, Chris Mason, David Sterba, linux-btrfs,
	Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Tetsuo Handa has pointed out that 0a0337e0d1d1 ("mm, oom: rework oom
detection") has subtly changed semantic for costly high order requests
with __GFP_NOFAIL and withtout __GFP_REPEAT and those can fail right now.
My code inspection didn't reveal any such users in the tree but it is
true that this might lead to unexpected allocation failures and
subsequent OOPs.

__alloc_pages_slowpath wrt. GFP_NOFAIL is hard to follow currently.
There are few special cases but we are lacking a catch all place to be
sure we will not miss any case where the non failing allocation might
fail. This patch reorganizes the code a bit and puts all those special
cases under nopage label which is the generic go-to-fail path. Non
failing allocations are retried or those that cannot retry like
non-sleeping allocation go to the failure point directly. This should
make the code flow much easier to follow and make it less error prone
for future changes.

While we are there we have to move the stall check up to catch
potentially looping non-failing allocations.

Changes since v1
- do not skip direct reclaim for TIF_MEMDIE && GFP_NOFAIL as per Hillf
- do not skip __alloc_pages_may_oom for TIF_MEMDIE && GFP_NOFAIL as
  per Tetsuo

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
---
 mm/page_alloc.c | 75 +++++++++++++++++++++++++++++++++------------------------
 1 file changed, 44 insertions(+), 31 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3f2c9e535f7f..095e2fa286de 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3640,35 +3640,21 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		goto got_pg;
 
 	/* Caller is not willing to reclaim, we can't balance anything */
-	if (!can_direct_reclaim) {
-		/*
-		 * All existing users of the __GFP_NOFAIL are blockable, so warn
-		 * of any new users that actually allow this type of allocation
-		 * to fail.
-		 */
-		WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
+	if (!can_direct_reclaim)
 		goto nopage;
-	}
 
-	/* Avoid recursion of direct reclaim */
-	if (current->flags & PF_MEMALLOC) {
-		/*
-		 * __GFP_NOFAIL request from this context is rather bizarre
-		 * because we cannot reclaim anything and only can loop waiting
-		 * for somebody to do a work for us.
-		 */
-		if (WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
-			cond_resched();
-			goto retry;
-		}
-		goto nopage;
+	/* Make sure we know about allocations which stall for too long */
+	if (time_after(jiffies, alloc_start + stall_timeout)) {
+		warn_alloc(gfp_mask,
+			"page alloction stalls for %ums, order:%u",
+			jiffies_to_msecs(jiffies-alloc_start), order);
+		stall_timeout += 10 * HZ;
 	}
 
-	/* Avoid allocations with no watermarks from looping endlessly */
-	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
+	/* Avoid recursion of direct reclaim */
+	if (current->flags & PF_MEMALLOC)
 		goto nopage;
 
-
 	/* Try direct reclaim and then allocating */
 	page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac,
 							&did_some_progress);
@@ -3692,14 +3678,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
 		goto nopage;
 
-	/* Make sure we know about allocations which stall for too long */
-	if (time_after(jiffies, alloc_start + stall_timeout)) {
-		warn_alloc(gfp_mask,
-			"page allocation stalls for %ums, order:%u",
-			jiffies_to_msecs(jiffies-alloc_start), order);
-		stall_timeout += 10 * HZ;
-	}
-
 	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
 				 did_some_progress > 0, &no_progress_loops))
 		goto retry;
@@ -3721,6 +3699,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (page)
 		goto got_pg;
 
+	/* Avoid allocations with no watermarks from looping endlessly */
+	if (test_thread_flag(TIF_MEMDIE))
+		goto nopage;
+
 	/* Retry as long as the OOM killer is making progress */
 	if (did_some_progress) {
 		no_progress_loops = 0;
@@ -3728,6 +3710,37 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	}
 
 nopage:
+	/*
+	 * Make sure that __GFP_NOFAIL request doesn't leak out and make sure
+	 * we always retry
+	 */
+	if (gfp_mask & __GFP_NOFAIL) {
+		/*
+		 * All existing users of the __GFP_NOFAIL are blockable, so warn
+		 * of any new users that actually require GFP_NOWAIT
+		 */
+		if (WARN_ON_ONCE(!can_direct_reclaim))
+			goto fail;
+
+		/*
+		 * PF_MEMALLOC request from this context is rather bizarre
+		 * because we cannot reclaim anything and only can loop waiting
+		 * for somebody to do a work for us
+		 */
+		WARN_ON_ONCE(current->flags & PF_MEMALLOC);
+
+		/*
+		 * non failing costly orders are a hard requirement which we
+		 * are not prepared for much so let's warn about these users
+		 * so that we can identify them and convert them to something
+		 * else.
+		 */
+		WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER);
+
+		cond_resched();
+		goto retry;
+	}
+fail:
 	warn_alloc(gfp_mask,
 			"page allocation failure: order:%u", order);
 got_pg:
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically
  2016-12-16 15:58   ` OOM: Better, but still there on Michal Hocko
  2016-12-16 15:58     ` [PATCH 1/2] mm: consolidate GFP_NOFAIL checks in the allocator slowpath Michal Hocko
@ 2016-12-16 15:58     ` Michal Hocko
  2016-12-16 17:31       ` Johannes Weiner
  2016-12-16 18:47     ` OOM: Better, but still there on Nils Holland
  2 siblings, 1 reply; 62+ messages in thread
From: Michal Hocko @ 2016-12-16 15:58 UTC (permalink / raw)
  To: Nils Holland
  Cc: linux-kernel, linux-mm, Chris Mason, David Sterba, linux-btrfs,
	Michal Hocko

From: Michal Hocko <mhocko@suse.com>

__alloc_pages_may_oom makes sure to skip the OOM killer depending on
the allocation request. This includes lowmem requests, costly high
order requests and others. For a long time __GFP_NOFAIL acted as an
override for all those rules. This is not documented and it can be quite
surprising as well. E.g. GFP_NOFS requests are not invoking the OOM
killer but GFP_NOFS|__GFP_NOFAIL does so if we try to convert some of
the existing open coded loops around allocator to nofail request (and we
have done that in the past) then such a change would have a non trivial
side effect which is not obvious. Note that the primary motivation for
skipping the OOM killer is to prevent from pre-mature invocation.

The exception has been added by 82553a937f12 ("oom: invoke oom killer
for __GFP_NOFAIL"). The changelog points out that the oom killer has to
be invoked otherwise the request would be looping for ever. But this
argument is rather weak because the OOM killer doesn't really guarantee
any forward progress for those exceptional cases:
	- it will hardly help to form costly order which in turn can
	  result in the system panic because of no oom killable task in
	  the end - I believe we certainly do not want to put the system
	  down just because there is a nasty driver asking for order-9
	  page with GFP_NOFAIL not realizing all the consequences. It is
	  much better this request would loop for ever than the massive
	  system disruption
	- lowmem is also highly unlikely to be freed during OOM killer
	- GFP_NOFS request could trigger while there is still a lot of
	  memory pinned by filesystems.

The pre-mature OOM killer is a real issue as reported by Nils Holland
	kworker/u4:5 invoked oom-killer: gfp_mask=0x2400840(GFP_NOFS|__GFP_NOFAIL), nodemask=0, order=0, oom_score_adj=0
	kworker/u4:5 cpuset=/ mems_allowed=0
	CPU: 1 PID: 2603 Comm: kworker/u4:5 Not tainted 4.9.0-gentoo #2
	Hardware name: Hewlett-Packard Compaq 15 Notebook PC/21F7, BIOS F.22 08/06/2014
	Workqueue: writeback wb_workfn (flush-btrfs-1)
	 eff0b604 c142bcce eff0b734 00000000 eff0b634 c1163332 00000000 00000292
	 eff0b634 c1431876 eff0b638 e7fb0b00 e7fa2900 e7fa2900 c1b58785 eff0b734
	 eff0b678 c110795f c1043895 eff0b664 c11075c7 00000007 00000000 00000000
	Call Trace:
	 [<c142bcce>] dump_stack+0x47/0x69
	 [<c1163332>] dump_header+0x60/0x178
	 [<c1431876>] ? ___ratelimit+0x86/0xe0
	 [<c110795f>] oom_kill_process+0x20f/0x3d0
	 [<c1043895>] ? has_capability_noaudit+0x15/0x20
	 [<c11075c7>] ? oom_badness.part.13+0xb7/0x130
	 [<c1107df9>] out_of_memory+0xd9/0x260
	 [<c110ba0b>] __alloc_pages_nodemask+0xbfb/0xc80
	 [<c110414d>] pagecache_get_page+0xad/0x270
	 [<c13664a6>] alloc_extent_buffer+0x116/0x3e0
	 [<c1334a2e>] btrfs_find_create_tree_block+0xe/0x10
	[...]
	Normal free:41332kB min:41368kB low:51708kB high:62048kB active_anon:0kB inactive_anon:0kB active_file:532748kB inactive_file:44kB unevictable:0kB writepending:24kB present:897016kB managed:836248kB mlocked:0kB slab_reclaimable:159448kB slab_unreclaimable:69608kB kernel_stack:1112kB pagetables:1404kB bounce:0kB free_pcp:528kB local_pcp:340kB free_cma:0kB
	lowmem_reserve[]: 0 0 21292 21292
	HighMem free:781660kB min:512kB low:34356kB high:68200kB active_anon:234740kB inactive_anon:360kB active_file:557232kB inactive_file:1127804kB unevictable:0kB writepending:2592kB present:2725384kB managed:2725384kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:800kB local_pcp:608kB free_cma:0kB

this is a GFP_NOFS|__GFP_NOFAIL request which invokes the OOM killer
because there is clearly nothing reclaimable in the zone Normal while
there is a lot of page cache which is most probably pinned by the fs but
GFP_NOFS cannot reclaim it.

This patch simply removes the __GFP_NOFAIL special case in order to have
a more clear semantic without surprising side effects. Instead we do
allow nofail requests to access memory reserves to move forward in both
cases when the OOM killer is invoked and when it should be supressed.
In the later case we are more careful and only allow a partial access
because we do not want to risk the whole reserves depleting. There
are users doing GFP_NOFS|__GFP_NOFAIL heavily (e.g. __getblk_gfp ->
grow_dev_page).

Introduce __alloc_pages_cpuset_fallback helper which allows to bypass
allocation constrains for the given gfp mask while it enforces cpusets
whenever possible.

Reported-by: Nils Holland <nholland@tisys.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/oom_kill.c   |  2 +-
 mm/page_alloc.c | 97 ++++++++++++++++++++++++++++++++++++---------------------
 2 files changed, 62 insertions(+), 37 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ec9f11d4f094..12a6fce85f61 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -1013,7 +1013,7 @@ bool out_of_memory(struct oom_control *oc)
 	 * make sure exclude 0 mask - all other users should have at least
 	 * ___GFP_DIRECT_RECLAIM to get here.
 	 */
-	if (oc->gfp_mask && !(oc->gfp_mask & (__GFP_FS|__GFP_NOFAIL)))
+	if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS))
 		return true;
 
 	/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 095e2fa286de..d6bc3e4f1a0c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3057,6 +3057,26 @@ void warn_alloc(gfp_t gfp_mask, const char *fmt, ...)
 }
 
 static inline struct page *
+__alloc_pages_cpuset_fallback(gfp_t gfp_mask, unsigned int order,
+			      unsigned int alloc_flags,
+			      const struct alloc_context *ac)
+{
+	struct page *page;
+
+	page = get_page_from_freelist(gfp_mask, order,
+			alloc_flags|ALLOC_CPUSET, ac);
+	/*
+	 * fallback to ignore cpuset restriction if our nodes
+	 * are depleted
+	 */
+	if (!page)
+		page = get_page_from_freelist(gfp_mask, order,
+				alloc_flags, ac);
+
+	return page;
+}
+
+static inline struct page *
 __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 	const struct alloc_context *ac, unsigned long *did_some_progress)
 {
@@ -3091,47 +3111,42 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 	if (page)
 		goto out;
 
-	if (!(gfp_mask & __GFP_NOFAIL)) {
-		/* Coredumps can quickly deplete all memory reserves */
-		if (current->flags & PF_DUMPCORE)
-			goto out;
-		/* The OOM killer will not help higher order allocs */
-		if (order > PAGE_ALLOC_COSTLY_ORDER)
-			goto out;
-		/* The OOM killer does not needlessly kill tasks for lowmem */
-		if (ac->high_zoneidx < ZONE_NORMAL)
-			goto out;
-		if (pm_suspended_storage())
-			goto out;
-		/*
-		 * XXX: GFP_NOFS allocations should rather fail than rely on
-		 * other request to make a forward progress.
-		 * We are in an unfortunate situation where out_of_memory cannot
-		 * do much for this context but let's try it to at least get
-		 * access to memory reserved if the current task is killed (see
-		 * out_of_memory). Once filesystems are ready to handle allocation
-		 * failures more gracefully we should just bail out here.
-		 */
+	/* Coredumps can quickly deplete all memory reserves */
+	if (current->flags & PF_DUMPCORE)
+		goto out;
+	/* The OOM killer will not help higher order allocs */
+	if (order > PAGE_ALLOC_COSTLY_ORDER)
+		goto out;
+	/* The OOM killer does not needlessly kill tasks for lowmem */
+	if (ac->high_zoneidx < ZONE_NORMAL)
+		goto out;
+	if (pm_suspended_storage())
+		goto out;
+	/*
+	 * XXX: GFP_NOFS allocations should rather fail than rely on
+	 * other request to make a forward progress.
+	 * We are in an unfortunate situation where out_of_memory cannot
+	 * do much for this context but let's try it to at least get
+	 * access to memory reserved if the current task is killed (see
+	 * out_of_memory). Once filesystems are ready to handle allocation
+	 * failures more gracefully we should just bail out here.
+	 */
+
+	/* The OOM killer may not free memory on a specific node */
+	if (gfp_mask & __GFP_THISNODE)
+		goto out;
 
-		/* The OOM killer may not free memory on a specific node */
-		if (gfp_mask & __GFP_THISNODE)
-			goto out;
-	}
 	/* Exhausted what can be done so it's blamo time */
-	if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
+	if (out_of_memory(&oc)) {
 		*did_some_progress = 1;
 
-		if (gfp_mask & __GFP_NOFAIL) {
-			page = get_page_from_freelist(gfp_mask, order,
-					ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac);
-			/*
-			 * fallback to ignore cpuset restriction if our nodes
-			 * are depleted
-			 */
-			if (!page)
-				page = get_page_from_freelist(gfp_mask, order,
+		/*
+		 * Help non-failing allocations by giving them access to memory
+		 * reserves
+		 */
+		if (gfp_mask & __GFP_NOFAIL)
+			page = __alloc_pages_cpuset_fallback(gfp_mask, order,
 					ALLOC_NO_WATERMARKS, ac);
-		}
 	}
 out:
 	mutex_unlock(&oom_lock);
@@ -3737,6 +3752,16 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		 */
 		WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER);
 
+		/*
+		 * Help non-failing allocations by giving them access to memory
+		 * reserves but do not use ALLOC_NO_WATERMARKS because this
+		 * could deplete whole memory reserves which would just make
+		 * the situation worse
+		 */
+		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac);
+		if (page)
+			goto got_pg;
+
 		cond_resched();
 		goto retry;
 	}
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically
  2016-12-16 15:58     ` [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically Michal Hocko
@ 2016-12-16 17:31       ` Johannes Weiner
  2016-12-16 22:12         ` Michal Hocko
  0 siblings, 1 reply; 62+ messages in thread
From: Johannes Weiner @ 2016-12-16 17:31 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Nils Holland, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs, Michal Hocko

On Fri, Dec 16, 2016 at 04:58:08PM +0100, Michal Hocko wrote:
> @@ -1013,7 +1013,7 @@ bool out_of_memory(struct oom_control *oc)
>  	 * make sure exclude 0 mask - all other users should have at least
>  	 * ___GFP_DIRECT_RECLAIM to get here.
>  	 */
> -	if (oc->gfp_mask && !(oc->gfp_mask & (__GFP_FS|__GFP_NOFAIL)))
> +	if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS))
>  		return true;

This makes sense, we should go back to what we had here. Because it's
not that the reported OOMs are premature - there is genuinely no more
memory reclaimable from the allocating context - but that this class
of allocations should never invoke the OOM killer in the first place.

> @@ -3737,6 +3752,16 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  		 */
>  		WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER);
>  
> +		/*
> +		 * Help non-failing allocations by giving them access to memory
> +		 * reserves but do not use ALLOC_NO_WATERMARKS because this
> +		 * could deplete whole memory reserves which would just make
> +		 * the situation worse
> +		 */
> +		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac);
> +		if (page)
> +			goto got_pg;
> +

But this should be a separate patch, IMO.

Do we observe GFP_NOFS lockups when we don't do this? Don't we risk
premature exhaustion of the memory reserves, and it's better to wait
for other reclaimers to make some progress instead? Should we give
reserve access to all GFP_NOFS allocations, or just the ones from a
reclaim/cleaning context? All that should go into the changelog of a
separate allocation booster patch, I think.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on 4.9
  2016-12-16  7:39 ` Michal Hocko
  2016-12-16 15:58   ` OOM: Better, but still there on Michal Hocko
@ 2016-12-16 18:15   ` Chris Mason
  2016-12-16 22:14     ` Michal Hocko
  2016-12-16 19:50   ` Chris Mason
  2 siblings, 1 reply; 62+ messages in thread
From: Chris Mason @ 2016-12-16 18:15 UTC (permalink / raw)
  To: Michal Hocko, Nils Holland
  Cc: linux-kernel, linux-mm, David Sterba, linux-btrfs

On 12/16/2016 02:39 AM, Michal Hocko wrote:
> [CC linux-mm and btrfs guys]
>
> On Thu 15-12-16 23:57:04, Nils Holland wrote:
> [...]
>> Of course, none of this are workloads that are new / special in any
>> way - prior to 4.8, I never experienced any issues doing the exact
>> same things.
>>
>> Dec 15 19:02:16 teela kernel: kworker/u4:5 invoked oom-killer: gfp_mask=0x2400840(GFP_NOFS|__GFP_NOFAIL), nodemask=0, order=0, oom_score_adj=0
>> Dec 15 19:02:18 teela kernel: kworker/u4:5 cpuset=/ mems_allowed=0
>> Dec 15 19:02:18 teela kernel: CPU: 1 PID: 2603 Comm: kworker/u4:5 Not tainted 4.9.0-gentoo #2
>> Dec 15 19:02:18 teela kernel: Hardware name: Hewlett-Packard Compaq 15 Notebook PC/21F7, BIOS F.22 08/06/2014
>> Dec 15 19:02:18 teela kernel: Workqueue: writeback wb_workfn (flush-btrfs-1)
>> Dec 15 19:02:18 teela kernel:  eff0b604 c142bcce eff0b734 00000000 eff0b634 c1163332 00000000 00000292
>> Dec 15 19:02:18 teela kernel:  eff0b634 c1431876 eff0b638 e7fb0b00 e7fa2900 e7fa2900 c1b58785 eff0b734
>> Dec 15 19:02:18 teela kernel:  eff0b678 c110795f c1043895 eff0b664 c11075c7 00000007 00000000 00000000
>> Dec 15 19:02:18 teela kernel: Call Trace:
>> Dec 15 19:02:18 teela kernel:  [<c142bcce>] dump_stack+0x47/0x69
>> Dec 15 19:02:18 teela kernel:  [<c1163332>] dump_header+0x60/0x178
>> Dec 15 19:02:18 teela kernel:  [<c1431876>] ? ___ratelimit+0x86/0xe0
>> Dec 15 19:02:18 teela kernel:  [<c110795f>] oom_kill_process+0x20f/0x3d0
>> Dec 15 19:02:18 teela kernel:  [<c1043895>] ? has_capability_noaudit+0x15/0x20
>> Dec 15 19:02:18 teela kernel:  [<c11075c7>] ? oom_badness.part.13+0xb7/0x130
>> Dec 15 19:02:18 teela kernel:  [<c1107df9>] out_of_memory+0xd9/0x260
>> Dec 15 19:02:18 teela kernel:  [<c110ba0b>] __alloc_pages_nodemask+0xbfb/0xc80
>> Dec 15 19:02:18 teela kernel:  [<c110414d>] pagecache_get_page+0xad/0x270
>> Dec 15 19:02:18 teela kernel:  [<c13664a6>] alloc_extent_buffer+0x116/0x3e0
>> Dec 15 19:02:18 teela kernel:  [<c1334a2e>] btrfs_find_create_tree_block+0xe/0x10
>> Dec 15 19:02:18 teela kernel:  [<c132a57f>] btrfs_alloc_tree_block+0x1ef/0x5f0
>> Dec 15 19:02:18 teela kernel:  [<c130f7c3>] __btrfs_cow_block+0x143/0x5f0
>> Dec 15 19:02:18 teela kernel:  [<c130fe1a>] btrfs_cow_block+0x13a/0x220
>> Dec 15 19:02:18 teela kernel:  [<c13132f1>] btrfs_search_slot+0x1d1/0x870
>> Dec 15 19:02:18 teela kernel:  [<c132fcdd>] btrfs_lookup_file_extent+0x4d/0x60
>> Dec 15 19:02:18 teela kernel:  [<c1354fe6>] __btrfs_drop_extents+0x176/0x1070
>> Dec 15 19:02:18 teela kernel:  [<c1150377>] ? kmem_cache_alloc+0xb7/0x190
>> Dec 15 19:02:18 teela kernel:  [<c133dbb5>] ? start_transaction+0x65/0x4b0
>> Dec 15 19:02:18 teela kernel:  [<c1150597>] ? __kmalloc+0x147/0x1e0
>> Dec 15 19:02:18 teela kernel:  [<c1345005>] cow_file_range_inline+0x215/0x6b0
>> Dec 15 19:02:18 teela kernel:  [<c13459fc>] cow_file_range.isra.49+0x55c/0x6d0
>> Dec 15 19:02:18 teela kernel:  [<c1361795>] ? lock_extent_bits+0x75/0x1e0
>> Dec 15 19:02:18 teela kernel:  [<c1346d51>] run_delalloc_range+0x441/0x470
>> Dec 15 19:02:18 teela kernel:  [<c13626e4>] writepage_delalloc.isra.47+0x144/0x1e0
>> Dec 15 19:02:18 teela kernel:  [<c1364548>] __extent_writepage+0xd8/0x2b0
>> Dec 15 19:02:18 teela kernel:  [<c1365c4c>] extent_writepages+0x25c/0x380
>> Dec 15 19:02:18 teela kernel:  [<c1342cd0>] ? btrfs_real_readdir+0x610/0x610
>> Dec 15 19:02:18 teela kernel:  [<c133ff0f>] btrfs_writepages+0x1f/0x30
>> Dec 15 19:02:18 teela kernel:  [<c110ff85>] do_writepages+0x15/0x40
>> Dec 15 19:02:18 teela kernel:  [<c1190a95>] __writeback_single_inode+0x35/0x2f0
>> Dec 15 19:02:18 teela kernel:  [<c119112e>] writeback_sb_inodes+0x16e/0x340
>> Dec 15 19:02:18 teela kernel:  [<c119145a>] wb_writeback+0xaa/0x280
>> Dec 15 19:02:18 teela kernel:  [<c1191de8>] wb_workfn+0xd8/0x3e0
>> Dec 15 19:02:18 teela kernel:  [<c104fd34>] process_one_work+0x114/0x3e0
>> Dec 15 19:02:18 teela kernel:  [<c1050b4f>] worker_thread+0x2f/0x4b0
>> Dec 15 19:02:18 teela kernel:  [<c1050b20>] ? create_worker+0x180/0x180
>> Dec 15 19:02:18 teela kernel:  [<c10552e7>] kthread+0x97/0xb0
>> Dec 15 19:02:18 teela kernel:  [<c1055250>] ? __kthread_parkme+0x60/0x60
>> Dec 15 19:02:18 teela kernel:  [<c19b5cb7>] ret_from_fork+0x1b/0x28
>> Dec 15 19:02:18 teela kernel: Mem-Info:
>> Dec 15 19:02:18 teela kernel: active_anon:58685 inactive_anon:90 isolated_anon:0
>>                                active_file:274324 inactive_file:281962 isolated_file:0
>
> OK, so there is still some anonymous memory that could be swapped out
> and quite a lot of page cache. This might be harder to reclaim because
> the allocation is a GFP_NOFS request which is limited in its reclaim
> capabilities. It might be possible that those pagecache pages are pinned
> in some way by the the filesystem.
>
>>                                unevictable:0 dirty:649 writeback:0 unstable:0
>>                                slab_reclaimable:40662 slab_unreclaimable:17754
>>                                mapped:7382 shmem:202 pagetables:351 bounce:0
>>                                free:206736 free_pcp:332 free_cma:0
>> Dec 15 19:02:18 teela kernel: Node 0 active_anon:234740kB inactive_anon:360kB active_file:1097296kB inactive_file:1127848kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:29528kB dirty:2596kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 184320kB anon_thp: 808kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
>> Dec 15 19:02:18 teela kernel: DMA free:3952kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:7316kB inactive_file:0kB unevictable:0kB writepending:96kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:3200kB slab_unreclaimable:1408kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>> Dec 15 19:02:18 teela kernel: lowmem_reserve[]: 0 813 3474 3474
>> Dec 15 19:02:18 teela kernel: Normal free:41332kB min:41368kB low:51708kB high:62048kB active_anon:0kB inactive_anon:0kB active_file:532748kB inactive_file:44kB unevictable:0kB writepending:24kB present:897016kB managed:836248kB mlocked:0kB slab_reclaimable:159448kB slab_unreclaimable:69608kB kernel_stack:1112kB pagetables:1404kB bounce:0kB free_pcp:528kB local_pcp:340kB free_cma:0kB
>
> And this shows that there is no anonymous memory in the lowmem zone.
> Note that this request cannot use the highmem zone so no swap out would
> help. So if we are not able to reclaim those pages on the file LRU then
> we are out of luck
>
>> Dec 15 19:02:18 teela kernel: lowmem_reserve[]: 0 0 21292 21292
>> Dec 15 19:02:18 teela kernel: HighMem free:781660kB min:512kB low:34356kB high:68200kB active_anon:234740kB inactive_anon:360kB active_file:557232kB inactive_file:1127804kB unevictable:0kB writepending:2592kB present:2725384kB managed:2725384kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:800kB local_pcp:608kB free_cma:0kB
>
> That being said, the OOM killer invocation is clearly pointless and
> pre-mature. We normally do not invoke it normally for GFP_NOFS requests
> exactly for these reasons. But this is GFP_NOFS|__GFP_NOFAIL which
> behaves differently. I am about to change that but my last attempt [1]
> has to be rethought.
>
> Now another thing is that the __GFP_NOFAIL which has this nasty side
> effect has been introduced by me d1b5c5671d01 ("btrfs: Prevent from
> early transaction abort") in 4.3 so I am quite surprised that this has
> shown up only in 4.8. Anyway there might be some other changes in the
> btrfs which could make it more subtle.
>
> I believe the right way to go around this is to pursue what I've started
> in [1]. I will try to prepare something for testing today for you. Stay
> tuned. But I would be really happy if somebody from the btrfs camp could
> check the NOFS aspect of this allocation. We have already seen
> allocation stalls from this path quite recently

Just double checking, are you asking why we're using GFP_NOFS to avoid 
going into btrfs from the btrfs writepages call, or are you asking why 
we aren't allowing highmem?

For why we're not using highmem, it goes back to 2011:

commit a65917156e345946dbde3d7effd28124c6d6a8c2
Btrfs: stop using highmem for extent_buffers

The short answer is that kmap + shared caching pointer between threads 
made it hugely complex.  I gave up and dropped the highmem part.

-chris

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-16 15:58   ` OOM: Better, but still there on Michal Hocko
  2016-12-16 15:58     ` [PATCH 1/2] mm: consolidate GFP_NOFAIL checks in the allocator slowpath Michal Hocko
  2016-12-16 15:58     ` [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically Michal Hocko
@ 2016-12-16 18:47     ` Nils Holland
  2016-12-17  0:02       ` Michal Hocko
  2 siblings, 1 reply; 62+ messages in thread
From: Nils Holland @ 2016-12-16 18:47 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, Chris Mason, David Sterba, linux-btrfs

On Fri, Dec 16, 2016 at 04:58:06PM +0100, Michal Hocko wrote:
> On Fri 16-12-16 08:39:41, Michal Hocko wrote:
> [...]
> > That being said, the OOM killer invocation is clearly pointless and
> > pre-mature. We normally do not invoke it normally for GFP_NOFS requests
> > exactly for these reasons. But this is GFP_NOFS|__GFP_NOFAIL which
> > behaves differently. I am about to change that but my last attempt [1]
> > has to be rethought.
> > 
> > Now another thing is that the __GFP_NOFAIL which has this nasty side
> > effect has been introduced by me d1b5c5671d01 ("btrfs: Prevent from
> > early transaction abort") in 4.3 so I am quite surprised that this has
> > shown up only in 4.8. Anyway there might be some other changes in the
> > btrfs which could make it more subtle.
> > 
> > I believe the right way to go around this is to pursue what I've started
> > in [1]. I will try to prepare something for testing today for you. Stay
> > tuned. But I would be really happy if somebody from the btrfs camp could
> > check the NOFS aspect of this allocation. We have already seen
> > allocation stalls from this path quite recently
> 
> Could you try to run with the two following patches?

I tried the two patches you sent, and ... well, things are different
now, but probably still a bit problematic. ;-)

Once again, I freshly booted both of my machines and told Gentoo's
portage to unpack and build the firefox sources. The first machine,
the one from which yesterday's OOM report came, became unresponsive
during the tarball unpack phase and had to be power cycled.
Unfortunately, there's nothing concerning its OOMs in the logs. :-(

The second machine actually finished the unpack phase successfully and
started the build process (which, every now and then, had also worked
with previous problematic kernels). However, after it had been
building for a while and I decided to increase the stress level by
starting X, firefox as well as a terminal and unpack a kernel source
tarball in it, it also started OOMing, this time once more with a
genuine kernel panic. Luckily, this machine also caught something in
the logs, which I'm including below.

Despite the fact that I'm no expert, I can see that there's no more
GFP_NOFS being logged, which seems to be what the patches tried to
achieve. What the still present OOMs mean remains up for
interpretation by the experts, all I can say is that in the (pre-4.8?)
past, doing all of the things I just did would probably slow down my
machine quite a bit, but I can't remember to have ever seen it OOM or
even crash completely.

Dec 16 18:56:24 boerne.fritz.box kernel: Purging GPU memory, 37 pages freed, 10219 pages still pinned.
Dec 16 18:56:29 boerne.fritz.box kernel: kthreadd invoked oom-killer: gfp_mask=0x27080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=0, order=1, oom_score_adj=0
Dec 16 18:56:29 boerne.fritz.box kernel: kthreadd cpuset=/ mems_allowed=0
Dec 16 18:56:29 boerne.fritz.box kernel: CPU: 1 PID: 2 Comm: kthreadd Not tainted 4.9.0-gentoo #3
Dec 16 18:56:29 boerne.fritz.box kernel: Hardware name: TOSHIBA Satellite L500/KSWAA, BIOS V1.80 10/28/2009
Dec 16 18:56:29 boerne.fritz.box kernel:  f4105d6c c1433406 f4105e9c c6611280 f4105d9c c1170011 f4105df0 00200296
Dec 16 18:56:29 boerne.fritz.box kernel:  f4105d9c c1438fff f4105da0 edc1bc80 ee32ce00 c6611280 c1ad1899 f4105e9c
Dec 16 18:56:29 boerne.fritz.box kernel:  f4105de0 c1114407 c10513a5 f4105dcc c11140a1 00000001 00000000 00000000
Dec 16 18:56:29 boerne.fritz.box kernel: Call Trace:
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c1433406>] dump_stack+0x47/0x61
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c1170011>] dump_header+0x5f/0x175
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c1438fff>] ? ___ratelimit+0x7f/0xe0
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c1114407>] oom_kill_process+0x207/0x3c0
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c10513a5>] ? has_capability_noaudit+0x15/0x20
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c11140a1>] ? oom_badness.part.13+0xb1/0x120
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c11148c4>] out_of_memory+0xd4/0x270
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c1118615>] __alloc_pages_nodemask+0xcf5/0xd60
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c10464f5>] copy_process.part.52+0xd5/0x1410
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c1080779>] ? pick_next_task_fair+0x479/0x510
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c1062ba0>] ? __kthread_parkme+0x60/0x60
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c10479d7>] _do_fork+0xc7/0x360
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c1062ba0>] ? __kthread_parkme+0x60/0x60
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c1047ca0>] kernel_thread+0x30/0x40
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c10637c6>] kthreadd+0x106/0x150
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c10636c0>] ? kthread_park+0x50/0x50
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c19422b7>] ret_from_fork+0x1b/0x28
Dec 16 18:56:29 boerne.fritz.box kernel: Mem-Info:
Dec 16 18:56:29 boerne.fritz.box kernel: active_anon:132176 inactive_anon:11640 isolated_anon:0
                                          active_file:295257 inactive_file:389350 isolated_file:20
                                          unevictable:0 dirty:3956 writeback:0 unstable:0
                                          slab_reclaimable:54632 slab_unreclaimable:21963
                                          mapped:36724 shmem:11853 pagetables:914 bounce:0
                                          free:77600 free_pcp:327 free_cma:0
Dec 16 18:56:29 boerne.fritz.box kernel: Node 0 active_anon:528704kB inactive_anon:46560kB active_file:1181028kB inactive_file:1557400kB unevictable:0kB isolated(anon):0kB isolated(file):80kB mapped:146896kB dirty:15824kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 172032kB anon_thp: 47412kB writeback_tmp:0kB unstable:0kB pages_scanned:15066965 all_unreclaimable? yes
Dec 16 18:56:29 boerne.fritz.box kernel: DMA free:3976kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:4788kB inactive_file:0kB unevictable:0kB writepending:160kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:5356kB slab_unreclaimable:1616kB kernel_stack:32kB pagetables:84kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Dec 16 18:56:29 boerne.fritz.box kernel: lowmem_reserve[]: 0 808 3849 3849
Dec 16 18:56:29 boerne.fritz.box kernel: Normal free:41008kB min:41100kB low:51372kB high:61644kB active_anon:0kB inactive_anon:0kB active_file:470556kB inactive_file:148kB unevictable:0kB writepending:1616kB present:897016kB managed:831480kB mlocked:0kB slab_reclaimable:213172kB slab_unreclaimable:86236kB kernel_stack:1864kB pagetables:3572kB bounce:0kB free_pcp:532kB local_pcp:456kB free_cma:0kB
Dec 16 18:56:29 boerne.fritz.box kernel: lowmem_reserve[]: 0 0 24330 24330
Dec 16 18:56:29 boerne.fritz.box kernel: HighMem free:265416kB min:512kB low:39184kB high:77856kB active_anon:528704kB inactive_anon:46560kB active_file:705684kB inactive_file:1557292kB unevictable:0kB writepending:14048kB present:3114256kB managed:3114256kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:776kB local_pcp:660kB free_cma:0kB
Dec 16 18:56:29 boerne.fritz.box kernel: lowmem_reserve[]: 0 0 0 0
Dec 16 18:56:29 boerne.fritz.box kernel: DMA: 2*4kB (UE) 2*8kB (U) 1*16kB (E) 1*32kB (U) 1*64kB (U) 0*128kB 1*256kB (E) 1*512kB (E) 1*1024kB (U) 1*2048kB (M) 0*4096kB = 3976kB
Dec 16 18:56:29 boerne.fritz.box kernel: Normal: 32*4kB (ME) 28*8kB (UM) 15*16kB (UM) 141*32kB (UME) 141*64kB (UM) 80*128kB (UM) 19*256kB (UME) 3*512kB (UME) 2*1024kB (ME) 2*2048kB (ME) 1*4096kB (M) = 41008kB
Dec 16 18:56:29 boerne.fritz.box kernel: HighMem: 340*4kB (UME) 339*8kB (UME) 258*16kB (UME) 192*32kB (UME) 69*64kB (UME) 15*128kB (UME) 6*256kB (ME) 5*512kB (UME) 7*1024kB (UME) 4*2048kB (UE) 55*4096kB (UM) = 265416kB
Dec 16 18:56:29 boerne.fritz.box kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec 16 18:56:29 boerne.fritz.box kernel: 696480 total pagecache pages
Dec 16 18:56:29 boerne.fritz.box kernel: 0 pages in swap cache
Dec 16 18:56:29 boerne.fritz.box kernel: Swap cache stats: add 0, delete 0, find 0/0
Dec 16 18:56:29 boerne.fritz.box kernel: Free swap  = 3781628kB
Dec 16 18:56:29 boerne.fritz.box kernel: Total swap = 3781628kB
Dec 16 18:56:29 boerne.fritz.box kernel: 1006816 pages RAM
Dec 16 18:56:29 boerne.fritz.box kernel: 778564 pages HighMem/MovableOnly
Dec 16 18:56:29 boerne.fritz.box kernel: 16403 pages reserved
Dec 16 18:56:29 boerne.fritz.box kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Dec 16 18:56:29 boerne.fritz.box kernel: [ 1874]     0  1874     6166      987       9       3        0             0 systemd-journal
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2497]     0  2497     2965      911       8       3        0         -1000 systemd-udevd
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2582]   107  2582     3874      958       8       3        0             0 systemd-timesyn
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2585]   108  2585     1269      883       6       3        0          -900 dbus-daemon
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2586]     0  2586    22054     3277      20       3        0             0 NetworkManager
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2587]     0  2587     1521      972       7       3        0             0 systemd-logind
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2589]    88  2589     1158      627       6       3        0             0 nullmailer-send
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2612]     0  2612     1510      460       5       3        0             0 fcron
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2665]     0  2665      768      580       5       3        0             0 dhcpcd
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2668]     0  2668      639      408       5       3        0             0 vnstatd
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2669]     0  2669     1460     1063       6       3        0         -1000 sshd
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2670]     0  2670     1235      838       6       3        0             0 login
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2672]     0  2672     1972     1267       7       3        0             0 systemd
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2700]     0  2700     2279      586       7       3        0             0 (sd-pam)
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2733]     0  2733     1836      890       7       3        0             0 bash
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2753]   109  2753    16724     3089      19       3        0             0 polkitd
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2776]     0  2776     2153     1349       7       3        0             0 wpa_supplicant
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2941]     0  2941    16268    15095      36       3        0             0 emerge
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2942]     0  2942     1235      833       5       3        0             0 login
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2949]  1000  2949     2033     1378       7       3        0             0 systemd
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2973]  1000  2973     2279      589       7       3        0             0 (sd-pam)
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2989]  1000  2989     1836      907       7       3        0             0 bash
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2997]  1000  2997    25339     2169      17       3        0             0 pulseaudio
Dec 16 18:56:29 boerne.fritz.box kernel: [ 3000]   111  3000     5763      655       9       3        0             0 rtkit-daemon
Dec 16 18:56:29 boerne.fritz.box kernel: [ 3019]  1000  3019     3575     1403      11       3        0             0 gconf-helper
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5626]  1000  5626     1743      709       8       3        0             0 startx
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5647]  1000  5647     1001      579       6       3        0             0 xinit
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5648]  1000  5648    22873     7477      43       3        0             0 X
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5674]  1000  5674    10584     4543      21       3        0             0 awesome
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5718]  1000  5718     1571      610       7       3        0             0 dbus-launch
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5720]  1000  5720     1238      645       6       3        0             0 dbus-daemon
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5725]  1000  5725     1571      634       7       3        0             0 dbus-launch
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5726]  1000  5726     1238      649       6       3        0             0 dbus-daemon
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5823]  1000  5823    35683     8366      42       3        0             0 nm-applet
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5825]  1000  5825    21454     7358      31       3        0             0 xfce4-terminal
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5827]  1000  5827    11257     1911      14       3        0             0 at-spi-bus-laun
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5832]  1000  5832     1238      831       6       3        0             0 dbus-daemon
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5838]  1000  5838     7480     2110      12       3        0             0 at-spi2-registr
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5840]  1000  5840    10179     1459      13       3        0             0 gvfsd
Dec 16 18:56:29 boerne.fritz.box kernel: [ 6181]  1000  6181     1836      883       7       3        0             0 bash
Dec 16 18:56:29 boerne.fritz.box kernel: [ 7874]  1000  7874     2246     1185       8       3        0             0 ssh
Dec 16 18:56:29 boerne.fritz.box kernel: [12950]  1000 12950   197232    73307     252       3        0             0 firefox
Dec 16 18:56:29 boerne.fritz.box kernel: [13020]   250 13020      549      377       4       3        0             0 sandbox
Dec 16 18:56:29 boerne.fritz.box kernel: [13022]   250 13022     2629     1567       8       3        0             0 ebuild.sh
Dec 16 18:56:29 boerne.fritz.box kernel: [13040]  1000 13040     1836      933       7       3        0             0 bash
Dec 16 18:56:29 boerne.fritz.box kernel: [13048]   250 13048     3002     1718       8       3        0             0 ebuild.sh
Dec 16 18:56:29 boerne.fritz.box kernel: [13052]   250 13052     1122      732       5       3        0             0 emake
Dec 16 18:56:29 boerne.fritz.box kernel: [13054]   250 13054      921      697       5       3        0             0 make
Dec 16 18:56:29 boerne.fritz.box kernel: [13118]   250 13118     1048      783       5       3        0             0 make
Dec 16 18:56:29 boerne.fritz.box kernel: [13181]   250 13181     1043      789       5       3        0             0 make
Dec 16 18:56:29 boerne.fritz.box kernel: [13208]   250 13208     1095      855       6       3        0             0 make
Dec 16 18:56:29 boerne.fritz.box kernel: [13255]   250 13255      772      555       5       3        0             0 make
Dec 16 18:56:29 boerne.fritz.box kernel: [13299]   250 13299      913      689       5       3        0             0 make
Dec 16 18:56:29 boerne.fritz.box kernel: [13493]   250 13493      876      619       5       3        0             0 make
Dec 16 18:56:29 boerne.fritz.box kernel: [13494]   250 13494    15191    14639      34       3        0             0 python
Dec 16 18:56:29 boerne.fritz.box kernel: [13532]   250 13532      808      594       4       3        0             0 make
Dec 16 18:56:29 boerne.fritz.box kernel: [13593]  1000 13593     1533      624       7       3        0             0 tar
Dec 16 18:56:29 boerne.fritz.box kernel: [13594]  1000 13594    17834    16906      38       3        0             0 xz
Dec 16 18:56:29 boerne.fritz.box kernel: [13604]   250 13604    12439    11843      27       3        0             0 python
Dec 16 18:56:29 boerne.fritz.box kernel: [13651]   250 13651      253        5       1       3        0             0 sh
Dec 16 18:56:29 boerne.fritz.box kernel: Out of memory: Kill process 12950 (firefox) score 38 or sacrifice child
Dec 16 18:56:29 boerne.fritz.box kernel: Killed process 12950 (firefox) total-vm:788928kB, anon-rss:192656kB, file-rss:100548kB, shmem-rss:24kB
Dec 16 18:56:29 boerne.fritz.box kernel: oom_reaper: reaped process 12950 (firefox), now anon-rss:0kB, file-rss:96kB, shmem-rss:24kB
Dec 16 18:56:31 boerne.fritz.box kernel: xfce4-terminal invoked oom-killer: gfp_mask=0x25000c0(GFP_KERNEL_ACCOUNT), nodemask=0, order=0, oom_score_adj=0
Dec 16 18:56:31 boerne.fritz.box kernel: xfce4-terminal cpuset=/ mems_allowed=0
Dec 16 18:56:31 boerne.fritz.box kernel: CPU: 0 PID: 5825 Comm: xfce4-terminal Not tainted 4.9.0-gentoo #3
Dec 16 18:56:31 boerne.fritz.box kernel: Hardware name: TOSHIBA Satellite L500/KSWAA, BIOS V1.80 10/28/2009
Dec 16 18:56:31 boerne.fritz.box kernel:  c6941c18 c1433406 c6941d48 c5972500 c6941c48 c1170011 c6941c9c 00200286
Dec 16 18:56:31 boerne.fritz.box kernel:  c6941c48 c1438fff c6941c4c edc1a940 ee32d400 c5972500 c1ad1899 c6941d48
Dec 16 18:56:31 boerne.fritz.box kernel:  c6941c8c c1114407 c10513a5 c6941c78 c11140a1 00000006 00000000 00000000
Dec 16 18:56:31 boerne.fritz.box kernel: Call Trace:
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1433406>] dump_stack+0x47/0x61
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1170011>] dump_header+0x5f/0x175
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1438fff>] ? ___ratelimit+0x7f/0xe0
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1114407>] oom_kill_process+0x207/0x3c0
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c10513a5>] ? has_capability_noaudit+0x15/0x20
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c11140a1>] ? oom_badness.part.13+0xb1/0x120
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c11148c4>] out_of_memory+0xd4/0x270
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1118615>] __alloc_pages_nodemask+0xcf5/0xd60
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1758900>] ? skb_queue_purge+0x30/0x30
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c175dcde>] alloc_skb_with_frags+0xee/0x1a0
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1753dba>] sock_alloc_send_pskb+0x19a/0x1c0
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1186120>] ? poll_select_copy_remaining+0x120/0x120
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1825880>] ? wait_for_unix_gc+0x20/0x90
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1823fc0>] unix_stream_sendmsg+0x2a0/0x350
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1750b3d>] sock_sendmsg+0x2d/0x40
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1750bb7>] sock_write_iter+0x67/0xc0
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1172c42>] do_readv_writev+0x1e2/0x380
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1750b50>] ? sock_sendmsg+0x40/0x40
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1033763>] ? lapic_next_event+0x13/0x20
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c10ae675>] ? clockevents_program_event+0x95/0x190
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c10a074a>] ? __hrtimer_run_queues+0x20a/0x280
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1173d16>] vfs_writev+0x36/0x60
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1173d85>] do_writev+0x45/0xc0
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1173efb>] SyS_writev+0x1b/0x20
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c10018ec>] do_fast_syscall_32+0x7c/0x130
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c194232b>] sysenter_past_esp+0x40/0x6a
Dec 16 18:56:31 boerne.fritz.box kernel: Mem-Info:
Dec 16 18:56:31 boerne.fritz.box kernel: active_anon:72795 inactive_anon:7267 isolated_anon:0
                                          active_file:297627 inactive_file:387672 isolated_file:0
                                          unevictable:0 dirty:77 writeback:18 unstable:0
                                          slab_reclaimable:54648 slab_unreclaimable:21983
                                          mapped:17819 shmem:8215 pagetables:662 bounce:8
                                          free:141692 free_pcp:107 free_cma:0
Dec 16 18:56:31 boerne.fritz.box kernel: Node 0 active_anon:291180kB inactive_anon:29068kB active_file:1190508kB inactive_file:1550688kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:71276kB dirty:308kB writeback:72kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 122880kB anon_thp: 32860kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
Dec 16 18:56:31 boerne.fritz.box kernel: DMA free:4020kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:4804kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:5356kB slab_unreclaimable:1572kB kernel_stack:32kB pagetables:84kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Dec 16 18:56:32 boerne.fritz.box kernel: lowmem_reserve[]: 0 808 3849 3849
Dec 16 18:56:32 boerne.fritz.box kernel: Normal free:41028kB min:41100kB low:51372kB high:61644kB active_anon:0kB inactive_anon:0kB active_file:472164kB inactive_file:108kB unevictable:0kB writepending:112kB present:897016kB managed:831480kB mlocked:0kB slab_reclaimable:213236kB slab_unreclaimable:86360kB kernel_stack:1584kB pagetables:2564kB bounce:32kB free_pcp:180kB local_pcp:24kB free_cma:0kB
Dec 16 18:56:32 boerne.fritz.box kernel: lowmem_reserve[]: 0 0 24330 24330
Dec 16 18:56:32 boerne.fritz.box kernel: HighMem free:521720kB min:512kB low:39184kB high:77856kB active_anon:291180kB inactive_anon:29068kB active_file:713448kB inactive_file:1550556kB unevictable:0kB writepending:76kB present:3114256kB managed:3114256kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:248kB local_pcp:156kB free_cma:0kB
Dec 16 18:56:32 boerne.fritz.box kernel: lowmem_reserve[]: 0 0 0 0
Dec 16 18:56:32 boerne.fritz.box kernel: DMA: 13*4kB (UE) 2*8kB (U) 1*16kB (E) 1*32kB (U) 1*64kB (U) 0*128kB 1*256kB (E) 1*512kB (E) 1*1024kB (U) 1*2048kB (M) 0*4096kB = 4020kB
Dec 16 18:56:32 boerne.fritz.box kernel: Normal: 37*4kB (UME) 24*8kB (ME) 17*16kB (UME) 137*32kB (UME) 143*64kB (UME) 82*128kB (UM) 18*256kB (UM) 3*512kB (UME) 2*1024kB (ME) 2*2048kB (ME) 1*4096kB (M) = 41028kB
Dec 16 18:56:32 boerne.fritz.box kernel: HighMem: 3230*4kB (ME) 1616*8kB (M) 680*16kB (UM) 398*32kB (UME) 145*64kB (UM) 59*128kB (UM) 25*256kB (ME) 19*512kB (UME) 9*1024kB (UME) 36*2048kB (UME) 87*4096kB (UME) = 521720kB
Dec 16 18:56:32 boerne.fritz.box kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec 16 18:56:32 boerne.fritz.box kernel: 693537 total pagecache pages
Dec 16 18:56:32 boerne.fritz.box kernel: 0 pages in swap cache
Dec 16 18:56:32 boerne.fritz.box kernel: Swap cache stats: add 0, delete 0, find 0/0
Dec 16 18:56:32 boerne.fritz.box kernel: Free swap  = 3781628kB
Dec 16 18:56:32 boerne.fritz.box kernel: Total swap = 3781628kB
Dec 16 18:56:32 boerne.fritz.box kernel: 1006816 pages RAM
Dec 16 18:56:32 boerne.fritz.box kernel: 778564 pages HighMem/MovableOnly
Dec 16 18:56:32 boerne.fritz.box kernel: 16403 pages reserved
Dec 16 18:56:32 boerne.fritz.box kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Dec 16 18:56:32 boerne.fritz.box kernel: [ 1874]     0  1874     6166     1007       9       3        0             0 systemd-journal
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2497]     0  2497     2965      911       8       3        0         -1000 systemd-udevd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2582]   107  2582     3874      958       8       3        0             0 systemd-timesyn
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2585]   108  2585     1301      885       6       3        0          -900 dbus-daemon
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2586]     0  2586    22054     3277      20       3        0             0 NetworkManager
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2587]     0  2587     1521      972       7       3        0             0 systemd-logind
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2589]    88  2589     1158      627       6       3        0             0 nullmailer-send
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2612]     0  2612     1510      460       5       3        0             0 fcron
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2665]     0  2665      768      580       5       3        0             0 dhcpcd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2668]     0  2668      639      408       5       3        0             0 vnstatd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2669]     0  2669     1460     1063       6       3        0         -1000 sshd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2670]     0  2670     1235      838       6       3        0             0 login
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2672]     0  2672     1972     1267       7       3        0             0 systemd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2700]     0  2700     2279      586       7       3        0             0 (sd-pam)
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2733]     0  2733     1836      890       7       3        0             0 bash
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2753]   109  2753    16724     3089      19       3        0             0 polkitd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2776]     0  2776     2153     1349       7       3        0             0 wpa_supplicant
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2941]     0  2941    16268    15095      36       3        0             0 emerge
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2942]     0  2942     1235      833       5       3        0             0 login
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2949]  1000  2949     2033     1378       7       3        0             0 systemd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2973]  1000  2973     2279      589       7       3        0             0 (sd-pam)
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2989]  1000  2989     1836      907       7       3        0             0 bash
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2997]  1000  2997    25339     2169      17       3        0             0 pulseaudio
Dec 16 18:56:32 boerne.fritz.box kernel: [ 3000]   111  3000     5763      655       9       3        0             0 rtkit-daemon
Dec 16 18:56:32 boerne.fritz.box kernel: [ 3019]  1000  3019     3575     1403      11       3        0             0 gconf-helper
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5626]  1000  5626     1743      709       8       3        0             0 startx
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5647]  1000  5647     1001      579       6       3        0             0 xinit
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5648]  1000  5648    22392     7078      41       3        0             0 X
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5674]  1000  5674    10584     4543      21       3        0             0 awesome
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5718]  1000  5718     1571      610       7       3        0             0 dbus-launch
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5720]  1000  5720     1238      645       6       3        0             0 dbus-daemon
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5725]  1000  5725     1571      634       7       3        0             0 dbus-launch
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5726]  1000  5726     1238      649       6       3        0             0 dbus-daemon
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5823]  1000  5823    35683     8366      42       3        0             0 nm-applet
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5825]  1000  5825    21454     7358      31       3        0             0 xfce4-terminal
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5827]  1000  5827    11257     1911      14       3        0             0 at-spi-bus-laun
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5832]  1000  5832     1238      831       6       3        0             0 dbus-daemon
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5838]  1000  5838     7480     2110      12       3        0             0 at-spi2-registr
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5840]  1000  5840    10179     1459      13       3        0             0 gvfsd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 6181]  1000  6181     1836      883       7       3        0             0 bash
Dec 16 18:56:32 boerne.fritz.box kernel: [ 7874]  1000  7874     2246     1185       8       3        0             0 ssh
Dec 16 18:56:32 boerne.fritz.box kernel: [13020]   250 13020      549      377       4       3        0             0 sandbox
Dec 16 18:56:32 boerne.fritz.box kernel: [13022]   250 13022     2629     1567       8       3        0             0 ebuild.sh
Dec 16 18:56:32 boerne.fritz.box kernel: [13040]  1000 13040     1836      933       7       3        0             0 bash
Dec 16 18:56:32 boerne.fritz.box kernel: [13048]   250 13048     3002     1718       8       3        0             0 ebuild.sh
Dec 16 18:56:32 boerne.fritz.box kernel: [13052]   250 13052     1122      732       5       3        0             0 emake
Dec 16 18:56:32 boerne.fritz.box kernel: [13054]   250 13054      921      697       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13118]   250 13118     1048      783       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13181]   250 13181     1043      789       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13208]   250 13208     1095      855       6       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13255]   250 13255      772      555       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13299]   250 13299      913      689       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13493]   250 13493      876      619       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13494]   250 13494    15321    14729      34       3        0             0 python
Dec 16 18:56:32 boerne.fritz.box kernel: [13532]   250 13532      808      594       4       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13593]  1000 13593     1533      624       7       3        0             0 tar
Dec 16 18:56:32 boerne.fritz.box kernel: [13594]  1000 13594    17834    16906      38       3        0             0 xz
Dec 16 18:56:32 boerne.fritz.box kernel: [13604]   250 13604    12599    12029      28       3        0             0 python
Dec 16 18:56:32 boerne.fritz.box kernel: [13658]   250 13658     1549     1104       6       3        0             0 python
Dec 16 18:56:32 boerne.fritz.box kernel: Out of memory: Kill process 13594 (xz) score 8 or sacrifice child
Dec 16 18:56:32 boerne.fritz.box kernel: Killed process 13594 (xz) total-vm:71336kB, anon-rss:65668kB, file-rss:1956kB, shmem-rss:0kB
Dec 16 18:56:32 boerne.fritz.box kernel: xfce4-terminal invoked oom-killer: gfp_mask=0x25000c0(GFP_KERNEL_ACCOUNT), nodemask=0, order=0, oom_score_adj=0
Dec 16 18:56:32 boerne.fritz.box kernel: xfce4-terminal cpuset=/ mems_allowed=0
Dec 16 18:56:32 boerne.fritz.box kernel: CPU: 1 PID: 5825 Comm: xfce4-terminal Not tainted 4.9.0-gentoo #3
Dec 16 18:56:32 boerne.fritz.box kernel: Hardware name: TOSHIBA Satellite L500/KSWAA, BIOS V1.80 10/28/2009
Dec 16 18:56:32 boerne.fritz.box kernel:  c6941c18 c1433406 c6941d48 ef25ef00 c6941c48 c1170011 c6941c9c 00200286
Dec 16 18:56:32 boerne.fritz.box kernel:  c6941c48 c1438fff c6941c4c ef267c80 ef233a00 ef25ef00 c1ad1899 c6941d48
Dec 16 18:56:32 boerne.fritz.box kernel:  c6941c8c c1114407 c10513a5 c6941c78 c11140a1 00000006 00000000 00000000
Dec 16 18:56:32 boerne.fritz.box kernel: Call Trace:
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1433406>] dump_stack+0x47/0x61
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1170011>] dump_header+0x5f/0x175
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1438fff>] ? ___ratelimit+0x7f/0xe0
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1114407>] oom_kill_process+0x207/0x3c0
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c10513a5>] ? has_capability_noaudit+0x15/0x20
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c11140a1>] ? oom_badness.part.13+0xb1/0x120
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c11148c4>] out_of_memory+0xd4/0x270
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1118615>] __alloc_pages_nodemask+0xcf5/0xd60
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1758900>] ? skb_queue_purge+0x30/0x30
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c175dcde>] alloc_skb_with_frags+0xee/0x1a0
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1753dba>] sock_alloc_send_pskb+0x19a/0x1c0
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1186120>] ? poll_select_copy_remaining+0x120/0x120
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1825880>] ? wait_for_unix_gc+0x20/0x90
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1823fc0>] unix_stream_sendmsg+0x2a0/0x350
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1750b3d>] sock_sendmsg+0x2d/0x40
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1750bb7>] sock_write_iter+0x67/0xc0
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1172c42>] do_readv_writev+0x1e2/0x380
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1750b50>] ? sock_sendmsg+0x40/0x40
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1033763>] ? lapic_next_event+0x13/0x20
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c10ae675>] ? clockevents_program_event+0x95/0x190
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c10a074a>] ? __hrtimer_run_queues+0x20a/0x280
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1173d16>] vfs_writev+0x36/0x60
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1173d85>] do_writev+0x45/0xc0
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1173efb>] SyS_writev+0x1b/0x20
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c10018ec>] do_fast_syscall_32+0x7c/0x130
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c194232b>] sysenter_past_esp+0x40/0x6a
Dec 16 18:56:32 boerne.fritz.box kernel: Mem-Info:
Dec 16 18:56:32 boerne.fritz.box kernel: active_anon:56747 inactive_anon:7267 isolated_anon:0
                                          active_file:297677 inactive_file:387697 isolated_file:0
                                          unevictable:0 dirty:151 writeback:18 unstable:0
                                          slab_reclaimable:54648 slab_unreclaimable:21983
                                          mapped:17769 shmem:8215 pagetables:637 bounce:8
                                          free:157498 free_pcp:299 free_cma:0
Dec 16 18:56:32 boerne.fritz.box kernel: Node 0 active_anon:226988kB inactive_anon:29068kB active_file:1190708kB inactive_file:1550788kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:71076kB dirty:604kB writeback:72kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 47104kB anon_thp: 32860kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
Dec 16 18:56:32 boerne.fritz.box kernel: DMA free:4020kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:4804kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:5356kB slab_unreclaimable:1572kB kernel_stack:32kB pagetables:84kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Dec 16 18:56:32 boerne.fritz.box kernel: lowmem_reserve[]: 0 808 3849 3849
Dec 16 18:56:32 boerne.fritz.box kernel: Normal free:40988kB min:41100kB low:51372kB high:61644kB active_anon:0kB inactive_anon:0kB active_file:472436kB inactive_file:144kB unevictable:0kB writepending:312kB present:897016kB managed:831480kB mlocked:0kB slab_reclaimable:213236kB slab_unreclaimable:86360kB kernel_stack:1584kB pagetables:2464kB bounce:32kB free_pcp:116kB local_pcp:0kB free_cma:0kB
Dec 16 18:56:32 boerne.fritz.box kernel: lowmem_reserve[]: 0 0 24330 24330
Dec 16 18:56:32 boerne.fritz.box kernel: HighMem free:584984kB min:512kB low:39184kB high:77856kB active_anon:226988kB inactive_anon:29068kB active_file:713448kB inactive_file:1550556kB unevictable:0kB writepending:224kB present:3114256kB managed:3114256kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:1080kB local_pcp:400kB free_cma:0kB
Dec 16 18:56:32 boerne.fritz.box kernel: lowmem_reserve[]: 0 0 0 0
Dec 16 18:56:32 boerne.fritz.box kernel: DMA: 13*4kB (UE) 2*8kB (U) 1*16kB (E) 1*32kB (U) 1*64kB (U) 0*128kB 1*256kB (E) 1*512kB (E) 1*1024kB (U) 1*2048kB (M) 0*4096kB = 4020kB
Dec 16 18:56:32 boerne.fritz.box kernel: Normal: 36*4kB (ME) 24*8kB (ME) 16*16kB (ME) 138*32kB (UME) 143*64kB (UME) 82*128kB (UM) 18*256kB (UM) 3*512kB (UME) 2*1024kB (ME) 2*2048kB (ME) 1*4096kB (M) = 41040kB
Dec 16 18:56:32 boerne.fritz.box kernel: HighMem: 3430*4kB (UME) 1795*8kB (UME) 750*16kB (UM) 401*32kB (UM) 148*64kB (UME) 56*128kB (UM) 28*256kB (UME) 19*512kB (UME) 9*1024kB (UME) 55*2048kB (UME) 92*4096kB (UME) = 585136kB
Dec 16 18:56:32 boerne.fritz.box kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec 16 18:56:32 boerne.fritz.box kernel: 693648 total pagecache pages
Dec 16 18:56:32 boerne.fritz.box kernel: 0 pages in swap cache
Dec 16 18:56:32 boerne.fritz.box kernel: Swap cache stats: add 0, delete 0, find 0/0
Dec 16 18:56:32 boerne.fritz.box kernel: Free swap  = 3781628kB
Dec 16 18:56:32 boerne.fritz.box kernel: Total swap = 3781628kB
Dec 16 18:56:32 boerne.fritz.box kernel: 1006816 pages RAM
Dec 16 18:56:32 boerne.fritz.box kernel: 778564 pages HighMem/MovableOnly
Dec 16 18:56:32 boerne.fritz.box kernel: 16403 pages reserved
Dec 16 18:56:32 boerne.fritz.box kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Dec 16 18:56:32 boerne.fritz.box kernel: [ 1874]     0  1874     6166     1011       9       3        0             0 systemd-journal
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2497]     0  2497     2965      911       8       3        0         -1000 systemd-udevd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2582]   107  2582     3874      958       8       3        0             0 systemd-timesyn
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2585]   108  2585     1301      885       6       3        0          -900 dbus-daemon
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2586]     0  2586    22054     3277      20       3        0             0 NetworkManager
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2587]     0  2587     1521      972       7       3        0             0 systemd-logind
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2589]    88  2589     1158      627       6       3        0             0 nullmailer-send
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2612]     0  2612     1510      460       5       3        0             0 fcron
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2665]     0  2665      768      580       5       3        0             0 dhcpcd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2668]     0  2668      639      408       5       3        0             0 vnstatd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2669]     0  2669     1460     1063       6       3        0         -1000 sshd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2670]     0  2670     1235      838       6       3        0             0 login
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2672]     0  2672     1972     1267       7       3        0             0 systemd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2700]     0  2700     2279      586       7       3        0             0 (sd-pam)
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2733]     0  2733     1836      890       7       3        0             0 bash
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2753]   109  2753    16724     3089      19       3        0             0 polkitd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2776]     0  2776     2153     1349       7       3        0             0 wpa_supplicant
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2941]     0  2941    16268    15095      36       3        0             0 emerge
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2942]     0  2942     1235      833       5       3        0             0 login
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2949]  1000  2949     2033     1378       7       3        0             0 systemd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2973]  1000  2973     2279      589       7       3        0             0 (sd-pam)
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2989]  1000  2989     1836      907       7       3        0             0 bash
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2997]  1000  2997    25339     2169      17       3        0             0 pulseaudio
Dec 16 18:56:32 boerne.fritz.box kernel: [ 3000]   111  3000     5763      655       9       3        0             0 rtkit-daemon
Dec 16 18:56:32 boerne.fritz.box kernel: [ 3019]  1000  3019     3575     1403      11       3        0             0 gconf-helper
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5626]  1000  5626     1743      709       8       3        0             0 startx
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5647]  1000  5647     1001      579       6       3        0             0 xinit
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5648]  1000  5648    22392     7078      41       3        0             0 X
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5674]  1000  5674    10584     4543      21       3        0             0 awesome
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5718]  1000  5718     1571      610       7       3        0             0 dbus-launch
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5720]  1000  5720     1238      645       6       3        0             0 dbus-daemon
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5725]  1000  5725     1571      634       7       3        0             0 dbus-launch
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5726]  1000  5726     1238      649       6       3        0             0 dbus-daemon
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5823]  1000  5823    35683     8366      42       3        0             0 nm-applet
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5825]  1000  5825    21454     7358      31       3        0             0 xfce4-terminal
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5827]  1000  5827    11257     1911      14       3        0             0 at-spi-bus-laun
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5832]  1000  5832     1238      831       6       3        0             0 dbus-daemon
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5838]  1000  5838     7480     2110      12       3        0             0 at-spi2-registr
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5840]  1000  5840    10179     1459      13       3        0             0 gvfsd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 6181]  1000  6181     1836      883       7       3        0             0 bash
Dec 16 18:56:32 boerne.fritz.box kernel: [ 7874]  1000  7874     2246     1185       8       3        0             0 ssh
Dec 16 18:56:32 boerne.fritz.box kernel: [13020]   250 13020      549      377       4       3        0             0 sandbox
Dec 16 18:56:32 boerne.fritz.box kernel: [13022]   250 13022     2629     1567       8       3        0             0 ebuild.sh
Dec 16 18:56:32 boerne.fritz.box kernel: [13040]  1000 13040     1836      933       7       3        0             0 bash
Dec 16 18:56:32 boerne.fritz.box kernel: [13048]   250 13048     3002     1718       8       3        0             0 ebuild.sh
Dec 16 18:56:32 boerne.fritz.box kernel: [13052]   250 13052     1122      732       5       3        0             0 emake
Dec 16 18:56:32 boerne.fritz.box kernel: [13054]   250 13054      921      697       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13118]   250 13118     1048      783       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13181]   250 13181     1043      789       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13208]   250 13208     1095      855       6       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13255]   250 13255      772      555       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13299]   250 13299      913      689       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13493]   250 13493      876      619       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13494]   250 13494    15321    14775      34       3        0             0 python
Dec 16 18:56:32 boerne.fritz.box kernel: [13532]   250 13532      808      594       4       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13593]  1000 13593     1533      643       7       3        0             0 tar
Dec 16 18:56:32 boerne.fritz.box kernel: [13604]   250 13604    12760    12198      28       3        0             0 python
Dec 16 18:56:32 boerne.fritz.box kernel: [13658]   250 13658     1687     1280       6       3        0             0 python
Dec 16 18:56:32 boerne.fritz.box kernel: Out of memory: Kill process 13494 (python) score 7 or sacrifice child
Dec 16 18:56:32 boerne.fritz.box kernel: Killed process 13494 (python) total-vm:61284kB, anon-rss:54128kB, file-rss:4972kB, shmem-rss:0kB
Dec 16 18:56:32 boerne.fritz.box kernel: oom_reaper: reaped process 13494 (python), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Greetings
Nils

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on 4.9
  2016-12-16  7:39 ` Michal Hocko
  2016-12-16 15:58   ` OOM: Better, but still there on Michal Hocko
  2016-12-16 18:15   ` OOM: Better, but still there on 4.9 Chris Mason
@ 2016-12-16 19:50   ` Chris Mason
  2 siblings, 0 replies; 62+ messages in thread
From: Chris Mason @ 2016-12-16 19:50 UTC (permalink / raw)
  To: Michal Hocko, Nils Holland
  Cc: linux-kernel, linux-mm, David Sterba, linux-btrfs

On 12/16/2016 02:39 AM, Michal Hocko wrote:
> [CC linux-mm and btrfs guys]
>
> On Thu 15-12-16 23:57:04, Nils Holland wrote:
> [...]
>> Of course, none of this are workloads that are new / special in any
>> way - prior to 4.8, I never experienced any issues doing the exact
>> same things.
>>
>> Dec 15 19:02:16 teela kernel: kworker/u4:5 invoked oom-killer: gfp_mask=0x2400840(GFP_NOFS|__GFP_NOFAIL), nodemask=0, order=0, oom_score_adj=0
>> Dec 15 19:02:18 teela kernel: kworker/u4:5 cpuset=/ mems_allowed=0
>> Dec 15 19:02:18 teela kernel: CPU: 1 PID: 2603 Comm: kworker/u4:5 Not tainted 4.9.0-gentoo #2
>> Dec 15 19:02:18 teela kernel: Hardware name: Hewlett-Packard Compaq 15 Notebook PC/21F7, BIOS F.22 08/06/2014
>> Dec 15 19:02:18 teela kernel: Workqueue: writeback wb_workfn (flush-btrfs-1)
>> Dec 15 19:02:18 teela kernel:  eff0b604 c142bcce eff0b734 00000000 eff0b634 c1163332 00000000 00000292
>> Dec 15 19:02:18 teela kernel:  eff0b634 c1431876 eff0b638 e7fb0b00 e7fa2900 e7fa2900 c1b58785 eff0b734
>> Dec 15 19:02:18 teela kernel:  eff0b678 c110795f c1043895 eff0b664 c11075c7 00000007 00000000 00000000
>> Dec 15 19:02:18 teela kernel: Call Trace:
>> Dec 15 19:02:18 teela kernel:  [<c142bcce>] dump_stack+0x47/0x69
>> Dec 15 19:02:18 teela kernel:  [<c1163332>] dump_header+0x60/0x178
>> Dec 15 19:02:18 teela kernel:  [<c1431876>] ? ___ratelimit+0x86/0xe0
>> Dec 15 19:02:18 teela kernel:  [<c110795f>] oom_kill_process+0x20f/0x3d0
>> Dec 15 19:02:18 teela kernel:  [<c1043895>] ? has_capability_noaudit+0x15/0x20
>> Dec 15 19:02:18 teela kernel:  [<c11075c7>] ? oom_badness.part.13+0xb7/0x130
>> Dec 15 19:02:18 teela kernel:  [<c1107df9>] out_of_memory+0xd9/0x260
>> Dec 15 19:02:18 teela kernel:  [<c110ba0b>] __alloc_pages_nodemask+0xbfb/0xc80
>> Dec 15 19:02:18 teela kernel:  [<c110414d>] pagecache_get_page+0xad/0x270
>> Dec 15 19:02:18 teela kernel:  [<c13664a6>] alloc_extent_buffer+0x116/0x3e0
>> Dec 15 19:02:18 teela kernel:  [<c1334a2e>] btrfs_find_create_tree_block+0xe/0x10
>> Dec 15 19:02:18 teela kernel:  [<c132a57f>] btrfs_alloc_tree_block+0x1ef/0x5f0
>> Dec 15 19:02:18 teela kernel:  [<c130f7c3>] __btrfs_cow_block+0x143/0x5f0
>> Dec 15 19:02:18 teela kernel:  [<c130fe1a>] btrfs_cow_block+0x13a/0x220
>> Dec 15 19:02:18 teela kernel:  [<c13132f1>] btrfs_search_slot+0x1d1/0x870
>> Dec 15 19:02:18 teela kernel:  [<c132fcdd>] btrfs_lookup_file_extent+0x4d/0x60
>> Dec 15 19:02:18 teela kernel:  [<c1354fe6>] __btrfs_drop_extents+0x176/0x1070
>> Dec 15 19:02:18 teela kernel:  [<c1150377>] ? kmem_cache_alloc+0xb7/0x190
>> Dec 15 19:02:18 teela kernel:  [<c133dbb5>] ? start_transaction+0x65/0x4b0
>> Dec 15 19:02:18 teela kernel:  [<c1150597>] ? __kmalloc+0x147/0x1e0
>> Dec 15 19:02:18 teela kernel:  [<c1345005>] cow_file_range_inline+0x215/0x6b0
>> Dec 15 19:02:18 teela kernel:  [<c13459fc>] cow_file_range.isra.49+0x55c/0x6d0
>> Dec 15 19:02:18 teela kernel:  [<c1361795>] ? lock_extent_bits+0x75/0x1e0
>> Dec 15 19:02:18 teela kernel:  [<c1346d51>] run_delalloc_range+0x441/0x470
>> Dec 15 19:02:18 teela kernel:  [<c13626e4>] writepage_delalloc.isra.47+0x144/0x1e0
>> Dec 15 19:02:18 teela kernel:  [<c1364548>] __extent_writepage+0xd8/0x2b0
>> Dec 15 19:02:18 teela kernel:  [<c1365c4c>] extent_writepages+0x25c/0x380
>> Dec 15 19:02:18 teela kernel:  [<c1342cd0>] ? btrfs_real_readdir+0x610/0x610
>> Dec 15 19:02:18 teela kernel:  [<c133ff0f>] btrfs_writepages+0x1f/0x30
>> Dec 15 19:02:18 teela kernel:  [<c110ff85>] do_writepages+0x15/0x40
>> Dec 15 19:02:18 teela kernel:  [<c1190a95>] __writeback_single_inode+0x35/0x2f0
>> Dec 15 19:02:18 teela kernel:  [<c119112e>] writeback_sb_inodes+0x16e/0x340
>> Dec 15 19:02:18 teela kernel:  [<c119145a>] wb_writeback+0xaa/0x280
>> Dec 15 19:02:18 teela kernel:  [<c1191de8>] wb_workfn+0xd8/0x3e0
>> Dec 15 19:02:18 teela kernel:  [<c104fd34>] process_one_work+0x114/0x3e0
>> Dec 15 19:02:18 teela kernel:  [<c1050b4f>] worker_thread+0x2f/0x4b0
>> Dec 15 19:02:18 teela kernel:  [<c1050b20>] ? create_worker+0x180/0x180
>> Dec 15 19:02:18 teela kernel:  [<c10552e7>] kthread+0x97/0xb0
>> Dec 15 19:02:18 teela kernel:  [<c1055250>] ? __kthread_parkme+0x60/0x60
>> Dec 15 19:02:18 teela kernel:  [<c19b5cb7>] ret_from_fork+0x1b/0x28
>> Dec 15 19:02:18 teela kernel: Mem-Info:
>> Dec 15 19:02:18 teela kernel: active_anon:58685 inactive_anon:90 isolated_anon:0
>>                                active_file:274324 inactive_file:281962 isolated_file:0
>
> OK, so there is still some anonymous memory that could be swapped out
> and quite a lot of page cache. This might be harder to reclaim because
> the allocation is a GFP_NOFS request which is limited in its reclaim
> capabilities. It might be possible that those pagecache pages are pinned
> in some way by the the filesystem.

Reading harder, its possible those pagecache pages are all from the 
btree inode.  They shouldn't be pinned by btrfs, kswapd should be able 
to wander in and free a good chunk.  What btrfs wants to happen is for 
this allocation to sit and wait for kswapd to make progress.

-chris

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically
  2016-12-16 17:31       ` Johannes Weiner
@ 2016-12-16 22:12         ` Michal Hocko
  2016-12-17 11:17           ` Tetsuo Handa
  0 siblings, 1 reply; 62+ messages in thread
From: Michal Hocko @ 2016-12-16 22:12 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Nils Holland, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs

On Fri 16-12-16 12:31:51, Johannes Weiner wrote:
> On Fri, Dec 16, 2016 at 04:58:08PM +0100, Michal Hocko wrote:
> > @@ -1013,7 +1013,7 @@ bool out_of_memory(struct oom_control *oc)
> >  	 * make sure exclude 0 mask - all other users should have at least
> >  	 * ___GFP_DIRECT_RECLAIM to get here.
> >  	 */
> > -	if (oc->gfp_mask && !(oc->gfp_mask & (__GFP_FS|__GFP_NOFAIL)))
> > +	if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS))
> >  		return true;
> 
> This makes sense, we should go back to what we had here. Because it's
> not that the reported OOMs are premature - there is genuinely no more
> memory reclaimable from the allocating context - but that this class
> of allocations should never invoke the OOM killer in the first place.

agreed, at least not with the current implementtion. If we had a proper
accounting where we know that the memory pinned by the fs is not really
there then we could invoke the oom killer and be safe

> > @@ -3737,6 +3752,16 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >  		 */
> >  		WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER);
> >  
> > +		/*
> > +		 * Help non-failing allocations by giving them access to memory
> > +		 * reserves but do not use ALLOC_NO_WATERMARKS because this
> > +		 * could deplete whole memory reserves which would just make
> > +		 * the situation worse
> > +		 */
> > +		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac);
> > +		if (page)
> > +			goto got_pg;
> > +
> 
> But this should be a separate patch, IMO.
> 
> Do we observe GFP_NOFS lockups when we don't do this? 

this is hard to tell but considering users like grow_dev_page we can get
stuck with a very slow progress I believe. Those allocations could see
some help.

> Don't we risk
> premature exhaustion of the memory reserves, and it's better to wait
> for other reclaimers to make some progress instead?

waiting for other reclaimers would be preferable but we should at least
give these some priority, which is what ALLOC_HARDER should help with.

> Should we give
> reserve access to all GFP_NOFS allocations, or just the ones from a
> reclaim/cleaning context?

I would focus only for those which are important enough. Which are those
is a harder question. But certainly those with GFP_NOFAIL are important
enough.

> All that should go into the changelog of a separate allocation booster
> patch, I think.

The reason I did both in the same patch is to address the concern about
potential lockups when NOFS|NOFAIL cannot make any progress. I've chosen
ALLOC_HARDER to give the minimum portion of the reserves so that we do
not risk other high priority users to be blocked out but still help a
bit at least and prevent from starvation when other reclaimers are
faster to consume the reclaimed memory.

I can extend the changelog of course but I believe that having both
changes together makes some sense. NOFS|NOFAIL allocations are not all
that rare and sometimes we really depend on them making a further
progress.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on 4.9
  2016-12-16 18:15   ` OOM: Better, but still there on 4.9 Chris Mason
@ 2016-12-16 22:14     ` Michal Hocko
  2016-12-16 22:47       ` Chris Mason
  0 siblings, 1 reply; 62+ messages in thread
From: Michal Hocko @ 2016-12-16 22:14 UTC (permalink / raw)
  To: Chris Mason
  Cc: Nils Holland, linux-kernel, linux-mm, David Sterba, linux-btrfs

On Fri 16-12-16 13:15:18, Chris Mason wrote:
> On 12/16/2016 02:39 AM, Michal Hocko wrote:
[...]
> > I believe the right way to go around this is to pursue what I've started
> > in [1]. I will try to prepare something for testing today for you. Stay
> > tuned. But I would be really happy if somebody from the btrfs camp could
> > check the NOFS aspect of this allocation. We have already seen
> > allocation stalls from this path quite recently
> 
> Just double checking, are you asking why we're using GFP_NOFS to avoid going
> into btrfs from the btrfs writepages call, or are you asking why we aren't
> allowing highmem?

I am more interested in the NOFS part. Why cannot this be a full
GFP_KERNEL context? What kind of locks we would lock up when recursing
to the fs via slab shrinkers?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on 4.9
  2016-12-16 22:14     ` Michal Hocko
@ 2016-12-16 22:47       ` Chris Mason
  2016-12-16 23:31         ` Michal Hocko
  0 siblings, 1 reply; 62+ messages in thread
From: Chris Mason @ 2016-12-16 22:47 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Nils Holland, linux-kernel, linux-mm, David Sterba, linux-btrfs

On 12/16/2016 05:14 PM, Michal Hocko wrote:
> On Fri 16-12-16 13:15:18, Chris Mason wrote:
>> On 12/16/2016 02:39 AM, Michal Hocko wrote:
> [...]
>>> I believe the right way to go around this is to pursue what I've started
>>> in [1]. I will try to prepare something for testing today for you. Stay
>>> tuned. But I would be really happy if somebody from the btrfs camp could
>>> check the NOFS aspect of this allocation. We have already seen
>>> allocation stalls from this path quite recently
>>
>> Just double checking, are you asking why we're using GFP_NOFS to avoid going
>> into btrfs from the btrfs writepages call, or are you asking why we aren't
>> allowing highmem?
>
> I am more interested in the NOFS part. Why cannot this be a full
> GFP_KERNEL context? What kind of locks we would lock up when recursing
> to the fs via slab shrinkers?
>

Since this is our writepages call, any jump into direct reclaim would go 
to writepage, which would end up calling the same set of code to read 
metadata blocks, which would do a GFP_KERNEL allocation and end up back 
in writepage again.

We'd also have issues with blowing through transaction reservations 
since the writepage recursion would have to nest into the running 
transaction.

-chris

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on 4.9
  2016-12-16 22:47       ` Chris Mason
@ 2016-12-16 23:31         ` Michal Hocko
  0 siblings, 0 replies; 62+ messages in thread
From: Michal Hocko @ 2016-12-16 23:31 UTC (permalink / raw)
  To: Chris Mason
  Cc: Nils Holland, linux-kernel, linux-mm, David Sterba, linux-btrfs

On Fri 16-12-16 17:47:25, Chris Mason wrote:
> On 12/16/2016 05:14 PM, Michal Hocko wrote:
> > On Fri 16-12-16 13:15:18, Chris Mason wrote:
> > > On 12/16/2016 02:39 AM, Michal Hocko wrote:
> > [...]
> > > > I believe the right way to go around this is to pursue what I've started
> > > > in [1]. I will try to prepare something for testing today for you. Stay
> > > > tuned. But I would be really happy if somebody from the btrfs camp could
> > > > check the NOFS aspect of this allocation. We have already seen
> > > > allocation stalls from this path quite recently
> > > 
> > > Just double checking, are you asking why we're using GFP_NOFS to avoid going
> > > into btrfs from the btrfs writepages call, or are you asking why we aren't
> > > allowing highmem?
> > 
> > I am more interested in the NOFS part. Why cannot this be a full
> > GFP_KERNEL context? What kind of locks we would lock up when recursing
> > to the fs via slab shrinkers?
> > 
> 
> Since this is our writepages call, any jump into direct reclaim would go to
> writepage, which would end up calling the same set of code to read metadata
> blocks, which would do a GFP_KERNEL allocation and end up back in writepage
> again.

But we are not doing pageout on the page cache from the direct reclaim
for a long time. So basically the only way to recurse back to the fs
code is via slab ([di]cache) shrinkers. Are those a problem as well?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-16 18:47     ` OOM: Better, but still there on Nils Holland
@ 2016-12-17  0:02       ` Michal Hocko
  2016-12-17 12:59         ` Nils Holland
  0 siblings, 1 reply; 62+ messages in thread
From: Michal Hocko @ 2016-12-17  0:02 UTC (permalink / raw)
  To: Nils Holland
  Cc: linux-kernel, linux-mm, Chris Mason, David Sterba, linux-btrfs

On Fri 16-12-16 19:47:00, Nils Holland wrote:
[...]
> Despite the fact that I'm no expert, I can see that there's no more
> GFP_NOFS being logged, which seems to be what the patches tried to
> achieve. What the still present OOMs mean remains up for
> interpretation by the experts, all I can say is that in the (pre-4.8?)
> past, doing all of the things I just did would probably slow down my
> machine quite a bit, but I can't remember to have ever seen it OOM or
> even crash completely.
> 
> Dec 16 18:56:24 boerne.fritz.box kernel: Purging GPU memory, 37 pages freed, 10219 pages still pinned.
> Dec 16 18:56:29 boerne.fritz.box kernel: kthreadd invoked oom-killer: gfp_mask=0x27080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=0, order=1, oom_score_adj=0
> Dec 16 18:56:29 boerne.fritz.box kernel: kthreadd cpuset=/ mems_allowed=0
[...]
> Dec 16 18:56:29 boerne.fritz.box kernel: Normal free:41008kB min:41100kB low:51372kB high:61644kB active_anon:0kB inactive_anon:0kB active_file:470556kB inactive_file:148kB unevictable:0kB writepending:1616kB present:897016kB managed:831480kB mlocked:0kB slab_reclaimable:213172kB slab_unreclaimable:86236kB kernel_stack:1864kB pagetables:3572kB bounce:0kB free_pcp:532kB local_pcp:456kB free_cma:0kB

this is a GFP_KERNEL allocation so it cannot use the highmem zone again.
There is no anonymous memory in this zone but the allocation
context implies the full reclaim context so the file LRU should be
reclaimable. For some reason ~470MB of the active file LRU is still
there. This is quite unexpected. It is harder to tell more without
further data. It would be great if you could enable reclaim related
tracepoints:

mount -t tracefs none /debug/trace
echo 1 > /debug/trace/events/vmscan/enable
cat /debug/trace/trace_pipe > trace.log

should help
[...]

> Dec 16 18:56:31 boerne.fritz.box kernel: xfce4-terminal invoked oom-killer: gfp_mask=0x25000c0(GFP_KERNEL_ACCOUNT), nodemask=0, order=0, oom_score_adj=0

another allocation in a short time. Killing the task has obviously
didn't help because the lowmem memory pressure hasn't been relieved

[...]
> Dec 16 18:56:32 boerne.fritz.box kernel: Normal free:41028kB min:41100kB low:51372kB high:61644kB active_anon:0kB inactive_anon:0kB active_file:472164kB inactive_file:108kB unevictable:0kB writepending:112kB present:897016kB managed:831480kB mlocked:0kB slab_reclaimable:213236kB slab_unreclaimable:86360kB kernel_stack:1584kB pagetables:2564kB bounce:32kB free_pcp:180kB local_pcp:24kB free_cma:0kB

in fact we have even more pages on the file LRUs.

[...]

> Dec 16 18:56:32 boerne.fritz.box kernel: xfce4-terminal invoked oom-killer: gfp_mask=0x25000c0(GFP_KERNEL_ACCOUNT), nodemask=0, order=0, oom_score_adj=0
[...]
> Dec 16 18:56:32 boerne.fritz.box kernel: Normal free:40988kB min:41100kB low:51372kB high:61644kB active_anon:0kB inactive_anon:0kB active_file:472436kB inactive_file:144kB unevictable:0kB writepending:312kB present:897016kB managed:831480kB mlocked:0kB slab_reclaimable:213236kB slab_unreclaimable:86360kB kernel_stack:1584kB pagetables:2464kB bounce:32kB free_pcp:116kB local_pcp:0kB free_cma:0kB

same here. All that suggests that the page cache cannot be reclaimed for
some reason. It is hard to tell why but there is definitely something
bad going on.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically
  2016-12-16 22:12         ` Michal Hocko
@ 2016-12-17 11:17           ` Tetsuo Handa
  2016-12-18 16:37             ` Michal Hocko
  0 siblings, 1 reply; 62+ messages in thread
From: Tetsuo Handa @ 2016-12-17 11:17 UTC (permalink / raw)
  To: Michal Hocko, Johannes Weiner
  Cc: Nils Holland, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs

Michal Hocko wrote:
> On Fri 16-12-16 12:31:51, Johannes Weiner wrote:
>>> @@ -3737,6 +3752,16 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>>>  		 */
>>>  		WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER);
>>>  
>>> +		/*
>>> +		 * Help non-failing allocations by giving them access to memory
>>> +		 * reserves but do not use ALLOC_NO_WATERMARKS because this
>>> +		 * could deplete whole memory reserves which would just make
>>> +		 * the situation worse
>>> +		 */
>>> +		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac);
>>> +		if (page)
>>> +			goto got_pg;
>>> +
>>
>> But this should be a separate patch, IMO.
>>
>> Do we observe GFP_NOFS lockups when we don't do this? 
> 
> this is hard to tell but considering users like grow_dev_page we can get
> stuck with a very slow progress I believe. Those allocations could see
> some help.
> 
>> Don't we risk
>> premature exhaustion of the memory reserves, and it's better to wait
>> for other reclaimers to make some progress instead?
> 
> waiting for other reclaimers would be preferable but we should at least
> give these some priority, which is what ALLOC_HARDER should help with.
> 
>> Should we give
>> reserve access to all GFP_NOFS allocations, or just the ones from a
>> reclaim/cleaning context?
> 
> I would focus only for those which are important enough. Which are those
> is a harder question. But certainly those with GFP_NOFAIL are important
> enough.
> 
>> All that should go into the changelog of a separate allocation booster
>> patch, I think.
> 
> The reason I did both in the same patch is to address the concern about
> potential lockups when NOFS|NOFAIL cannot make any progress. I've chosen
> ALLOC_HARDER to give the minimum portion of the reserves so that we do
> not risk other high priority users to be blocked out but still help a
> bit at least and prevent from starvation when other reclaimers are
> faster to consume the reclaimed memory.
> 
> I can extend the changelog of course but I believe that having both
> changes together makes some sense. NOFS|NOFAIL allocations are not all
> that rare and sometimes we really depend on them making a further
> progress.
> 

I feel that allowing access to memory reserves based on __GFP_NOFAIL might not
make sense. My understanding is that actual I/O operation triggered by I/O
requests by filesystem code are processed by other threads. Even if we grant
access to memory reserves to GFP_NOFS | __GFP_NOFAIL allocations by fs code,
I think that it is possible that memory allocations by underlying bio code
fails to make a further progress unless memory reserves are granted as well.

Below is a typical trace which I observe under OOM lockuped situation (though
this trace is from an OOM stress test using XFS).

----------------------------------------
[ 1845.187246] MemAlloc: kworker/2:1(14498) flags=0x4208060 switches=323636 seq=48 gfp=0x2400000(GFP_NOIO) order=0 delay=430400 uninterruptible
[ 1845.187248] kworker/2:1     D12712 14498      2 0x00000080
[ 1845.187251] Workqueue: events_freezable_power_ disk_events_workfn
[ 1845.187252] Call Trace:
[ 1845.187253]  ? __schedule+0x23f/0xba0
[ 1845.187254]  schedule+0x38/0x90
[ 1845.187255]  schedule_timeout+0x205/0x4a0
[ 1845.187256]  ? del_timer_sync+0xd0/0xd0
[ 1845.187257]  schedule_timeout_uninterruptible+0x25/0x30
[ 1845.187258]  __alloc_pages_nodemask+0x1035/0x10e0
[ 1845.187259]  ? alloc_request_struct+0x14/0x20
[ 1845.187261]  alloc_pages_current+0x96/0x1b0
[ 1845.187262]  ? bio_alloc_bioset+0x20f/0x2e0
[ 1845.187264]  bio_copy_kern+0xc4/0x180
[ 1845.187265]  blk_rq_map_kern+0x6f/0x120
[ 1845.187268]  __scsi_execute.isra.23+0x12f/0x160
[ 1845.187270]  scsi_execute_req_flags+0x8f/0x100
[ 1845.187271]  sr_check_events+0xba/0x2b0 [sr_mod]
[ 1845.187274]  cdrom_check_events+0x13/0x30 [cdrom]
[ 1845.187275]  sr_block_check_events+0x25/0x30 [sr_mod]
[ 1845.187276]  disk_check_events+0x5b/0x150
[ 1845.187277]  disk_events_workfn+0x17/0x20
[ 1845.187278]  process_one_work+0x1fc/0x750
[ 1845.187279]  ? process_one_work+0x167/0x750
[ 1845.187279]  worker_thread+0x126/0x4a0
[ 1845.187280]  kthread+0x10a/0x140
[ 1845.187281]  ? process_one_work+0x750/0x750
[ 1845.187282]  ? kthread_create_on_node+0x60/0x60
[ 1845.187283]  ret_from_fork+0x2a/0x40
----------------------------------------

I think that this GFP_NOIO allocation request needs to consume more memory reserves
than GFP_NOFS allocation request to make progress. 
Do we want to add __GFP_NOFAIL to this GFP_NOIO allocation request in order to allow
access to memory reserves as well as GFP_NOFS | __GFP_NOFAIL allocation request?

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-17  0:02       ` Michal Hocko
@ 2016-12-17 12:59         ` Nils Holland
  2016-12-17 14:44           ` Tetsuo Handa
  0 siblings, 1 reply; 62+ messages in thread
From: Nils Holland @ 2016-12-17 12:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, Chris Mason, David Sterba, linux-btrfs

On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote:
> On Fri 16-12-16 19:47:00, Nils Holland wrote:
> > 
> > Dec 16 18:56:24 boerne.fritz.box kernel: Purging GPU memory, 37 pages freed, 10219 pages still pinned.
> > Dec 16 18:56:29 boerne.fritz.box kernel: kthreadd invoked oom-killer: gfp_mask=0x27080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=0, order=1, oom_score_adj=0
> > Dec 16 18:56:29 boerne.fritz.box kernel: kthreadd cpuset=/ mems_allowed=0
> [...]
> > Dec 16 18:56:29 boerne.fritz.box kernel: Normal free:41008kB min:41100kB low:51372kB high:61644kB active_anon:0kB inactive_anon:0kB active_file:470556kB inactive_file:148kB unevictable:0kB writepending:1616kB present:897016kB managed:831480kB mlocked:0kB slab_reclaimable:213172kB slab_unreclaimable:86236kB kernel_stack:1864kB pagetables:3572kB bounce:0kB free_pcp:532kB local_pcp:456kB free_cma:0kB
> 
> this is a GFP_KERNEL allocation so it cannot use the highmem zone again.
> There is no anonymous memory in this zone but the allocation
> context implies the full reclaim context so the file LRU should be
> reclaimable. For some reason ~470MB of the active file LRU is still
> there. This is quite unexpected. It is harder to tell more without
> further data. It would be great if you could enable reclaim related
> tracepoints:
> 
> mount -t tracefs none /debug/trace
> echo 1 > /debug/trace/events/vmscan/enable
> cat /debug/trace/trace_pipe > trace.log
> 
> should help
> [...]

No problem! I enabled writing the trace data to a file and then tried
to trigger another OOM situation. That worked, this time without a
complete kernel panic, but with only my processes being killed and the
system becoming unresponsive. When that happened, I let it run for
another minute or two so that in case it was still logging something
to the trace file, it could continue to do so some time longer. Then I
rebooted with the only thing that still worked, i.e. by means of magic
SysRequest.

The trace file has actually become rather big (around 21 MB). I didn't
dare to cut anything from it because I didn't want to risk deleting
something that might turn out important. So, due to the size, I'm not
attaching the trace file to this message, but it's up compressed
(about 536 KB) to be grabbed at:

http://ftp.tisys.org/pub/misc/trace.log.xz

For reference, here's the OOM report that goes along with this
incident and the trace file:

Dec 17 13:31:06 boerne.fritz.box kernel: Purging GPU memory, 145 pages freed, 10287 pages still pinned.
Dec 17 13:31:07 boerne.fritz.box kernel: awesome invoked oom-killer: gfp_mask=0x25000c0(GFP_KERNEL_ACCOUNT), nodemask=0, order=0, oom_score_adj=0
Dec 17 13:31:07 boerne.fritz.box kernel: awesome cpuset=/ mems_allowed=0
Dec 17 13:31:07 boerne.fritz.box kernel: CPU: 1 PID: 5599 Comm: awesome Not tainted 4.9.0-gentoo #3
Dec 17 13:31:07 boerne.fritz.box kernel: Hardware name: TOSHIBA Satellite L500/KSWAA, BIOS V1.80 10/28/2009
Dec 17 13:31:07 boerne.fritz.box kernel:  c5a37c18
Dec 17 13:31:07 boerne.fritz.box kernel:  c1433406
Dec 17 13:31:07 boerne.fritz.box kernel:  c5a37d48
Dec 17 13:31:07 boerne.fritz.box kernel:  c5319280
Dec 17 13:31:07 boerne.fritz.box kernel:  c5a37c48
Dec 17 13:31:07 boerne.fritz.box kernel:  c1170011
Dec 17 13:31:07 boerne.fritz.box kernel:  c5a37c9c
Dec 17 13:31:07 boerne.fritz.box kernel:  00200286
Dec 17 13:31:07 boerne.fritz.box kernel:  c5a37c48
Dec 17 13:31:07 boerne.fritz.box kernel:  c1438fff
Dec 17 13:31:07 boerne.fritz.box kernel:  c5a37c4c
Dec 17 13:31:07 boerne.fritz.box kernel:  c72479c0
Dec 17 13:31:07 boerne.fritz.box kernel:  c60dd200
Dec 17 13:31:07 boerne.fritz.box kernel:  c5319280
Dec 17 13:31:07 boerne.fritz.box kernel:  c1ad1899
Dec 17 13:31:07 boerne.fritz.box kernel:  c5a37d48
Dec 17 13:31:07 boerne.fritz.box kernel:  c5a37c8c
Dec 17 13:31:07 boerne.fritz.box kernel:  c1114407
Dec 17 13:31:07 boerne.fritz.box kernel:  c10513a5
Dec 17 13:31:07 boerne.fritz.box kernel:  c5a37c78
Dec 17 13:31:07 boerne.fritz.box kernel:  c11140a1
Dec 17 13:31:07 boerne.fritz.box kernel:  00000005
Dec 17 13:31:07 boerne.fritz.box kernel:  00000000
Dec 17 13:31:07 boerne.fritz.box kernel:  00000000
Dec 17 13:31:07 boerne.fritz.box kernel: Call Trace:
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c1433406>] dump_stack+0x47/0x61
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c1170011>] dump_header+0x5f/0x175
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c1438fff>] ? ___ratelimit+0x7f/0xe0
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c1114407>] oom_kill_process+0x207/0x3c0
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c10513a5>] ? has_capability_noaudit+0x15/0x20
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c11140a1>] ? oom_badness.part.13+0xb1/0x120
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c11148c4>] out_of_memory+0xd4/0x270
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c1118615>] __alloc_pages_nodemask+0xcf5/0xd60
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c1758900>] ? skb_queue_purge+0x30/0x30
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c175dcde>] alloc_skb_with_frags+0xee/0x1a0
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c1753dba>] sock_alloc_send_pskb+0x19a/0x1c0
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c1825880>] ? wait_for_unix_gc+0x20/0x90
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c1823fc0>] unix_stream_sendmsg+0x2a0/0x350
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c1750b3d>] sock_sendmsg+0x2d/0x40
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c1750bb7>] sock_write_iter+0x67/0xc0
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c1172c42>] do_readv_writev+0x1e2/0x380
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c1750b50>] ? sock_sendmsg+0x40/0x40
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c10806f2>] ? pick_next_task_fair+0x3f2/0x510
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c1033763>] ? lapic_next_event+0x13/0x20
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c1173d16>] vfs_writev+0x36/0x60
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c1173d85>] do_writev+0x45/0xc0
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c1173efb>] SyS_writev+0x1b/0x20
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c10018ec>] do_fast_syscall_32+0x7c/0x130
Dec 17 13:31:07 boerne.fritz.box kernel:  [<c194232b>] sysenter_past_esp+0x40/0x6a
Dec 17 13:31:07 boerne.fritz.box kernel: Mem-Info:
Dec 17 13:31:07 boerne.fritz.box kernel: active_anon:99962 inactive_anon:10651 isolated_anon:0
                                          active_file:305350 inactive_file:411946 isolated_file:36
                                          unevictable:0 dirty:5961 writeback:0 unstable:0
                                          slab_reclaimable:50496 slab_unreclaimable:21852
                                          mapped:36866 shmem:10990 pagetables:973 bounce:0
                                          free:82280 free_pcp:103 free_cma:0
Dec 17 13:31:07 boerne.fritz.box kernel: Node 0 active_anon:399848kB inactive_anon:42604kB active_file:1221400kB inactive_file:1647784kB unevictable:0kB isolated(anon):0kB isolated(file):144kB mapped:147464kB dirty:23844kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 165888kB anon_thp: 43960kB writeback_tmp:0kB unstable:0kB pages_scanned:56194255 all_unreclaimable? yes
Dec 17 13:31:07 boerne.fritz.box kernel: DMA free:3944kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:6504kB inactive_file:0kB unevictable:0kB writepending:120kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:2712kB slab_unreclaimable:1016kB kernel_stack:360kB pagetables:1132kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Dec 17 13:31:07 boerne.fritz.box kernel: lowmem_reserve[]:
Dec 17 13:31:07 boerne.fritz.box kernel:  0
Dec 17 13:31:07 boerne.fritz.box kernel:  808
Dec 17 13:31:07 boerne.fritz.box kernel:  3849
Dec 17 13:31:07 boerne.fritz.box kernel:  3849
Dec 17 13:31:07 boerne.fritz.box kernel: Normal free:41056kB min:41100kB low:51372kB high:61644kB active_anon:0kB inactive_anon:0kB active_file:483028kB inactive_file:4kB unevictable:0kB writepending:2056kB present:897016kB managed:831480kB mlocked:0kB slab_reclaimable:199272kB slab_unreclaimable:86392kB kernel_stack:1656kB pagetables:2760kB bounce:0kB free_pcp:252kB local_pcp:144kB free_cma:0kB
Dec 17 13:31:07 boerne.fritz.box kernel: lowmem_reserve[]:
Dec 17 13:31:07 boerne.fritz.box kernel:  0
Dec 17 13:31:07 boerne.fritz.box kernel:  0
Dec 17 13:31:07 boerne.fritz.box kernel:  24330
Dec 17 13:31:07 boerne.fritz.box kernel:  24330
Dec 17 13:31:07 boerne.fritz.box kernel: HighMem free:284120kB min:512kB low:39184kB high:77856kB active_anon:399848kB inactive_anon:42604kB active_file:731868kB inactive_file:1647684kB unevictable:0kB writepending:21668kB present:3114256kB managed:3114256kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:160kB local_pcp:84kB free_cma:0kB
Dec 17 13:31:07 boerne.fritz.box kernel: lowmem_reserve[]:
Dec 17 13:31:07 boerne.fritz.box kernel:  0
Dec 17 13:31:07 boerne.fritz.box kernel:  0
Dec 17 13:31:07 boerne.fritz.box kernel:  0
Dec 17 13:31:07 boerne.fritz.box kernel:  0
Dec 17 13:31:07 boerne.fritz.box kernel: DMA: 
Dec 17 13:31:07 boerne.fritz.box kernel: 4*4kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (U) 
Dec 17 13:31:07 boerne.fritz.box kernel: 1*8kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (E) 
Dec 17 13:31:07 boerne.fritz.box kernel: 1*16kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (U) 
Dec 17 13:31:07 boerne.fritz.box kernel: 8*32kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (UE) 
Dec 17 13:31:07 boerne.fritz.box kernel: 3*64kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (UE) 
Dec 17 13:31:07 boerne.fritz.box kernel: 1*128kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (U) 
Dec 17 13:31:07 boerne.fritz.box kernel: 1*256kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (E) 
Dec 17 13:31:07 boerne.fritz.box kernel: 0*512kB 
Dec 17 13:31:07 boerne.fritz.box kernel: 1*1024kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (E) 
Dec 17 13:31:07 boerne.fritz.box kernel: 1*2048kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (M) 
Dec 17 13:31:07 boerne.fritz.box kernel: 0*4096kB 
Dec 17 13:31:07 boerne.fritz.box kernel: = 3944kB
Dec 17 13:31:07 boerne.fritz.box kernel: Normal: 
Dec 17 13:31:07 boerne.fritz.box kernel: 40*4kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (UM) 
Dec 17 13:31:07 boerne.fritz.box kernel: 28*8kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (UME) 
Dec 17 13:31:07 boerne.fritz.box kernel: 22*16kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (UME) 
Dec 17 13:31:07 boerne.fritz.box kernel: 20*32kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (M) 
Dec 17 13:31:07 boerne.fritz.box kernel: 92*64kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (UM) 
Dec 17 13:31:07 boerne.fritz.box kernel: 76*128kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (UME) 
Dec 17 13:31:07 boerne.fritz.box kernel: 20*256kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (UME) 
Dec 17 13:31:07 boerne.fritz.box kernel: 3*512kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (UM) 
Dec 17 13:31:07 boerne.fritz.box kernel: 1*1024kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (E) 
Dec 17 13:31:07 boerne.fritz.box kernel: 2*2048kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (UM) 
Dec 17 13:31:07 boerne.fritz.box kernel: 3*4096kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (M) 
Dec 17 13:31:07 boerne.fritz.box kernel: = 41056kB
Dec 17 13:31:07 boerne.fritz.box kernel: HighMem: 
Dec 17 13:31:07 boerne.fritz.box kernel: 1452*4kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (UME) 
Dec 17 13:31:07 boerne.fritz.box kernel: 1347*8kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (UME) 
Dec 17 13:31:07 boerne.fritz.box kernel: 903*16kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (UME) 
Dec 17 13:31:07 boerne.fritz.box kernel: 443*32kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (UME) 
Dec 17 13:31:07 boerne.fritz.box kernel: 135*64kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (UME) 
Dec 17 13:31:07 boerne.fritz.box kernel: 33*128kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (UME) 
Dec 17 13:31:07 boerne.fritz.box kernel: 11*256kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (ME) 
Dec 17 13:31:07 boerne.fritz.box kernel: 10*512kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (UME) 
Dec 17 13:31:07 boerne.fritz.box kernel: 7*1024kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (UME) 
Dec 17 13:31:07 boerne.fritz.box kernel: 3*2048kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (UE) 
Dec 17 13:31:07 boerne.fritz.box kernel: 50*4096kB 
Dec 17 13:31:07 boerne.fritz.box kernel: (UM) 
Dec 17 13:31:07 boerne.fritz.box kernel: = 284120kB
Dec 17 13:31:07 boerne.fritz.box kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec 17 13:31:07 boerne.fritz.box kernel: 728298 total pagecache pages
Dec 17 13:31:07 boerne.fritz.box kernel: 0 pages in swap cache
Dec 17 13:31:07 boerne.fritz.box kernel: Swap cache stats: add 0, delete 0, find 0/0
Dec 17 13:31:07 boerne.fritz.box kernel: Free swap  = 3781628kB
Dec 17 13:31:07 boerne.fritz.box kernel: Total swap = 3781628kB
Dec 17 13:31:07 boerne.fritz.box kernel: 1006816 pages RAM
Dec 17 13:31:07 boerne.fritz.box kernel: 778564 pages HighMem/MovableOnly
Dec 17 13:31:07 boerne.fritz.box kernel: 16403 pages reserved
Dec 17 13:31:07 boerne.fritz.box kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Dec 17 13:31:07 boerne.fritz.box kernel: [ 1876]     0  1876     6165      985      10       3        0             0 systemd-journal
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2497]     0  2497     2965      915       6       3        0         -1000 systemd-udevd
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2582]   107  2582     3874      902       8       3        0             0 systemd-timesyn
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2585]    88  2585     1158      567       6       3        0             0 nullmailer-send
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2588]   108  2588     1271      848       7       3        0          -900 dbus-daemon
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2590]     0  2590     1510      459       5       3        0             0 fcron
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2594]     0  2594     1521      994       6       3        0             0 systemd-logind
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2595]     0  2595    22001     3143      21       3        0             0 NetworkManager
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2649]     0  2649      768      579       5       3        0             0 dhcpcd
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2655]     0  2655      639      416       5       3        0             0 vnstatd
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2656]     0  2656     1235      843       6       3        0             0 login
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2657]     0  2657     1460     1047       6       3        0         -1000 sshd
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2684]     0  2684     1972     1291       7       3        0             0 systemd
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2713]     0  2713     2279      569       7       3        0             0 (sd-pam)
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2728]     0  2728     1836      914       7       3        0             0 bash
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2768]   109  2768    16725     3172      19       3        0             0 polkitd
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2798]     0  2798     2157     1375       7       3        0             0 wpa_supplicant
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2864]     0  2864     1743      703       7       3        0             0 start_trace
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2866]     0  2866     1395      390       7       3        0             0 cat
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2867]     0  2867     1370      422       6       3        0             0 tail
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2916]     0  2916     1235      845       6       3        0             0 login
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2917]     0  2917     1836      870       7       3        0             0 bash
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2956]     0  2956    16257    14998      36       3        0             0 emerge
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2963]     0  2963     1235      846       6       3        0             0 login
Dec 17 13:31:07 boerne.fritz.box kernel: [ 2972]     0  2972     1836      906       7       3        0             0 bash
Dec 17 13:31:07 boerne.fritz.box kernel: [ 3021]     0  3021     6058     1745      15       3        0             0 journalctl
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5253]   250  5253      549      356       5       3        0             0 sandbox
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5255]   250  5255     2629     1567       8       3        0             0 ebuild.sh
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5272]   250  5272     2995     1763       8       3        0             0 ebuild.sh
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5335]     0  5335     1235      843       6       3        0             0 login
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5343]   250  5343     1123      724       5       3        0             0 emake
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5345]   250  5345      909      661       6       3        0             0 make
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5467]  1000  5467     2033     1374       7       3        0             0 systemd
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5483]  1000  5483     6633      597      10       3        0             0 (sd-pam)
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5506]  1000  5506     1836      887       7       3        0             0 bash
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5530]   250  5530     1057      674       4       3        0             0 sh
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5531]   250  5531     3204     2648      10       3        0             0 python2.7
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5536]  1000  5536    25339     2203      18       3        0             0 pulseaudio
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5537]   111  5537     5763      643       9       3        0             0 rtkit-daemon
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5560]  1000  5560     3575     1420      10       3        0             0 gconf-helper
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5567]  1000  5567     1743      709       7       3        0             0 startx
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5588]  1000  5588     1001      579       5       3        0             0 xinit
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5589]  1000  5589    23142     6927      42       3        0             0 X
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5599]  1000  5599    10592     4532      21       3        0             0 awesome
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5625]  1000  5625     1571      616       7       3        0             0 dbus-launch
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5626]  1000  5626     1238      636       6       3        0             0 dbus-daemon
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5631]  1000  5631     1571      621       7       3        0             0 dbus-launch
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5632]  1000  5632     1238      703       6       3        0             0 dbus-daemon
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5659]   250  5659     3749     3243      11       3        0             0 python
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5671]  1000  5671    31584     7782      39       3        0             0 nm-applet
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5707]  1000  5707    11224     1897      14       3        0             0 at-spi-bus-laun
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5718]  1000  5718     1238      806       6       3        0             0 dbus-daemon
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5725]  1000  5725     7480     2144      12       3        0             0 at-spi2-registr
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5732]  1000  5732    10179     1469      14       3        0             0 gvfsd
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5765]  1000  5765   194951    71017     247       3        0             0 firefox
Dec 17 13:31:07 boerne.fritz.box kernel: [ 5825]   250  5825     1209      839       5       3        0             0 sh
Dec 17 13:31:07 boerne.fritz.box kernel: [ 7253]  1000  7253    21521     7455      32       3        0             0 xfce4-terminal
Dec 17 13:31:07 boerne.fritz.box kernel: [ 7359]  1000  7359     1836      891       7       3        0             0 bash
Dec 17 13:31:07 boerne.fritz.box kernel: [ 8641]  1000  8641     1533      593       6       3        0             0 tar
Dec 17 13:31:07 boerne.fritz.box kernel: [ 8642]  1000  8642    17834    16879      38       3        0             0 xz
Dec 17 13:31:07 boerne.fritz.box kernel: [ 9059]   250  9059    10070     2536      13       3        0             0 python
Dec 17 13:31:07 boerne.fritz.box kernel: [ 9063]   250  9063     3155     1923      10       3        0             0 python
Dec 17 13:31:07 boerne.fritz.box kernel: [ 9064]   250  9064     3155     1926      10       3        0             0 python
Dec 17 13:31:07 boerne.fritz.box kernel: [ 9068]   250  9068     1211      826       5       3        0             0 sh
Dec 17 13:31:07 boerne.fritz.box kernel: [ 9075]   250  9075     3847     3307      11       3        0             0 python
Dec 17 13:31:07 boerne.fritz.box kernel: [ 9417]  1000  9417     1829      901       7       3        0             0 bash
Dec 17 13:31:07 boerne.fritz.box kernel: [ 9459]  1000  9459     2246     1206       9       3        0             0 ssh
Dec 17 13:31:07 boerne.fritz.box kernel: [ 9499]   250  9499     1087      710       5       3        0             0 sh
Dec 17 13:31:07 boerne.fritz.box kernel: [ 9567]   250  9567     1087      532       5       3        0             0 sh
Dec 17 13:31:07 boerne.fritz.box kernel: [ 9570]   250  9570     1088      618       5       3        0             0 sh
Dec 17 13:31:07 boerne.fritz.box kernel: Out of memory: Kill process 5765 (firefox) score 36 or sacrifice child
Dec 17 13:31:07 boerne.fritz.box kernel: Killed process 5765 (firefox) total-vm:779804kB, anon-rss:183712kB, file-rss:100332kB, shmem-rss:24kB
Dec 17 13:31:08 boerne.fritz.box kernel: awesome invoked oom-killer: gfp_mask=0x25000c0(GFP_KERNEL_ACCOUNT), nodemask=0, order=0, oom_score_adj=0
Dec 17 13:31:08 boerne.fritz.box kernel: awesome cpuset=/ mems_allowed=0
Dec 17 13:31:08 boerne.fritz.box kernel: CPU: 0 PID: 5599 Comm: awesome Not tainted 4.9.0-gentoo #3
Dec 17 13:31:08 boerne.fritz.box kernel: Hardware name: TOSHIBA Satellite L500/KSWAA, BIOS V1.80 10/28/2009
Dec 17 13:31:08 boerne.fritz.box kernel:  c5a37c18 c1433406 c5a37d48 c531ca00 c5a37c48 c1170011 c5a37c9c 00000286
Dec 17 13:31:08 boerne.fritz.box kernel:  c5a37c48 c1438fff c5a37c4c c7246c00 e737e800 c531ca00 c1ad1899 c5a37d48
Dec 17 13:31:08 boerne.fritz.box kernel:  c5a37c8c c1114407 001d89cc c5a37c78 c1114000 00000005 00000000 00000000
Dec 17 13:31:08 boerne.fritz.box kernel: Call Trace:
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c1433406>] dump_stack+0x47/0x61
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c1170011>] dump_header+0x5f/0x175
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c1438fff>] ? ___ratelimit+0x7f/0xe0
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c1114407>] oom_kill_process+0x207/0x3c0
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c1114000>] ? oom_badness.part.13+0x10/0x120
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c11148c4>] out_of_memory+0xd4/0x270
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c1118615>] __alloc_pages_nodemask+0xcf5/0xd60
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c1758900>] ? skb_queue_purge+0x30/0x30
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c175dcde>] alloc_skb_with_frags+0xee/0x1a0
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c1753dba>] sock_alloc_send_pskb+0x19a/0x1c0
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c1825880>] ? wait_for_unix_gc+0x20/0x90
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c1823fc0>] unix_stream_sendmsg+0x2a0/0x350
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c1750b3d>] sock_sendmsg+0x2d/0x40
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c1750bb7>] sock_write_iter+0x67/0xc0
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c1172c42>] do_readv_writev+0x1e2/0x380
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c1750b50>] ? sock_sendmsg+0x40/0x40
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c1033763>] ? lapic_next_event+0x13/0x20
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c10ae675>] ? clockevents_program_event+0x95/0x190
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c10a074a>] ? __hrtimer_run_queues+0x20a/0x280
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c1173d16>] vfs_writev+0x36/0x60
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c1173d85>] do_writev+0x45/0xc0
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c1173efb>] SyS_writev+0x1b/0x20
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c10018ec>] do_fast_syscall_32+0x7c/0x130
Dec 17 13:31:08 boerne.fritz.box kernel:  [<c194232b>] sysenter_past_esp+0x40/0x6a
Dec 17 13:31:08 boerne.fritz.box kernel: Mem-Info:
Dec 17 13:31:08 boerne.fritz.box kernel: active_anon:53993 inactive_anon:7042 isolated_anon:0
                                          active_file:310474 inactive_file:411136 isolated_file:0
                                          unevictable:0 dirty:9093 writeback:0 unstable:0
                                          slab_reclaimable:50588 slab_unreclaimable:21858
                                          mapped:18104 shmem:7404 pagetables:732 bounce:0
                                          free:127428 free_pcp:488 free_cma:0
Dec 17 13:31:08 boerne.fritz.box kernel: Node 0 active_anon:215972kB inactive_anon:28168kB active_file:1241896kB inactive_file:1644544kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:72416kB dirty:36372kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 112640kB anon_thp: 29616kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
Dec 17 13:31:08 boerne.fritz.box kernel: DMA free:3928kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:6964kB inactive_file:44kB unevictable:0kB writepending:596kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:3016kB slab_unreclaimable:1176kB kernel_stack:96kB pagetables:388kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Dec 17 13:31:08 boerne.fritz.box kernel: lowmem_reserve[]: 0 808 3849 3849
Dec 17 13:31:08 boerne.fritz.box kernel: Normal free:40944kB min:41100kB low:51372kB high:61644kB active_anon:0kB inactive_anon:0kB active_file:483096kB inactive_file:80kB unevictable:0kB writepending:2060kB present:897016kB managed:831480kB mlocked:0kB slab_reclaimable:199336kB slab_unreclaimable:86256kB kernel_stack:1632kB pagetables:2540kB bounce:0kB free_pcp:692kB local_pcp:396kB free_cma:0kB
Dec 17 13:31:08 boerne.fritz.box kernel: lowmem_reserve[]: 0 0 24330 24330
Dec 17 13:31:08 boerne.fritz.box kernel: HighMem free:464840kB min:512kB low:39184kB high:77856kB active_anon:215972kB inactive_anon:28168kB active_file:751836kB inactive_file:1644320kB unevictable:0kB writepending:33716kB present:3114256kB managed:3114256kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:1260kB local_pcp:628kB free_cma:0kB
Dec 17 13:31:08 boerne.fritz.box kernel: lowmem_reserve[]: 0 0 0 0
Dec 17 13:31:08 boerne.fritz.box kernel: DMA: 6*4kB (U) 14*8kB (U) 15*16kB (U) 7*32kB (U) 2*64kB (U) 1*128kB (U) 0*256kB 0*512kB 1*1024kB (E) 1*2048kB (M) 0*4096kB = 3928kB
Dec 17 13:31:08 boerne.fritz.box kernel: Normal: 40*4kB (UM) 30*8kB (UM) 22*16kB (UME) 24*32kB (UM) 92*64kB (UM) 76*128kB (UME) 19*256kB (UM) 3*512kB (UM) 1*1024kB (E) 2*2048kB (UM) 3*4096kB (M) = 40944kB
Dec 17 13:31:08 boerne.fritz.box kernel: HighMem: 14*4kB (UE) 1256*8kB (ME) 869*16kB (UME) 520*32kB (UME) 210*64kB (UME) 93*128kB (UME) 42*256kB (ME) 22*512kB (UME) 12*1024kB (UME) 30*2048kB (UME) 74*4096kB (UM) = 464840kB
Dec 17 13:31:08 boerne.fritz.box kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec 17 13:31:08 boerne.fritz.box kernel: 729003 total pagecache pages
Dec 17 13:31:08 boerne.fritz.box kernel: 0 pages in swap cache
Dec 17 13:31:08 boerne.fritz.box kernel: Swap cache stats: add 0, delete 0, find 0/0
Dec 17 13:31:08 boerne.fritz.box kernel: Free swap  = 3781628kB
Dec 17 13:31:08 boerne.fritz.box kernel: Total swap = 3781628kB
Dec 17 13:31:08 boerne.fritz.box kernel: 1006816 pages RAM
Dec 17 13:31:08 boerne.fritz.box kernel: 778564 pages HighMem/MovableOnly
Dec 17 13:31:08 boerne.fritz.box kernel: 16403 pages reserved
Dec 17 13:31:08 boerne.fritz.box kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Dec 17 13:31:08 boerne.fritz.box kernel: [ 1876]     0  1876     6165     1016      10       3        0             0 systemd-journal
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2497]     0  2497     2965      915       6       3        0         -1000 systemd-udevd
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2582]   107  2582     3874      902       8       3        0             0 systemd-timesyn
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2585]    88  2585     1158      567       6       3        0             0 nullmailer-send
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2588]   108  2588     1271      848       7       3        0          -900 dbus-daemon
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2590]     0  2590     1510      459       5       3        0             0 fcron
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2594]     0  2594     1521      994       6       3        0             0 systemd-logind
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2595]     0  2595    22001     3143      21       3        0             0 NetworkManager
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2649]     0  2649      768      579       5       3        0             0 dhcpcd
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2655]     0  2655      639      416       5       3        0             0 vnstatd
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2656]     0  2656     1235      843       6       3        0             0 login
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2657]     0  2657     1460     1047       6       3        0         -1000 sshd
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2684]     0  2684     1972     1291       7       3        0             0 systemd
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2713]     0  2713     2279      569       7       3        0             0 (sd-pam)
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2728]     0  2728     1836      914       7       3        0             0 bash
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2768]   109  2768    16725     3172      19       3        0             0 polkitd
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2798]     0  2798     2157     1375       7       3        0             0 wpa_supplicant
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2864]     0  2864     1743      703       7       3        0             0 start_trace
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2866]     0  2866     1395      390       7       3        0             0 cat
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2867]     0  2867     1370      422       6       3        0             0 tail
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2916]     0  2916     1235      845       6       3        0             0 login
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2917]     0  2917     1836      870       7       3        0             0 bash
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2956]     0  2956    16257    14998      36       3        0             0 emerge
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2963]     0  2963     1235      846       6       3        0             0 login
Dec 17 13:31:08 boerne.fritz.box kernel: [ 2972]     0  2972     1836      906       7       3        0             0 bash
Dec 17 13:31:08 boerne.fritz.box kernel: [ 3021]     0  3021     6058     1761      15       3        0             0 journalctl
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5253]   250  5253      549      356       5       3        0             0 sandbox
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5255]   250  5255     2629     1567       8       3        0             0 ebuild.sh
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5272]   250  5272     2995     1763       8       3        0             0 ebuild.sh
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5335]     0  5335     1235      843       6       3        0             0 login
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5343]   250  5343     1123      724       5       3        0             0 emake
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5345]   250  5345      909      661       6       3        0             0 make
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5467]  1000  5467     2033     1374       7       3        0             0 systemd
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5483]  1000  5483     6633      597      10       3        0             0 (sd-pam)
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5506]  1000  5506     1836      887       7       3        0             0 bash
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5530]   250  5530     1057      674       4       3        0             0 sh
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5531]   250  5531     3204     2648      10       3        0             0 python2.7
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5536]  1000  5536    25339     2203      18       3        0             0 pulseaudio
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5537]   111  5537     5763      643       9       3        0             0 rtkit-daemon
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5560]  1000  5560     3575     1420      10       3        0             0 gconf-helper
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5567]  1000  5567     1743      709       7       3        0             0 startx
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5588]  1000  5588     1001      579       5       3        0             0 xinit
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5589]  1000  5589    23069     6556      42       3        0             0 X
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5599]  1000  5599    10592     4532      21       3        0             0 awesome
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5625]  1000  5625     1571      616       7       3        0             0 dbus-launch
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5626]  1000  5626     1238      636       6       3        0             0 dbus-daemon
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5631]  1000  5631     1571      621       7       3        0             0 dbus-launch
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5632]  1000  5632     1238      703       6       3        0             0 dbus-daemon
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5659]   250  5659     3749     3243      11       3        0             0 python
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5671]  1000  5671    31584     7782      39       3        0             0 nm-applet
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5707]  1000  5707    11224     1897      14       3        0             0 at-spi-bus-laun
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5718]  1000  5718     1238      806       6       3        0             0 dbus-daemon
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5725]  1000  5725     7480     2144      12       3        0             0 at-spi2-registr
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5732]  1000  5732    10179     1469      14       3        0             0 gvfsd
Dec 17 13:31:08 boerne.fritz.box kernel: [ 5825]   250  5825     1209      839       5       3        0             0 sh
Dec 17 13:31:08 boerne.fritz.box kernel: [ 7253]  1000  7253    21521     7455      32       3        0             0 xfce4-terminal
Dec 17 13:31:08 boerne.fritz.box kernel: [ 7359]  1000  7359     1836      891       7       3        0             0 bash
Dec 17 13:31:08 boerne.fritz.box kernel: [ 8641]  1000  8641     1533      593       6       3        0             0 tar
Dec 17 13:31:08 boerne.fritz.box kernel: [ 8642]  1000  8642    17834    16879      38       3        0             0 xz
Dec 17 13:31:08 boerne.fritz.box kernel: [ 9059]   250  9059    10070     2536      13       3        0             0 python
Dec 17 13:31:08 boerne.fritz.box kernel: [ 9063]   250  9063     3155     1923      10       3        0             0 python
Dec 17 13:31:08 boerne.fritz.box kernel: [ 9064]   250  9064     3155     1926      10       3        0             0 python
Dec 17 13:31:08 boerne.fritz.box kernel: [ 9068]   250  9068     1211      826       5       3        0             0 sh
Dec 17 13:31:08 boerne.fritz.box kernel: [ 9075]   250  9075     3847     3307      11       3        0             0 python
Dec 17 13:31:08 boerne.fritz.box kernel: [ 9417]  1000  9417     1829      901       7       3        0             0 bash
Dec 17 13:31:08 boerne.fritz.box kernel: [ 9459]  1000  9459     2246     1206       9       3        0             0 ssh
Dec 17 13:31:08 boerne.fritz.box kernel: [ 9499]   250  9499     1087      711       5       3        0             0 sh
Dec 17 13:31:08 boerne.fritz.box kernel: [ 9607]   250  9607     1211      755       5       3        0             0 sh
Dec 17 13:31:08 boerne.fritz.box kernel: [ 9608]   250  9608     1087      533       5       3        0             0 sh

Greetings
Nils

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-17 12:59         ` Nils Holland
@ 2016-12-17 14:44           ` Tetsuo Handa
  2016-12-17 17:11             ` Nils Holland
                               ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: Tetsuo Handa @ 2016-12-17 14:44 UTC (permalink / raw)
  To: Nils Holland, Michal Hocko
  Cc: linux-kernel, linux-mm, Chris Mason, David Sterba, linux-btrfs

On 2016/12/17 21:59, Nils Holland wrote:
> On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote:
>> mount -t tracefs none /debug/trace
>> echo 1 > /debug/trace/events/vmscan/enable
>> cat /debug/trace/trace_pipe > trace.log
>>
>> should help
>> [...]
> 
> No problem! I enabled writing the trace data to a file and then tried
> to trigger another OOM situation. That worked, this time without a
> complete kernel panic, but with only my processes being killed and the
> system becoming unresponsive. When that happened, I let it run for
> another minute or two so that in case it was still logging something
> to the trace file, it could continue to do so some time longer. Then I
> rebooted with the only thing that still worked, i.e. by means of magic
> SysRequest.

Under OOM situation, writing to a file on disk unlikely works. Maybe
logging via network ( "cat /debug/trace/trace_pipe > /dev/udp/$ip/$port"
if your are using bash) works better. (I wish we can do it from kernel
so that /bin/cat is not disturbed by delays due to page fault.)

If you can configure netconsole for logging OOM killer messages and
UDP socket for logging trace_pipe messages, udplogger at
https://osdn.net/projects/akari/scm/svn/tree/head/branches/udplogger/
might fit for logging both output with timestamp into a single file.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-17 14:44           ` Tetsuo Handa
@ 2016-12-17 17:11             ` Nils Holland
  2016-12-17 21:06             ` Nils Holland
  2016-12-18  0:28             ` OOM: Better, but still there on Xin Zhou
  2 siblings, 0 replies; 62+ messages in thread
From: Nils Holland @ 2016-12-17 17:11 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Michal Hocko, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs

On Sat, Dec 17, 2016 at 11:44:45PM +0900, Tetsuo Handa wrote:
> On 2016/12/17 21:59, Nils Holland wrote:
> > On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote:
> >> mount -t tracefs none /debug/trace
> >> echo 1 > /debug/trace/events/vmscan/enable
> >> cat /debug/trace/trace_pipe > trace.log
> >>
> >> should help
> >> [...]
> > 
> > No problem! I enabled writing the trace data to a file and then tried
> > to trigger another OOM situation. That worked, this time without a
> > complete kernel panic, but with only my processes being killed and the
> > system becoming unresponsive.
> > [...]
> 
> Under OOM situation, writing to a file on disk unlikely works. Maybe
> logging via network ( "cat /debug/trace/trace_pipe > /dev/udp/$ip/$port"
> if your are using bash) works better. (I wish we can do it from kernel
> so that /bin/cat is not disturbed by delays due to page fault.)
> 
> If you can configure netconsole for logging OOM killer messages and
> UDP socket for logging trace_pipe messages, udplogger at
> https://osdn.net/projects/akari/scm/svn/tree/head/branches/udplogger/
> might fit for logging both output with timestamp into a single file.

Thanks for the hint, sounds very sane! I'll try to go that route for
the next log / trace I produce. Of course, if Michal says that the
trace file I've already posted, and which has been logged to file, is
useless and would have been better if I had instead logged to a
different machine via the network, I could also repeat the current
experiment and produce a new file at any time. :-)

Greetings
Nils

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-17 14:44           ` Tetsuo Handa
  2016-12-17 17:11             ` Nils Holland
@ 2016-12-17 21:06             ` Nils Holland
  2016-12-18  5:14               ` Tetsuo Handa
  2016-12-19 13:45               ` Michal Hocko
  2016-12-18  0:28             ` OOM: Better, but still there on Xin Zhou
  2 siblings, 2 replies; 62+ messages in thread
From: Nils Holland @ 2016-12-17 21:06 UTC (permalink / raw)
  To: Tetsuo Handa, Michal Hocko
  Cc: linux-kernel, linux-mm, Chris Mason, David Sterba, linux-btrfs

On Sat, Dec 17, 2016 at 11:44:45PM +0900, Tetsuo Handa wrote:
> On 2016/12/17 21:59, Nils Holland wrote:
> > On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote:
> >> mount -t tracefs none /debug/trace
> >> echo 1 > /debug/trace/events/vmscan/enable
> >> cat /debug/trace/trace_pipe > trace.log
> >>
> >> should help
> >> [...]
> > 
> > No problem! I enabled writing the trace data to a file and then tried
> > to trigger another OOM situation. That worked, this time without a
> > complete kernel panic, but with only my processes being killed and the
> > system becoming unresponsive.
> 
> Under OOM situation, writing to a file on disk unlikely works. Maybe
> logging via network ( "cat /debug/trace/trace_pipe > /dev/udp/$ip/$port"
> if your are using bash) works better. (I wish we can do it from kernel
> so that /bin/cat is not disturbed by delays due to page fault.)
> 
> If you can configure netconsole for logging OOM killer messages and
> UDP socket for logging trace_pipe messages, udplogger at
> https://osdn.net/projects/akari/scm/svn/tree/head/branches/udplogger/
> might fit for logging both output with timestamp into a single file.

Actually, I decided to give this a try once more on machine #2, i.e.
not the one that produced the previous trace, but the other one.

I logged via netconsole as well as 'cat /debug/trace/trace_pipe' via
the network to another machine running udplogger. After the machine
had been frehsly booted and I had set up the logging, unpacking of the
firefox source tarball started. After it had been unpacking for a
while, the first load of trace messages started to appear. Some time
later, OOMs started to appear - I've got quite a lot of them in my
capture file this time.

Unfortunately, the reclaim trace messages stopped a while after the first
OOM messages show up - most likely my "cat" had been killed at that
point or became unresponsive. :-/

In the end, the machine didn't completely panic, but after nothing new
showed up being logged via the network, I walked up to the
machine and found it in a state where I couldn't really log in to it
anymore, but all that worked was, as always, a magic SysRequest reboot.

The complete log, from machine boot right up to the point where it
wouldn't really do anything anymore, is up again on my web server (~42
MB, 928 KB packed):

http://ftp.tisys.org/pub/misc/teela_2016-12-17.log.xz

Greetings
Nils

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-17 14:44           ` Tetsuo Handa
  2016-12-17 17:11             ` Nils Holland
  2016-12-17 21:06             ` Nils Holland
@ 2016-12-18  0:28             ` Xin Zhou
  2 siblings, 0 replies; 62+ messages in thread
From: Xin Zhou @ 2016-12-18  0:28 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-kernel, linux-btrfs

Hi,
The system supposes to have special memory reservation for coredump and other debug info when encountering panic,
the size seems configurable.
Thanks,
Xin
 
 

Sent: Saturday, December 17, 2016 at 6:44 AM
From: "Tetsuo Handa" <penguin-kernel@I-love.SAKURA.ne.jp>
To: "Nils Holland" <nholland@tisys.org>, "Michal Hocko" <mhocko@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, "Chris Mason" <clm@fb.com>, "David Sterba" <dsterba@suse.cz>, linux-btrfs@vger.kernel.org
Subject: Re: OOM: Better, but still there on
On 2016/12/17 21:59, Nils Holland wrote:
> On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote:
>> mount -t tracefs none /debug/trace
>> echo 1 > /debug/trace/events/vmscan/enable
>> cat /debug/trace/trace_pipe > trace.log
>>
>> should help
>> [...]
>
> No problem! I enabled writing the trace data to a file and then tried
> to trigger another OOM situation. That worked, this time without a
> complete kernel panic, but with only my processes being killed and the
> system becoming unresponsive. When that happened, I let it run for
> another minute or two so that in case it was still logging something
> to the trace file, it could continue to do so some time longer. Then I
> rebooted with the only thing that still worked, i.e. by means of magic
> SysRequest.

Under OOM situation, writing to a file on disk unlikely works. Maybe
logging via network ( "cat /debug/trace/trace_pipe > /dev/udp/$ip/$port"
if your are using bash) works better. (I wish we can do it from kernel
so that /bin/cat is not disturbed by delays due to page fault.)

If you can configure netconsole for logging OOM killer messages and
UDP socket for logging trace_pipe messages, udplogger at
https://osdn.net/projects/akari/scm/svn/tree/head/branches/udplogger/
might fit for logging both output with timestamp into a single file.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html[http://vger.kernel.org/majordomo-info.html]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-17 21:06             ` Nils Holland
@ 2016-12-18  5:14               ` Tetsuo Handa
  2016-12-19 13:45               ` Michal Hocko
  1 sibling, 0 replies; 62+ messages in thread
From: Tetsuo Handa @ 2016-12-18  5:14 UTC (permalink / raw)
  To: nholland, mhocko; +Cc: linux-kernel, linux-mm, clm, dsterba, linux-btrfs

Nils Holland wrote:
> On Sat, Dec 17, 2016 at 11:44:45PM +0900, Tetsuo Handa wrote:
> > On 2016/12/17 21:59, Nils Holland wrote:
> > > On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote:
> > >> mount -t tracefs none /debug/trace
> > >> echo 1 > /debug/trace/events/vmscan/enable
> > >> cat /debug/trace/trace_pipe > trace.log
> > >>
> > >> should help
> > >> [...]
> > >
> > > No problem! I enabled writing the trace data to a file and then tried
> > > to trigger another OOM situation. That worked, this time without a
> > > complete kernel panic, but with only my processes being killed and the
> > > system becoming unresponsive.
> >
> > Under OOM situation, writing to a file on disk unlikely works. Maybe
> > logging via network ( "cat /debug/trace/trace_pipe > /dev/udp/$ip/$port"
> > if your are using bash) works better. (I wish we can do it from kernel
> > so that /bin/cat is not disturbed by delays due to page fault.)
> >
> > If you can configure netconsole for logging OOM killer messages and
> > UDP socket for logging trace_pipe messages, udplogger at
> > https://osdn.net/projects/akari/scm/svn/tree/head/branches/udplogger/
> > might fit for logging both output with timestamp into a single file.
>
> Actually, I decided to give this a try once more on machine #2, i.e.
> not the one that produced the previous trace, but the other one.
>
> I logged via netconsole as well as 'cat /debug/trace/trace_pipe' via
> the network to another machine running udplogger. After the machine
> had been frehsly booted and I had set up the logging, unpacking of the
> firefox source tarball started. After it had been unpacking for a
> while, the first load of trace messages started to appear. Some time
> later, OOMs started to appear - I've got quite a lot of them in my
> capture file this time.

Thank you for capturing. I think it worked well. Let's wait for Michal.

The first OOM killer invocation was

  2016-12-17 21:36:56 192.168.17.23:6665 [ 1276.828639] Killed process 3894 (xz) total-vm:68640kB, anon-rss:65920kB, file-rss:1696kB, shmem-rss:0kB

and the last OOM killer invocation was

  2016-12-17 21:39:27 192.168.17.23:6665 [ 1426.800677] Killed process 3070 (screen) total-vm:7440kB, anon-rss:960kB, file-rss:2360kB, shmem-rss:0kB

and trace output was sent until

  2016-12-17 21:37:07 192.168.17.23:48468     kworker/u4:4-3896  [000] ....  1287.202958: mm_shrink_slab_start: super_cache_scan+0x0/0x170 f4436ed4: nid: 0 objects to shrink 86 gfp_flags GFP_NOFS|__GFP_NOFAIL pgs_scanned 32 lru_pgs 406078 cache items 412 delta 0 total_scan 86

which (I hope) should be sufficient for analysis.

>
> Unfortunately, the reclaim trace messages stopped a while after the first
> OOM messages show up - most likely my "cat" had been killed at that
> point or became unresponsive. :-/
>
> In the end, the machine didn't completely panic, but after nothing new
> showed up being logged via the network, I walked up to the
> machine and found it in a state where I couldn't really log in to it
> anymore, but all that worked was, as always, a magic SysRequest reboot.

There is a known issue (since Linux 2.6.32) that all memory allocation requests
get stuck due to kswapd v.s. shrink_inactive_list() livelock which occurs under
almost OOM situation ( http://lkml.kernel.org/r/20160211225929.GU14668@dastard ).
If we hit it, even "page allocation stalls for " messages do not show up.

Even if we didn't hit it, although agetty and sshd were still alive

  2016-12-17 21:39:27 192.168.17.23:6665 [ 1426.800614] [ 2800]     0  2800     1152      494       6       3        0             0 agetty
  2016-12-17 21:39:27 192.168.17.23:6665 [ 1426.800618] [ 2802]     0  2802     1457     1055       6       3        0         -1000 sshd

memory allocation was delaying too much

  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034624] btrfs-transacti: page alloction stalls for 93995ms, order:0, mode:0x2400840(GFP_NOFS|__GFP_NOFAIL)
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034628] CPU: 1 PID: 1949 Comm: btrfs-transacti Not tainted 4.9.0-gentoo #3
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034630] Hardware name: Hewlett-Packard Compaq 15 Notebook PC/21F7, BIOS F.22 08/06/2014
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034638]  f162f94c c142bd8e 00000001 00000000 f162f970 c110ad7e c1b58833 02400840
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034645]  f162f978 f162f980 c1b55814 f162f960 00000160 f162fa38 c110b78c 02400840
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034652]  c1b55814 00016f2b 00000000 00400000 00000000 f21d0000 f21d0000 00000001
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034653] Call Trace:
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034660]  [<c142bd8e>] dump_stack+0x47/0x69
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034666]  [<c110ad7e>] warn_alloc+0xce/0xf0
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034671]  [<c110b78c>] __alloc_pages_nodemask+0x97c/0xd30
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034678]  [<c1103fbd>] ? find_get_entry+0x1d/0x100
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034681]  [<c1102fc1>] ? add_to_page_cache_lru+0x61/0xc0
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034685]  [<c110414d>] pagecache_get_page+0xad/0x270
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034692]  [<c1366556>] alloc_extent_buffer+0x116/0x3e0
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034699]  [<c1334ade>] btrfs_find_create_tree_block+0xe/0x10
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034704]  [<c132a62f>] btrfs_alloc_tree_block+0x1ef/0x5f0
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034710]  [<c1079050>] ? autoremove_wake_function+0x40/0x40
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034716]  [<c130f873>] __btrfs_cow_block+0x143/0x5f0
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034723]  [<c130feca>] btrfs_cow_block+0x13a/0x220
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034727]  [<c13133a1>] btrfs_search_slot+0x1d1/0x870
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034731]  [<c131a74a>] lookup_inline_extent_backref+0x10a/0x6d0
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034736]  [<c19b656c>] ? common_interrupt+0x2c/0x34
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034742]  [<c131c959>] __btrfs_free_extent+0x129/0xe80
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034750]  [<c1322160>] __btrfs_run_delayed_refs+0xaf0/0x13e0
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034754]  [<c106f759>] ? set_next_entity+0x659/0xec0
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034757]  [<c106c351>] ? put_prev_entity+0x21/0xcf0
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034801]  [<fa83b2da>] ? xfs_attr3_leaf_add_work+0x25a/0x420 [xfs]
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034808]  [<c13259f1>] btrfs_run_delayed_refs+0x71/0x260
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034813]  [<c10903ef>] ? lock_timer_base+0x5f/0x80
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034818]  [<c133cefb>] btrfs_commit_transaction+0x2b/0xd30
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034821]  [<c133dc65>] ? start_transaction+0x65/0x4b0
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034826]  [<c1337f65>] transaction_kthread+0x1b5/0x1d0
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034830]  [<c1337db0>] ? btrfs_cleanup_transaction+0x490/0x490
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034833]  [<c10552e7>] kthread+0x97/0xb0
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034837]  [<c1055250>] ? __kthread_parkme+0x60/0x60
  2016-12-17 21:41:03 192.168.17.23:6665 [ 1521.034842]  [<c19b5d77>] ret_from_fork+0x1b/0x28

and therefore memory allocation by page fault by trying to login was too slow to wait.

>
> The complete log, from machine boot right up to the point where it
> wouldn't really do anything anymore, is up again on my web server (~42
> MB, 928 KB packed):
>
> http://ftp.tisys.org/pub/misc/teela_2016-12-17.log.xz
>
> Greetings
> Nils
>

It might be pointless to check, but is your 4.9.0-gentoo kernel using 4.9.0 final source?
The typo "page alloction stalls" was fixed in v4.9-rc5. Maybe some last minute changes are
missing...

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically
  2016-12-17 11:17           ` Tetsuo Handa
@ 2016-12-18 16:37             ` Michal Hocko
  0 siblings, 0 replies; 62+ messages in thread
From: Michal Hocko @ 2016-12-18 16:37 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Johannes Weiner, Nils Holland, linux-kernel, linux-mm,
	Chris Mason, David Sterba, linux-btrfs

On Sat 17-12-16 20:17:07, Tetsuo Handa wrote:
[...]
> I feel that allowing access to memory reserves based on __GFP_NOFAIL might not
> make sense. My understanding is that actual I/O operation triggered by I/O
> requests by filesystem code are processed by other threads. Even if we grant
> access to memory reserves to GFP_NOFS | __GFP_NOFAIL allocations by fs code,
> I think that it is possible that memory allocations by underlying bio code
> fails to make a further progress unless memory reserves are granted as well.

IO layer should rely on mempools to guarantee a forward progress.

> Below is a typical trace which I observe under OOM lockuped situation (though
> this trace is from an OOM stress test using XFS).
> 
> ----------------------------------------
> [ 1845.187246] MemAlloc: kworker/2:1(14498) flags=0x4208060 switches=323636 seq=48 gfp=0x2400000(GFP_NOIO) order=0 delay=430400 uninterruptible
> [ 1845.187248] kworker/2:1     D12712 14498      2 0x00000080
> [ 1845.187251] Workqueue: events_freezable_power_ disk_events_workfn
> [ 1845.187252] Call Trace:
> [ 1845.187253]  ? __schedule+0x23f/0xba0
> [ 1845.187254]  schedule+0x38/0x90
> [ 1845.187255]  schedule_timeout+0x205/0x4a0
> [ 1845.187256]  ? del_timer_sync+0xd0/0xd0
> [ 1845.187257]  schedule_timeout_uninterruptible+0x25/0x30
> [ 1845.187258]  __alloc_pages_nodemask+0x1035/0x10e0
> [ 1845.187259]  ? alloc_request_struct+0x14/0x20
> [ 1845.187261]  alloc_pages_current+0x96/0x1b0
> [ 1845.187262]  ? bio_alloc_bioset+0x20f/0x2e0
> [ 1845.187264]  bio_copy_kern+0xc4/0x180
> [ 1845.187265]  blk_rq_map_kern+0x6f/0x120
> [ 1845.187268]  __scsi_execute.isra.23+0x12f/0x160
> [ 1845.187270]  scsi_execute_req_flags+0x8f/0x100
> [ 1845.187271]  sr_check_events+0xba/0x2b0 [sr_mod]
> [ 1845.187274]  cdrom_check_events+0x13/0x30 [cdrom]
> [ 1845.187275]  sr_block_check_events+0x25/0x30 [sr_mod]
> [ 1845.187276]  disk_check_events+0x5b/0x150
> [ 1845.187277]  disk_events_workfn+0x17/0x20
> [ 1845.187278]  process_one_work+0x1fc/0x750
> [ 1845.187279]  ? process_one_work+0x167/0x750
> [ 1845.187279]  worker_thread+0x126/0x4a0
> [ 1845.187280]  kthread+0x10a/0x140
> [ 1845.187281]  ? process_one_work+0x750/0x750
> [ 1845.187282]  ? kthread_create_on_node+0x60/0x60
> [ 1845.187283]  ret_from_fork+0x2a/0x40
> ----------------------------------------
> 
> I think that this GFP_NOIO allocation request needs to consume more memory reserves
> than GFP_NOFS allocation request to make progress. 

AFAIU, this is an allocation path which doesn't block a forward progress
on a regular IO. It is merely a check whether there is a new medium in
the CDROM (aka regular polling of the device). I really fail to see any
reason why this one should get any access to memory reserves at all.

I actually do not see any reason why it should be NOIO in the first
place but I am not familiar with this code much so there might be some
reasons for that. The fact that it might stall under a heavy memory
pressure is sad but who actually cares?

> Do we want to add __GFP_NOFAIL to this GFP_NOIO allocation request
> in order to allow access to memory reserves as well as GFP_NOFS |
> __GFP_NOFAIL allocation request?

Why?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-17 21:06             ` Nils Holland
  2016-12-18  5:14               ` Tetsuo Handa
@ 2016-12-19 13:45               ` Michal Hocko
  2016-12-20  2:08                 ` Nils Holland
  1 sibling, 1 reply; 62+ messages in thread
From: Michal Hocko @ 2016-12-19 13:45 UTC (permalink / raw)
  To: Nils Holland
  Cc: Tetsuo Handa, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs

On Sat 17-12-16 22:06:47, Nils Holland wrote:
[...]
> Unfortunately, the reclaim trace messages stopped a while after the first
> OOM messages show up - most likely my "cat" had been killed at that
> point or became unresponsive. :-/

The later is more probable because I do not see the OOM killer to kill
any cat process and the first bash has been killed 10s after the first
OOM.

2016-12-17 21:36:56 192.168.17.23:6665 [ 1276.828639] Killed process 3894 (xz) total-vm:68640kB, anon-rss:65920kB, file-rss:1696kB, shmem-rss:0kB
2016-12-17 21:36:57 192.168.17.23:6665 [ 1277.598271] Killed process 3864 (sandbox) total-vm:2192kB, anon-rss:128kB, file-rss:1400kB, shmem-rss:0kB
2016-12-17 21:36:57 192.168.17.23:6665 [ 1278.222416] Killed process 3086 (emerge) total-vm:65064kB, anon-rss:52768kB, file-rss:7216kB, shmem-rss:0kB
2016-12-17 21:36:58 192.168.17.23:6665 [ 1278.846902] Killed process 2705 (NetworkManager) total-vm:104376kB, anon-rss:4172kB, file-rss:10516kB, shmem-rss:0kB
2016-12-17 21:36:59 192.168.17.23:6665 [ 1279.862150] Killed process 2823 (polkitd) total-vm:65536kB, anon-rss:2192kB, file-rss:8656kB, shmem-rss:0kB
2016-12-17 21:37:00 192.168.17.23:6665 [ 1280.496988] Killed process 3885 (ebuild.sh) total-vm:10640kB, anon-rss:3340kB, file-rss:2244kB, shmem-rss:0kB
2016-12-17 21:37:04 192.168.17.23:6665 [ 1285.126052] Killed process 2824 (wpa_supplicant) total-vm:8580kB, anon-rss:540kB, file-rss:5092kB, shmem-rss:0kB
2016-12-17 21:37:05 192.168.17.23:6665 [ 1286.124687] Killed process 2943 (bash) total-vm:7320kB, anon-rss:368kB, file-rss:3240kB, shmem-rss:0kB
2016-12-17 21:37:07 192.168.17.23:6665 [ 1287.974353] Killed process 2878 (sshd) total-vm:10524kB, anon-rss:700kB, file-rss:4908kB, shmem-rss:4kB
2016-12-17 21:37:16 192.168.17.23:6665 [ 1296.953350] Killed process 4048 (ebuild.sh) total-vm:10640kB, anon-rss:3352kB, file-rss:1892kB, shmem-rss:0kB
2016-12-17 21:37:24 192.168.17.23:6665 [ 1304.398944] Killed process 1980 (systemd-journal) total-vm:24640kB, anon-rss:332kB, file-rss:4608kB, shmem-rss:4kB
2016-12-17 21:37:25 192.168.17.23:6665 [ 1305.934472] Killed process 2918 ((sd-pam)) total-vm:9152kB, anon-rss:964kB, file-rss:1536kB, shmem-rss:0kB
2016-12-17 21:37:28 192.168.17.23:6665 [ 1308.878775] Killed process 2888 (systemd) total-vm:7856kB, anon-rss:528kB, file-rss:4388kB, shmem-rss:0kB
2016-12-17 21:37:34 192.168.17.23:6665 [ 1314.268177] Killed process 2711 (rsyslogd) total-vm:25200kB, anon-rss:1084kB, file-rss:2908kB, shmem-rss:0kB
2016-12-17 21:37:39 192.168.17.23:6665 [ 1319.634561] Killed process 2704 (systemd-logind) total-vm:5980kB, anon-rss:340kB, file-rss:3568kB, shmem-rss:0kB
2016-12-17 21:37:43 192.168.17.23:6665 [ 1323.488894] Killed process 3103 (htop) total-vm:7532kB, anon-rss:1024kB, file-rss:2872kB, shmem-rss:0kB
2016-12-17 21:38:42 192.168.17.23:6665 [ 1379.556282] Killed process 2701 (systemd-timesyn) total-vm:15480kB, anon-rss:356kB, file-rss:3292kB, shmem-rss:0kB
2016-12-17 21:39:05 192.168.17.23:6665 [ 1403.130435] Killed process 3082 (bash) total-vm:7324kB, anon-rss:380kB, file-rss:3324kB, shmem-rss:0kB
2016-12-17 21:39:17 192.168.17.23:6665 [ 1417.600367] Killed process 3077 (start_trace) total-vm:6948kB, anon-rss:184kB, file-rss:2524kB, shmem-rss:0kB
2016-12-17 21:39:24 192.168.17.23:6665 [ 1423.955452] Killed process 3073 (bash) total-vm:7324kB, anon-rss:380kB, file-rss:3284kB, shmem-rss:0kB
2016-12-17 21:39:27 192.168.17.23:6665 [ 1425.338670] Killed process 3099 (bash) total-vm:7324kB, anon-rss:376kB, file-rss:3176kB, shmem-rss:0kB
2016-12-17 21:39:27 192.168.17.23:6665 [ 1426.800677] Killed process 3070 (screen) total-vm:7440kB, anon-rss:960kB, file-rss:2360kB, shmem-rss:0kB
 
> In the end, the machine didn't completely panic, but after nothing new
> showed up being logged via the network, I walked up to the
> machine and found it in a state where I couldn't really log in to it
> anymore, but all that worked was, as always, a magic SysRequest reboot.
> 
> The complete log, from machine boot right up to the point where it
> wouldn't really do anything anymore, is up again on my web server (~42
> MB, 928 KB packed):
> 
> http://ftp.tisys.org/pub/misc/teela_2016-12-17.log.xz

$ xzgrep invoked teela_2016-12-17.log.xz | sed 's@.*gfp_mask=0x[0-9a-f]*(\(.*\)), .*@\1@' | sort | uniq -c
      2 GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK
      1 GFP_KERNEL|__GFP_NOTRACK
      6 GFP_KERNEL|__GFP_NOWARN|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_NOTRACK
      1 GFP_KERNEL|__GFP_NOWARN|__GFP_REPEAT|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_NOTRACK
      2 GFP_KERNEL|__GFP_REPEAT|__GFP_NOTRACK
      2 GFP_TEMPORARY
      5 GFP_TEMPORARY|__GFP_NOTRACK
      3 GFP_USER|__GFP_COLD

so all of them are lowmem requests which is in line with your previous
report. This basically means that only zone Normal is usable as I've
already mentioned before. In general lowmem problems are inherent to the
32b kernels but in this case we still have a _lot of_ page cache to
reclaim so we shouldn't really blow up. 

Normal free:41260kB min:41368kB low:51708kB high:62048kB active_anon:0kB inactive_anon:0kB active_file:532676kB inactive_file:100kB unevictable:0kB writepending:124kB present:897016kB managed:836248kB mlocked:0kB slab_reclaimable:157428kB slab_unreclaimable:68940kB kernel_stack:1160kB pagetables:1336kB bounce:0kB free_pcp:484kB local_pcp:240kB free_cma:0kB

and this looks very similar to your previous report as well. No
anonymous pages and the whole file LRU sitting in the active list so
there is nothing imediatelly reclaimable. This is very weird because
we should rotate the active list to the inactive if the later is low
which it obviously is here and this seems to be the case in other cases
as well (inactive_is_low.sh is a simple and dirty script to subtract
Highmem active/inactive counters from the node ones).

$ xzgrep -f zones teela_2016-12-17.log.xz | sh inactive_is_low.sh
total_active 1094600 active 541424 total_inactive 1117512 inactive 104 ratio 1 low 1
total_active 1094744 active 541568 total_inactive 1117524 inactive 116 ratio 1 low 1
total_active 1094864 active 541564 total_inactive 1117512 inactive 108 ratio 1 low 1
total_active 1095188 active 541564 total_inactive 1117220 inactive 116 ratio 1 low 1
total_active 1097520 active 541596 total_inactive 1115048 inactive 120 ratio 1 low 1
total_active 1097836 active 541612 total_inactive 1114764 inactive 136 ratio 1 low 1
total_active 1098692 active 542384 total_inactive 1114688 inactive 100 ratio 1 low 1
total_active 1098964 active 542504 total_inactive 1114480 inactive 24 ratio 1 low 1
total_active 1099108 active 542620 total_inactive 1114544 inactive 92 ratio 1 low 1
total_active 1099180 active 542548 total_inactive 1114564 inactive 236 ratio 1 low 1
[...]

Unfortunatelly shrink_active_list doesn't have any tracepoint so we do
not know whether we managed to rotate those pages. If they are referenced
quickly enough we might just keep refaulting them... Could you try to apply
the followin diff on top what you have currently. It should add some more
tracepoint data which might tell us more. We can reduce the amount of
tracing data by enabling only mm_vmscan_lru_isolate,
mm_vmscan_lru_shrink_inactive and mm_vmscan_lru_shrink_active.
---
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index bfe53d95c25b..2ba3e6dea6ef 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -519,7 +519,7 @@ void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask);
 extern void __free_pages(struct page *page, unsigned int order);
 extern void free_pages(unsigned long addr, unsigned int order);
 extern void free_hot_cold_page(struct page *page, bool cold);
-extern void free_hot_cold_page_list(struct list_head *list, bool cold);
+extern int free_hot_cold_page_list(struct list_head *list, bool cold);
 
 struct page_frag_cache;
 extern void __page_frag_drain(struct page *page, unsigned int order,
diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index c88fd0934e7e..7966915cf663 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -365,14 +365,27 @@ TRACE_EVENT(mm_vmscan_lru_shrink_inactive,
 
 	TP_PROTO(int nid,
 		unsigned long nr_scanned, unsigned long nr_reclaimed,
+		unsigned long nr_dirty, unsigned long nr_writeback,
+		unsigned long nr_congested, unsigned long nr_immediate,
+		unsigned long nr_activate, unsigned long nr_ref_keep,
+		unsigned long nr_unmap_fail,
 		int priority, int file),
 
-	TP_ARGS(nid, nr_scanned, nr_reclaimed, priority, file),
+	TP_ARGS(nid, nr_scanned, nr_reclaimed, nr_dirty, nr_writeback,
+		nr_congested, nr_immediate, nr_activate, nr_ref_keep,
+		nr_unmap_fail, priority, file),
 
 	TP_STRUCT__entry(
 		__field(int, nid)
 		__field(unsigned long, nr_scanned)
 		__field(unsigned long, nr_reclaimed)
+		__field(unsigned long, nr_dirty)
+		__field(unsigned long, nr_writeback)
+		__field(unsigned long, nr_congested)
+		__field(unsigned long, nr_immediate)
+		__field(unsigned long, nr_activate)
+		__field(unsigned long, nr_ref_keep)
+		__field(unsigned long, nr_unmap_fail)
 		__field(int, priority)
 		__field(int, reclaim_flags)
 	),
@@ -381,17 +394,63 @@ TRACE_EVENT(mm_vmscan_lru_shrink_inactive,
 		__entry->nid = nid;
 		__entry->nr_scanned = nr_scanned;
 		__entry->nr_reclaimed = nr_reclaimed;
+		__entry->nr_dirty = nr_dirty;
+		__entry->nr_writeback = nr_writeback;
+		__entry->nr_congested = nr_congested;
+		__entry->nr_immediate = nr_immediate;
+		__entry->nr_activate = nr_activate;
+		__entry->nr_ref_keep = nr_ref_keep;
 		__entry->priority = priority;
 		__entry->reclaim_flags = trace_shrink_flags(file);
 	),
 
-	TP_printk("nid=%d nr_scanned=%ld nr_reclaimed=%ld priority=%d flags=%s",
+	TP_printk("nid=%d nr_scanned=%ld nr_reclaimed=%ld nr_dirty=%ld nr_writeback=%ld nr_congested=%ld nr_immediate=%ld nr_activate=%ld nr_ref_keep=%ld nr_unmap_fail=%ld priority=%d flags=%s",
 		__entry->nid,
 		__entry->nr_scanned, __entry->nr_reclaimed,
-		__entry->priority,
+		__entry->nr_dirty, __entry->nr_writeback,
+		__entry->nr_congested, __entry->nr_immediate,
+		__entry->nr_activate, __entry->nr_ref_keep,
+		__entry->nr_unmap_fail, __entry->priority,
 		show_reclaim_flags(__entry->reclaim_flags))
 );
 
+TRACE_EVENT(mm_vmscan_lru_shrink_active,
+
+	TP_PROTO(int nid, unsigned long nr_scanned, unsigned long nr_freed,
+		unsigned long nr_unevictable, unsigned long nr_deactivated,
+		unsigned long nr_rotated, int priority, int file),
+
+	TP_ARGS(nid, nr_scanned, nr_freed, nr_unevictable, nr_deactivated, nr_rotated, priority, file),
+
+	TP_STRUCT__entry(
+		__field(int, nid)
+		__field(unsigned long, nr_scanned)
+		__field(unsigned long, nr_freed)
+		__field(unsigned long, nr_unevictable)
+		__field(unsigned long, nr_deactivated)
+		__field(unsigned long, nr_rotated)
+		__field(int, priority)
+		__field(int, reclaim_flags)
+	),
+
+	TP_fast_assign(
+		__entry->nid = nid;
+		__entry->nr_scanned = nr_scanned;
+		__entry->nr_freed = nr_freed;
+		__entry->nr_unevictable = nr_unevictable;
+		__entry->nr_deactivated = nr_deactivated;
+		__entry->nr_rotated = nr_rotated;
+		__entry->priority = priority;
+		__entry->reclaim_flags = trace_shrink_flags(file);
+	),
+
+	TP_printk("nid=%d nr_scanned=%ld nr_freed=%ld nr_unevictable=%ld nr_deactivated=%ld nr_rotated=%ld priority=%d flags=%s",
+		__entry->nid,
+		__entry->nr_scanned, __entry->nr_freed, __entry->nr_unevictable,
+		__entry->nr_deactivated, __entry->nr_rotated,
+		__entry->priority,
+		show_reclaim_flags(__entry->reclaim_flags))
+);
 #endif /* _TRACE_VMSCAN_H */
 
 /* This part must be outside protection */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e701be6b930a..a8a103a5f7f0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2490,14 +2490,18 @@ void free_hot_cold_page(struct page *page, bool cold)
 /*
  * Free a list of 0-order pages
  */
-void free_hot_cold_page_list(struct list_head *list, bool cold)
+int free_hot_cold_page_list(struct list_head *list, bool cold)
 {
 	struct page *page, *next;
+	int ret = 0;
 
 	list_for_each_entry_safe(page, next, list, lru) {
 		trace_mm_page_free_batched(page, cold);
 		free_hot_cold_page(page, cold);
+		ret++;
 	}
+
+	return ret;
 }
 
 /*
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4ea6b610f20e..4d7febde9e72 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -902,6 +902,17 @@ static void page_check_dirty_writeback(struct page *page,
 		mapping->a_ops->is_dirty_writeback(page, dirty, writeback);
 }
 
+struct reclaim_stat {
+	unsigned nr_dirty;
+	unsigned nr_unqueued_dirty;
+	unsigned nr_congested;
+	unsigned nr_writeback;
+	unsigned nr_immediate;
+	unsigned nr_activate;
+	unsigned nr_ref_keep;
+	unsigned nr_unmap_fail;
+};
+
 /*
  * shrink_page_list() returns the number of reclaimed pages
  */
@@ -909,22 +920,21 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				      struct pglist_data *pgdat,
 				      struct scan_control *sc,
 				      enum ttu_flags ttu_flags,
-				      unsigned long *ret_nr_dirty,
-				      unsigned long *ret_nr_unqueued_dirty,
-				      unsigned long *ret_nr_congested,
-				      unsigned long *ret_nr_writeback,
-				      unsigned long *ret_nr_immediate,
+				      struct reclaim_stat *stat,
 				      bool force_reclaim)
 {
 	LIST_HEAD(ret_pages);
 	LIST_HEAD(free_pages);
 	int pgactivate = 0;
-	unsigned long nr_unqueued_dirty = 0;
-	unsigned long nr_dirty = 0;
-	unsigned long nr_congested = 0;
-	unsigned long nr_reclaimed = 0;
-	unsigned long nr_writeback = 0;
-	unsigned long nr_immediate = 0;
+	unsigned nr_unqueued_dirty = 0;
+	unsigned nr_dirty = 0;
+	unsigned nr_congested = 0;
+	unsigned nr_reclaimed = 0;
+	unsigned nr_writeback = 0;
+	unsigned nr_immediate = 0;
+	unsigned nr_activate = 0;
+	unsigned nr_ref_keep = 0;
+	unsigned nr_unmap_fail = 0;
 
 	cond_resched();
 
@@ -1063,6 +1073,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		case PAGEREF_ACTIVATE:
 			goto activate_locked;
 		case PAGEREF_KEEP:
+			nr_ref_keep++;
 			goto keep_locked;
 		case PAGEREF_RECLAIM:
 		case PAGEREF_RECLAIM_CLEAN:
@@ -1100,6 +1111,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				(ttu_flags | TTU_BATCH_FLUSH | TTU_LZFREE) :
 				(ttu_flags | TTU_BATCH_FLUSH))) {
 			case SWAP_FAIL:
+				nr_unmap_fail++;
 				goto activate_locked;
 			case SWAP_AGAIN:
 				goto keep_locked;
@@ -1252,6 +1264,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		VM_BUG_ON_PAGE(PageActive(page), page);
 		SetPageActive(page);
 		pgactivate++;
+		nr_activate++;
 keep_locked:
 		unlock_page(page);
 keep:
@@ -1266,11 +1279,16 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 	list_splice(&ret_pages, page_list);
 	count_vm_events(PGACTIVATE, pgactivate);
 
-	*ret_nr_dirty += nr_dirty;
-	*ret_nr_congested += nr_congested;
-	*ret_nr_unqueued_dirty += nr_unqueued_dirty;
-	*ret_nr_writeback += nr_writeback;
-	*ret_nr_immediate += nr_immediate;
+	if (stat) {
+		stat->nr_dirty = nr_dirty;
+		stat->nr_congested = nr_congested;
+		stat->nr_unqueued_dirty = nr_unqueued_dirty;
+		stat->nr_writeback = nr_writeback;
+		stat->nr_immediate = nr_immediate;
+		stat->nr_activate = nr_activate;
+		stat->nr_ref_keep = nr_ref_keep;
+		stat->nr_unmap_fail = nr_unmap_fail;
+	}
 	return nr_reclaimed;
 }
 
@@ -1282,7 +1300,7 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 		.priority = DEF_PRIORITY,
 		.may_unmap = 1,
 	};
-	unsigned long ret, dummy1, dummy2, dummy3, dummy4, dummy5;
+	unsigned long ret;
 	struct page *page, *next;
 	LIST_HEAD(clean_pages);
 
@@ -1295,8 +1313,7 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 	}
 
 	ret = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
-			TTU_UNMAP|TTU_IGNORE_ACCESS,
-			&dummy1, &dummy2, &dummy3, &dummy4, &dummy5, true);
+			TTU_UNMAP|TTU_IGNORE_ACCESS, NULL, true);
 	list_splice(&clean_pages, page_list);
 	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -ret);
 	return ret;
@@ -1696,11 +1713,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	unsigned long nr_scanned;
 	unsigned long nr_reclaimed = 0;
 	unsigned long nr_taken;
-	unsigned long nr_dirty = 0;
-	unsigned long nr_congested = 0;
-	unsigned long nr_unqueued_dirty = 0;
-	unsigned long nr_writeback = 0;
-	unsigned long nr_immediate = 0;
+	struct reclaim_stat stat = {};
 	isolate_mode_t isolate_mode = 0;
 	int file = is_file_lru(lru);
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
@@ -1745,9 +1758,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 		return 0;
 
 	nr_reclaimed = shrink_page_list(&page_list, pgdat, sc, TTU_UNMAP,
-				&nr_dirty, &nr_unqueued_dirty, &nr_congested,
-				&nr_writeback, &nr_immediate,
-				false);
+				&stat, false);
 
 	spin_lock_irq(&pgdat->lru_lock);
 
@@ -1781,7 +1792,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	 * of pages under pages flagged for immediate reclaim and stall if any
 	 * are encountered in the nr_immediate check below.
 	 */
-	if (nr_writeback && nr_writeback == nr_taken)
+	if (stat.nr_writeback && stat.nr_writeback == nr_taken)
 		set_bit(PGDAT_WRITEBACK, &pgdat->flags);
 
 	/*
@@ -1793,7 +1804,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 		 * Tag a zone as congested if all the dirty pages scanned were
 		 * backed by a congested BDI and wait_iff_congested will stall.
 		 */
-		if (nr_dirty && nr_dirty == nr_congested)
+		if (stat.nr_dirty && stat.nr_dirty == stat.nr_congested)
 			set_bit(PGDAT_CONGESTED, &pgdat->flags);
 
 		/*
@@ -1802,7 +1813,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 		 * the pgdat PGDAT_DIRTY and kswapd will start writing pages from
 		 * reclaim context.
 		 */
-		if (nr_unqueued_dirty == nr_taken)
+		if (stat.nr_unqueued_dirty == nr_taken)
 			set_bit(PGDAT_DIRTY, &pgdat->flags);
 
 		/*
@@ -1811,7 +1822,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 		 * that pages are cycling through the LRU faster than
 		 * they are written so also forcibly stall.
 		 */
-		if (nr_immediate && current_may_throttle())
+		if (stat.nr_immediate && current_may_throttle())
 			congestion_wait(BLK_RW_ASYNC, HZ/10);
 	}
 
@@ -1826,6 +1837,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 
 	trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id,
 			nr_scanned, nr_reclaimed,
+			stat.nr_dirty,  stat.nr_writeback,
+			stat.nr_congested, stat.nr_immediate,
+			stat.nr_activate, stat.nr_ref_keep, stat.nr_unmap_fail,
 			sc->priority, file);
 	return nr_reclaimed;
 }
@@ -1846,9 +1860,11 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
  *
  * The downside is that we have to touch page->_refcount against each page.
  * But we had to alter page->flags anyway.
+ *
+ * Returns the number of pages moved to the given lru.
  */
 
-static void move_active_pages_to_lru(struct lruvec *lruvec,
+static int move_active_pages_to_lru(struct lruvec *lruvec,
 				     struct list_head *list,
 				     struct list_head *pages_to_free,
 				     enum lru_list lru)
@@ -1857,6 +1873,7 @@ static void move_active_pages_to_lru(struct lruvec *lruvec,
 	unsigned long pgmoved = 0;
 	struct page *page;
 	int nr_pages;
+	int nr_moved = 0;
 
 	while (!list_empty(list)) {
 		page = lru_to_page(list);
@@ -1882,11 +1899,15 @@ static void move_active_pages_to_lru(struct lruvec *lruvec,
 				spin_lock_irq(&pgdat->lru_lock);
 			} else
 				list_add(&page->lru, pages_to_free);
+		} else {
+			nr_moved++;
 		}
 	}
 
 	if (!is_active_lru(lru))
 		__count_vm_events(PGDEACTIVATE, pgmoved);
+
+	return nr_moved;
 }
 
 static void shrink_active_list(unsigned long nr_to_scan,
@@ -1902,7 +1923,8 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	LIST_HEAD(l_inactive);
 	struct page *page;
 	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
-	unsigned long nr_rotated = 0;
+	unsigned long nr_rotated = 0, nr_unevictable = 0;
+	unsigned long nr_freed, nr_deactivate, nr_activate;
 	isolate_mode_t isolate_mode = 0;
 	int file = is_file_lru(lru);
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
@@ -1935,6 +1957,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
 
 		if (unlikely(!page_evictable(page))) {
 			putback_lru_page(page);
+			nr_unevictable++;
 			continue;
 		}
 
@@ -1980,13 +2003,16 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	 */
 	reclaim_stat->recent_rotated[file] += nr_rotated;
 
-	move_active_pages_to_lru(lruvec, &l_active, &l_hold, lru);
-	move_active_pages_to_lru(lruvec, &l_inactive, &l_hold, lru - LRU_ACTIVE);
+	nr_activate = move_active_pages_to_lru(lruvec, &l_active, &l_hold, lru);
+	nr_deactivate = move_active_pages_to_lru(lruvec, &l_inactive, &l_hold, lru - LRU_ACTIVE);
 	__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
 	spin_unlock_irq(&pgdat->lru_lock);
 
 	mem_cgroup_uncharge_list(&l_hold);
-	free_hot_cold_page_list(&l_hold, true);
+	nr_freed = free_hot_cold_page_list(&l_hold, true);
+	trace_mm_vmscan_lru_shrink_active(pgdat->node_id, nr_scanned, nr_freed,
+			nr_unevictable, nr_deactivate, nr_rotated,
+			sc->priority, file);
 }
 
 /*
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-19 13:45               ` Michal Hocko
@ 2016-12-20  2:08                 ` Nils Holland
  2016-12-21  7:36                   ` Michal Hocko
  0 siblings, 1 reply; 62+ messages in thread
From: Nils Holland @ 2016-12-20  2:08 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs

On Mon, Dec 19, 2016 at 02:45:34PM +0100, Michal Hocko wrote:

> Unfortunatelly shrink_active_list doesn't have any tracepoint so we do
> not know whether we managed to rotate those pages. If they are referenced
> quickly enough we might just keep refaulting them... Could you try to apply
> the followin diff on top what you have currently. It should add some more
> tracepoint data which might tell us more. We can reduce the amount of
> tracing data by enabling only mm_vmscan_lru_isolate,
> mm_vmscan_lru_shrink_inactive and mm_vmscan_lru_shrink_active.

So, the results are in! I applied your patch and rebuild the kernel,
then I rebooted the machine, set up tracing so that only the three
events you mentioned were being traced, and captured the output over
the network.

Things went a bit different this time: The trace events started to
appear after a while and a whole lot of them were generated, but
suddenly they stopped. A short while later, we get

[ 1661.485568] btrfs-transacti: page alloction stalls for 611058ms, order:0, mode:0x2420048(GFP_NOFS|__GFP_HARDWALL|__GFP_MOVABLE)

along with a backtrace and memory information, and then there was
silence. When I walked up to the machine, it had completely died; it
wouldn't turn on its screen on key press any more, blindly trying to
reboot via SysRequest had no effect, but the caps lock LED also wasn't
blinking, like it normally does when a kernel panic occurs. Good
question what state it was in. The OOM reaper didn't really seem to
kick in and kill processes this time, it seems.

The complete capture is up at:

http://ftp.tisys.org/pub/misc/teela_2016-12-20.log.xz

Greetings
Nils

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-20  2:08                 ` Nils Holland
@ 2016-12-21  7:36                   ` Michal Hocko
  2016-12-21 11:00                     ` Tetsuo Handa
  2016-12-22 10:10                     ` Nils Holland
  0 siblings, 2 replies; 62+ messages in thread
From: Michal Hocko @ 2016-12-21  7:36 UTC (permalink / raw)
  To: Nils Holland
  Cc: Tetsuo Handa, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs

TL;DR
there is another version of the debugging patch. Just revert the
previous one and apply this one instead. It's still not clear what
is going on but I suspect either some misaccounting or unexpeted
pages on the LRU lists. I have added one more tracepoint, so please
enable also mm_vmscan_inactive_list_is_low.

Hopefully the additional data will tell us more.

On Tue 20-12-16 03:08:29, Nils Holland wrote:
> On Mon, Dec 19, 2016 at 02:45:34PM +0100, Michal Hocko wrote:
> 
> > Unfortunatelly shrink_active_list doesn't have any tracepoint so we do
> > not know whether we managed to rotate those pages. If they are referenced
> > quickly enough we might just keep refaulting them... Could you try to apply
> > the followin diff on top what you have currently. It should add some more
> > tracepoint data which might tell us more. We can reduce the amount of
> > tracing data by enabling only mm_vmscan_lru_isolate,
> > mm_vmscan_lru_shrink_inactive and mm_vmscan_lru_shrink_active.
> 
> So, the results are in! I applied your patch and rebuild the kernel,
> then I rebooted the machine, set up tracing so that only the three
> events you mentioned were being traced, and captured the output over
> the network.
> 
> Things went a bit different this time: The trace events started to
> appear after a while and a whole lot of them were generated, but
> suddenly they stopped. A short while later, we get

It is possible that you are hitting multiple issues so it would be
great to focus at one at the time. The underlying problem might be
same/similar in the end but this is hard to tell now. Could you try to
reproduce and provide data for the OOM killer situation as well?
 
> [ 1661.485568] btrfs-transacti: page alloction stalls for 611058ms, order:0, mode:0x2420048(GFP_NOFS|__GFP_HARDWALL|__GFP_MOVABLE)
> 
> along with a backtrace and memory information, and then there was
> silence.

> When I walked up to the machine, it had completely died; it
> wouldn't turn on its screen on key press any more, blindly trying to
> reboot via SysRequest had no effect, but the caps lock LED also wasn't
> blinking, like it normally does when a kernel panic occurs. Good
> question what state it was in. The OOM reaper didn't really seem to
> kick in and kill processes this time, it seems.
> 
> The complete capture is up at:
> 
> http://ftp.tisys.org/pub/misc/teela_2016-12-20.log.xz

This is the stall report:
[ 1661.485568] btrfs-transacti: page alloction stalls for 611058ms, order:0, mode:0x2420048(GFP_NOFS|__GFP_HARDWALL|__GFP_MOVABLE)
[ 1661.485859] CPU: 1 PID: 1950 Comm: btrfs-transacti Not tainted 4.9.0-gentoo #4

pid 1950 is trying to allocate for a _long_ time. Considering that this
is the only stall report, this means that reclaim took really long so we
didn't get to the page allocator for that long. It sounds really crazy!

$ xzgrep -w 1950 teela_2016-12-20.log.xz | grep mm_vmscan_lru_shrink_inactive | sed 's@.*nr_reclaimed=\([0-9\]*\).*@\1@' | sort | uniq -c
    509 0
      1 1
      1 10
      5 11
      1 12
      1 14
      1 16
      2 19
      5 2
      1 22
      2 23
      1 25
      3 28
      2 3
      1 4
      4 5

It barely managed to reclaim something. While it has tried a lot. It
had hard times to actually isolate anything:

$ xzgrep -w 1950 teela_2016-12-20.log.xz | grep mm_vmscan_lru_isolate: | sed 's@.*nr_taken=@@' | sort | uniq -c
   8284 0 file=1
      8 11 file=1
      4 14 file=1
      1 1 file=1
      7 23 file=1
      1 25 file=1
      9 2 file=1
    501 32 file=1
      1 3 file=1
      7 5 file=1
      1 6 file=1

a typical mm_vmscan_lru_isolate looks as follows

btrfs-transacti-1950  [001] d...  1368.508008: mm_vmscan_lru_isolate: isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=266727 nr_taken=0 file=1

so the whole inactive lru has been scanned it seems. But we couldn't
isolate a single page. There are two possibilities here. Either we skip
them all because they are from the highmem zone or we fail to
__isolate_lru_page them. Counters will not tell us because nr_scanned
includes skipped pages. I have updated the debugging patch to make this
distinction. I suspect we are skipping all of them...
The later option would be really surprising because the only way to fail
__isolate_lru_page with the 0 isolate_mode is if get_page_unless_zero(page)
fails which would mean we would have pages with 0 reference count on the
LRU list.

The stall message is from a later time so the situation might have
changed but
[ 1661.490170] Node 0 	active_anon:139296kB	inactive_anon:432kB	active_file:1088996kB	inactive_file:1114524kB
[ 1661.490745] DMA 	active_anon:0kB 	inactive_anon:0kB	active_file:9540kB	inactive_file:0kB
[ 1661.491528] Normal 	active_anon:0kB 	inactive_anon:0kB	active_file:530560kB	inactive_file:452kB
[ 1661.513077] HighMem 	active_anon:139296kB 	inactive_anon:432kB	active_file:548896kB	inactive_file:1114068kB

suggests our inactive file LRU is low:
file total_active 1088996 active 540100 total_inactive 1114524 inactive 456 ratio 1 low 1

and we should be rotating active pages. But

$ xzgrep -w 1950 teela_2016-12-20.log.xz | grep mm_vmscan_lru_shrink_active
$

Now inactive_list_is_low is racy but I doubt we can consistently see it
racing and give us a wrong answer. I also do not see it would miss lowmem
zones imbalanced but hidden by highmem zones (assuming those counters
are OK).

That being said, numbers do not make much sense to me, to be honest.
Could you try with the updated tracing patch please?
---
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 4175dca4ac39..61aa9b49e86d 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -503,7 +503,7 @@ void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask);
 extern void __free_pages(struct page *page, unsigned int order);
 extern void free_pages(unsigned long addr, unsigned int order);
 extern void free_hot_cold_page(struct page *page, bool cold);
-extern void free_hot_cold_page_list(struct list_head *list, bool cold);
+extern int free_hot_cold_page_list(struct list_head *list, bool cold);
 
 struct page_frag_cache;
 extern void __page_frag_drain(struct page *page, unsigned int order,
diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index c88fd0934e7e..cbd2fff521f0 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -275,20 +275,22 @@ DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template,
 		int order,
 		unsigned long nr_requested,
 		unsigned long nr_scanned,
+		unsigned long nr_skipped,
 		unsigned long nr_taken,
 		isolate_mode_t isolate_mode,
-		int file),
+		int lru),
 
-	TP_ARGS(classzone_idx, order, nr_requested, nr_scanned, nr_taken, isolate_mode, file),
+	TP_ARGS(classzone_idx, order, nr_requested, nr_scanned, nr_skipped, nr_taken, isolate_mode, lru),
 
 	TP_STRUCT__entry(
 		__field(int, classzone_idx)
 		__field(int, order)
 		__field(unsigned long, nr_requested)
 		__field(unsigned long, nr_scanned)
+		__field(unsigned long, nr_skipped)
 		__field(unsigned long, nr_taken)
 		__field(isolate_mode_t, isolate_mode)
-		__field(int, file)
+		__field(int, lru)
 	),
 
 	TP_fast_assign(
@@ -296,19 +298,21 @@ DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template,
 		__entry->order = order;
 		__entry->nr_requested = nr_requested;
 		__entry->nr_scanned = nr_scanned;
+		__entry->nr_skipped = nr_skipped;
 		__entry->nr_taken = nr_taken;
 		__entry->isolate_mode = isolate_mode;
-		__entry->file = file;
+		__entry->lru = lru;
 	),
 
-	TP_printk("isolate_mode=%d classzone=%d order=%d nr_requested=%lu nr_scanned=%lu nr_taken=%lu file=%d",
+	TP_printk("isolate_mode=%d classzone=%d order=%d nr_requested=%lu nr_scanned=%lu nr_skipped=%lu nr_taken=%lu lru=%d",
 		__entry->isolate_mode,
 		__entry->classzone_idx,
 		__entry->order,
 		__entry->nr_requested,
 		__entry->nr_scanned,
+		__entry->nr_skipped,
 		__entry->nr_taken,
-		__entry->file)
+		__entry->lru)
 );
 
 DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_lru_isolate,
@@ -317,11 +321,12 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_lru_isolate,
 		int order,
 		unsigned long nr_requested,
 		unsigned long nr_scanned,
+		unsigned long nr_skipped,
 		unsigned long nr_taken,
 		isolate_mode_t isolate_mode,
-		int file),
+		int lru),
 
-	TP_ARGS(classzone_idx, order, nr_requested, nr_scanned, nr_taken, isolate_mode, file)
+	TP_ARGS(classzone_idx, order, nr_requested, nr_scanned, nr_skipped, nr_taken, isolate_mode, lru)
 
 );
 
@@ -331,11 +336,12 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_memcg_isolate,
 		int order,
 		unsigned long nr_requested,
 		unsigned long nr_scanned,
+		unsigned long nr_skipped,
 		unsigned long nr_taken,
 		isolate_mode_t isolate_mode,
-		int file),
+		int lru),
 
-	TP_ARGS(classzone_idx, order, nr_requested, nr_scanned, nr_taken, isolate_mode, file)
+	TP_ARGS(classzone_idx, order, nr_requested, nr_scanned, nr_skipped, nr_taken, isolate_mode, lru)
 
 );
 
@@ -365,14 +371,27 @@ TRACE_EVENT(mm_vmscan_lru_shrink_inactive,
 
 	TP_PROTO(int nid,
 		unsigned long nr_scanned, unsigned long nr_reclaimed,
+		unsigned long nr_dirty, unsigned long nr_writeback,
+		unsigned long nr_congested, unsigned long nr_immediate,
+		unsigned long nr_activate, unsigned long nr_ref_keep,
+		unsigned long nr_unmap_fail,
 		int priority, int file),
 
-	TP_ARGS(nid, nr_scanned, nr_reclaimed, priority, file),
+	TP_ARGS(nid, nr_scanned, nr_reclaimed, nr_dirty, nr_writeback,
+		nr_congested, nr_immediate, nr_activate, nr_ref_keep,
+		nr_unmap_fail, priority, file),
 
 	TP_STRUCT__entry(
 		__field(int, nid)
 		__field(unsigned long, nr_scanned)
 		__field(unsigned long, nr_reclaimed)
+		__field(unsigned long, nr_dirty)
+		__field(unsigned long, nr_writeback)
+		__field(unsigned long, nr_congested)
+		__field(unsigned long, nr_immediate)
+		__field(unsigned long, nr_activate)
+		__field(unsigned long, nr_ref_keep)
+		__field(unsigned long, nr_unmap_fail)
 		__field(int, priority)
 		__field(int, reclaim_flags)
 	),
@@ -381,17 +400,100 @@ TRACE_EVENT(mm_vmscan_lru_shrink_inactive,
 		__entry->nid = nid;
 		__entry->nr_scanned = nr_scanned;
 		__entry->nr_reclaimed = nr_reclaimed;
+		__entry->nr_dirty = nr_dirty;
+		__entry->nr_writeback = nr_writeback;
+		__entry->nr_congested = nr_congested;
+		__entry->nr_immediate = nr_immediate;
+		__entry->nr_activate = nr_activate;
+		__entry->nr_ref_keep = nr_ref_keep;
+		__entry->nr_unmap_fail = nr_unmap_fail;
 		__entry->priority = priority;
 		__entry->reclaim_flags = trace_shrink_flags(file);
 	),
 
-	TP_printk("nid=%d nr_scanned=%ld nr_reclaimed=%ld priority=%d flags=%s",
+	TP_printk("nid=%d nr_scanned=%ld nr_reclaimed=%ld nr_dirty=%ld nr_writeback=%ld nr_congested=%ld nr_immediate=%ld nr_activate=%ld nr_ref_keep=%ld nr_unmap_fail=%ld priority=%d flags=%s",
 		__entry->nid,
 		__entry->nr_scanned, __entry->nr_reclaimed,
+		__entry->nr_dirty, __entry->nr_writeback,
+		__entry->nr_congested, __entry->nr_immediate,
+		__entry->nr_activate, __entry->nr_ref_keep,
+		__entry->nr_unmap_fail, __entry->priority,
+		show_reclaim_flags(__entry->reclaim_flags))
+);
+
+TRACE_EVENT(mm_vmscan_lru_shrink_active,
+
+	TP_PROTO(int nid, unsigned long nr_scanned, unsigned long nr_freed,
+		unsigned long nr_unevictable, unsigned long nr_deactivated,
+		unsigned long nr_rotated, int priority, int file),
+
+	TP_ARGS(nid, nr_scanned, nr_freed, nr_unevictable, nr_deactivated, nr_rotated, priority, file),
+
+	TP_STRUCT__entry(
+		__field(int, nid)
+		__field(unsigned long, nr_scanned)
+		__field(unsigned long, nr_freed)
+		__field(unsigned long, nr_unevictable)
+		__field(unsigned long, nr_deactivated)
+		__field(unsigned long, nr_rotated)
+		__field(int, priority)
+		__field(int, reclaim_flags)
+	),
+
+	TP_fast_assign(
+		__entry->nid = nid;
+		__entry->nr_scanned = nr_scanned;
+		__entry->nr_freed = nr_freed;
+		__entry->nr_unevictable = nr_unevictable;
+		__entry->nr_deactivated = nr_deactivated;
+		__entry->nr_rotated = nr_rotated;
+		__entry->priority = priority;
+		__entry->reclaim_flags = trace_shrink_flags(file);
+	),
+
+	TP_printk("nid=%d nr_scanned=%ld nr_freed=%ld nr_unevictable=%ld nr_deactivated=%ld nr_rotated=%ld priority=%d flags=%s",
+		__entry->nid,
+		__entry->nr_scanned, __entry->nr_freed, __entry->nr_unevictable,
+		__entry->nr_deactivated, __entry->nr_rotated,
 		__entry->priority,
 		show_reclaim_flags(__entry->reclaim_flags))
 );
 
+TRACE_EVENT(mm_vmscan_inactive_list_is_low,
+
+	TP_PROTO(int nid, unsigned long total_inactive, unsigned long inactive,
+		unsigned long total_active, unsigned long active,
+		unsigned long ratio, int file),
+
+	TP_ARGS(nid, total_inactive, inactive, total_active, active, ratio, file),
+
+	TP_STRUCT__entry(
+		__field(int, nid)
+		__field(unsigned long, total_inactive)
+		__field(unsigned long, inactive)
+		__field(unsigned long, total_active)
+		__field(unsigned long, active)
+		__field(unsigned long, ratio)
+		__field(int, reclaim_flags)
+	),
+
+	TP_fast_assign(
+		__entry->nid = nid;
+		__entry->total_inactive = total_inactive;
+		__entry->inactive = inactive;
+		__entry->total_active = total_active;
+		__entry->active = active;
+		__entry->ratio = ratio;
+		__entry->reclaim_flags = trace_shrink_flags(file);
+	),
+
+	TP_printk("nid=%d total_inactive=%ld inactive=%ld total_active=%ld active=%ld ratio=%ld flags=%s",
+		__entry->nid,
+		__entry->total_inactive, __entry->inactive,
+		__entry->total_active, __entry->active,
+		__entry->ratio,
+		show_reclaim_flags(__entry->reclaim_flags))
+);
 #endif /* _TRACE_VMSCAN_H */
 
 /* This part must be outside protection */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1c24112308d6..77d204660857 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2487,14 +2487,18 @@ void free_hot_cold_page(struct page *page, bool cold)
 /*
  * Free a list of 0-order pages
  */
-void free_hot_cold_page_list(struct list_head *list, bool cold)
+int free_hot_cold_page_list(struct list_head *list, bool cold)
 {
 	struct page *page, *next;
+	int ret = 0;
 
 	list_for_each_entry_safe(page, next, list, lru) {
 		trace_mm_page_free_batched(page, cold);
 		free_hot_cold_page(page, cold);
+		ret++;
 	}
+
+	return ret;
 }
 
 /*
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c4abf08861d2..0c4707571762 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -902,6 +902,17 @@ static void page_check_dirty_writeback(struct page *page,
 		mapping->a_ops->is_dirty_writeback(page, dirty, writeback);
 }
 
+struct reclaim_stat {
+	unsigned nr_dirty;
+	unsigned nr_unqueued_dirty;
+	unsigned nr_congested;
+	unsigned nr_writeback;
+	unsigned nr_immediate;
+	unsigned nr_activate;
+	unsigned nr_ref_keep;
+	unsigned nr_unmap_fail;
+};
+
 /*
  * shrink_page_list() returns the number of reclaimed pages
  */
@@ -909,22 +920,20 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				      struct pglist_data *pgdat,
 				      struct scan_control *sc,
 				      enum ttu_flags ttu_flags,
-				      unsigned long *ret_nr_dirty,
-				      unsigned long *ret_nr_unqueued_dirty,
-				      unsigned long *ret_nr_congested,
-				      unsigned long *ret_nr_writeback,
-				      unsigned long *ret_nr_immediate,
+				      struct reclaim_stat *stat,
 				      bool force_reclaim)
 {
 	LIST_HEAD(ret_pages);
 	LIST_HEAD(free_pages);
 	int pgactivate = 0;
-	unsigned long nr_unqueued_dirty = 0;
-	unsigned long nr_dirty = 0;
-	unsigned long nr_congested = 0;
-	unsigned long nr_reclaimed = 0;
-	unsigned long nr_writeback = 0;
-	unsigned long nr_immediate = 0;
+	unsigned nr_unqueued_dirty = 0;
+	unsigned nr_dirty = 0;
+	unsigned nr_congested = 0;
+	unsigned nr_reclaimed = 0;
+	unsigned nr_writeback = 0;
+	unsigned nr_immediate = 0;
+	unsigned nr_ref_keep = 0;
+	unsigned nr_unmap_fail = 0;
 
 	cond_resched();
 
@@ -1063,6 +1072,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		case PAGEREF_ACTIVATE:
 			goto activate_locked;
 		case PAGEREF_KEEP:
+			nr_ref_keep++;
 			goto keep_locked;
 		case PAGEREF_RECLAIM:
 		case PAGEREF_RECLAIM_CLEAN:
@@ -1100,6 +1110,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				(ttu_flags | TTU_BATCH_FLUSH | TTU_LZFREE) :
 				(ttu_flags | TTU_BATCH_FLUSH))) {
 			case SWAP_FAIL:
+				nr_unmap_fail++;
 				goto activate_locked;
 			case SWAP_AGAIN:
 				goto keep_locked;
@@ -1266,11 +1277,16 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 	list_splice(&ret_pages, page_list);
 	count_vm_events(PGACTIVATE, pgactivate);
 
-	*ret_nr_dirty += nr_dirty;
-	*ret_nr_congested += nr_congested;
-	*ret_nr_unqueued_dirty += nr_unqueued_dirty;
-	*ret_nr_writeback += nr_writeback;
-	*ret_nr_immediate += nr_immediate;
+	if (stat) {
+		stat->nr_dirty = nr_dirty;
+		stat->nr_congested = nr_congested;
+		stat->nr_unqueued_dirty = nr_unqueued_dirty;
+		stat->nr_writeback = nr_writeback;
+		stat->nr_immediate = nr_immediate;
+		stat->nr_activate = pgactivate;
+		stat->nr_ref_keep = nr_ref_keep;
+		stat->nr_unmap_fail = nr_unmap_fail;
+	}
 	return nr_reclaimed;
 }
 
@@ -1282,7 +1298,7 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 		.priority = DEF_PRIORITY,
 		.may_unmap = 1,
 	};
-	unsigned long ret, dummy1, dummy2, dummy3, dummy4, dummy5;
+	unsigned long ret;
 	struct page *page, *next;
 	LIST_HEAD(clean_pages);
 
@@ -1295,8 +1311,7 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 	}
 
 	ret = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
-			TTU_UNMAP|TTU_IGNORE_ACCESS,
-			&dummy1, &dummy2, &dummy3, &dummy4, &dummy5, true);
+			TTU_UNMAP|TTU_IGNORE_ACCESS, NULL, true);
 	list_splice(&clean_pages, page_list);
 	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -ret);
 	return ret;
@@ -1428,6 +1443,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 	unsigned long nr_taken = 0;
 	unsigned long nr_zone_taken[MAX_NR_ZONES] = { 0 };
 	unsigned long nr_skipped[MAX_NR_ZONES] = { 0, };
+	unsigned long skipped = 0, total_skipped = 0;
 	unsigned long scan, nr_pages;
 	LIST_HEAD(pages_skipped);
 
@@ -1479,14 +1495,13 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 	 */
 	if (!list_empty(&pages_skipped)) {
 		int zid;
-		unsigned long total_skipped = 0;
 
 		for (zid = 0; zid < MAX_NR_ZONES; zid++) {
 			if (!nr_skipped[zid])
 				continue;
 
 			__count_zid_vm_events(PGSCAN_SKIP, zid, nr_skipped[zid]);
-			total_skipped += nr_skipped[zid];
+			skipped += nr_skipped[zid];
 		}
 
 		/*
@@ -1494,13 +1509,13 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 		 * close to unreclaimable. If the LRU list is empty, account
 		 * skipped pages as a full scan.
 		 */
-		scan += list_empty(src) ? total_skipped : total_skipped >> 2;
+		total_skipped = list_empty(src) ? skipped : skipped >> 2;
 
 		list_splice(&pages_skipped, src);
 	}
-	*nr_scanned = scan;
+	*nr_scanned = scan + total_skipped;
 	trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan, scan,
-				    nr_taken, mode, is_file_lru(lru));
+				    skipped, nr_taken, mode, is_file_lru(lru));
 	update_lru_sizes(lruvec, lru, nr_zone_taken, nr_taken);
 	return nr_taken;
 }
@@ -1696,11 +1711,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	unsigned long nr_scanned;
 	unsigned long nr_reclaimed = 0;
 	unsigned long nr_taken;
-	unsigned long nr_dirty = 0;
-	unsigned long nr_congested = 0;
-	unsigned long nr_unqueued_dirty = 0;
-	unsigned long nr_writeback = 0;
-	unsigned long nr_immediate = 0;
+	struct reclaim_stat stat = {};
 	isolate_mode_t isolate_mode = 0;
 	int file = is_file_lru(lru);
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
@@ -1745,9 +1756,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 		return 0;
 
 	nr_reclaimed = shrink_page_list(&page_list, pgdat, sc, TTU_UNMAP,
-				&nr_dirty, &nr_unqueued_dirty, &nr_congested,
-				&nr_writeback, &nr_immediate,
-				false);
+				&stat, false);
 
 	spin_lock_irq(&pgdat->lru_lock);
 
@@ -1781,7 +1790,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	 * of pages under pages flagged for immediate reclaim and stall if any
 	 * are encountered in the nr_immediate check below.
 	 */
-	if (nr_writeback && nr_writeback == nr_taken)
+	if (stat.nr_writeback && stat.nr_writeback == nr_taken)
 		set_bit(PGDAT_WRITEBACK, &pgdat->flags);
 
 	/*
@@ -1793,7 +1802,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 		 * Tag a zone as congested if all the dirty pages scanned were
 		 * backed by a congested BDI and wait_iff_congested will stall.
 		 */
-		if (nr_dirty && nr_dirty == nr_congested)
+		if (stat.nr_dirty && stat.nr_dirty == stat.nr_congested)
 			set_bit(PGDAT_CONGESTED, &pgdat->flags);
 
 		/*
@@ -1802,7 +1811,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 		 * the pgdat PGDAT_DIRTY and kswapd will start writing pages from
 		 * reclaim context.
 		 */
-		if (nr_unqueued_dirty == nr_taken)
+		if (stat.nr_unqueued_dirty == nr_taken)
 			set_bit(PGDAT_DIRTY, &pgdat->flags);
 
 		/*
@@ -1811,7 +1820,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 		 * that pages are cycling through the LRU faster than
 		 * they are written so also forcibly stall.
 		 */
-		if (nr_immediate && current_may_throttle())
+		if (stat.nr_immediate && current_may_throttle())
 			congestion_wait(BLK_RW_ASYNC, HZ/10);
 	}
 
@@ -1826,6 +1835,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 
 	trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id,
 			nr_scanned, nr_reclaimed,
+			stat.nr_dirty,  stat.nr_writeback,
+			stat.nr_congested, stat.nr_immediate,
+			stat.nr_activate, stat.nr_ref_keep, stat.nr_unmap_fail,
 			sc->priority, file);
 	return nr_reclaimed;
 }
@@ -1846,9 +1858,11 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
  *
  * The downside is that we have to touch page->_refcount against each page.
  * But we had to alter page->flags anyway.
+ *
+ * Returns the number of pages moved to the given lru.
  */
 
-static void move_active_pages_to_lru(struct lruvec *lruvec,
+static int move_active_pages_to_lru(struct lruvec *lruvec,
 				     struct list_head *list,
 				     struct list_head *pages_to_free,
 				     enum lru_list lru)
@@ -1857,6 +1871,7 @@ static void move_active_pages_to_lru(struct lruvec *lruvec,
 	unsigned long pgmoved = 0;
 	struct page *page;
 	int nr_pages;
+	int nr_moved = 0;
 
 	while (!list_empty(list)) {
 		page = lru_to_page(list);
@@ -1882,11 +1897,15 @@ static void move_active_pages_to_lru(struct lruvec *lruvec,
 				spin_lock_irq(&pgdat->lru_lock);
 			} else
 				list_add(&page->lru, pages_to_free);
+		} else {
+			nr_moved++;
 		}
 	}
 
 	if (!is_active_lru(lru))
 		__count_vm_events(PGDEACTIVATE, pgmoved);
+
+	return nr_moved;
 }
 
 static void shrink_active_list(unsigned long nr_to_scan,
@@ -1902,7 +1921,8 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	LIST_HEAD(l_inactive);
 	struct page *page;
 	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
-	unsigned long nr_rotated = 0;
+	unsigned long nr_rotated = 0, nr_unevictable = 0;
+	unsigned long nr_freed, nr_deactivate, nr_activate;
 	isolate_mode_t isolate_mode = 0;
 	int file = is_file_lru(lru);
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
@@ -1935,6 +1955,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
 
 		if (unlikely(!page_evictable(page))) {
 			putback_lru_page(page);
+			nr_unevictable++;
 			continue;
 		}
 
@@ -1980,13 +2001,16 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	 */
 	reclaim_stat->recent_rotated[file] += nr_rotated;
 
-	move_active_pages_to_lru(lruvec, &l_active, &l_hold, lru);
-	move_active_pages_to_lru(lruvec, &l_inactive, &l_hold, lru - LRU_ACTIVE);
+	nr_activate = move_active_pages_to_lru(lruvec, &l_active, &l_hold, lru);
+	nr_deactivate = move_active_pages_to_lru(lruvec, &l_inactive, &l_hold, lru - LRU_ACTIVE);
 	__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
 	spin_unlock_irq(&pgdat->lru_lock);
 
 	mem_cgroup_uncharge_list(&l_hold);
-	free_hot_cold_page_list(&l_hold, true);
+	nr_freed = free_hot_cold_page_list(&l_hold, true);
+	trace_mm_vmscan_lru_shrink_active(pgdat->node_id, nr_scanned, nr_freed,
+			nr_unevictable, nr_deactivate, nr_rotated,
+			sc->priority, file);
 }
 
 /*
@@ -2019,8 +2043,8 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
 						struct scan_control *sc)
 {
 	unsigned long inactive_ratio;
-	unsigned long inactive;
-	unsigned long active;
+	unsigned long total_inactive, inactive;
+	unsigned long total_active, active;
 	unsigned long gb;
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
 	int zid;
@@ -2032,8 +2056,8 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
 	if (!file && !total_swap_pages)
 		return false;
 
-	inactive = lruvec_lru_size(lruvec, file * LRU_FILE);
-	active = lruvec_lru_size(lruvec, file * LRU_FILE + LRU_ACTIVE);
+	total_inactive = inactive = lruvec_lru_size(lruvec, file * LRU_FILE);
+	total_active = active = lruvec_lru_size(lruvec, file * LRU_FILE + LRU_ACTIVE);
 
 	/*
 	 * For zone-constrained allocations, it is necessary to check if
@@ -2062,6 +2086,9 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
 	else
 		inactive_ratio = 1;
 
+	trace_mm_vmscan_inactive_list_is_low(pgdat->node_id,
+			total_inactive, inactive,
+			total_active, active, inactive_ratio, file);
 	return inactive * inactive_ratio < active;
 }
 
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-21  7:36                   ` Michal Hocko
@ 2016-12-21 11:00                     ` Tetsuo Handa
  2016-12-21 11:16                       ` Michal Hocko
  2016-12-22 10:10                     ` Nils Holland
  1 sibling, 1 reply; 62+ messages in thread
From: Tetsuo Handa @ 2016-12-21 11:00 UTC (permalink / raw)
  To: mhocko, nholland; +Cc: linux-kernel, linux-mm, clm, dsterba, linux-btrfs

Michal Hocko wrote:
> TL;DR
> there is another version of the debugging patch. Just revert the
> previous one and apply this one instead. It's still not clear what
> is going on but I suspect either some misaccounting or unexpeted
> pages on the LRU lists. I have added one more tracepoint, so please
> enable also mm_vmscan_inactive_list_is_low.
> 
> Hopefully the additional data will tell us more.
> 
> On Tue 20-12-16 03:08:29, Nils Holland wrote:
> > On Mon, Dec 19, 2016 at 02:45:34PM +0100, Michal Hocko wrote:
> > 
> > > Unfortunatelly shrink_active_list doesn't have any tracepoint so we do
> > > not know whether we managed to rotate those pages. If they are referenced
> > > quickly enough we might just keep refaulting them... Could you try to apply
> > > the followin diff on top what you have currently. It should add some more
> > > tracepoint data which might tell us more. We can reduce the amount of
> > > tracing data by enabling only mm_vmscan_lru_isolate,
> > > mm_vmscan_lru_shrink_inactive and mm_vmscan_lru_shrink_active.
> > 
> > So, the results are in! I applied your patch and rebuild the kernel,
> > then I rebooted the machine, set up tracing so that only the three
> > events you mentioned were being traced, and captured the output over
> > the network.
> > 
> > Things went a bit different this time: The trace events started to
> > appear after a while and a whole lot of them were generated, but
> > suddenly they stopped. A short while later, we get

"cat /debug/trace/trace_pipe > /dev/udp/$ip/$port" stops reporting if
/bin/cat is disturbed by page fault and/or memory allocation needed for
sending UDP packets. Since netconsole can send UDP packets without involving
memory allocation, printk() is preferable than tracing under OOM.

> 
> It is possible that you are hitting multiple issues so it would be
> great to focus at one at the time. The underlying problem might be
> same/similar in the end but this is hard to tell now. Could you try to
> reproduce and provide data for the OOM killer situation as well?
>  
> > [ 1661.485568] btrfs-transacti: page alloction stalls for 611058ms, order:0, mode:0x2420048(GFP_NOFS|__GFP_HARDWALL|__GFP_MOVABLE)
> > 
> > along with a backtrace and memory information, and then there was
> > silence.
> 
> > When I walked up to the machine, it had completely died; it
> > wouldn't turn on its screen on key press any more, blindly trying to
> > reboot via SysRequest had no effect, but the caps lock LED also wasn't
> > blinking, like it normally does when a kernel panic occurs. Good
> > question what state it was in. The OOM reaper didn't really seem to
> > kick in and kill processes this time, it seems.
> > 
> > The complete capture is up at:
> > 
> > http://ftp.tisys.org/pub/misc/teela_2016-12-20.log.xz
> 
> This is the stall report:
> [ 1661.485568] btrfs-transacti: page alloction stalls for 611058ms, order:0, mode:0x2420048(GFP_NOFS|__GFP_HARDWALL|__GFP_MOVABLE)
> [ 1661.485859] CPU: 1 PID: 1950 Comm: btrfs-transacti Not tainted 4.9.0-gentoo #4
> 
> pid 1950 is trying to allocate for a _long_ time. Considering that this
> is the only stall report, this means that reclaim took really long so we
> didn't get to the page allocator for that long. It sounds really crazy!

warn_alloc() reports only if !__GFP_NOWARN.

We can report where they were looping using kmallocwd at
http://lkml.kernel.org/r/1478416501-10104-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
(and extend it to call printk() for reporting values using SystemTap which your
trace hooks would report, only during memory allocations are stalling, without
delay caused by page fault and/or memory allocation needed for sending UDP packets).

But if trying to reboot via SysRq-b did not work, I think that the system
was in hard lockup state. That would be a different problem.

By the way, Michal, I'm feeling strange because it seems to me that your
analysis does not refer to the implications of "x86_32 kernel". Maybe
you already referred x86_32 by "they are from the highmem zone" though.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-21 11:00                     ` Tetsuo Handa
@ 2016-12-21 11:16                       ` Michal Hocko
  2016-12-21 14:04                         ` Chris Mason
  0 siblings, 1 reply; 62+ messages in thread
From: Michal Hocko @ 2016-12-21 11:16 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: nholland, linux-kernel, linux-mm, clm, dsterba, linux-btrfs

On Wed 21-12-16 20:00:38, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > TL;DR
> > there is another version of the debugging patch. Just revert the
> > previous one and apply this one instead. It's still not clear what
> > is going on but I suspect either some misaccounting or unexpeted
> > pages on the LRU lists. I have added one more tracepoint, so please
> > enable also mm_vmscan_inactive_list_is_low.
> > 
> > Hopefully the additional data will tell us more.
> > 
> > On Tue 20-12-16 03:08:29, Nils Holland wrote:
[...]
> > > http://ftp.tisys.org/pub/misc/teela_2016-12-20.log.xz
> > 
> > This is the stall report:
> > [ 1661.485568] btrfs-transacti: page alloction stalls for 611058ms, order:0, mode:0x2420048(GFP_NOFS|__GFP_HARDWALL|__GFP_MOVABLE)
> > [ 1661.485859] CPU: 1 PID: 1950 Comm: btrfs-transacti Not tainted 4.9.0-gentoo #4
> > 
> > pid 1950 is trying to allocate for a _long_ time. Considering that this
> > is the only stall report, this means that reclaim took really long so we
> > didn't get to the page allocator for that long. It sounds really crazy!
> 
> warn_alloc() reports only if !__GFP_NOWARN.

yes and the above allocation clear is !__GFP_NOWARN allocation which is
reported after 611s! If there are no prior/lost warn_alloc() then it
implies we have spent _that_ much time in the reclaim. Considering the
tracing data we cannot really rule that out. All the reclaimers would
fight over the lru_lock and considering we are scanning the whole LRU
this will take some time.

[...]

> By the way, Michal, I'm feeling strange because it seems to me that your
> analysis does not refer to the implications of "x86_32 kernel". Maybe
> you already referred x86_32 by "they are from the highmem zone" though.

yes Highmem as well all those scanning anomalies is the 32b kernel
specific thing. I believe I have already mentioned that the 32b kernel
suffers from some inherent issues but I would like to understand what is
going on here before blaming the 32b.

One thing to note here, when we are talking about 32b kernel, things
have changed in 4.8 when we moved from the zone based to node based
reclaim (see b2e18757f2c9 ("mm, vmscan: begin reclaiming pages on a
per-node basis") and associated patches). It is possible that the
reporter is hitting some pathological path which needs fixing but it
might be also related to something else. So I am rather not trying to
blame 32b yet...

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-21 11:16                       ` Michal Hocko
@ 2016-12-21 14:04                         ` Chris Mason
  0 siblings, 0 replies; 62+ messages in thread
From: Chris Mason @ 2016-12-21 14:04 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, nholland, linux-kernel, linux-mm, dsterba, linux-btrfs

On Wed, Dec 21, 2016 at 12:16:53PM +0100, Michal Hocko wrote:
>On Wed 21-12-16 20:00:38, Tetsuo Handa wrote:
>
>One thing to note here, when we are talking about 32b kernel, things
>have changed in 4.8 when we moved from the zone based to node based
>reclaim (see b2e18757f2c9 ("mm, vmscan: begin reclaiming pages on a
>per-node basis") and associated patches). It is possible that the
>reporter is hitting some pathological path which needs fixing but it
>might be also related to something else. So I am rather not trying to
>blame 32b yet...

It might be interesting to put tracing on releasepage and see if btrfs 
is pinning pages around.  I can't see how 32bit kernels would be 
different, but maybe we're hitting a weird corner.

-chris

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-21  7:36                   ` Michal Hocko
  2016-12-21 11:00                     ` Tetsuo Handa
@ 2016-12-22 10:10                     ` Nils Holland
  2016-12-22 10:27                       ` Michal Hocko
  2016-12-22 19:17                       ` Michal Hocko
  1 sibling, 2 replies; 62+ messages in thread
From: Nils Holland @ 2016-12-22 10:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs

On Wed, Dec 21, 2016 at 08:36:59AM +0100, Michal Hocko wrote:
> TL;DR
> there is another version of the debugging patch. Just revert the
> previous one and apply this one instead. It's still not clear what
> is going on but I suspect either some misaccounting or unexpeted
> pages on the LRU lists. I have added one more tracepoint, so please
> enable also mm_vmscan_inactive_list_is_low.

Right, I did just that and can provide a new log. I was also able, in
this case, to reproduce the OOM issues again and not just the "page
allocation stalls" that were the only thing visible in the previous
log. However, the log comes from machine #2 again today, as I'm
unfortunately forced to try this via VPN from work to home today, so I
have exactly one attempt per machine before it goes down and locks up
(and I can only restart it later tonight). Machine #1 failed to
produce good looking results during its one attempt, but what machine #2
produced seems to be exactly what we've been trying to track down, and so
its log us now up at:

http://ftp.tisys.org/pub/misc/boerne_2016-12-22.log.xz

Greetings
Nils

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-22 10:10                     ` Nils Holland
@ 2016-12-22 10:27                       ` Michal Hocko
  2016-12-22 10:35                         ` Nils Holland
  2016-12-22 19:17                       ` Michal Hocko
  1 sibling, 1 reply; 62+ messages in thread
From: Michal Hocko @ 2016-12-22 10:27 UTC (permalink / raw)
  To: Nils Holland
  Cc: Tetsuo Handa, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs

On Thu 22-12-16 11:10:29, Nils Holland wrote:
> On Wed, Dec 21, 2016 at 08:36:59AM +0100, Michal Hocko wrote:
> > TL;DR
> > there is another version of the debugging patch. Just revert the
> > previous one and apply this one instead. It's still not clear what
> > is going on but I suspect either some misaccounting or unexpeted
> > pages on the LRU lists. I have added one more tracepoint, so please
> > enable also mm_vmscan_inactive_list_is_low.
> 
> Right, I did just that and can provide a new log. I was also able, in
> this case, to reproduce the OOM issues again and not just the "page
> allocation stalls" that were the only thing visible in the previous
> log.

Thanks a lot for testing! I will have a look later today.

> However, the log comes from machine #2 again today, as I'm
> unfortunately forced to try this via VPN from work to home today, so I
> have exactly one attempt per machine before it goes down and locks up
> (and I can only restart it later tonight).

This is really surprising to me. Are you sure that you have sysrq
configured properly. At least sysrq+b shouldn't depend on any memory
allocations and should allow you to reboot immediately. A sysrq+m right
before the reboot might turn out being helpful as well.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-22 10:27                       ` Michal Hocko
@ 2016-12-22 10:35                         ` Nils Holland
  2016-12-22 10:46                           ` Tetsuo Handa
  0 siblings, 1 reply; 62+ messages in thread
From: Nils Holland @ 2016-12-22 10:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs

On Thu, Dec 22, 2016 at 11:27:25AM +0100, Michal Hocko wrote:
> On Thu 22-12-16 11:10:29, Nils Holland wrote:
> 
> > However, the log comes from machine #2 again today, as I'm
> > unfortunately forced to try this via VPN from work to home today, so I
> > have exactly one attempt per machine before it goes down and locks up
> > (and I can only restart it later tonight).
> 
> This is really surprising to me. Are you sure that you have sysrq
> configured properly. At least sysrq+b shouldn't depend on any memory
> allocations and should allow you to reboot immediately. A sysrq+m right
> before the reboot might turn out being helpful as well.

Well, the issue is that I could only do everything via ssh today and
don't have any physical access to the machines. In fact, both seem to
have suffered a genuine kernel panic, which is also visible in the
last few lines of the log I provided today. So, basically, both
machines are now sitting at my home in panic state and I'll only be
able to resurrect them wheh I'm physically there again tonight. But
that was expected; I could have waited with the test until I'm at
home, which makes things easier, but I thought the sooner I can
provide a log for you to look at, the better. ;-)

Greetings
Nils

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-22 10:35                         ` Nils Holland
@ 2016-12-22 10:46                           ` Tetsuo Handa
  0 siblings, 0 replies; 62+ messages in thread
From: Tetsuo Handa @ 2016-12-22 10:46 UTC (permalink / raw)
  To: nholland, mhocko; +Cc: linux-kernel, linux-mm, clm, dsterba, linux-btrfs

Nils Holland wrote:
> Well, the issue is that I could only do everything via ssh today and
> don't have any physical access to the machines. In fact, both seem to
> have suffered a genuine kernel panic, which is also visible in the
> last few lines of the log I provided today. So, basically, both
> machines are now sitting at my home in panic state and I'll only be
> able to resurrect them wheh I'm physically there again tonight.

# echo 10 > /proc/sys/kernel/panic

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-22 10:10                     ` Nils Holland
  2016-12-22 10:27                       ` Michal Hocko
@ 2016-12-22 19:17                       ` Michal Hocko
  2016-12-22 21:46                         ` Nils Holland
  1 sibling, 1 reply; 62+ messages in thread
From: Michal Hocko @ 2016-12-22 19:17 UTC (permalink / raw)
  To: Nils Holland
  Cc: Tetsuo Handa, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs

TL;DR I still do not see what is going on here and it still smells like
multiple issues. Please apply the patch below on _top_ of what you had.

On Thu 22-12-16 11:10:29, Nils Holland wrote:
[...]
> http://ftp.tisys.org/pub/misc/boerne_2016-12-22.log.xz

It took me a while to realize that tracepoint and printk messages are
not sorted by the timestamp. Some massaging has fixed that
$ xzcat boerne_2016-12-22.log.xz | sed -e 's@.*192.168.17.32:6665 \[[[:space:]]*\([0-9\.]\+\)\] @\1 @' -e 's@.*192.168.17.32:53062[[:space:]]*\([^[:space:]]\+\)[[:space:]].*[[:space:]]\([0-9\.]\+\):@\2 \1@' | sort -k1 -n -s

461.757468 kswapd0-32 mm_vmscan_lru_isolate: isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=32 nr_skipped=0 nr_taken=32 lru=1
461.757501 kswapd0-32 mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=32 nr_reclaimed=32 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=0 nr_ref_keep=0 nr_unmap_fail=0 p
riority=2 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
461.757504 kswapd0-32 mm_vmscan_inactive_list_is_low: nid=0 total_inactive=11852 inactive=0 total_active=118195 active=0 ratio=1 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
461.757508 kswapd0-32 mm_vmscan_lru_isolate: isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=32 nr_skipped=0 nr_taken=32 lru=1
461.757535 kswapd0-32 mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=32 nr_reclaimed=32 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=0 nr_ref_keep=0 nr_unmap_fail=0 p
riority=2 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
461.757537 kswapd0-32 mm_vmscan_inactive_list_is_low: nid=0 total_inactive=11820 inactive=0 total_active=118195 active=0 ratio=1 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
461.757543 kswapd0-32 mm_vmscan_lru_isolate: isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=32 nr_skipped=0 nr_taken=32 lru=1
461.757584 kswapd0-32 mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=32 nr_reclaimed=32 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate=0 nr_ref_keep=0 nr_unmap_fail=0 p
riority=2 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
461.757588 kswapd0-32 mm_vmscan_inactive_list_is_low: nid=0 total_inactive=11788 inactive=0 total_active=118195 active=0 ratio=1 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
[...]
482.722379 cat-2974 mm_vmscan_inactive_list_is_low: nid=0 total_inactive=9939 inactive=0 total_active=120208 active=0 ratio=1 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
482.722379 cat-2974 mm_vmscan_inactive_list_is_low: nid=0 total_inactive=9939 inactive=0 total_active=120208 active=0 ratio=1 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
482.722379 cat-2974 mm_vmscan_inactive_list_is_low: nid=0 total_inactive=89 inactive=0 total_active=1301 active=0 ratio=1 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC
482.722385 cat-2974 mm_vmscan_inactive_list_is_low: nid=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
482.722386 cat-2974 mm_vmscan_inactive_list_is_low: nid=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC
482.722391 cat-2974 mm_vmscan_inactive_list_is_low: nid=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
482.722391 cat-2974 mm_vmscan_inactive_list_is_low: nid=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC
482.722396 cat-2974 mm_vmscan_inactive_list_is_low: nid=0 total_inactive=1 inactive=0 total_active=21 active=0 ratio=1 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
482.722396 cat-2974 mm_vmscan_inactive_list_is_low: nid=0 total_inactive=0 inactive=0 total_active=131 active=0 ratio=1 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC
482.722397 cat-2974 mm_vmscan_inactive_list_is_low: nid=0 total_inactive=1 inactive=0 total_active=21 active=0 ratio=1 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
482.722397 cat-2974 mm_vmscan_inactive_list_is_low: nid=0 total_inactive=0 inactive=0 total_active=131 active=0 ratio=1 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC
482.722401 cat-2974 mm_vmscan_inactive_list_is_low: nid=0 total_inactive=450730 inactive=0 total_active=206026 active=0 ratio=1 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
484.144971 collect2 invoked oom-killer: gfp_mask=0x27080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=0, order=0, oom_score_adj=0
[...]
484.146871 Node 0 active_anon:100688kB inactive_anon:380kB active_file:1296560kB inactive_file:1848044kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:32180kB dirty:20896kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 40960kB anon_thp: 776kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
484.147097 DMA free:4004kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:8016kB inactive_file:12kB unevictable:0kB writepending:68kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:2652kB slab_unreclaimable:1224kB kernel_stack:8kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
484.147319 lowmem_reserve[]: 0 808 3849 3849
484.147387 Normal free:41016kB min:41100kB low:51372kB high:61644kB active_anon:0kB inactive_anon:0kB active_file:464688kB inactive_file:48kB unevictable:0kB writepending:2684kB present:897016kB managed:831472kB mlocked:0kB slab_reclaimable:215812kB slab_unreclaimable:90092kB kernel_stack:1336kB pagetables:1436kB bounce:0kB free_pcp:372kB local_pcp:176kB free_cma:0kB
484.149971 lowmem_reserve[]: 0 0 24330 24330
484.152390 HighMem free:332648kB min:512kB low:39184kB high:77856kB active_anon:100688kB inactive_anon:380kB active_file:823856kB inactive_file:1847984kB unevictable:0kB writepending:18144kB present:3114256kB managed:3114256kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:836kB local_pcp:156kB free_cma:0kB

Unfortunately LOST EVENT are not logged with the timestamp but there are
many lost events between 10:55:31-33 which corresponds to above time
range in timestamps:
$ xzgrep "10:55:3[1-3].*LOST" boerne_2016-12-22.log.xz | awk '{sum+=$6}END{print sum}'
5616415

so we do not have a good picture again :/ One thing is highly suspicious
though. I really doubt the _whole_ pagecache went down to zero and then up
in such a short time:
482.722379 cat-2974 mm_vmscan_inactive_list_is_low: nid=0 total_inactive=89 inactive=0 total_active=1301 active=0 ratio=1 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC
482.722397 cat-2974 mm_vmscan_inactive_list_is_low: nid=0 total_inactive=1 inactive=0 total_active=21 active=0 ratio=1 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
482.722401 cat-2974 mm_vmscan_inactive_list_is_low: nid=0 total_inactive=450730 inactive=0 total_active=206026 active=0 ratio=1 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC

File inactive 450730 resp. active 206026 roughly match the global
counters in the oom report so I would trust this to be more realistic. I
simply do not see any large source of the LRU isolation. Maybe those
pages have been truncated and new ones allocated. The time window is
really short though but who knows...

Another possibility would be a misaccounting but I do not see anything
that would use __mod_zone_page_state and __mod_node_page_state on LRU
handles node vs. zone counters inconsistently. Everything seems to go
via __update_lru_size.

Another thing to check would be the per-cpu counters usage. The
following patch should use the more precise numbers. I am also not
sure about the lockless nature of inactive_list_is_low so the patch
below adds the lru_lock there.

The only clear thing is that mm_vmscan_lru_isolate indeed skipped
through the whole list without finding a single suitable page
when it couldn't isolate any pages. So the failure is not due to
get_page_unless_zero.
$ xzgrep "mm_vmscan_lru_isolate.*nr_taken=0" boerne_2016-12-22.log.xz | sed 's@.*nr_scanned=\([0-9]*\).*@\1@' | sort | uniq -c
   7941 0

I am not able to draw any conclusion now. I am suspecting get_scan_count
as well. Let's see whether the patch below makes any difference and if
not I will dig into g_s_c some more. I will think about it some more,
maybe somebody else will notice something so I am sending this half
baked analysis.

---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index cb82913b62bb..8727b68a8e70 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -239,7 +239,7 @@ unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru)
 	if (!mem_cgroup_disabled())
 		return mem_cgroup_get_lru_size(lruvec, lru);
 
-	return node_page_state(lruvec_pgdat(lruvec), NR_LRU_BASE + lru);
+	return node_page_state_snapshot(lruvec_pgdat(lruvec), NR_LRU_BASE + lru);
 }
 
 /*
@@ -2056,6 +2056,7 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
 	if (!file && !total_swap_pages)
 		return false;
 
+	spin_lock_irq(&pgdat->lru_lock);
 	total_inactive = inactive = lruvec_lru_size(lruvec, file * LRU_FILE);
 	total_active = active = lruvec_lru_size(lruvec, file * LRU_FILE + LRU_ACTIVE);
 
@@ -2071,14 +2072,15 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
 		if (!managed_zone(zone))
 			continue;
 
-		inactive_zone = zone_page_state(zone,
+		inactive_zone = zone_page_state_snapshot(zone,
 				NR_ZONE_LRU_BASE + (file * LRU_FILE));
-		active_zone = zone_page_state(zone,
+		active_zone = zone_page_state_snapshot(zone,
 				NR_ZONE_LRU_BASE + (file * LRU_FILE) + LRU_ACTIVE);
 
 		inactive -= min(inactive, inactive_zone);
 		active -= min(active, active_zone);
 	}
+	spin_unlock_irq(&pgdat->lru_lock);
 
 	gb = (inactive + active) >> (30 - PAGE_SHIFT);
 	if (gb)
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-22 19:17                       ` Michal Hocko
@ 2016-12-22 21:46                         ` Nils Holland
  2016-12-23 10:51                           ` Michal Hocko
  0 siblings, 1 reply; 62+ messages in thread
From: Nils Holland @ 2016-12-22 21:46 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs

On Thu, Dec 22, 2016 at 08:17:19PM +0100, Michal Hocko wrote:
> TL;DR I still do not see what is going on here and it still smells like
> multiple issues. Please apply the patch below on _top_ of what you had.

I've run the usual procedure again with the new patch on top and the
log is now up at:

http://ftp.tisys.org/pub/misc/boerne_2016-12-22_2.log.xz

As a little side note: It is likely, but I cannot completely say for
sure yet, that this issue is rather easy to reproduce. When I had some
time today at work, I set up a fresh Debian Sid installation in a VM
(32 bit PAE kernel, 4 GB RAM, btrfs as root fs). I used some late 4.9rc(8?)
kernel supplied by Debian - they don't seem to have 4.9 final yet and I
didn't come around to build and use a custom 4.9 final kernel, probably
even with your patches. But the 4.9rc kernel there seemed to behave very much
the same as the 4.9 kernel on my real 32 bit machines does: All I had
to do was unpack a few big tarballs - firefox, libreoffice and the
kernel are my favorites - and the machine would start OOMing.

This might suggest - although I have to admit, again, that this is
inconclusive, as I've not used a final 4.9 kernel - that you could
very easily reproduce the issue yourself by just setting up a 32 bit
system with a btrfs filesystem and then unpacking a few huge tarballs.
Of course, I'm more than happy to continue giving any patches sent to
me a spin, but I thought I'd still mention this in case it makes
things easier for you. :-)

Greetings
Nils

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-22 21:46                         ` Nils Holland
@ 2016-12-23 10:51                           ` Michal Hocko
  2016-12-23 12:18                             ` Nils Holland
  0 siblings, 1 reply; 62+ messages in thread
From: Michal Hocko @ 2016-12-23 10:51 UTC (permalink / raw)
  To: Nils Holland
  Cc: Tetsuo Handa, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs

TL;DR
drop the last patch, check whether memory cgroup is enabled and retest
with cgroup_disable=memory to see whether this is memcg related and if
it is _not_ then try to test with the patch below

On Thu 22-12-16 22:46:11, Nils Holland wrote:
> On Thu, Dec 22, 2016 at 08:17:19PM +0100, Michal Hocko wrote:
> > TL;DR I still do not see what is going on here and it still smells like
> > multiple issues. Please apply the patch below on _top_ of what you had.
> 
> I've run the usual procedure again with the new patch on top and the
> log is now up at:
> 
> http://ftp.tisys.org/pub/misc/boerne_2016-12-22_2.log.xz

OK, so there are still large page cache fluctuations even with the
locking applied:
472.042409 kswapd0-32 mm_vmscan_inactive_list_is_low: nid=0 total_inactive=450451 inactive=0 total_active=210056 active=0 ratio=1 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
472.042442 kswapd0-32 mm_vmscan_inactive_list_is_low: nid=0 total_inactive=0 inactive=0 total_active=0 active=0 ratio=1 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
472.042451 kswapd0-32 mm_vmscan_inactive_list_is_low: nid=0 total_inactive=0 inactive=0 total_active=12 active=0 ratio=1 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
472.042484 kswapd0-32 mm_vmscan_inactive_list_is_low: nid=0 total_inactive=11944 inactive=0 total_active=117286 active=0 ratio=1 flags=RECLAIM_WB_FILE|RECLAIM_WB

One thing that didn't occure to me previously was that this might be an
effect of the memory cgroups. Do you have memory cgroups enabled? If
yes then reruning with cgroup_disable=memory would be interesting
as well.

Anyway, now I am looking at get_scan_count which determines how many pages
we should scan on each LRU list. The problem I can see there is that
it doesn't reflect eligible zones (or at least it doesn't do that
consistently). So it might happen we simply decide to scan the whole LRU
list (when we get down to prio 0 because we cannot make any progress)
and then _slowly_ scan through it in SWAP_CLUSTER_MAX chunks each
time. This can take a lot of time and who knows what might have happened
if there are many such reclaimers in parallel.

[...]

> This might suggest - although I have to admit, again, that this is
> inconclusive, as I've not used a final 4.9 kernel - that you could
> very easily reproduce the issue yourself by just setting up a 32 bit
> system with a btrfs filesystem and then unpacking a few huge tarballs.
> Of course, I'm more than happy to continue giving any patches sent to
> me a spin, but I thought I'd still mention this in case it makes
> things easier for you. :-)

I would appreciate to stick with your setup to not pull new unknows into
the picture.
---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index cb82913b62bb..533bb591b0be 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -243,6 +243,35 @@ unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru)
 }
 
 /*
+ * Return the number of pages on the given lru which are eligibne for the
+ * given zone_idx
+ */
+static unsigned long lruvec_lru_size_zone_idx(struct lruvec *lruvec,
+		enum lru_list lru, int zone_idx)
+{
+	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
+	unsigned long lru_size;
+	int zid;
+
+	if (!mem_cgroup_disabled())
+		return mem_cgroup_get_lru_size(lruvec, lru);
+
+	lru_size = lruvec_lru_size(lruvec, lru);
+	for (zid = zone_idx + 1; zid < MAX_NR_ZONES; zid++) {
+		struct zone *zone = &pgdat->node_zones[zid];
+		unsigned long size;
+
+		if (!managed_zone(zone))
+			continue;
+
+		size = zone_page_state(zone, NR_ZONE_LRU_BASE + lru);
+		lru_size -= min(size, lru_size);
+	}
+
+	return lru_size;
+}
+
+/*
  * Add a shrinker callback to be called from the vm.
  */
 int register_shrinker(struct shrinker *shrinker)
@@ -2228,7 +2257,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
 	 * system is under heavy pressure.
 	 */
 	if (!inactive_list_is_low(lruvec, true, sc) &&
-	    lruvec_lru_size(lruvec, LRU_INACTIVE_FILE) >> sc->priority) {
+	    lruvec_lru_size_zone_idx(lruvec, LRU_INACTIVE_FILE, sc->reclaim_idx) >> sc->priority) {
 		scan_balance = SCAN_FILE;
 		goto out;
 	}
@@ -2295,7 +2324,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
 			unsigned long size;
 			unsigned long scan;
 
-			size = lruvec_lru_size(lruvec, lru);
+			size = lruvec_lru_size_zone_idx(lruvec, lru, sc->reclaim_idx);
 			scan = size >> sc->priority;
 
 			if (!scan && pass && force_scan)
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-23 10:51                           ` Michal Hocko
@ 2016-12-23 12:18                             ` Nils Holland
  2016-12-23 12:57                               ` Michal Hocko
  0 siblings, 1 reply; 62+ messages in thread
From: Nils Holland @ 2016-12-23 12:18 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs

On Fri, Dec 23, 2016 at 11:51:57AM +0100, Michal Hocko wrote:
> TL;DR
> drop the last patch, check whether memory cgroup is enabled and retest
> with cgroup_disable=memory to see whether this is memcg related and if
> it is _not_ then try to test with the patch below

Right, it seems we might be looking in the right direction! So I
removed the previous patch from my kernel and verified if memory
cgroup was enabled, and indeed, it was. So I booted with
cgroup_disable=memory and ran my ordinary test again ... and in fact,
no ooms! I could have the firefox sources building and unpack half a
dozen big tarballs, which would previously with 99% certainty already
trigger an OOM upon unpacking the first tarball. Also, the system
seemed to run noticably "nicer", in the sense that the other processes
I had running (like htop) would not get delayed / hung. The new patch
you sent has, as per your instructions, NOT been applied.

I've provided a log of this run, it's available at:

http://ftp.tisys.org/pub/misc/boerne_2016-12-23.log.xz

As no OOMs or other bad situations occured, no memory information was
forcibly logged. However, about three times I triggered a memory info
manually via SysReq, because I guess that might be interesting for you
to look at.

I'd like to run the same test on my second machine as well just to
make sure that cgroup_disable=memory has an effect there too. I
should be able to do that later tonight and will report back as soon
as I know more!

> I would appreciate to stick with your setup to not pull new unknows into
> the picture.

No problem! It's just likely that I won't be able to test during the
following days until Dec 27th, but after that I should be back to
normal and thus be able to run further tests in a timely fashion. :-)

Greetings
Nils

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: OOM: Better, but still there on
  2016-12-23 12:18                             ` Nils Holland
@ 2016-12-23 12:57                               ` Michal Hocko
  2016-12-23 14:47                                 ` [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on) Michal Hocko
  0 siblings, 1 reply; 62+ messages in thread
From: Michal Hocko @ 2016-12-23 12:57 UTC (permalink / raw)
  To: Nils Holland
  Cc: Tetsuo Handa, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs

On Fri 23-12-16 13:18:51, Nils Holland wrote:
> On Fri, Dec 23, 2016 at 11:51:57AM +0100, Michal Hocko wrote:
> > TL;DR
> > drop the last patch, check whether memory cgroup is enabled and retest
> > with cgroup_disable=memory to see whether this is memcg related and if
> > it is _not_ then try to test with the patch below
> 
> Right, it seems we might be looking in the right direction! So I
> removed the previous patch from my kernel and verified if memory
> cgroup was enabled, and indeed, it was. So I booted with
> cgroup_disable=memory and ran my ordinary test again ... and in fact,
> no ooms!

OK, thanks for confirmation. I could have figured that earlier. The
pagecache differences in such a short time should have raised the red
flag and point towards memcgs...

[...]
> > I would appreciate to stick with your setup to not pull new unknows into
> > the picture.
> 
> No problem! It's just likely that I won't be able to test during the
> following days until Dec 27th, but after that I should be back to
> normal and thus be able to run further tests in a timely fashion. :-)

no problem at all. I will try to cook up a patch in the mean time.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)
  2016-12-23 12:57                               ` Michal Hocko
@ 2016-12-23 14:47                                 ` Michal Hocko
  2016-12-23 22:26                                   ` Nils Holland
  2016-12-25 22:25                                   ` [lkp-developer] [mm, memcg] d18e2b2aca: WARNING:at_mm/memcontrol.c:#mem_cgroup_update_lru_size kernel test robot
  0 siblings, 2 replies; 62+ messages in thread
From: Michal Hocko @ 2016-12-23 14:47 UTC (permalink / raw)
  To: Nils Holland, Mel Gorman, Johannes Weiner, Vladimir Davydov
  Cc: Tetsuo Handa, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs

[Add Mel, Johannes and Vladimir - the email thread started here
http://lkml.kernel.org/r/20161215225702.GA27944@boerne.fritz.box
The long story short, the zone->node reclaim change has broken active
list aging for lowmem requests when memory cgroups are enabled. More
details below.

On Fri 23-12-16 13:57:28, Michal Hocko wrote:
> On Fri 23-12-16 13:18:51, Nils Holland wrote:
> > On Fri, Dec 23, 2016 at 11:51:57AM +0100, Michal Hocko wrote:
> > > TL;DR
> > > drop the last patch, check whether memory cgroup is enabled and retest
> > > with cgroup_disable=memory to see whether this is memcg related and if
> > > it is _not_ then try to test with the patch below
> > 
> > Right, it seems we might be looking in the right direction! So I
> > removed the previous patch from my kernel and verified if memory
> > cgroup was enabled, and indeed, it was. So I booted with
> > cgroup_disable=memory and ran my ordinary test again ... and in fact,
> > no ooms!
> 
> OK, thanks for confirmation. I could have figured that earlier. The
> pagecache differences in such a short time should have raised the red
> flag and point towards memcgs...
> 
> [...]
> > > I would appreciate to stick with your setup to not pull new unknows into
> > > the picture.
> > 
> > No problem! It's just likely that I won't be able to test during the
> > following days until Dec 27th, but after that I should be back to
> > normal and thus be able to run further tests in a timely fashion. :-)
> 
> no problem at all. I will try to cook up a patch in the mean time.

So here is my attempt. Only compile tested so be careful, it might eat
your kittens or do more harm. I would appreciate other guys to have a
look to see whether this is sane. There are probably other places which
would need some tweaks. I think that get_scan_count needs some tweaks
as well because we should only consider eligible zones when counting the
number of pages to scan. This would be for a separate patch which I will
send later. I just want to fix this one first.

Nils, even though this is still highly experimental, could you give it a
try please?
---
>From a66fd89d43e9fd8ca9afa7e6c7252ab73d22b686 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Fri, 23 Dec 2016 15:11:54 +0100
Subject: [PATCH] mm, memcg: fix the active list aging for lowmem requests when
 memcg is enabled

Nils Holland has reported unexpected OOM killer invocations with 32b
kernel starting with 4.8 kernels

	kworker/u4:5 invoked oom-killer: gfp_mask=0x2400840(GFP_NOFS|__GFP_NOFAIL), nodemask=0, order=0, oom_score_adj=0
	kworker/u4:5 cpuset=/ mems_allowed=0
	CPU: 1 PID: 2603 Comm: kworker/u4:5 Not tainted 4.9.0-gentoo #2
	[...]
	Mem-Info:
	active_anon:58685 inactive_anon:90 isolated_anon:0
	 active_file:274324 inactive_file:281962 isolated_file:0
	 unevictable:0 dirty:649 writeback:0 unstable:0
	 slab_reclaimable:40662 slab_unreclaimable:17754
	 mapped:7382 shmem:202 pagetables:351 bounce:0
	 free:206736 free_pcp:332 free_cma:0
	Node 0 active_anon:234740kB inactive_anon:360kB active_file:1097296kB inactive_file:1127848kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:29528kB dirty:2596kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 184320kB anon_thp: 808kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
	DMA free:3952kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:7316kB inactive_file:0kB unevictable:0kB writepending:96kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:3200kB slab_unreclaimable:1408kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
	lowmem_reserve[]: 0 813 3474 3474
	Normal free:41332kB min:41368kB low:51708kB high:62048kB active_anon:0kB inactive_anon:0kB active_file:532748kB inactive_file:44kB unevictable:0kB writepending:24kB present:897016kB managed:836248kB mlocked:0kB slab_reclaimable:159448kB slab_unreclaimable:69608kB kernel_stack:1112kB pagetables:1404kB bounce:0kB free_pcp:528kB local_pcp:340kB free_cma:0kB
	lowmem_reserve[]: 0 0 21292 21292
	HighMem free:781660kB min:512kB low:34356kB high:68200kB active_anon:234740kB inactive_anon:360kB active_file:557232kB inactive_file:1127804kB unevictable:0kB writepending:2592kB present:2725384kB managed:2725384kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:800kB local_pcp:608kB free_cma:0kB

the oom killer is clearly pre-mature because there there is still a
lot of page cache in the zone Normal which should satisfy this lowmem
request. Further debugging has shown that the reclaim cannot make any
forward progress because the page cache is hidden in the active list
which doesn't get rotated because inactive_list_is_low is not memcg
aware.
It simply subtracts per-zone highmem counters from the respective
memcg's lru sizes which doesn't make any sense. We can simply end up
always seeing the resulting active and inactive counts 0 and return
false. This issue is not limited to 32b kernels but in practice the
effect on systems without CONFIG_HIGHMEM would be much harder to notice
because we do not invoke the OOM killer for allocations requests
targeting < ZONE_NORMAL.

Fix the issue by tracking per zone lru page counts in mem_cgroup_per_node
and subtract per-memcg highmem counts when memcg is enabled. Introduce
helper lruvec_zone_lru_size which redirects to either zone counters or
mem_cgroup_get_zone_lru_size when appropriate.

Fixes: f8d1a31163fc ("mm: consider whether to decivate based on eligible zones inactive ratio")
Cc: stable # 4.8+
Reported-by: Nils Holland <nholland@tisys.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/memcontrol.h | 26 +++++++++++++++++++++++---
 include/linux/mm_inline.h  |  2 +-
 mm/memcontrol.c            | 11 ++++++-----
 mm/vmscan.c                | 26 ++++++++++++++++----------
 4 files changed, 46 insertions(+), 19 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 61d20c17f3b7..002cb08b0f3e 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -120,7 +120,7 @@ struct mem_cgroup_reclaim_iter {
  */
 struct mem_cgroup_per_node {
 	struct lruvec		lruvec;
-	unsigned long		lru_size[NR_LRU_LISTS];
+	unsigned long		lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS];
 
 	struct mem_cgroup_reclaim_iter	iter[DEF_PRIORITY + 1];
 
@@ -432,7 +432,7 @@ static inline bool mem_cgroup_online(struct mem_cgroup *memcg)
 int mem_cgroup_select_victim_node(struct mem_cgroup *memcg);
 
 void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
-		int nr_pages);
+		int zid, int nr_pages);
 
 unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg,
 					   int nid, unsigned int lru_mask);
@@ -441,9 +441,23 @@ static inline
 unsigned long mem_cgroup_get_lru_size(struct lruvec *lruvec, enum lru_list lru)
 {
 	struct mem_cgroup_per_node *mz;
+	unsigned long nr_pages = 0;
+	int zid;
 
 	mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
-	return mz->lru_size[lru];
+	for (zid = 0; zid < MAX_NR_ZONES; zid++)
+		nr_pages += mz->lru_zone_size[zid][lru];
+	return nr_pages;
+}
+
+static inline
+unsigned long mem_cgroup_get_zone_lru_size(struct lruvec *lruvec, enum lru_list lru,
+					   int zone_idx)
+{
+	struct mem_cgroup_per_node *mz;
+
+	mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
+	return mz->lru_zone_size[zone_idx][lru];
 }
 
 void mem_cgroup_handle_over_high(void);
@@ -671,6 +685,12 @@ mem_cgroup_get_lru_size(struct lruvec *lruvec, enum lru_list lru)
 {
 	return 0;
 }
+static inline
+unsigned long mem_cgroup_get_zone_lru_size(struct lruvec *lruvec, enum lru_list lru,
+					   int zone_idx)
+{
+	return 0;
+}
 
 static inline unsigned long
 mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg,
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index 71613e8a720f..41d376e7116d 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -39,7 +39,7 @@ static __always_inline void update_lru_size(struct lruvec *lruvec,
 {
 	__update_lru_size(lruvec, lru, zid, nr_pages);
 #ifdef CONFIG_MEMCG
-	mem_cgroup_update_lru_size(lruvec, lru, nr_pages);
+	mem_cgroup_update_lru_size(lruvec, lru, zid, nr_pages);
 #endif
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 91dfc7c5ce8f..f4e9c4d49df3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -625,8 +625,8 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
 unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg,
 					   int nid, unsigned int lru_mask)
 {
+	struct lruvec *lruvec = mem_cgroup_lruvec(NODE_DATA(nid), memcg);
 	unsigned long nr = 0;
-	struct mem_cgroup_per_node *mz;
 	enum lru_list lru;
 
 	VM_BUG_ON((unsigned)nid >= nr_node_ids);
@@ -634,8 +634,7 @@ unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg,
 	for_each_lru(lru) {
 		if (!(BIT(lru) & lru_mask))
 			continue;
-		mz = mem_cgroup_nodeinfo(memcg, nid);
-		nr += mz->lru_size[lru];
+		nr += mem_cgroup_get_lru_size(lruvec, lru);
 	}
 	return nr;
 }
@@ -1002,6 +1001,7 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd
  * mem_cgroup_update_lru_size - account for adding or removing an lru page
  * @lruvec: mem_cgroup per zone lru vector
  * @lru: index of lru list the page is sitting on
+ * @zid: zone id of the accounted pages
  * @nr_pages: positive when adding or negative when removing
  *
  * This function must be called under lru_lock, just before a page is added
@@ -1009,7 +1009,7 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd
  * so as to allow it to check that lru_size 0 is consistent with list_empty).
  */
 void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
-				int nr_pages)
+				int zid, int nr_pages)
 {
 	struct mem_cgroup_per_node *mz;
 	unsigned long *lru_size;
@@ -1020,7 +1020,7 @@ void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
 		return;
 
 	mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
-	lru_size = mz->lru_size + lru;
+	lru_size = &mz->lru_zone_size[zid][lru];
 	empty = list_empty(lruvec->lists + lru);
 
 	if (nr_pages < 0)
@@ -1036,6 +1036,7 @@ void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
 
 	if (nr_pages > 0)
 		*lru_size += nr_pages;
+	mz->lru_zone_size[zid][lru] += nr_pages;
 }
 
 bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c4abf08861d2..c98b1a585992 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -242,6 +242,15 @@ unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru)
 	return node_page_state(lruvec_pgdat(lruvec), NR_LRU_BASE + lru);
 }
 
+unsigned long lruvec_zone_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx)
+{
+	if (!mem_cgroup_disabled())
+		return mem_cgroup_get_zone_lru_size(lruvec, lru, zone_idx);
+
+	return zone_page_state(&lruvec_pgdat(lruvec)->node_zones[zone_idx],
+			       NR_ZONE_LRU_BASE + lru);
+}
+
 /*
  * Add a shrinker callback to be called from the vm.
  */
@@ -1382,8 +1391,7 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode)
  * be complete before mem_cgroup_update_lru_size due to a santity check.
  */
 static __always_inline void update_lru_sizes(struct lruvec *lruvec,
-			enum lru_list lru, unsigned long *nr_zone_taken,
-			unsigned long nr_taken)
+			enum lru_list lru, unsigned long *nr_zone_taken)
 {
 	int zid;
 
@@ -1392,11 +1400,11 @@ static __always_inline void update_lru_sizes(struct lruvec *lruvec,
 			continue;
 
 		__update_lru_size(lruvec, lru, zid, -nr_zone_taken[zid]);
-	}
-
 #ifdef CONFIG_MEMCG
-	mem_cgroup_update_lru_size(lruvec, lru, -nr_taken);
+		mem_cgroup_update_lru_size(lruvec, lru, zid, -nr_zone_taken[zid]);
 #endif
+	}
+
 }
 
 /*
@@ -1501,7 +1509,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 	*nr_scanned = scan;
 	trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan, scan,
 				    nr_taken, mode, is_file_lru(lru));
-	update_lru_sizes(lruvec, lru, nr_zone_taken, nr_taken);
+	update_lru_sizes(lruvec, lru, nr_zone_taken);
 	return nr_taken;
 }
 
@@ -2047,10 +2055,8 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
 		if (!managed_zone(zone))
 			continue;
 
-		inactive_zone = zone_page_state(zone,
-				NR_ZONE_LRU_BASE + (file * LRU_FILE));
-		active_zone = zone_page_state(zone,
-				NR_ZONE_LRU_BASE + (file * LRU_FILE) + LRU_ACTIVE);
+		inactive_zone = lruvec_zone_lru_size(lruvec, file * LRU_FILE, zid);
+		active_zone = lruvec_zone_lru_size(lruvec, (file * LRU_FILE) + LRU_ACTIVE, zid);
 
 		inactive -= min(inactive, inactive_zone);
 		active -= min(active, active_zone);
-- 
2.10.2


-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)
  2016-12-23 14:47                                 ` [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on) Michal Hocko
@ 2016-12-23 22:26                                   ` Nils Holland
  2016-12-26 12:48                                     ` Michal Hocko
  2016-12-25 22:25                                   ` [lkp-developer] [mm, memcg] d18e2b2aca: WARNING:at_mm/memcontrol.c:#mem_cgroup_update_lru_size kernel test robot
  1 sibling, 1 reply; 62+ messages in thread
From: Nils Holland @ 2016-12-23 22:26 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mel Gorman, Johannes Weiner, Vladimir Davydov, Tetsuo Handa,
	linux-kernel, linux-mm, Chris Mason, David Sterba, linux-btrfs

On Fri, Dec 23, 2016 at 03:47:39PM +0100, Michal Hocko wrote:
> 
> Nils, even though this is still highly experimental, could you give it a
> try please?

Yes, no problem! So I kept the very first patch you sent but had to
revert the latest version of the debugging patch (the one in
which you added the "mm_vmscan_inactive_list_is_low" event) because
otherwise the patch you just sent wouldn't apply. Then I rebooted with
memory cgroups enabled again, and the first thing that strikes the eye
is that I get this during boot:

[    1.568174] ------------[ cut here ]------------
[    1.568327] WARNING: CPU: 0 PID: 1 at mm/memcontrol.c:1032 mem_cgroup_update_lru_size+0x118/0x130
[    1.568543] mem_cgroup_update_lru_size(f4406400, 2, 1): lru_size 0 but not empty
[    1.568754] Modules linked in:
[    1.568922] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.0-gentoo #6
[    1.569052] Hardware name: Hewlett-Packard Compaq 15 Notebook PC/21F7, BIOS F.22 08/06/2014
[    1.571750]  f44e5b84 c142bdee f44e5bc8 c1b5ade0 f44e5bb4 c103ab1d c1b583e4 f44e5be4
[    1.572262]  00000001 c1b5ade0 00000408 c11603d8 00000408 00000000 c1b5af73 00000001
[    1.572774]  f44e5bd0 c103ab76 00000009 00000000 f44e5bc8 c1b583e4 f44e5be4 f44e5c18
[    1.573285] Call Trace:
[    1.573419]  [<c142bdee>] dump_stack+0x47/0x69
[    1.573551]  [<c103ab1d>] __warn+0xed/0x110
[    1.573681]  [<c11603d8>] ? mem_cgroup_update_lru_size+0x118/0x130
[    1.573812]  [<c103ab76>] warn_slowpath_fmt+0x36/0x40
[    1.573942]  [<c11603d8>] mem_cgroup_update_lru_size+0x118/0x130
[    1.574076]  [<c1111467>] __pagevec_lru_add_fn+0xd7/0x1b0
[    1.574206]  [<c1111390>] ? perf_trace_mm_lru_insertion+0x150/0x150
[    1.574336]  [<c111239d>] pagevec_lru_move_fn+0x4d/0x80
[    1.574465]  [<c1111390>] ? perf_trace_mm_lru_insertion+0x150/0x150
[    1.574595]  [<c11127e5>] __lru_cache_add+0x45/0x60
[    1.574724]  [<c1112848>] lru_cache_add+0x8/0x10
[    1.574852]  [<c1102fc1>] add_to_page_cache_lru+0x61/0xc0
[    1.574982]  [<c110418e>] pagecache_get_page+0xee/0x270
[    1.575111]  [<c11060f0>] grab_cache_page_write_begin+0x20/0x40
[    1.575243]  [<c118b955>] simple_write_begin+0x25/0xd0
[    1.575372]  [<c11061b8>] generic_perform_write+0xa8/0x1a0
[    1.575503]  [<c1106447>] __generic_file_write_iter+0x197/0x1f0
[    1.575634]  [<c110663f>] generic_file_write_iter+0x19f/0x2b0
[    1.575766]  [<c11669c1>] __vfs_write+0xd1/0x140
[    1.575897]  [<c1166bc5>] vfs_write+0x95/0x1b0
[    1.576026]  [<c1166daf>] SyS_write+0x3f/0x90
[    1.576157]  [<c1ce4474>] xwrite+0x1c/0x4b
[    1.576285]  [<c1ce44c5>] do_copy+0x22/0xac
[    1.576413]  [<c1ce42c3>] write_buffer+0x1d/0x2c
[    1.576540]  [<c1ce42f0>] flush_buffer+0x1e/0x70
[    1.576670]  [<c1d0eae8>] unxz+0x149/0x211
[    1.576798]  [<c1d0e99f>] ? unlzo+0x359/0x359
[    1.576926]  [<c1ce4946>] unpack_to_rootfs+0x14f/0x246
[    1.577054]  [<c1ce42d2>] ? write_buffer+0x2c/0x2c
[    1.577183]  [<c1ce4216>] ? initrd_load+0x3b/0x3b
[    1.577312]  [<c1ce4b20>] ? maybe_link.part.3+0xe3/0xe3
[    1.577443]  [<c1ce4b67>] populate_rootfs+0x47/0x8f
[    1.577573]  [<c1000456>] do_one_initcall+0x36/0x150
[    1.577701]  [<c1ce351e>] ? repair_env_string+0x12/0x54
[    1.577832]  [<c1054ded>] ? parse_args+0x25d/0x400
[    1.577962]  [<c1ce3baf>] ? kernel_init_freeable+0x101/0x19e
[    1.578092]  [<c1ce3bcf>] kernel_init_freeable+0x121/0x19e
[    1.578222]  [<c19b0700>] ? rest_init+0x60/0x60
[    1.578350]  [<c19b070b>] kernel_init+0xb/0x100
[    1.578480]  [<c1060c7c>] ? schedule_tail+0xc/0x50
[    1.578608]  [<c19b0700>] ? rest_init+0x60/0x60
[    1.578737]  [<c19b5db7>] ret_from_fork+0x1b/0x28
[    1.578871] ---[ end trace cf6f1adac9dfe60e ]---

The machine then continued to boot just normally, however, so I
started my ordinary tests. And in fact, they were working just fine,
i.e. no OOMing anymore, even during heavy tarball unpacking.

Would it make sense to capture more trace data for you at this point?
As I'm on the go, I don't currently have a second machine for
capturing over the network, but since we're not having OOMs or other
issues now, capturing to file should probably work just fine.

I'll keep the patch applied and see if I notice anything else that
doesn't look normal during day to day usage, especially during my
ordinary Gentoo updates, which consist of a lot of fetching /
unpacking / building, and in the recent past had been very problematic
(in fact, that was where the problem first struck me and the "heavy
tarball unpacking" test was then just what I distilled it down to
in order to manually reproduce this with the least time and effort
possible).

Greetings
Nils

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [lkp-developer] [mm, memcg]  d18e2b2aca: WARNING:at_mm/memcontrol.c:#mem_cgroup_update_lru_size
  2016-12-23 14:47                                 ` [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on) Michal Hocko
  2016-12-23 22:26                                   ` Nils Holland
@ 2016-12-25 22:25                                   ` kernel test robot
  2016-12-26 12:26                                     ` Michal Hocko
  1 sibling, 1 reply; 62+ messages in thread
From: kernel test robot @ 2016-12-25 22:25 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Nils Holland, Mel Gorman, Johannes Weiner, Vladimir Davydov,
	Tetsuo Handa, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs, lkp

[-- Attachment #1: Type: text/plain, Size: 2163 bytes --]


FYI, we noticed the following commit:

commit: d18e2b2aca0396849f588241e134787a829c707d ("mm, memcg: fix (Re: OOM: Better, but still there on)")
url: https://github.com/0day-ci/linux/commits/Michal-Hocko/mm-memcg-fix-Re-OOM-Better-but-still-there-on/20161223-225057
base: git://git.cmpxchg.org/linux-mmotm.git master

in testcase: boot

on test machine: qemu-system-i386 -enable-kvm -m 360M

caused below changes:


+--------------------------------------------------------+------------+------------+
|                                                        | c7d85b880b | d18e2b2aca |
+--------------------------------------------------------+------------+------------+
| boot_successes                                         | 8          | 0          |
| boot_failures                                          | 0          | 2          |
| WARNING:at_mm/memcontrol.c:#mem_cgroup_update_lru_size | 0          | 2          |
| kernel_BUG_at_mm/memcontrol.c                          | 0          | 2          |
| invalid_opcode:#[##]DEBUG_PAGEALLOC                    | 0          | 2          |
| Kernel_panic-not_syncing:Fatal_exception               | 0          | 2          |
+--------------------------------------------------------+------------+------------+



[   95.226364] init: tty6 main process (990) killed by TERM signal
[   95.314020] init: plymouth-upstart-bridge main process (1039) terminated with status 1
[   97.588568] ------------[ cut here ]------------
[   97.594364] WARNING: CPU: 0 PID: 1055 at mm/memcontrol.c:1032 mem_cgroup_update_lru_size+0xdd/0x12b
[   97.606654] mem_cgroup_update_lru_size(40297f00, 0, -1): lru_size 1 but empty
[   97.615140] Modules linked in:
[   97.618834] CPU: 0 PID: 1055 Comm: killall5 Not tainted 4.9.0-mm1-00095-gd18e2b2 #82
[   97.628008] Call Trace:
[   97.631025]  dump_stack+0x16/0x18
[   97.635107]  __warn+0xaf/0xc6
[   97.638729]  ? mem_cgroup_update_lru_size+0xdd/0x12b


To reproduce:

        git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
        cd lkp-tests
        bin/lkp qemu -k <bzImage> job-script  # job-script is attached in this email



Thanks,
Xiaolong

[-- Attachment #2: config-4.9.0-mm1-00095-gd18e2b2 --]
[-- Type: text/plain, Size: 85513 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 4.9.0-mm1 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_BITS_MAX=16
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEBUG_RODATA=y
CONFIG_PGTABLE_LEVELS=2
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
CONFIG_KERNEL_LZO=y
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SYSVIPC is not set
# CONFIG_POSIX_MQUEUE is not set
# CONFIG_CROSS_MEMORY_ATTACH is not set
CONFIG_FHANDLE=y
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_IRQ_DOMAIN_DEBUG=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set

#
# RCU Subsystem
#
CONFIG_TINY_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TASKS_RCU=y
# CONFIG_RCU_STALL_COMMON is not set
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_RCU_EXPEDITE_BOOT is not set
CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_NMI_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_CGROUPS=y
CONFIG_PAGE_COUNTER=y
CONFIG_MEMCG=y
# CONFIG_CGROUP_SCHED is not set
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CPUSETS is not set
CONFIG_CGROUP_DEVICE=y
# CONFIG_CGROUP_CPUACCT is not set
# CONFIG_CGROUP_PERF is not set
# CONFIG_CGROUP_DEBUG is not set
CONFIG_CHECKPOINT_RESTORE=y
# CONFIG_NAMESPACES is not set
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
# CONFIG_RD_LZMA is not set
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
# CONFIG_RD_LZ4 is not set
CONFIG_INITRAMFS_COMPRESSION=".gz"
# CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE is not set
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_BPF=y
CONFIG_EXPERT=y
CONFIG_UID16=y
CONFIG_MULTIUSER=y
# CONFIG_SGETMASK_SYSCALL is not set
# CONFIG_SYSFS_SYSCALL is not set
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_POSIX_TIMERS=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_ABSOLUTE_PERCPU is not set
CONFIG_KALLSYMS_BASE_RELATIVE=y
CONFIG_PRINTK=y
CONFIG_PRINTK_NMI=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
# CONFIG_PCSPKR_PLATFORM is not set
# CONFIG_BASE_FULL is not set
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
# CONFIG_BPF_SYSCALL is not set
# CONFIG_SHMEM is not set
CONFIG_AIO=y
# CONFIG_ADVISE_SYSCALLS is not set
# CONFIG_USERFAULTFD is not set
CONFIG_PCI_QUIRKS=y
# CONFIG_MEMBARRIER is not set
CONFIG_EMBEDDED=y
CONFIG_HAVE_PERF_EVENTS=y
CONFIG_PERF_USE_VMALLOC=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
CONFIG_DEBUG_PERF_USE_VMALLOC=y
# CONFIG_VM_EVENT_COUNTERS is not set
# CONFIG_COMPAT_BRK is not set
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_SLAB_FREELIST_RANDOM=y
# CONFIG_SYSTEM_DATA_VERIFICATION is not set
# CONFIG_PROFILING is not set
CONFIG_TRACEPOINTS=y
CONFIG_KEXEC_CORE=y
CONFIG_HAVE_OPROFILE=y
CONFIG_OPROFILE_NMI_TIMER=y
CONFIG_KPROBES=y
# CONFIG_JUMP_LABEL is not set
CONFIG_OPTPROBES=y
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_UPROBES=y
# CONFIG_HAVE_64BIT_ALIGNED_ACCESS is not set
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_KRETPROBES=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_HAVE_NMI=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_CONTIGUOUS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_DMA_API_DEBUG=y
CONFIG_HAVE_HW_BREAKPOINT=y
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
CONFIG_HAVE_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_ARCH_WANT_IPC_PARSE_VERSION=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP_FILTER=y
CONFIG_HAVE_GCC_PLUGINS=y
# CONFIG_GCC_PLUGINS is not set
CONFIG_HAVE_CC_STACKPROTECTOR=y
# CONFIG_CC_STACKPROTECTOR is not set
CONFIG_CC_STACKPROTECTOR_NONE=y
# CONFIG_CC_STACKPROTECTOR_REGULAR is not set
# CONFIG_CC_STACKPROTECTOR_STRONG is not set
CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES=y
CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_MODULES_USE_ELF_REL=y
CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
CONFIG_HAVE_ARCH_MMAP_RND_BITS=y
CONFIG_HAVE_EXIT_THREAD=y
CONFIG_ARCH_MMAP_RND_BITS=8
CONFIG_HAVE_COPY_THREAD_TLS=y
# CONFIG_HAVE_ARCH_HASH is not set
CONFIG_ISA_BUS_API=y
CONFIG_CLONE_BACKWARDS=y
CONFIG_OLD_SIGSUSPEND3=y
CONFIG_OLD_SIGACTION=y
# CONFIG_CPU_NO_EFFICIENT_FFS is not set
# CONFIG_HAVE_ARCH_VMAP_STACK is not set

#
# GCOV-based kernel profiling
#
# CONFIG_GCOV_KERNEL is not set
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
CONFIG_HAVE_GENERIC_DMA_COHERENT=y
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=1
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
# CONFIG_MODULE_SIG is not set
CONFIG_MODULE_COMPRESS=y
CONFIG_MODULE_COMPRESS_GZIP=y
# CONFIG_MODULE_COMPRESS_XZ is not set
# CONFIG_TRIM_UNUSED_KSYMS is not set
CONFIG_MODULES_TREE_LOOKUP=y
# CONFIG_BLOCK is not set
CONFIG_ASN1=m
CONFIG_UNINLINE_SPIN_UNLOCK=y
CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
CONFIG_ARCH_USE_QUEUED_SPINLOCKS=y
CONFIG_ARCH_USE_QUEUED_RWLOCKS=y
CONFIG_FREEZER=y

#
# Processor type and features
#
# CONFIG_ZONE_DMA is not set
# CONFIG_SMP is not set
CONFIG_X86_FEATURE_NAMES=y
CONFIG_X86_FAST_FEATURE_TESTS=y
# CONFIG_X86_MPPARSE is not set
# CONFIG_GOLDFISH is not set
CONFIG_X86_EXTENDED_PLATFORM=y
# CONFIG_X86_GOLDFISH is not set
# CONFIG_X86_INTEL_MID is not set
# CONFIG_X86_INTEL_QUARK is not set
# CONFIG_X86_INTEL_LPSS is not set
# CONFIG_X86_AMD_PLATFORM_DEVICE is not set
CONFIG_IOSF_MBI=y
CONFIG_IOSF_MBI_DEBUG=y
# CONFIG_X86_RDC321X is not set
CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y
CONFIG_X86_32_IRIS=y
# CONFIG_SCHED_OMIT_FRAME_POINTER is not set
CONFIG_HYPERVISOR_GUEST=y
CONFIG_PARAVIRT=y
CONFIG_PARAVIRT_DEBUG=y
CONFIG_KVM_GUEST=y
# CONFIG_KVM_DEBUG_FS is not set
# CONFIG_LGUEST_GUEST is not set
CONFIG_PARAVIRT_TIME_ACCOUNTING=y
CONFIG_PARAVIRT_CLOCK=y
CONFIG_NO_BOOTMEM=y
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
CONFIG_MK6=y
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MELAN is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_MCORE2 is not set
# CONFIG_MATOM is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_INTERNODE_CACHE_SHIFT=5
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_X86_ALIGNMENT_16=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_X86_MINIMUM_CPU_FAMILY=4
CONFIG_PROCESSOR_SELECT=y
# CONFIG_CPU_SUP_INTEL is not set
# CONFIG_CPU_SUP_CYRIX_32 is not set
# CONFIG_CPU_SUP_AMD is not set
# CONFIG_CPU_SUP_CENTAUR is not set
CONFIG_CPU_SUP_TRANSMETA_32=y
# CONFIG_CPU_SUP_UMC_32 is not set
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
# CONFIG_DMI is not set
CONFIG_NR_CPUS=1
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y
CONFIG_UP_LATE_INIT=y
CONFIG_X86_UP_APIC=y
# CONFIG_X86_UP_IOAPIC is not set
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_ANCIENT_MCE=y
CONFIG_X86_MCE_THRESHOLD=y
# CONFIG_X86_MCE_INJECT is not set
CONFIG_X86_THERMAL_VECTOR=y

#
# Performance monitoring
#
# CONFIG_X86_LEGACY_VM86 is not set
# CONFIG_VM86 is not set
CONFIG_TOSHIBA=y
CONFIG_I8K=m
# CONFIG_X86_REBOOTFIXUPS is not set
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
CONFIG_NOHIGHMEM=y
# CONFIG_HIGHMEM4G is not set
# CONFIG_HIGHMEM64G is not set
# CONFIG_VMSPLIT_3G is not set
# CONFIG_VMSPLIT_3G_OPT is not set
# CONFIG_VMSPLIT_2G is not set
# CONFIG_VMSPLIT_2G_OPT is not set
CONFIG_VMSPLIT_1G=y
CONFIG_PAGE_OFFSET=0x40000000
# CONFIG_X86_PAE is not set
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ILLEGAL_POINTER_VALUE=0
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_SPARSEMEM_STATIC=y
CONFIG_HAVE_MEMBLOCK=y
CONFIG_HAVE_MEMBLOCK_NODE_MAP=y
CONFIG_ARCH_DISCARD_MEMBLOCK=y
CONFIG_MEMORY_ISOLATION=y
# CONFIG_HAVE_BOOTMEM_INFO_NODE is not set
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_COMPACTION=y
CONFIG_MIGRATION=y
# CONFIG_PHYS_ADDR_T_64BIT is not set
CONFIG_VIRT_TO_BUS=y
# CONFIG_KSM is not set
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y
# CONFIG_MEMORY_FAILURE is not set
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
CONFIG_TRANSPARENT_HUGE_PAGECACHE=y
CONFIG_NEED_PER_CPU_KM=y
# CONFIG_CLEANCACHE is not set
CONFIG_CMA=y
CONFIG_CMA_DEBUG=y
CONFIG_CMA_DEBUGFS=y
CONFIG_CMA_AREAS=7
# CONFIG_ZPOOL is not set
CONFIG_ZBUD=m
# CONFIG_ZSMALLOC is not set
CONFIG_GENERIC_EARLY_IOREMAP=y
CONFIG_ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT=y
CONFIG_IDLE_PAGE_TRACKING=y
CONFIG_X86_CHECK_BIOS_CORRUPTION=y
CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK=y
CONFIG_X86_RESERVE_LOW=64
# CONFIG_MTRR is not set
CONFIG_ARCH_RANDOM=y
# CONFIG_X86_SMAP is not set
# CONFIG_EFI is not set
CONFIG_SECCOMP=y
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
CONFIG_HZ_300=y
# CONFIG_HZ_1000 is not set
CONFIG_HZ=300
CONFIG_SCHED_HRTICK=y
CONFIG_KEXEC=y
CONFIG_PHYSICAL_START=0x1000000
CONFIG_RELOCATABLE=y
# CONFIG_RANDOMIZE_BASE is not set
CONFIG_X86_NEED_RELOCS=y
CONFIG_PHYSICAL_ALIGN=0x200000
CONFIG_COMPAT_VDSO=y
# CONFIG_CMDLINE_BOOL is not set
# CONFIG_MODIFY_LDT_SYSCALL is not set

#
# Power management and ACPI options
#
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
# CONFIG_SUSPEND_SKIP_SYNC is not set
CONFIG_PM_SLEEP=y
# CONFIG_PM_AUTOSLEEP is not set
CONFIG_PM_WAKELOCKS=y
CONFIG_PM_WAKELOCKS_LIMIT=100
# CONFIG_PM_WAKELOCKS_GC is not set
CONFIG_PM=y
CONFIG_PM_DEBUG=y
# CONFIG_PM_ADVANCED_DEBUG is not set
CONFIG_PM_TEST_SUSPEND=y
CONFIG_PM_SLEEP_DEBUG=y
# CONFIG_PM_TRACE_RTC is not set
CONFIG_WQ_POWER_EFFICIENT_DEFAULT=y
CONFIG_ACPI=y
CONFIG_ACPI_LEGACY_TABLES_LOOKUP=y
CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC=y
CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT=y
# CONFIG_ACPI_DEBUGGER is not set
CONFIG_ACPI_SLEEP=y
# CONFIG_ACPI_PROCFS_POWER is not set
CONFIG_ACPI_REV_OVERRIDE_POSSIBLE=y
# CONFIG_ACPI_EC_DEBUGFS is not set
CONFIG_ACPI_AC=y
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
# CONFIG_ACPI_VIDEO is not set
CONFIG_ACPI_FAN=y
# CONFIG_ACPI_DOCK is not set
CONFIG_ACPI_CPU_FREQ_PSS=y
CONFIG_ACPI_PROCESSOR_CSTATE=y
CONFIG_ACPI_PROCESSOR_IDLE=y
CONFIG_ACPI_PROCESSOR=y
# CONFIG_ACPI_IPMI is not set
# CONFIG_ACPI_PROCESSOR_AGGREGATOR is not set
CONFIG_ACPI_THERMAL=y
# CONFIG_ACPI_CUSTOM_DSDT is not set
CONFIG_ARCH_HAS_ACPI_TABLE_UPGRADE=y
CONFIG_ACPI_TABLE_UPGRADE=y
# CONFIG_ACPI_DEBUG is not set
# CONFIG_ACPI_PCI_SLOT is not set
CONFIG_X86_PM_TIMER=y
# CONFIG_ACPI_CONTAINER is not set
CONFIG_ACPI_HOTPLUG_IOAPIC=y
# CONFIG_ACPI_SBS is not set
# CONFIG_ACPI_HED is not set
# CONFIG_ACPI_CUSTOM_METHOD is not set
# CONFIG_ACPI_REDUCED_HARDWARE_ONLY is not set
CONFIG_HAVE_ACPI_APEI=y
CONFIG_HAVE_ACPI_APEI_NMI=y
# CONFIG_ACPI_APEI is not set
# CONFIG_DPTF_POWER is not set
# CONFIG_ACPI_EXTLOG is not set
# CONFIG_PMIC_OPREGION is not set
# CONFIG_ACPI_CONFIGFS is not set
CONFIG_SFI=y
# CONFIG_APM is not set

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set

#
# CPU Idle
#
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y
# CONFIG_CPU_IDLE_GOV_MENU is not set
# CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is not set

#
# Bus options (PCI etc.)
#
CONFIG_PCI=y
# CONFIG_PCI_GOBIOS is not set
# CONFIG_PCI_GOMMCONFIG is not set
# CONFIG_PCI_GODIRECT is not set
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_DOMAINS=y
# CONFIG_PCI_CNB20LE_QUIRK is not set
# CONFIG_PCIEPORTBUS is not set
# CONFIG_PCI_MSI is not set
# CONFIG_PCI_DEBUG is not set
# CONFIG_PCI_REALLOC_ENABLE_AUTO is not set
# CONFIG_PCI_STUB is not set
CONFIG_HT_IRQ=y
# CONFIG_PCI_IOV is not set
# CONFIG_PCI_PRI is not set
# CONFIG_PCI_PASID is not set
CONFIG_PCI_LABEL=y
# CONFIG_HOTPLUG_PCI is not set

#
# PCI host controller drivers
#
CONFIG_ISA_BUS=y
CONFIG_ISA_DMA_API=y
CONFIG_ISA=y
# CONFIG_EISA is not set
# CONFIG_SCx200 is not set
# CONFIG_OLPC is not set
# CONFIG_ALIX is not set
# CONFIG_NET5501 is not set
CONFIG_PCCARD=m
CONFIG_PCMCIA=m
# CONFIG_PCMCIA_LOAD_CIS is not set
CONFIG_CARDBUS=y

#
# PC-card bridges
#
# CONFIG_YENTA is not set
# CONFIG_PD6729 is not set
# CONFIG_I82092 is not set
# CONFIG_I82365 is not set
CONFIG_TCIC=m
CONFIG_PCMCIA_PROBE=y
CONFIG_PCCARD_NONSTATIC=y
# CONFIG_RAPIDIO is not set
CONFIG_X86_SYSFB=y

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
CONFIG_ELFCORE=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_HAVE_AOUT=y
CONFIG_BINFMT_AOUT=m
CONFIG_BINFMT_MISC=m
CONFIG_COREDUMP=y
CONFIG_COMPAT_32=y
CONFIG_HAVE_ATOMIC_IOMAP=y
CONFIG_PMC_ATOM=y
CONFIG_NET=y

#
# Networking options
#
# CONFIG_PACKET is not set
CONFIG_UNIX=y
# CONFIG_UNIX_DIAG is not set
# CONFIG_NET_KEY is not set
# CONFIG_INET is not set
# CONFIG_NETWORK_SECMARK is not set
# CONFIG_NET_PTP_CLASSIFY is not set
# CONFIG_NETWORK_PHY_TIMESTAMPING is not set
# CONFIG_NETFILTER is not set
# CONFIG_ATM is not set
# CONFIG_BRIDGE is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_PHONET is not set
# CONFIG_IEEE802154 is not set
# CONFIG_NET_SCHED is not set
# CONFIG_DCB is not set
# CONFIG_DNS_RESOLVER is not set
# CONFIG_BATMAN_ADV is not set
# CONFIG_VSOCKETS is not set
# CONFIG_NETLINK_DIAG is not set
# CONFIG_MPLS is not set
# CONFIG_HSR is not set
# CONFIG_SOCK_CGROUP_DATA is not set
# CONFIG_CGROUP_NET_PRIO is not set
# CONFIG_CGROUP_NET_CLASSID is not set
CONFIG_NET_RX_BUSY_POLL=y
CONFIG_BQL=y

#
# Network testing
#
# CONFIG_HAMRADIO is not set
# CONFIG_CAN is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
# CONFIG_STREAM_PARSER is not set
CONFIG_WIRELESS=y
# CONFIG_CFG80211 is not set
# CONFIG_LIB80211 is not set

#
# CFG80211 needs to be enabled for MAC80211
#
CONFIG_MAC80211_STA_HASH_MAX_SIZE=0
# CONFIG_WIMAX is not set
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set
# CONFIG_CAIF is not set
# CONFIG_NFC is not set
# CONFIG_LWTUNNEL is not set
# CONFIG_DST_CACHE is not set
# CONFIG_NET_DEVLINK is not set
CONFIG_MAY_USE_DEVLINK=y

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER=y
CONFIG_UEVENT_HELPER_PATH=""
CONFIG_DEVTMPFS=y
# CONFIG_DEVTMPFS_MOUNT is not set
CONFIG_STANDALONE=y
# CONFIG_PREVENT_FIRMWARE_BUILD is not set
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=""
CONFIG_FW_LOADER_USER_HELPER=y
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
# CONFIG_ALLOW_DEV_COREDUMP is not set
# CONFIG_DEBUG_DRIVER is not set
CONFIG_DEBUG_DEVRES=y
# CONFIG_DEBUG_TEST_DRIVER_REMOVE is not set
CONFIG_TEST_ASYNC_DRIVER_PROBE=m
# CONFIG_SYS_HYPERVISOR is not set
# CONFIG_GENERIC_CPU_DEVICES is not set
CONFIG_GENERIC_CPU_AUTOPROBE=y
CONFIG_REGMAP=y
CONFIG_REGMAP_I2C=y
CONFIG_REGMAP_SPI=y
CONFIG_REGMAP_SPMI=y
CONFIG_REGMAP_MMIO=y
CONFIG_REGMAP_IRQ=y
CONFIG_DMA_SHARED_BUFFER=y
CONFIG_DMA_FENCE_TRACE=y
CONFIG_DMA_CMA=y

#
# Default contiguous memory area size:
#
CONFIG_CMA_SIZE_MBYTES=0
CONFIG_CMA_SIZE_PERCENTAGE=0
# CONFIG_CMA_SIZE_SEL_MBYTES is not set
# CONFIG_CMA_SIZE_SEL_PERCENTAGE is not set
CONFIG_CMA_SIZE_SEL_MIN=y
# CONFIG_CMA_SIZE_SEL_MAX is not set
CONFIG_CMA_ALIGNMENT=8

#
# Bus devices
#
# CONFIG_CONNECTOR is not set
CONFIG_MTD=y
CONFIG_MTD_TESTS=m
# CONFIG_MTD_REDBOOT_PARTS is not set
CONFIG_MTD_CMDLINE_PARTS=m
# CONFIG_MTD_OF_PARTS is not set
CONFIG_MTD_AR7_PARTS=y

#
# User Modules And Translation Layers
#
CONFIG_MTD_OOPS=m
# CONFIG_MTD_PARTITIONED_MASTER is not set

#
# RAM/ROM/Flash chip drivers
#
CONFIG_MTD_CFI=y
CONFIG_MTD_JEDECPROBE=y
CONFIG_MTD_GEN_PROBE=y
# CONFIG_MTD_CFI_ADV_OPTIONS is not set
CONFIG_MTD_MAP_BANK_WIDTH_1=y
CONFIG_MTD_MAP_BANK_WIDTH_2=y
CONFIG_MTD_MAP_BANK_WIDTH_4=y
# CONFIG_MTD_MAP_BANK_WIDTH_8 is not set
# CONFIG_MTD_MAP_BANK_WIDTH_16 is not set
# CONFIG_MTD_MAP_BANK_WIDTH_32 is not set
CONFIG_MTD_CFI_I1=y
CONFIG_MTD_CFI_I2=y
# CONFIG_MTD_CFI_I4 is not set
# CONFIG_MTD_CFI_I8 is not set
CONFIG_MTD_CFI_INTELEXT=m
CONFIG_MTD_CFI_AMDSTD=y
CONFIG_MTD_CFI_STAA=m
CONFIG_MTD_CFI_UTIL=y
CONFIG_MTD_RAM=y
CONFIG_MTD_ROM=m
CONFIG_MTD_ABSENT=y

#
# Mapping drivers for chip access
#
# CONFIG_MTD_COMPLEX_MAPPINGS is not set
CONFIG_MTD_PHYSMAP=y
CONFIG_MTD_PHYSMAP_COMPAT=y
CONFIG_MTD_PHYSMAP_START=0x8000000
CONFIG_MTD_PHYSMAP_LEN=0
CONFIG_MTD_PHYSMAP_BANKWIDTH=2
CONFIG_MTD_PHYSMAP_OF=y
# CONFIG_MTD_PHYSMAP_OF_VERSATILE is not set
# CONFIG_MTD_AMD76XROM is not set
CONFIG_MTD_ICHXROM=y
# CONFIG_MTD_ESB2ROM is not set
# CONFIG_MTD_CK804XROM is not set
# CONFIG_MTD_SCB2_FLASH is not set
CONFIG_MTD_NETtel=m
CONFIG_MTD_L440GX=m
# CONFIG_MTD_INTEL_VR_NOR is not set
CONFIG_MTD_PLATRAM=m

#
# Self-contained MTD device drivers
#
# CONFIG_MTD_PMC551 is not set
# CONFIG_MTD_DATAFLASH is not set
# CONFIG_MTD_SST25L is not set
CONFIG_MTD_SLRAM=y
CONFIG_MTD_PHRAM=m
CONFIG_MTD_MTDRAM=y
CONFIG_MTDRAM_TOTAL_SIZE=4096
CONFIG_MTDRAM_ERASE_SIZE=128

#
# Disk-On-Chip Device Drivers
#
CONFIG_MTD_DOCG3=m
CONFIG_BCH_CONST_M=14
CONFIG_BCH_CONST_T=4
# CONFIG_MTD_NAND is not set
# CONFIG_MTD_ONENAND is not set

#
# LPDDR & LPDDR2 PCM memory drivers
#
CONFIG_MTD_LPDDR=m
CONFIG_MTD_QINFO_PROBE=m
# CONFIG_MTD_SPI_NOR is not set
# CONFIG_MTD_UBI is not set
CONFIG_OF=y
# CONFIG_OF_UNITTEST is not set
CONFIG_OF_ADDRESS=y
CONFIG_OF_ADDRESS_PCI=y
CONFIG_OF_IRQ=y
CONFIG_OF_PCI=y
CONFIG_OF_PCI_IRQ=y
# CONFIG_OF_OVERLAY is not set
CONFIG_ARCH_MIGHT_HAVE_PC_PARPORT=y
CONFIG_PARPORT=m
CONFIG_PARPORT_PC=m
# CONFIG_PARPORT_SERIAL is not set
CONFIG_PARPORT_PC_FIFO=y
CONFIG_PARPORT_PC_SUPERIO=y
CONFIG_PARPORT_PC_PCMCIA=m
# CONFIG_PARPORT_GSC is not set
# CONFIG_PARPORT_AX88796 is not set
CONFIG_PARPORT_1284=y
CONFIG_PNP=y
# CONFIG_PNP_DEBUG_MESSAGES is not set

#
# Protocols
#
CONFIG_ISAPNP=y
# CONFIG_PNPBIOS is not set
CONFIG_PNPACPI=y

#
# Misc devices
#
# CONFIG_SENSORS_LIS3LV02D is not set
# CONFIG_AD525X_DPOT is not set
CONFIG_DUMMY_IRQ=y
# CONFIG_IBM_ASM is not set
# CONFIG_PHANTOM is not set
# CONFIG_SGI_IOC4 is not set
# CONFIG_TIFM_CORE is not set
CONFIG_ICS932S401=m
# CONFIG_ENCLOSURE_SERVICES is not set
# CONFIG_HP_ILO is not set
CONFIG_APDS9802ALS=y
CONFIG_ISL29003=m
# CONFIG_ISL29020 is not set
CONFIG_SENSORS_TSL2550=m
CONFIG_SENSORS_BH1770=m
CONFIG_SENSORS_APDS990X=y
CONFIG_HMC6352=m
CONFIG_DS1682=y
# CONFIG_TI_DAC7512 is not set
# CONFIG_PCH_PHUB is not set
# CONFIG_USB_SWITCH_FSA9480 is not set
# CONFIG_LATTICE_ECP3_CONFIG is not set
# CONFIG_SRAM is not set
CONFIG_PANEL=m
CONFIG_PANEL_PARPORT=0
CONFIG_PANEL_PROFILE=5
CONFIG_PANEL_CHANGE_MESSAGE=y
CONFIG_PANEL_BOOT_MESSAGE=""
# CONFIG_C2PORT is not set

#
# EEPROM support
#
CONFIG_EEPROM_AT24=m
CONFIG_EEPROM_AT25=y
# CONFIG_EEPROM_LEGACY is not set
CONFIG_EEPROM_MAX6875=m
# CONFIG_EEPROM_93CX6 is not set
# CONFIG_EEPROM_93XX46 is not set
# CONFIG_CB710_CORE is not set

#
# Texas Instruments shared transport line discipline
#
# CONFIG_TI_ST is not set
# CONFIG_SENSORS_LIS3_I2C is not set

#
# Altera FPGA firmware download module
#
CONFIG_ALTERA_STAPL=m
# CONFIG_INTEL_MEI is not set
# CONFIG_INTEL_MEI_ME is not set
# CONFIG_INTEL_MEI_TXE is not set
# CONFIG_VMWARE_VMCI is not set

#
# Intel MIC Bus Driver
#

#
# SCIF Bus Driver
#

#
# VOP Bus Driver
#

#
# Intel MIC Host Driver
#

#
# Intel MIC Card Driver
#

#
# SCIF Driver
#

#
# Intel MIC Coprocessor State Management (COSM) Drivers
#

#
# VOP Driver
#
CONFIG_ECHO=y
# CONFIG_CXL_BASE is not set
# CONFIG_CXL_AFU_DRIVER_OPS is not set
CONFIG_HAVE_IDE=y

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
# CONFIG_SCSI_DMA is not set
# CONFIG_SCSI_NETLINK is not set
# CONFIG_FUSION is not set

#
# IEEE 1394 (FireWire) support
#
CONFIG_FIREWIRE=m
# CONFIG_FIREWIRE_OHCI is not set
# CONFIG_FIREWIRE_NOSY is not set
# CONFIG_MACINTOSH_DRIVERS is not set
# CONFIG_NETDEVICES is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_LEDS=y
CONFIG_INPUT_FF_MEMLESS=y
CONFIG_INPUT_POLLDEV=y
CONFIG_INPUT_SPARSEKMAP=m
CONFIG_INPUT_MATRIXKMAP=y

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_INPUT_JOYDEV=m
# CONFIG_INPUT_EVDEV is not set
CONFIG_INPUT_EVBUG=y

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ADC=y
# CONFIG_KEYBOARD_ADP5588 is not set
CONFIG_KEYBOARD_ADP5589=m
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_QT1070 is not set
# CONFIG_KEYBOARD_QT2160 is not set
CONFIG_KEYBOARD_LKKBD=m
CONFIG_KEYBOARD_GPIO=y
CONFIG_KEYBOARD_GPIO_POLLED=m
CONFIG_KEYBOARD_TCA6416=y
CONFIG_KEYBOARD_TCA8418=m
# CONFIG_KEYBOARD_MATRIX is not set
CONFIG_KEYBOARD_LM8323=m
CONFIG_KEYBOARD_LM8333=y
# CONFIG_KEYBOARD_MAX7359 is not set
CONFIG_KEYBOARD_MCS=y
# CONFIG_KEYBOARD_MPR121 is not set
CONFIG_KEYBOARD_NEWTON=m
CONFIG_KEYBOARD_OPENCORES=y
# CONFIG_KEYBOARD_STOWAWAY is not set
CONFIG_KEYBOARD_SUNKBD=m
# CONFIG_KEYBOARD_OMAP4 is not set
CONFIG_KEYBOARD_TC3589X=m
CONFIG_KEYBOARD_TWL4030=y
CONFIG_KEYBOARD_XTKBD=y
CONFIG_KEYBOARD_CROS_EC=m
CONFIG_KEYBOARD_CAP11XX=y
# CONFIG_INPUT_MOUSE is not set
CONFIG_INPUT_JOYSTICK=y
# CONFIG_JOYSTICK_ANALOG is not set
# CONFIG_JOYSTICK_A3D is not set
CONFIG_JOYSTICK_ADI=m
# CONFIG_JOYSTICK_COBRA is not set
# CONFIG_JOYSTICK_GF2K is not set
CONFIG_JOYSTICK_GRIP=m
# CONFIG_JOYSTICK_GRIP_MP is not set
# CONFIG_JOYSTICK_GUILLEMOT is not set
# CONFIG_JOYSTICK_INTERACT is not set
CONFIG_JOYSTICK_SIDEWINDER=y
# CONFIG_JOYSTICK_TMDC is not set
CONFIG_JOYSTICK_IFORCE=m
# CONFIG_JOYSTICK_IFORCE_232 is not set
CONFIG_JOYSTICK_WARRIOR=m
CONFIG_JOYSTICK_MAGELLAN=m
CONFIG_JOYSTICK_SPACEORB=m
# CONFIG_JOYSTICK_SPACEBALL is not set
# CONFIG_JOYSTICK_STINGER is not set
CONFIG_JOYSTICK_TWIDJOY=y
# CONFIG_JOYSTICK_ZHENHUA is not set
CONFIG_JOYSTICK_DB9=m
CONFIG_JOYSTICK_GAMECON=m
# CONFIG_JOYSTICK_TURBOGRAFX is not set
CONFIG_JOYSTICK_AS5011=y
# CONFIG_JOYSTICK_JOYDUMP is not set
# CONFIG_JOYSTICK_XPAD is not set
# CONFIG_JOYSTICK_WALKERA0701 is not set
# CONFIG_INPUT_TABLET is not set
CONFIG_INPUT_TOUCHSCREEN=y
CONFIG_TOUCHSCREEN_PROPERTIES=y
CONFIG_TOUCHSCREEN_ADS7846=m
# CONFIG_TOUCHSCREEN_AD7877 is not set
CONFIG_TOUCHSCREEN_AD7879=m
CONFIG_TOUCHSCREEN_AD7879_I2C=m
# CONFIG_TOUCHSCREEN_AD7879_SPI is not set
CONFIG_TOUCHSCREEN_AR1021_I2C=m
# CONFIG_TOUCHSCREEN_ATMEL_MXT is not set
# CONFIG_TOUCHSCREEN_AUO_PIXCIR is not set
CONFIG_TOUCHSCREEN_BU21013=y
# CONFIG_TOUCHSCREEN_CHIPONE_ICN8318 is not set
CONFIG_TOUCHSCREEN_CY8CTMG110=m
CONFIG_TOUCHSCREEN_CYTTSP_CORE=m
CONFIG_TOUCHSCREEN_CYTTSP_I2C=m
CONFIG_TOUCHSCREEN_CYTTSP_SPI=m
CONFIG_TOUCHSCREEN_CYTTSP4_CORE=y
CONFIG_TOUCHSCREEN_CYTTSP4_I2C=y
# CONFIG_TOUCHSCREEN_CYTTSP4_SPI is not set
CONFIG_TOUCHSCREEN_DA9052=m
CONFIG_TOUCHSCREEN_DYNAPRO=m
CONFIG_TOUCHSCREEN_HAMPSHIRE=y
# CONFIG_TOUCHSCREEN_EETI is not set
CONFIG_TOUCHSCREEN_EGALAX=m
CONFIG_TOUCHSCREEN_EGALAX_SERIAL=m
CONFIG_TOUCHSCREEN_FUJITSU=m
# CONFIG_TOUCHSCREEN_GOODIX is not set
# CONFIG_TOUCHSCREEN_ILI210X is not set
CONFIG_TOUCHSCREEN_GUNZE=m
CONFIG_TOUCHSCREEN_EKTF2127=y
# CONFIG_TOUCHSCREEN_ELAN is not set
# CONFIG_TOUCHSCREEN_ELO is not set
CONFIG_TOUCHSCREEN_WACOM_W8001=y
CONFIG_TOUCHSCREEN_WACOM_I2C=m
# CONFIG_TOUCHSCREEN_MAX11801 is not set
# CONFIG_TOUCHSCREEN_MCS5000 is not set
# CONFIG_TOUCHSCREEN_MMS114 is not set
# CONFIG_TOUCHSCREEN_MELFAS_MIP4 is not set
# CONFIG_TOUCHSCREEN_MTOUCH is not set
CONFIG_TOUCHSCREEN_IMX6UL_TSC=m
CONFIG_TOUCHSCREEN_INEXIO=y
CONFIG_TOUCHSCREEN_MK712=m
# CONFIG_TOUCHSCREEN_HTCPEN is not set
CONFIG_TOUCHSCREEN_PENMOUNT=m
# CONFIG_TOUCHSCREEN_EDT_FT5X06 is not set
CONFIG_TOUCHSCREEN_TOUCHRIGHT=y
CONFIG_TOUCHSCREEN_TOUCHWIN=y
# CONFIG_TOUCHSCREEN_PIXCIR is not set
CONFIG_TOUCHSCREEN_WDT87XX_I2C=y
# CONFIG_TOUCHSCREEN_WM831X is not set
# CONFIG_TOUCHSCREEN_USB_COMPOSITE is not set
# CONFIG_TOUCHSCREEN_MC13783 is not set
CONFIG_TOUCHSCREEN_TOUCHIT213=m
CONFIG_TOUCHSCREEN_TSC_SERIO=y
CONFIG_TOUCHSCREEN_TSC200X_CORE=y
CONFIG_TOUCHSCREEN_TSC2004=y
CONFIG_TOUCHSCREEN_TSC2005=y
CONFIG_TOUCHSCREEN_TSC2007=y
CONFIG_TOUCHSCREEN_RM_TS=y
CONFIG_TOUCHSCREEN_SILEAD=m
# CONFIG_TOUCHSCREEN_SIS_I2C is not set
CONFIG_TOUCHSCREEN_ST1232=y
CONFIG_TOUCHSCREEN_SURFACE3_SPI=y
CONFIG_TOUCHSCREEN_SX8654=m
# CONFIG_TOUCHSCREEN_TPS6507X is not set
# CONFIG_TOUCHSCREEN_ZFORCE is not set
CONFIG_TOUCHSCREEN_COLIBRI_VF50=m
# CONFIG_TOUCHSCREEN_ROHM_BU21023 is not set
CONFIG_INPUT_MISC=y
CONFIG_INPUT_88PM80X_ONKEY=y
CONFIG_INPUT_AD714X=y
CONFIG_INPUT_AD714X_I2C=m
CONFIG_INPUT_AD714X_SPI=y
CONFIG_INPUT_ATMEL_CAPTOUCH=y
# CONFIG_INPUT_BMA150 is not set
CONFIG_INPUT_E3X0_BUTTON=y
# CONFIG_INPUT_MAX77693_HAPTIC is not set
# CONFIG_INPUT_MAX8997_HAPTIC is not set
CONFIG_INPUT_MC13783_PWRBUTTON=y
# CONFIG_INPUT_MMA8450 is not set
CONFIG_INPUT_MPU3050=m
# CONFIG_INPUT_APANEL is not set
CONFIG_INPUT_GP2A=m
CONFIG_INPUT_GPIO_BEEPER=y
CONFIG_INPUT_GPIO_TILT_POLLED=m
# CONFIG_INPUT_GPIO_DECODER is not set
# CONFIG_INPUT_WISTRON_BTNS is not set
# CONFIG_INPUT_ATLAS_BTNS is not set
# CONFIG_INPUT_ATI_REMOTE2 is not set
# CONFIG_INPUT_KEYSPAN_REMOTE is not set
CONFIG_INPUT_KXTJ9=y
# CONFIG_INPUT_KXTJ9_POLLED_MODE is not set
# CONFIG_INPUT_POWERMATE is not set
# CONFIG_INPUT_YEALINK is not set
# CONFIG_INPUT_CM109 is not set
CONFIG_INPUT_TPS65218_PWRBUTTON=m
CONFIG_INPUT_TWL4030_PWRBUTTON=m
CONFIG_INPUT_TWL4030_VIBRA=y
CONFIG_INPUT_UINPUT=m
CONFIG_INPUT_PCF50633_PMU=y
# CONFIG_INPUT_PCF8574 is not set
CONFIG_INPUT_PWM_BEEPER=y
CONFIG_INPUT_GPIO_ROTARY_ENCODER=m
# CONFIG_INPUT_DA9052_ONKEY is not set
CONFIG_INPUT_DA9055_ONKEY=m
CONFIG_INPUT_DA9063_ONKEY=m
# CONFIG_INPUT_WM831X_ON is not set
# CONFIG_INPUT_ADXL34X is not set
CONFIG_INPUT_CMA3000=m
CONFIG_INPUT_CMA3000_I2C=m
# CONFIG_INPUT_IDEAPAD_SLIDEBAR is not set
CONFIG_INPUT_SOC_BUTTON_ARRAY=m
# CONFIG_INPUT_DRV260X_HAPTICS is not set
# CONFIG_INPUT_DRV2665_HAPTICS is not set
CONFIG_INPUT_DRV2667_HAPTICS=y
CONFIG_RMI4_CORE=y
CONFIG_RMI4_I2C=y
# CONFIG_RMI4_SPI is not set
# CONFIG_RMI4_SMB is not set
# CONFIG_RMI4_F03 is not set
CONFIG_RMI4_2D_SENSOR=y
CONFIG_RMI4_F11=y
CONFIG_RMI4_F12=y
CONFIG_RMI4_F30=y
# CONFIG_RMI4_F34 is not set
# CONFIG_RMI4_F55 is not set

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_ARCH_MIGHT_HAVE_PC_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
# CONFIG_SERIO_CT82C710 is not set
CONFIG_SERIO_PARKBD=m
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
CONFIG_SERIO_RAW=y
# CONFIG_SERIO_ALTERA_PS2 is not set
CONFIG_SERIO_PS2MULT=y
CONFIG_SERIO_ARC_PS2=m
CONFIG_SERIO_APBPS2=y
CONFIG_USERIO=m
CONFIG_GAMEPORT=y
# CONFIG_GAMEPORT_NS558 is not set
CONFIG_GAMEPORT_L4=y
# CONFIG_GAMEPORT_EMU10K1 is not set
# CONFIG_GAMEPORT_FM801 is not set

#
# Character devices
#
CONFIG_TTY=y
# CONFIG_VT is not set
CONFIG_UNIX98_PTYS=y
# CONFIG_LEGACY_PTYS is not set
# CONFIG_SERIAL_NONSTANDARD is not set
# CONFIG_NOZOMI is not set
# CONFIG_N_GSM is not set
# CONFIG_TRACE_ROUTER is not set
CONFIG_TRACE_SINK=m
CONFIG_DEVMEM=y
CONFIG_DEVKMEM=y

#
# Serial drivers
#
CONFIG_SERIAL_EARLYCON=y
CONFIG_SERIAL_8250=y
# CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set
# CONFIG_SERIAL_8250_PNP is not set
CONFIG_SERIAL_8250_FINTEK=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_DMA=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_CS=m
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
# CONFIG_SERIAL_8250_EXTENDED is not set
# CONFIG_SERIAL_8250_FSL is not set
CONFIG_SERIAL_8250_DW=m
CONFIG_SERIAL_8250_RT288X=y
CONFIG_SERIAL_8250_LPSS=y
CONFIG_SERIAL_8250_MID=y
# CONFIG_SERIAL_8250_MOXA is not set
CONFIG_SERIAL_OF_PLATFORM=m

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_MAX3100 is not set
# CONFIG_SERIAL_MAX310X is not set
CONFIG_SERIAL_UARTLITE=m
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
# CONFIG_SERIAL_JSM is not set
# CONFIG_SERIAL_SCCNXP is not set
CONFIG_SERIAL_SC16IS7XX=m
# CONFIG_SERIAL_SC16IS7XX_I2C is not set
# CONFIG_SERIAL_SC16IS7XX_SPI is not set
CONFIG_SERIAL_TIMBERDALE=y
CONFIG_SERIAL_ALTERA_JTAGUART=m
CONFIG_SERIAL_ALTERA_UART=m
CONFIG_SERIAL_ALTERA_UART_MAXPORTS=4
CONFIG_SERIAL_ALTERA_UART_BAUDRATE=115200
CONFIG_SERIAL_IFX6X60=y
# CONFIG_SERIAL_PCH_UART is not set
# CONFIG_SERIAL_XILINX_PS_UART is not set
CONFIG_SERIAL_ARC=m
CONFIG_SERIAL_ARC_NR_PORTS=1
# CONFIG_SERIAL_RP2 is not set
CONFIG_SERIAL_FSL_LPUART=m
CONFIG_SERIAL_CONEXANT_DIGICOLOR=m
CONFIG_TTY_PRINTK=m
CONFIG_PRINTER=m
CONFIG_LP_CONSOLE=y
# CONFIG_PPDEV is not set
CONFIG_IPMI_HANDLER=m
# CONFIG_IPMI_PANIC_EVENT is not set
CONFIG_IPMI_DEVICE_INTERFACE=m
CONFIG_IPMI_SI=m
CONFIG_IPMI_SSIF=m
CONFIG_IPMI_WATCHDOG=m
CONFIG_IPMI_POWEROFF=m
CONFIG_HW_RANDOM=m
CONFIG_HW_RANDOM_TIMERIOMEM=m
CONFIG_HW_RANDOM_INTEL=m
CONFIG_HW_RANDOM_AMD=m
CONFIG_HW_RANDOM_GEODE=m
CONFIG_HW_RANDOM_VIA=m
CONFIG_HW_RANDOM_TPM=m
CONFIG_NVRAM=y
CONFIG_DTLK=m
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
# CONFIG_SONYPI is not set

#
# PCMCIA character devices
#
# CONFIG_SYNCLINK_CS is not set
CONFIG_CARDMAN_4000=m
# CONFIG_CARDMAN_4040 is not set
CONFIG_SCR24X=m
CONFIG_MWAVE=y
CONFIG_PC8736x_GPIO=m
CONFIG_NSC_GPIO=y
# CONFIG_HPET is not set
CONFIG_HANGCHECK_TIMER=m
CONFIG_TCG_TPM=m
CONFIG_TCG_TIS_CORE=m
CONFIG_TCG_TIS=m
CONFIG_TCG_TIS_SPI=m
# CONFIG_TCG_TIS_I2C_ATMEL is not set
CONFIG_TCG_TIS_I2C_INFINEON=m
CONFIG_TCG_TIS_I2C_NUVOTON=m
CONFIG_TCG_NSC=m
# CONFIG_TCG_ATMEL is not set
CONFIG_TCG_INFINEON=m
# CONFIG_TCG_CRB is not set
# CONFIG_TCG_VTPM_PROXY is not set
CONFIG_TCG_TIS_ST33ZP24=m
CONFIG_TCG_TIS_ST33ZP24_I2C=m
CONFIG_TCG_TIS_ST33ZP24_SPI=m
CONFIG_TELCLOCK=y
CONFIG_DEVPORT=y
# CONFIG_XILLYBUS is not set

#
# I2C support
#
CONFIG_I2C=y
CONFIG_ACPI_I2C_OPREGION=y
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_COMPAT=y
CONFIG_I2C_CHARDEV=y
CONFIG_I2C_MUX=m

#
# Multiplexer I2C Chip support
#
CONFIG_I2C_ARB_GPIO_CHALLENGE=m
CONFIG_I2C_MUX_GPIO=m
CONFIG_I2C_MUX_PCA9541=m
# CONFIG_I2C_MUX_PCA954x is not set
CONFIG_I2C_MUX_REG=m
# CONFIG_I2C_MUX_MLXCPLD is not set
# CONFIG_I2C_HELPER_AUTO is not set
CONFIG_I2C_SMBUS=m

#
# I2C Algorithms
#
CONFIG_I2C_ALGOBIT=y
CONFIG_I2C_ALGOPCF=y
# CONFIG_I2C_ALGOPCA is not set

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
# CONFIG_I2C_AMD756 is not set
# CONFIG_I2C_AMD8111 is not set
# CONFIG_I2C_I801 is not set
# CONFIG_I2C_ISCH is not set
# CONFIG_I2C_ISMT is not set
# CONFIG_I2C_PIIX4 is not set
# CONFIG_I2C_NFORCE2 is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
# CONFIG_I2C_SIS96X is not set
# CONFIG_I2C_VIA is not set
# CONFIG_I2C_VIAPRO is not set

#
# ACPI drivers
#
# CONFIG_I2C_SCMI is not set

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
CONFIG_I2C_CBUS_GPIO=y
# CONFIG_I2C_DESIGNWARE_PCI is not set
# CONFIG_I2C_EG20T is not set
CONFIG_I2C_GPIO=y
CONFIG_I2C_KEMPLD=y
# CONFIG_I2C_OCORES is not set
# CONFIG_I2C_PCA_PLATFORM is not set
# CONFIG_I2C_PXA is not set
# CONFIG_I2C_PXA_PCI is not set
CONFIG_I2C_SIMTEC=y
CONFIG_I2C_XILINX=m

#
# External I2C/SMBus adapter drivers
#
CONFIG_I2C_PARPORT=m
CONFIG_I2C_PARPORT_LIGHT=m
# CONFIG_I2C_TAOS_EVM is not set

#
# Other I2C/SMBus bus drivers
#
CONFIG_I2C_ELEKTOR=m
# CONFIG_I2C_PCA_ISA is not set
CONFIG_I2C_CROS_EC_TUNNEL=m
# CONFIG_SCx200_ACB is not set
CONFIG_I2C_STUB=m
CONFIG_I2C_SLAVE=y
CONFIG_I2C_SLAVE_EEPROM=m
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
CONFIG_SPI=y
CONFIG_SPI_DEBUG=y
CONFIG_SPI_MASTER=y

#
# SPI Master Controller Drivers
#
# CONFIG_SPI_ALTERA is not set
CONFIG_SPI_AXI_SPI_ENGINE=m
CONFIG_SPI_BITBANG=m
CONFIG_SPI_BUTTERFLY=m
# CONFIG_SPI_CADENCE is not set
CONFIG_SPI_DESIGNWARE=m
# CONFIG_SPI_DW_PCI is not set
CONFIG_SPI_DW_MMIO=m
# CONFIG_SPI_GPIO is not set
CONFIG_SPI_LM70_LLP=m
# CONFIG_SPI_FSL_SPI is not set
CONFIG_SPI_OC_TINY=m
# CONFIG_SPI_PXA2XX is not set
# CONFIG_SPI_PXA2XX_PCI is not set
CONFIG_SPI_ROCKCHIP=m
CONFIG_SPI_SC18IS602=y
# CONFIG_SPI_TOPCLIFF_PCH is not set
CONFIG_SPI_XCOMM=y
# CONFIG_SPI_XILINX is not set
CONFIG_SPI_ZYNQMP_GQSPI=m

#
# SPI Protocol Masters
#
# CONFIG_SPI_SPIDEV is not set
# CONFIG_SPI_LOOPBACK_TEST is not set
CONFIG_SPI_TLE62X0=y
CONFIG_SPMI=y
CONFIG_HSI=y
CONFIG_HSI_BOARDINFO=y

#
# HSI controllers
#

#
# HSI clients
#
CONFIG_HSI_CHAR=y

#
# PPS support
#
# CONFIG_PPS is not set

#
# PPS generators support
#

#
# PTP clock support
#
# CONFIG_PTP_1588_CLOCK is not set

#
# Enable PHYLIB and NETWORK_PHY_TIMESTAMPING to see the additional clocks.
#
# CONFIG_PTP_1588_CLOCK_PCH is not set
CONFIG_GPIOLIB=y
CONFIG_OF_GPIO=y
CONFIG_GPIO_ACPI=y
CONFIG_GPIOLIB_IRQCHIP=y
CONFIG_DEBUG_GPIO=y
CONFIG_GPIO_SYSFS=y
CONFIG_GPIO_GENERIC=y
CONFIG_GPIO_MAX730X=m

#
# Memory mapped GPIO drivers
#
CONFIG_GPIO_74XX_MMIO=y
# CONFIG_GPIO_ALTERA is not set
# CONFIG_GPIO_AMDPT is not set
# CONFIG_GPIO_DWAPB is not set
CONFIG_GPIO_GENERIC_PLATFORM=y
# CONFIG_GPIO_GRGPIO is not set
# CONFIG_GPIO_ICH is not set
# CONFIG_GPIO_LYNXPOINT is not set
CONFIG_GPIO_MOCKUP=y
CONFIG_GPIO_SYSCON=y
# CONFIG_GPIO_VX855 is not set
CONFIG_GPIO_XILINX=y

#
# Port-mapped I/O GPIO drivers
#
CONFIG_GPIO_104_DIO_48E=y
CONFIG_GPIO_104_IDIO_16=y
CONFIG_GPIO_104_IDI_48=m
CONFIG_GPIO_F7188X=m
CONFIG_GPIO_GPIO_MM=y
CONFIG_GPIO_IT87=y
# CONFIG_GPIO_SCH is not set
# CONFIG_GPIO_SCH311X is not set
# CONFIG_GPIO_WS16C48 is not set

#
# I2C GPIO expanders
#
CONFIG_GPIO_ADP5588=y
# CONFIG_GPIO_ADP5588_IRQ is not set
CONFIG_GPIO_ADNP=m
# CONFIG_GPIO_MAX7300 is not set
CONFIG_GPIO_MAX732X=m
# CONFIG_GPIO_PCA953X is not set
CONFIG_GPIO_PCF857X=m
CONFIG_GPIO_TPIC2810=y

#
# MFD GPIO expanders
#
# CONFIG_GPIO_ARIZONA is not set
# CONFIG_GPIO_CRYSTAL_COVE is not set
CONFIG_GPIO_DA9052=y
CONFIG_GPIO_DA9055=y
# CONFIG_GPIO_KEMPLD is not set
# CONFIG_GPIO_LP873X is not set
CONFIG_GPIO_MAX77620=y
CONFIG_GPIO_RC5T583=y
# CONFIG_GPIO_TC3589X is not set
# CONFIG_GPIO_TPS65218 is not set
CONFIG_GPIO_TPS6586X=y
# CONFIG_GPIO_TPS65912 is not set
# CONFIG_GPIO_TWL4030 is not set
CONFIG_GPIO_WHISKEY_COVE=y
CONFIG_GPIO_WM831X=y
CONFIG_GPIO_WM8994=y

#
# PCI GPIO expanders
#
# CONFIG_GPIO_AMD8111 is not set
# CONFIG_GPIO_BT8XX is not set
# CONFIG_GPIO_ML_IOH is not set
# CONFIG_GPIO_PCH is not set
# CONFIG_GPIO_RDC321X is not set
# CONFIG_GPIO_SODAVILLE is not set

#
# SPI GPIO expanders
#
CONFIG_GPIO_74X164=y
CONFIG_GPIO_MAX7301=m
# CONFIG_GPIO_MC33880 is not set
# CONFIG_GPIO_PISOSR is not set

#
# SPI or I2C GPIO expanders
#
# CONFIG_GPIO_MCP23S08 is not set
CONFIG_W1=y

#
# 1-wire Bus Masters
#
# CONFIG_W1_MASTER_MATROX is not set
CONFIG_W1_MASTER_DS2482=m
# CONFIG_W1_MASTER_DS1WM is not set
CONFIG_W1_MASTER_GPIO=m

#
# 1-wire Slaves
#
CONFIG_W1_SLAVE_THERM=m
CONFIG_W1_SLAVE_SMEM=m
CONFIG_W1_SLAVE_DS2408=m
# CONFIG_W1_SLAVE_DS2408_READBACK is not set
CONFIG_W1_SLAVE_DS2413=m
CONFIG_W1_SLAVE_DS2406=m
CONFIG_W1_SLAVE_DS2423=y
CONFIG_W1_SLAVE_DS2431=m
CONFIG_W1_SLAVE_DS2433=y
# CONFIG_W1_SLAVE_DS2433_CRC is not set
CONFIG_W1_SLAVE_DS2760=y
CONFIG_W1_SLAVE_DS2780=y
CONFIG_W1_SLAVE_DS2781=m
CONFIG_W1_SLAVE_DS28E04=m
CONFIG_W1_SLAVE_BQ27000=m
CONFIG_POWER_AVS=y
# CONFIG_POWER_RESET is not set
CONFIG_POWER_SUPPLY=y
CONFIG_POWER_SUPPLY_DEBUG=y
CONFIG_PDA_POWER=y
CONFIG_GENERIC_ADC_BATTERY=m
CONFIG_WM831X_BACKUP=y
CONFIG_WM831X_POWER=y
CONFIG_TEST_POWER=y
CONFIG_BATTERY_DS2760=m
CONFIG_BATTERY_DS2780=y
CONFIG_BATTERY_DS2781=m
CONFIG_BATTERY_DS2782=y
CONFIG_BATTERY_SBS=m
# CONFIG_BATTERY_BQ27XXX is not set
# CONFIG_BATTERY_DA9052 is not set
CONFIG_BATTERY_MAX17040=y
CONFIG_BATTERY_MAX17042=m
# CONFIG_BATTERY_TWL4030_MADC is not set
CONFIG_CHARGER_PCF50633=m
# CONFIG_BATTERY_RX51 is not set
CONFIG_CHARGER_MAX8903=y
CONFIG_CHARGER_TWL4030=m
CONFIG_CHARGER_LP8727=y
CONFIG_CHARGER_GPIO=y
CONFIG_CHARGER_MAX14577=m
CONFIG_CHARGER_MAX77693=y
CONFIG_CHARGER_BQ2415X=y
CONFIG_CHARGER_BQ24190=y
CONFIG_CHARGER_BQ24257=m
CONFIG_CHARGER_BQ24735=y
CONFIG_CHARGER_BQ25890=m
# CONFIG_CHARGER_SMB347 is not set
CONFIG_CHARGER_TPS65090=y
CONFIG_BATTERY_GAUGE_LTC2941=y
CONFIG_BATTERY_RT5033=y
CONFIG_CHARGER_RT9455=y
CONFIG_HWMON=y
CONFIG_HWMON_VID=y
# CONFIG_HWMON_DEBUG_CHIP is not set

#
# Native drivers
#
CONFIG_SENSORS_AD7314=m
# CONFIG_SENSORS_AD7414 is not set
CONFIG_SENSORS_AD7418=y
CONFIG_SENSORS_ADM1021=m
CONFIG_SENSORS_ADM1025=m
# CONFIG_SENSORS_ADM1026 is not set
CONFIG_SENSORS_ADM1029=y
CONFIG_SENSORS_ADM1031=m
CONFIG_SENSORS_ADM9240=m
CONFIG_SENSORS_ADT7X10=y
# CONFIG_SENSORS_ADT7310 is not set
CONFIG_SENSORS_ADT7410=y
CONFIG_SENSORS_ADT7411=m
# CONFIG_SENSORS_ADT7462 is not set
CONFIG_SENSORS_ADT7470=y
CONFIG_SENSORS_ADT7475=m
CONFIG_SENSORS_ASC7621=y
# CONFIG_SENSORS_K8TEMP is not set
# CONFIG_SENSORS_K10TEMP is not set
# CONFIG_SENSORS_APPLESMC is not set
CONFIG_SENSORS_ASB100=y
CONFIG_SENSORS_ATXP1=y
# CONFIG_SENSORS_DS620 is not set
CONFIG_SENSORS_DS1621=y
CONFIG_SENSORS_DELL_SMM=m
# CONFIG_SENSORS_DA9052_ADC is not set
# CONFIG_SENSORS_DA9055 is not set
# CONFIG_SENSORS_I5K_AMB is not set
# CONFIG_SENSORS_F71805F is not set
CONFIG_SENSORS_F71882FG=m
CONFIG_SENSORS_F75375S=m
CONFIG_SENSORS_MC13783_ADC=y
CONFIG_SENSORS_FSCHMD=m
CONFIG_SENSORS_FTSTEUTATES=m
CONFIG_SENSORS_GL518SM=y
# CONFIG_SENSORS_GL520SM is not set
CONFIG_SENSORS_G760A=m
CONFIG_SENSORS_G762=m
CONFIG_SENSORS_GPIO_FAN=y
CONFIG_SENSORS_HIH6130=m
CONFIG_SENSORS_IBMAEM=m
CONFIG_SENSORS_IBMPEX=m
# CONFIG_SENSORS_IIO_HWMON is not set
# CONFIG_SENSORS_I5500 is not set
# CONFIG_SENSORS_CORETEMP is not set
CONFIG_SENSORS_IT87=m
CONFIG_SENSORS_JC42=y
CONFIG_SENSORS_POWR1220=y
CONFIG_SENSORS_LINEAGE=y
CONFIG_SENSORS_LTC2945=m
CONFIG_SENSORS_LTC2990=m
CONFIG_SENSORS_LTC4151=m
# CONFIG_SENSORS_LTC4215 is not set
# CONFIG_SENSORS_LTC4222 is not set
CONFIG_SENSORS_LTC4245=m
# CONFIG_SENSORS_LTC4260 is not set
# CONFIG_SENSORS_LTC4261 is not set
# CONFIG_SENSORS_MAX1111 is not set
CONFIG_SENSORS_MAX16065=m
# CONFIG_SENSORS_MAX1619 is not set
CONFIG_SENSORS_MAX1668=m
CONFIG_SENSORS_MAX197=m
CONFIG_SENSORS_MAX31722=y
# CONFIG_SENSORS_MAX6639 is not set
# CONFIG_SENSORS_MAX6642 is not set
CONFIG_SENSORS_MAX6650=y
# CONFIG_SENSORS_MAX6697 is not set
CONFIG_SENSORS_MAX31790=m
# CONFIG_SENSORS_MCP3021 is not set
CONFIG_SENSORS_TC654=m
CONFIG_SENSORS_ADCXX=y
CONFIG_SENSORS_LM63=m
CONFIG_SENSORS_LM70=y
CONFIG_SENSORS_LM73=y
CONFIG_SENSORS_LM75=y
CONFIG_SENSORS_LM77=y
CONFIG_SENSORS_LM78=y
CONFIG_SENSORS_LM80=y
CONFIG_SENSORS_LM83=m
CONFIG_SENSORS_LM85=m
CONFIG_SENSORS_LM87=y
# CONFIG_SENSORS_LM90 is not set
CONFIG_SENSORS_LM92=m
CONFIG_SENSORS_LM93=m
CONFIG_SENSORS_LM95234=m
CONFIG_SENSORS_LM95241=y
CONFIG_SENSORS_LM95245=m
CONFIG_SENSORS_PC87360=y
# CONFIG_SENSORS_PC87427 is not set
CONFIG_SENSORS_NTC_THERMISTOR=y
CONFIG_SENSORS_NCT6683=y
CONFIG_SENSORS_NCT6775=m
CONFIG_SENSORS_NCT7802=m
CONFIG_SENSORS_NCT7904=m
CONFIG_SENSORS_PCF8591=y
CONFIG_PMBUS=m
CONFIG_SENSORS_PMBUS=m
CONFIG_SENSORS_ADM1275=m
# CONFIG_SENSORS_LM25066 is not set
CONFIG_SENSORS_LTC2978=m
# CONFIG_SENSORS_LTC3815 is not set
CONFIG_SENSORS_MAX16064=m
CONFIG_SENSORS_MAX20751=m
CONFIG_SENSORS_MAX34440=m
# CONFIG_SENSORS_MAX8688 is not set
# CONFIG_SENSORS_TPS40422 is not set
CONFIG_SENSORS_UCD9000=m
CONFIG_SENSORS_UCD9200=m
# CONFIG_SENSORS_ZL6100 is not set
CONFIG_SENSORS_PWM_FAN=m
CONFIG_SENSORS_SHT15=y
CONFIG_SENSORS_SHT21=y
CONFIG_SENSORS_SHT3x=y
CONFIG_SENSORS_SHTC1=m
# CONFIG_SENSORS_SIS5595 is not set
CONFIG_SENSORS_DME1737=m
CONFIG_SENSORS_EMC1403=m
CONFIG_SENSORS_EMC2103=y
# CONFIG_SENSORS_EMC6W201 is not set
# CONFIG_SENSORS_SMSC47M1 is not set
CONFIG_SENSORS_SMSC47M192=m
CONFIG_SENSORS_SMSC47B397=y
# CONFIG_SENSORS_SCH56XX_COMMON is not set
# CONFIG_SENSORS_SCH5627 is not set
# CONFIG_SENSORS_SCH5636 is not set
# CONFIG_SENSORS_SMM665 is not set
CONFIG_SENSORS_ADC128D818=m
# CONFIG_SENSORS_ADS1015 is not set
CONFIG_SENSORS_ADS7828=y
# CONFIG_SENSORS_ADS7871 is not set
CONFIG_SENSORS_AMC6821=y
CONFIG_SENSORS_INA209=y
CONFIG_SENSORS_INA2XX=y
# CONFIG_SENSORS_INA3221 is not set
CONFIG_SENSORS_TC74=m
CONFIG_SENSORS_THMC50=m
CONFIG_SENSORS_TMP102=m
# CONFIG_SENSORS_TMP103 is not set
# CONFIG_SENSORS_TMP108 is not set
CONFIG_SENSORS_TMP401=m
CONFIG_SENSORS_TMP421=m
CONFIG_SENSORS_TWL4030_MADC=m
# CONFIG_SENSORS_VIA_CPUTEMP is not set
# CONFIG_SENSORS_VIA686A is not set
# CONFIG_SENSORS_VT1211 is not set
# CONFIG_SENSORS_VT8231 is not set
CONFIG_SENSORS_W83781D=y
CONFIG_SENSORS_W83791D=m
# CONFIG_SENSORS_W83792D is not set
# CONFIG_SENSORS_W83793 is not set
CONFIG_SENSORS_W83795=y
CONFIG_SENSORS_W83795_FANCTRL=y
CONFIG_SENSORS_W83L785TS=y
CONFIG_SENSORS_W83L786NG=m
CONFIG_SENSORS_W83627HF=m
CONFIG_SENSORS_W83627EHF=m
CONFIG_SENSORS_WM831X=m

#
# ACPI drivers
#
# CONFIG_SENSORS_ACPI_POWER is not set
# CONFIG_SENSORS_ATK0110 is not set
CONFIG_THERMAL=y
CONFIG_THERMAL_HWMON=y
CONFIG_THERMAL_OF=y
CONFIG_THERMAL_WRITABLE_TRIPS=y
CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE=y
# CONFIG_THERMAL_DEFAULT_GOV_FAIR_SHARE is not set
# CONFIG_THERMAL_DEFAULT_GOV_USER_SPACE is not set
# CONFIG_THERMAL_DEFAULT_GOV_POWER_ALLOCATOR is not set
# CONFIG_THERMAL_GOV_FAIR_SHARE is not set
CONFIG_THERMAL_GOV_STEP_WISE=y
# CONFIG_THERMAL_GOV_BANG_BANG is not set
CONFIG_THERMAL_GOV_USER_SPACE=y
# CONFIG_THERMAL_GOV_POWER_ALLOCATOR is not set
# CONFIG_THERMAL_EMULATION is not set
# CONFIG_MAX77620_THERMAL is not set
# CONFIG_QORIQ_THERMAL is not set
CONFIG_X86_PKG_TEMP_THERMAL=m
# CONFIG_INTEL_SOC_DTS_THERMAL is not set

#
# ACPI INT340X thermal drivers
#
# CONFIG_INT340X_THERMAL is not set
# CONFIG_INTEL_BXT_PMIC_THERMAL is not set
# CONFIG_INTEL_PCH_THERMAL is not set
# CONFIG_QCOM_SPMI_TEMP_ALARM is not set
# CONFIG_GENERIC_ADC_THERMAL is not set
CONFIG_WATCHDOG=y
CONFIG_WATCHDOG_CORE=y
# CONFIG_WATCHDOG_NOWAYOUT is not set
# CONFIG_WATCHDOG_SYSFS is not set

#
# Watchdog Device Drivers
#
CONFIG_SOFT_WATCHDOG=y
# CONFIG_DA9052_WATCHDOG is not set
CONFIG_DA9055_WATCHDOG=m
CONFIG_DA9062_WATCHDOG=m
CONFIG_GPIO_WATCHDOG=m
# CONFIG_WDAT_WDT is not set
# CONFIG_WM831X_WATCHDOG is not set
CONFIG_XILINX_WATCHDOG=m
CONFIG_ZIIRAVE_WATCHDOG=y
# CONFIG_CADENCE_WATCHDOG is not set
CONFIG_DW_WATCHDOG=y
# CONFIG_RN5T618_WATCHDOG is not set
CONFIG_TWL4030_WATCHDOG=y
# CONFIG_MAX63XX_WATCHDOG is not set
CONFIG_MAX77620_WATCHDOG=m
# CONFIG_ACQUIRE_WDT is not set
CONFIG_ADVANTECH_WDT=m
# CONFIG_ALIM1535_WDT is not set
# CONFIG_ALIM7101_WDT is not set
CONFIG_EBC_C384_WDT=y
# CONFIG_F71808E_WDT is not set
# CONFIG_SP5100_TCO is not set
CONFIG_SBC_FITPC2_WATCHDOG=y
# CONFIG_EUROTECH_WDT is not set
CONFIG_IB700_WDT=m
CONFIG_IBMASR=m
CONFIG_WAFER_WDT=m
# CONFIG_I6300ESB_WDT is not set
# CONFIG_IE6XX_WDT is not set
# CONFIG_ITCO_WDT is not set
CONFIG_IT8712F_WDT=m
# CONFIG_IT87_WDT is not set
# CONFIG_HP_WATCHDOG is not set
CONFIG_KEMPLD_WDT=m
# CONFIG_SC1200_WDT is not set
CONFIG_PC87413_WDT=m
# CONFIG_NV_TCO is not set
# CONFIG_60XX_WDT is not set
# CONFIG_SBC8360_WDT is not set
# CONFIG_SBC7240_WDT is not set
# CONFIG_CPU5_WDT is not set
# CONFIG_SMSC_SCH311X_WDT is not set
# CONFIG_SMSC37B787_WDT is not set
# CONFIG_VIA_WDT is not set
CONFIG_W83627HF_WDT=y
CONFIG_W83877F_WDT=m
CONFIG_W83977F_WDT=y
CONFIG_MACHZ_WDT=m
CONFIG_SBC_EPX_C3_WATCHDOG=y
# CONFIG_NI903X_WDT is not set
CONFIG_MEN_A21_WDT=y

#
# ISA-based Watchdog Cards
#
CONFIG_PCWATCHDOG=y
CONFIG_MIXCOMWD=m
# CONFIG_WDT is not set

#
# PCI-based Watchdog Cards
#
# CONFIG_PCIPCWATCHDOG is not set
# CONFIG_WDTPCI is not set

#
# Watchdog Pretimeout Governors
#
CONFIG_WATCHDOG_PRETIMEOUT_GOV=y
# CONFIG_WATCHDOG_PRETIMEOUT_DEFAULT_GOV_NOOP is not set
CONFIG_WATCHDOG_PRETIMEOUT_DEFAULT_GOV_PANIC=y
# CONFIG_WATCHDOG_PRETIMEOUT_GOV_NOOP is not set
CONFIG_WATCHDOG_PRETIMEOUT_GOV_PANIC=y
CONFIG_SSB_POSSIBLE=y

#
# Sonics Silicon Backplane
#
# CONFIG_SSB is not set
CONFIG_BCMA_POSSIBLE=y

#
# Broadcom specific AMBA
#
# CONFIG_BCMA is not set

#
# Multifunction device drivers
#
CONFIG_MFD_CORE=y
# CONFIG_MFD_CS5535 is not set
# CONFIG_MFD_ACT8945A is not set
# CONFIG_MFD_AS3711 is not set
# CONFIG_MFD_AS3722 is not set
# CONFIG_PMIC_ADP5520 is not set
CONFIG_MFD_AAT2870_CORE=y
CONFIG_MFD_ATMEL_FLEXCOM=m
# CONFIG_MFD_ATMEL_HLCDC is not set
CONFIG_MFD_BCM590XX=y
# CONFIG_MFD_AXP20X_I2C is not set
CONFIG_MFD_CROS_EC=m
# CONFIG_MFD_CROS_EC_I2C is not set
# CONFIG_MFD_CROS_EC_SPI is not set
# CONFIG_PMIC_DA903X is not set
CONFIG_PMIC_DA9052=y
CONFIG_MFD_DA9052_SPI=y
CONFIG_MFD_DA9052_I2C=y
CONFIG_MFD_DA9055=y
CONFIG_MFD_DA9062=y
# CONFIG_MFD_DA9063 is not set
# CONFIG_MFD_DA9150 is not set
CONFIG_MFD_MC13XXX=y
# CONFIG_MFD_MC13XXX_SPI is not set
CONFIG_MFD_MC13XXX_I2C=y
# CONFIG_MFD_HI6421_PMIC is not set
CONFIG_HTC_PASIC3=m
# CONFIG_HTC_I2CPLD is not set
# CONFIG_LPC_ICH is not set
# CONFIG_LPC_SCH is not set
CONFIG_INTEL_SOC_PMIC=y
# CONFIG_MFD_INTEL_LPSS_ACPI is not set
# CONFIG_MFD_INTEL_LPSS_PCI is not set
# CONFIG_MFD_JANZ_CMODIO is not set
CONFIG_MFD_KEMPLD=y
CONFIG_MFD_88PM800=y
# CONFIG_MFD_88PM805 is not set
# CONFIG_MFD_88PM860X is not set
CONFIG_MFD_MAX14577=m
CONFIG_MFD_MAX77620=y
CONFIG_MFD_MAX77686=m
CONFIG_MFD_MAX77693=y
CONFIG_MFD_MAX77843=y
# CONFIG_MFD_MAX8907 is not set
# CONFIG_MFD_MAX8925 is not set
CONFIG_MFD_MAX8997=y
CONFIG_MFD_MAX8998=y
CONFIG_MFD_MT6397=y
# CONFIG_MFD_MENF21BMC is not set
# CONFIG_EZX_PCAP is not set
# CONFIG_MFD_RETU is not set
CONFIG_MFD_PCF50633=y
CONFIG_PCF50633_ADC=m
CONFIG_PCF50633_GPIO=y
# CONFIG_MFD_RDC321X is not set
# CONFIG_MFD_RTSX_PCI is not set
CONFIG_MFD_RT5033=y
CONFIG_MFD_RC5T583=y
CONFIG_MFD_RK808=y
CONFIG_MFD_RN5T618=m
CONFIG_MFD_SEC_CORE=y
CONFIG_MFD_SI476X_CORE=y
# CONFIG_MFD_SM501 is not set
# CONFIG_MFD_SKY81452 is not set
# CONFIG_MFD_SMSC is not set
CONFIG_ABX500_CORE=y
# CONFIG_AB3100_CORE is not set
# CONFIG_MFD_STMPE is not set
CONFIG_MFD_SYSCON=y
# CONFIG_MFD_TI_AM335X_TSCADC is not set
# CONFIG_MFD_LP3943 is not set
# CONFIG_MFD_LP8788 is not set
# CONFIG_MFD_PALMAS is not set
# CONFIG_TPS6105X is not set
# CONFIG_TPS65010 is not set
CONFIG_TPS6507X=m
# CONFIG_MFD_TPS65086 is not set
CONFIG_MFD_TPS65090=y
# CONFIG_MFD_TPS65217 is not set
CONFIG_MFD_TI_LP873X=y
CONFIG_MFD_TPS65218=y
CONFIG_MFD_TPS6586X=y
# CONFIG_MFD_TPS65910 is not set
CONFIG_MFD_TPS65912=m
# CONFIG_MFD_TPS65912_I2C is not set
CONFIG_MFD_TPS65912_SPI=m
CONFIG_MFD_TPS80031=y
CONFIG_TWL4030_CORE=y
CONFIG_MFD_TWL4030_AUDIO=y
# CONFIG_TWL6040_CORE is not set
CONFIG_MFD_WL1273_CORE=y
# CONFIG_MFD_LM3533 is not set
# CONFIG_MFD_TIMBERDALE is not set
CONFIG_MFD_TC3589X=y
# CONFIG_MFD_TMIO is not set
# CONFIG_MFD_VX855 is not set
CONFIG_MFD_ARIZONA=y
CONFIG_MFD_ARIZONA_I2C=y
CONFIG_MFD_ARIZONA_SPI=y
CONFIG_MFD_CS47L24=y
CONFIG_MFD_WM5102=y
CONFIG_MFD_WM5110=y
# CONFIG_MFD_WM8997 is not set
# CONFIG_MFD_WM8998 is not set
CONFIG_MFD_WM8400=y
CONFIG_MFD_WM831X=y
CONFIG_MFD_WM831X_I2C=y
CONFIG_MFD_WM831X_SPI=y
# CONFIG_MFD_WM8350_I2C is not set
CONFIG_MFD_WM8994=y
# CONFIG_REGULATOR is not set
# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
# CONFIG_AGP is not set
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=16
# CONFIG_VGA_SWITCHEROO is not set
# CONFIG_DRM is not set

#
# ACP (Audio CoProcessor) Configuration
#

#
# Frame buffer Devices
#
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB_CMDLINE=y
CONFIG_FB_NOTIFY=y
# CONFIG_FB_DDC is not set
# CONFIG_FB_BOOT_VESA_SUPPORT is not set
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
# CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set
CONFIG_FB_SYS_FILLRECT=y
CONFIG_FB_SYS_COPYAREA=y
CONFIG_FB_SYS_IMAGEBLIT=y
CONFIG_FB_FOREIGN_ENDIAN=y
# CONFIG_FB_BOTH_ENDIAN is not set
# CONFIG_FB_BIG_ENDIAN is not set
CONFIG_FB_LITTLE_ENDIAN=y
CONFIG_FB_SYS_FOPS=y
CONFIG_FB_DEFERRED_IO=y
CONFIG_FB_HECUBA=y
# CONFIG_FB_SVGALIB is not set
# CONFIG_FB_MACMODES is not set
CONFIG_FB_BACKLIGHT=y
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
CONFIG_FB_ARC=m
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_VGA16 is not set
# CONFIG_FB_VESA is not set
CONFIG_FB_N411=y
# CONFIG_FB_HGA is not set
CONFIG_FB_OPENCORES=m
CONFIG_FB_S1D13XXX=m
# CONFIG_FB_NVIDIA is not set
# CONFIG_FB_RIVA is not set
# CONFIG_FB_I740 is not set
# CONFIG_FB_LE80578 is not set
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_S3 is not set
# CONFIG_FB_SAVAGE is not set
# CONFIG_FB_SIS is not set
# CONFIG_FB_VIA is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_GEODE is not set
CONFIG_FB_IBM_GXT4500=m
CONFIG_FB_VIRTUAL=y
# CONFIG_FB_METRONOME is not set
# CONFIG_FB_MB862XX is not set
# CONFIG_FB_BROADSHEET is not set
CONFIG_FB_AUO_K190X=m
CONFIG_FB_AUO_K1900=m
CONFIG_FB_AUO_K1901=m
CONFIG_FB_SIMPLE=y
CONFIG_FB_SSD1307=y
# CONFIG_FB_SM712 is not set
CONFIG_BACKLIGHT_LCD_SUPPORT=y
CONFIG_LCD_CLASS_DEVICE=y
CONFIG_LCD_L4F00242T03=y
CONFIG_LCD_LMS283GF05=m
# CONFIG_LCD_LTV350QV is not set
CONFIG_LCD_ILI922X=m
CONFIG_LCD_ILI9320=m
CONFIG_LCD_TDO24M=y
CONFIG_LCD_VGG2432A4=m
CONFIG_LCD_PLATFORM=m
CONFIG_LCD_S6E63M0=y
CONFIG_LCD_LD9040=y
# CONFIG_LCD_AMS369FG06 is not set
# CONFIG_LCD_LMS501KF03 is not set
# CONFIG_LCD_HX8357 is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=y
# CONFIG_BACKLIGHT_GENERIC is not set
# CONFIG_BACKLIGHT_PWM is not set
CONFIG_BACKLIGHT_DA9052=m
# CONFIG_BACKLIGHT_APPLE is not set
# CONFIG_BACKLIGHT_PM8941_WLED is not set
# CONFIG_BACKLIGHT_SAHARA is not set
# CONFIG_BACKLIGHT_WM831X is not set
# CONFIG_BACKLIGHT_ADP8860 is not set
CONFIG_BACKLIGHT_ADP8870=m
# CONFIG_BACKLIGHT_PCF50633 is not set
# CONFIG_BACKLIGHT_AAT2870 is not set
# CONFIG_BACKLIGHT_LM3630A is not set
CONFIG_BACKLIGHT_LM3639=y
CONFIG_BACKLIGHT_LP855X=m
# CONFIG_BACKLIGHT_PANDORA is not set
# CONFIG_BACKLIGHT_GPIO is not set
# CONFIG_BACKLIGHT_LV5207LP is not set
# CONFIG_BACKLIGHT_BD6107 is not set
# CONFIG_VGASTATE is not set
CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
CONFIG_LOGO_LINUX_VGA16=y
# CONFIG_LOGO_LINUX_CLUT224 is not set
CONFIG_SOUND=m
CONFIG_SOUND_OSS_CORE=y
CONFIG_SOUND_OSS_CORE_PRECLAIM=y
# CONFIG_SND is not set
CONFIG_SOUND_PRIME=m
# CONFIG_SOUND_MSNDCLAS is not set
# CONFIG_SOUND_MSNDPIN is not set
# CONFIG_SOUND_OSS is not set

#
# HID support
#
# CONFIG_HID is not set

#
# I2C HID support
#
# CONFIG_I2C_HID is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_SUPPORT=y
CONFIG_USB_ARCH_HAS_HCD=y
# CONFIG_USB is not set

#
# USB port drivers
#

#
# USB Physical Layer drivers
#
# CONFIG_USB_PHY is not set
# CONFIG_NOP_USB_XCEIV is not set
# CONFIG_USB_GPIO_VBUS is not set
# CONFIG_USB_GADGET is not set
# CONFIG_USB_ULPI_BUS is not set
CONFIG_UWB=y
# CONFIG_UWB_WHCI is not set
# CONFIG_MMC is not set
CONFIG_MEMSTICK=m
CONFIG_MEMSTICK_DEBUG=y

#
# MemoryStick drivers
#
# CONFIG_MEMSTICK_UNSAFE_RESUME is not set

#
# MemoryStick Host Controller Drivers
#
# CONFIG_MEMSTICK_TIFM_MS is not set
# CONFIG_MEMSTICK_JMICRON_38X is not set
# CONFIG_MEMSTICK_R592 is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
# CONFIG_LEDS_CLASS_FLASH is not set

#
# LED drivers
#
CONFIG_LEDS_BCM6328=y
CONFIG_LEDS_BCM6358=y
CONFIG_LEDS_LM3530=m
# CONFIG_LEDS_LM3642 is not set
CONFIG_LEDS_PCA9532=m
CONFIG_LEDS_PCA9532_GPIO=y
CONFIG_LEDS_GPIO=y
# CONFIG_LEDS_LP3944 is not set
# CONFIG_LEDS_LP3952 is not set
CONFIG_LEDS_LP55XX_COMMON=y
CONFIG_LEDS_LP5521=m
CONFIG_LEDS_LP5523=y
CONFIG_LEDS_LP5562=m
# CONFIG_LEDS_LP8501 is not set
CONFIG_LEDS_LP8860=y
# CONFIG_LEDS_PCA955X is not set
CONFIG_LEDS_PCA963X=m
# CONFIG_LEDS_WM831X_STATUS is not set
# CONFIG_LEDS_DA9052 is not set
CONFIG_LEDS_DAC124S085=y
CONFIG_LEDS_PWM=y
CONFIG_LEDS_BD2802=m
# CONFIG_LEDS_LT3593 is not set
CONFIG_LEDS_MC13783=m
# CONFIG_LEDS_TCA6507 is not set
# CONFIG_LEDS_TLC591XX is not set
CONFIG_LEDS_MAX8997=y
CONFIG_LEDS_LM355x=y
CONFIG_LEDS_OT200=m
CONFIG_LEDS_IS31FL319X=y
CONFIG_LEDS_IS31FL32XX=m

#
# LED driver for blink(1) USB RGB LED is under Special HID drivers (HID_THINGM)
#
CONFIG_LEDS_BLINKM=m
# CONFIG_LEDS_SYSCON is not set
CONFIG_LEDS_USER=y
# CONFIG_LEDS_NIC78BX is not set

#
# LED Triggers
#
# CONFIG_LEDS_TRIGGERS is not set
# CONFIG_ACCESSIBILITY is not set
CONFIG_EDAC_ATOMIC_SCRUB=y
CONFIG_EDAC_SUPPORT=y
CONFIG_EDAC=y
CONFIG_EDAC_LEGACY_SYSFS=y
CONFIG_EDAC_DEBUG=y
CONFIG_EDAC_MM_EDAC=m
# CONFIG_EDAC_AMD76X is not set
# CONFIG_EDAC_E7XXX is not set
# CONFIG_EDAC_E752X is not set
# CONFIG_EDAC_I82875P is not set
# CONFIG_EDAC_I82975X is not set
# CONFIG_EDAC_I3000 is not set
# CONFIG_EDAC_I3200 is not set
# CONFIG_EDAC_IE31200 is not set
# CONFIG_EDAC_X38 is not set
# CONFIG_EDAC_I5400 is not set
# CONFIG_EDAC_I7CORE is not set
# CONFIG_EDAC_I82860 is not set
# CONFIG_EDAC_R82600 is not set
# CONFIG_EDAC_I5000 is not set
# CONFIG_EDAC_I5100 is not set
# CONFIG_EDAC_I7300 is not set
CONFIG_RTC_LIB=y
CONFIG_RTC_MC146818_LIB=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_HCTOSYS=y
CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
# CONFIG_RTC_SYSTOHC is not set
CONFIG_RTC_DEBUG=y

#
# RTC interfaces
#
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
# CONFIG_RTC_INTF_DEV_UIE_EMUL is not set
CONFIG_RTC_DRV_TEST=m

#
# I2C RTC drivers
#
CONFIG_RTC_DRV_88PM80X=m
CONFIG_RTC_DRV_ABB5ZES3=y
# CONFIG_RTC_DRV_ABX80X is not set
CONFIG_RTC_DRV_DS1307=y
# CONFIG_RTC_DRV_DS1307_HWMON is not set
CONFIG_RTC_DRV_DS1307_CENTURY=y
CONFIG_RTC_DRV_DS1374=y
# CONFIG_RTC_DRV_DS1374_WDT is not set
CONFIG_RTC_DRV_DS1672=m
# CONFIG_RTC_DRV_HYM8563 is not set
CONFIG_RTC_DRV_MAX6900=y
# CONFIG_RTC_DRV_MAX8998 is not set
# CONFIG_RTC_DRV_MAX8997 is not set
# CONFIG_RTC_DRV_MAX77686 is not set
CONFIG_RTC_DRV_RK808=y
# CONFIG_RTC_DRV_RS5C372 is not set
CONFIG_RTC_DRV_ISL1208=m
CONFIG_RTC_DRV_ISL12022=y
CONFIG_RTC_DRV_X1205=m
CONFIG_RTC_DRV_PCF8523=m
CONFIG_RTC_DRV_PCF85063=y
# CONFIG_RTC_DRV_PCF8563 is not set
CONFIG_RTC_DRV_PCF8583=y
CONFIG_RTC_DRV_M41T80=y
# CONFIG_RTC_DRV_M41T80_WDT is not set
CONFIG_RTC_DRV_BQ32K=m
CONFIG_RTC_DRV_TWL4030=y
# CONFIG_RTC_DRV_TPS6586X is not set
# CONFIG_RTC_DRV_TPS80031 is not set
CONFIG_RTC_DRV_RC5T583=y
CONFIG_RTC_DRV_S35390A=m
CONFIG_RTC_DRV_FM3130=y
CONFIG_RTC_DRV_RX8010=y
# CONFIG_RTC_DRV_RX8581 is not set
# CONFIG_RTC_DRV_RX8025 is not set
CONFIG_RTC_DRV_EM3027=y
CONFIG_RTC_DRV_RV8803=y
CONFIG_RTC_DRV_S5M=y

#
# SPI RTC drivers
#
CONFIG_RTC_DRV_M41T93=m
CONFIG_RTC_DRV_M41T94=y
CONFIG_RTC_DRV_DS1302=y
CONFIG_RTC_DRV_DS1305=y
# CONFIG_RTC_DRV_DS1343 is not set
# CONFIG_RTC_DRV_DS1347 is not set
CONFIG_RTC_DRV_DS1390=y
# CONFIG_RTC_DRV_MAX6916 is not set
# CONFIG_RTC_DRV_R9701 is not set
CONFIG_RTC_DRV_RX4581=m
# CONFIG_RTC_DRV_RX6110 is not set
# CONFIG_RTC_DRV_RS5C348 is not set
# CONFIG_RTC_DRV_MAX6902 is not set
CONFIG_RTC_DRV_PCF2123=m
CONFIG_RTC_DRV_MCP795=m
CONFIG_RTC_I2C_AND_SPI=y

#
# SPI and I2C RTC drivers
#
CONFIG_RTC_DRV_DS3232=m
CONFIG_RTC_DRV_PCF2127=y
CONFIG_RTC_DRV_RV3029C2=y
CONFIG_RTC_DRV_RV3029_HWMON=y

#
# Platform RTC drivers
#
CONFIG_RTC_DRV_CMOS=y
CONFIG_RTC_DRV_DS1286=m
# CONFIG_RTC_DRV_DS1511 is not set
CONFIG_RTC_DRV_DS1553=y
# CONFIG_RTC_DRV_DS1685_FAMILY is not set
# CONFIG_RTC_DRV_DS1742 is not set
CONFIG_RTC_DRV_DS2404=m
CONFIG_RTC_DRV_DA9052=m
CONFIG_RTC_DRV_DA9055=y
# CONFIG_RTC_DRV_DA9063 is not set
CONFIG_RTC_DRV_STK17TA8=y
CONFIG_RTC_DRV_M48T86=y
CONFIG_RTC_DRV_M48T35=m
CONFIG_RTC_DRV_M48T59=m
CONFIG_RTC_DRV_MSM6242=m
CONFIG_RTC_DRV_BQ4802=y
CONFIG_RTC_DRV_RP5C01=m
CONFIG_RTC_DRV_V3020=y
# CONFIG_RTC_DRV_WM831X is not set
CONFIG_RTC_DRV_PCF50633=m
CONFIG_RTC_DRV_ZYNQMP=y

#
# on-CPU RTC drivers
#
CONFIG_RTC_DRV_MC13XXX=m
# CONFIG_RTC_DRV_SNVS is not set
CONFIG_RTC_DRV_MT6397=y
# CONFIG_RTC_DRV_R7301 is not set

#
# HID Sensor RTC drivers
#
CONFIG_DMADEVICES=y
# CONFIG_DMADEVICES_DEBUG is not set

#
# DMA Devices
#
CONFIG_DMA_ENGINE=y
CONFIG_DMA_VIRTUAL_CHANNELS=y
CONFIG_DMA_ACPI=y
CONFIG_DMA_OF=y
CONFIG_FSL_EDMA=m
# CONFIG_INTEL_IDMA64 is not set
# CONFIG_PCH_DMA is not set
CONFIG_QCOM_HIDMA_MGMT=m
CONFIG_QCOM_HIDMA=y
CONFIG_DW_DMAC_CORE=y
CONFIG_DW_DMAC=y
# CONFIG_DW_DMAC_PCI is not set
CONFIG_HSU_DMA=y

#
# DMA Clients
#
CONFIG_ASYNC_TX_DMA=y
CONFIG_DMATEST=m

#
# DMABUF options
#
CONFIG_SYNC_FILE=y
CONFIG_SW_SYNC=y
CONFIG_AUXDISPLAY=y
# CONFIG_KS0108 is not set
# CONFIG_IMG_ASCII_LCD is not set
# CONFIG_HT16K33 is not set
CONFIG_UIO=y
# CONFIG_UIO_CIF is not set
CONFIG_UIO_PDRV_GENIRQ=m
CONFIG_UIO_DMEM_GENIRQ=y
# CONFIG_UIO_AEC is not set
# CONFIG_UIO_SERCOS3 is not set
# CONFIG_UIO_PCI_GENERIC is not set
# CONFIG_UIO_NETX is not set
CONFIG_UIO_PRUSS=y
# CONFIG_UIO_MF624 is not set
# CONFIG_VIRT_DRIVERS is not set

#
# Virtio drivers
#
# CONFIG_VIRTIO_PCI is not set
# CONFIG_VIRTIO_MMIO is not set

#
# Microsoft Hyper-V guest support
#
# CONFIG_HYPERV is not set
# CONFIG_STAGING is not set
CONFIG_X86_PLATFORM_DEVICES=y
# CONFIG_ACERHDF is not set
# CONFIG_ASUS_LAPTOP is not set
# CONFIG_DELL_SMO8800 is not set
# CONFIG_FUJITSU_LAPTOP is not set
# CONFIG_FUJITSU_TABLET is not set
# CONFIG_HP_ACCEL is not set
# CONFIG_HP_WIRELESS is not set
# CONFIG_PANASONIC_LAPTOP is not set
# CONFIG_THINKPAD_ACPI is not set
# CONFIG_SENSORS_HDAPS is not set
# CONFIG_INTEL_MENLOW is not set
# CONFIG_ASUS_WIRELESS is not set
# CONFIG_ACPI_WMI is not set
# CONFIG_TOPSTAR_LAPTOP is not set
# CONFIG_TOSHIBA_BT_RFKILL is not set
# CONFIG_TOSHIBA_HAPS is not set
# CONFIG_ACPI_CMPC is not set
# CONFIG_INTEL_HID_EVENT is not set
# CONFIG_INTEL_VBTN is not set
# CONFIG_INTEL_IPS is not set
# CONFIG_INTEL_PMC_CORE is not set
# CONFIG_IBM_RTL is not set
# CONFIG_SAMSUNG_LAPTOP is not set
# CONFIG_SAMSUNG_Q10 is not set
# CONFIG_APPLE_GMUX is not set
# CONFIG_INTEL_RST is not set
# CONFIG_INTEL_SMARTCONNECT is not set
# CONFIG_PVPANIC is not set
# CONFIG_INTEL_PMC_IPC is not set
# CONFIG_SURFACE_PRO3_BUTTON is not set
# CONFIG_SURFACE_3_BUTTON is not set
CONFIG_INTEL_PUNIT_IPC=y
CONFIG_MLX_CPLD_PLATFORM=y
CONFIG_CHROME_PLATFORMS=y
# CONFIG_CHROMEOS_PSTORE is not set
# CONFIG_CROS_EC_CHARDEV is not set
CONFIG_CROS_EC_LPC=m
CONFIG_CROS_EC_PROTO=y
# CONFIG_CROS_KBD_LED_BACKLIGHT is not set

#
# Hardware Spinlock drivers
#

#
# Clock Source drivers
#
CONFIG_CLKSRC_I8253=y
CONFIG_CLKEVT_I8253=y
CONFIG_CLKBLD_I8253=y
# CONFIG_ATMEL_PIT is not set
# CONFIG_SH_TIMER_CMT is not set
# CONFIG_SH_TIMER_MTU2 is not set
# CONFIG_SH_TIMER_TMU is not set
# CONFIG_EM_TIMER_STI is not set
# CONFIG_MAILBOX is not set
# CONFIG_IOMMU_SUPPORT is not set

#
# Remoteproc drivers
#
# CONFIG_REMOTEPROC is not set

#
# Rpmsg drivers
#

#
# SOC (System On Chip) specific Drivers
#

#
# Broadcom SoC drivers
#
# CONFIG_SUNXI_SRAM is not set
# CONFIG_SOC_TI is not set
# CONFIG_PM_DEVFREQ is not set
CONFIG_EXTCON=m

#
# Extcon Device Drivers
#
CONFIG_EXTCON_ADC_JACK=m
CONFIG_EXTCON_GPIO=m
# CONFIG_EXTCON_MAX14577 is not set
CONFIG_EXTCON_MAX3355=m
# CONFIG_EXTCON_MAX77693 is not set
CONFIG_EXTCON_MAX77843=m
CONFIG_EXTCON_MAX8997=m
CONFIG_EXTCON_QCOM_SPMI_MISC=m
CONFIG_EXTCON_RT8973A=m
# CONFIG_EXTCON_SM5502 is not set
CONFIG_EXTCON_USB_GPIO=m
# CONFIG_MEMORY is not set
CONFIG_IIO=y
CONFIG_IIO_BUFFER=y
CONFIG_IIO_BUFFER_CB=m
CONFIG_IIO_KFIFO_BUF=y
CONFIG_IIO_TRIGGERED_BUFFER=y
CONFIG_IIO_CONFIGFS=y
CONFIG_IIO_TRIGGER=y
CONFIG_IIO_CONSUMERS_PER_TRIGGER=2
CONFIG_IIO_SW_DEVICE=y
# CONFIG_IIO_SW_TRIGGER is not set

#
# Accelerometers
#
CONFIG_BMA180=y
# CONFIG_BMA220 is not set
CONFIG_BMC150_ACCEL=y
CONFIG_BMC150_ACCEL_I2C=y
CONFIG_BMC150_ACCEL_SPI=y
CONFIG_DA280=y
CONFIG_DA311=m
CONFIG_DMARD06=y
CONFIG_DMARD09=y
CONFIG_DMARD10=m
CONFIG_IIO_ST_ACCEL_3AXIS=m
CONFIG_IIO_ST_ACCEL_I2C_3AXIS=m
CONFIG_IIO_ST_ACCEL_SPI_3AXIS=m
CONFIG_KXSD9=y
# CONFIG_KXSD9_SPI is not set
CONFIG_KXSD9_I2C=m
CONFIG_KXCJK1013=y
CONFIG_MC3230=y
CONFIG_MMA7455=y
CONFIG_MMA7455_I2C=y
CONFIG_MMA7455_SPI=y
CONFIG_MMA7660=m
# CONFIG_MMA8452 is not set
CONFIG_MMA9551_CORE=y
CONFIG_MMA9551=y
CONFIG_MMA9553=m
# CONFIG_MXC4005 is not set
# CONFIG_MXC6255 is not set
CONFIG_SCA3000=y
CONFIG_STK8312=y
# CONFIG_STK8BA50 is not set

#
# Analog to digital converters
#
CONFIG_AD_SIGMA_DELTA=y
CONFIG_AD7266=y
CONFIG_AD7291=m
CONFIG_AD7298=y
CONFIG_AD7476=m
# CONFIG_AD7766 is not set
CONFIG_AD7791=m
CONFIG_AD7793=y
# CONFIG_AD7887 is not set
# CONFIG_AD7923 is not set
# CONFIG_AD799X is not set
# CONFIG_ENVELOPE_DETECTOR is not set
# CONFIG_HI8435 is not set
CONFIG_LTC2485=m
CONFIG_MAX1027=m
CONFIG_MAX1363=m
# CONFIG_MCP320X is not set
# CONFIG_MCP3422 is not set
CONFIG_NAU7802=y
# CONFIG_QCOM_SPMI_IADC is not set
CONFIG_QCOM_SPMI_VADC=y
# CONFIG_STX104 is not set
# CONFIG_TI_ADC081C is not set
CONFIG_TI_ADC0832=y
# CONFIG_TI_ADC12138 is not set
# CONFIG_TI_ADC128S052 is not set
CONFIG_TI_ADC161S626=m
CONFIG_TI_ADS1015=m
CONFIG_TI_ADS8688=y
CONFIG_TWL4030_MADC=m
# CONFIG_TWL6030_GPADC is not set
CONFIG_VF610_ADC=y

#
# Amplifiers
#
CONFIG_AD8366=m

#
# Chemical Sensors
#
CONFIG_ATLAS_PH_SENSOR=m
# CONFIG_IAQCORE is not set
CONFIG_VZ89X=m
CONFIG_IIO_CROS_EC_SENSORS_CORE=m
CONFIG_IIO_CROS_EC_SENSORS=m

#
# Hid Sensor IIO Common
#
CONFIG_IIO_MS_SENSORS_I2C=y

#
# SSP Sensor Common
#
# CONFIG_IIO_SSP_SENSORHUB is not set
CONFIG_IIO_ST_SENSORS_I2C=y
CONFIG_IIO_ST_SENSORS_SPI=y
CONFIG_IIO_ST_SENSORS_CORE=y

#
# Counters
#
# CONFIG_104_QUAD_8 is not set

#
# Digital to analog converters
#
# CONFIG_AD5064 is not set
CONFIG_AD5360=y
# CONFIG_AD5380 is not set
CONFIG_AD5421=m
# CONFIG_AD5446 is not set
# CONFIG_AD5449 is not set
CONFIG_AD5592R_BASE=m
CONFIG_AD5592R=m
CONFIG_AD5593R=m
CONFIG_AD5504=y
CONFIG_AD5624R_SPI=m
CONFIG_AD5686=y
CONFIG_AD5755=m
CONFIG_AD5761=y
CONFIG_AD5764=m
CONFIG_AD5791=y
CONFIG_AD7303=y
CONFIG_CIO_DAC=m
CONFIG_AD8801=y
CONFIG_DPOT_DAC=m
# CONFIG_M62332 is not set
CONFIG_MAX517=m
CONFIG_MAX5821=m
CONFIG_MCP4725=m
CONFIG_MCP4922=m
CONFIG_VF610_DAC=m

#
# IIO dummy driver
#
CONFIG_IIO_SIMPLE_DUMMY=y
# CONFIG_IIO_SIMPLE_DUMMY_EVENTS is not set
# CONFIG_IIO_SIMPLE_DUMMY_BUFFER is not set

#
# Frequency Synthesizers DDS/PLL
#

#
# Clock Generator/Distribution
#
# CONFIG_AD9523 is not set

#
# Phase-Locked Loop (PLL) frequency synthesizers
#
# CONFIG_ADF4350 is not set

#
# Digital gyroscope sensors
#
CONFIG_ADIS16080=m
# CONFIG_ADIS16130 is not set
# CONFIG_ADIS16136 is not set
CONFIG_ADIS16260=y
CONFIG_ADXRS450=y
CONFIG_BMG160=y
CONFIG_BMG160_I2C=y
CONFIG_BMG160_SPI=y
CONFIG_IIO_ST_GYRO_3AXIS=m
CONFIG_IIO_ST_GYRO_I2C_3AXIS=m
CONFIG_IIO_ST_GYRO_SPI_3AXIS=m
CONFIG_ITG3200=m

#
# Health Sensors
#

#
# Heart Rate Monitors
#
CONFIG_AFE4403=m
CONFIG_AFE4404=m
CONFIG_MAX30100=m

#
# Humidity sensors
#
# CONFIG_AM2315 is not set
CONFIG_DHT11=y
CONFIG_HDC100X=y
CONFIG_HTS221=y
CONFIG_HTS221_I2C=y
CONFIG_HTS221_SPI=y
CONFIG_HTU21=y
CONFIG_SI7005=m
CONFIG_SI7020=m

#
# Inertial measurement units
#
CONFIG_ADIS16400=y
# CONFIG_ADIS16480 is not set
# CONFIG_BMI160_I2C is not set
# CONFIG_BMI160_SPI is not set
CONFIG_KMX61=y
CONFIG_INV_MPU6050_IIO=m
CONFIG_INV_MPU6050_I2C=m
CONFIG_INV_MPU6050_SPI=m
CONFIG_IIO_ADIS_LIB=y
CONFIG_IIO_ADIS_LIB_BUFFER=y

#
# Light sensors
#
# CONFIG_ACPI_ALS is not set
CONFIG_ADJD_S311=m
CONFIG_AL3320A=m
# CONFIG_APDS9300 is not set
# CONFIG_APDS9960 is not set
CONFIG_BH1750=m
# CONFIG_BH1780 is not set
CONFIG_CM32181=m
# CONFIG_CM3232 is not set
CONFIG_CM3323=y
CONFIG_CM36651=m
CONFIG_GP2AP020A00F=y
CONFIG_SENSORS_ISL29018=y
CONFIG_ISL29125=y
CONFIG_JSA1212=m
# CONFIG_RPR0521 is not set
CONFIG_LTR501=m
CONFIG_MAX44000=m
CONFIG_OPT3001=y
CONFIG_PA12203001=y
CONFIG_SI1145=y
CONFIG_STK3310=m
# CONFIG_TCS3414 is not set
CONFIG_TCS3472=y
CONFIG_SENSORS_TSL2563=y
CONFIG_TSL2583=y
# CONFIG_TSL4531 is not set
CONFIG_US5182D=m
CONFIG_VCNL4000=y
# CONFIG_VEML6070 is not set

#
# Magnetometer sensors
#
CONFIG_AK8974=y
CONFIG_AK8975=m
CONFIG_AK09911=m
CONFIG_BMC150_MAGN=y
CONFIG_BMC150_MAGN_I2C=m
CONFIG_BMC150_MAGN_SPI=y
CONFIG_MAG3110=m
CONFIG_MMC35240=m
CONFIG_IIO_ST_MAGN_3AXIS=y
CONFIG_IIO_ST_MAGN_I2C_3AXIS=y
CONFIG_IIO_ST_MAGN_SPI_3AXIS=y
# CONFIG_SENSORS_HMC5843_I2C is not set
# CONFIG_SENSORS_HMC5843_SPI is not set

#
# Inclinometer sensors
#

#
# Triggers - standalone
#
# CONFIG_IIO_INTERRUPT_TRIGGER is not set
CONFIG_IIO_SYSFS_TRIGGER=m

#
# Digital potentiometers
#
CONFIG_DS1803=m
CONFIG_MAX5487=y
# CONFIG_MCP4131 is not set
# CONFIG_MCP4531 is not set
CONFIG_TPL0102=m

#
# Digital potentiostats
#
CONFIG_LMP91000=m

#
# Pressure sensors
#
# CONFIG_ABP060MG is not set
CONFIG_BMP280=y
CONFIG_BMP280_I2C=y
CONFIG_BMP280_SPI=y
# CONFIG_HP03 is not set
CONFIG_MPL115=m
CONFIG_MPL115_I2C=m
CONFIG_MPL115_SPI=m
CONFIG_MPL3115=m
# CONFIG_MS5611 is not set
# CONFIG_MS5637 is not set
CONFIG_IIO_ST_PRESS=y
CONFIG_IIO_ST_PRESS_I2C=y
CONFIG_IIO_ST_PRESS_SPI=y
# CONFIG_T5403 is not set
# CONFIG_HP206C is not set
CONFIG_ZPA2326=y
CONFIG_ZPA2326_I2C=y
CONFIG_ZPA2326_SPI=y

#
# Lightning sensors
#
# CONFIG_AS3935 is not set

#
# Proximity sensors
#
CONFIG_LIDAR_LITE_V2=y
# CONFIG_SX9500 is not set

#
# Temperature sensors
#
CONFIG_MAXIM_THERMOCOUPLE=y
CONFIG_MLX90614=m
# CONFIG_TMP006 is not set
CONFIG_TSYS01=m
CONFIG_TSYS02D=m
# CONFIG_NTB is not set
# CONFIG_VME_BUS is not set
CONFIG_PWM=y
CONFIG_PWM_SYSFS=y
# CONFIG_PWM_CRC is not set
CONFIG_PWM_CROS_EC=m
CONFIG_PWM_FSL_FTM=y
# CONFIG_PWM_LPSS_PCI is not set
# CONFIG_PWM_LPSS_PLATFORM is not set
CONFIG_PWM_PCA9685=y
# CONFIG_PWM_TWL is not set
CONFIG_PWM_TWL_LED=m
CONFIG_IRQCHIP=y
CONFIG_ARM_GIC_MAX_NR=1
CONFIG_IPACK_BUS=m
# CONFIG_BOARD_TPCI200 is not set
CONFIG_SERIAL_IPOCTAL=m
CONFIG_RESET_CONTROLLER=y
# CONFIG_RESET_ATH79 is not set
# CONFIG_RESET_BERLIN is not set
# CONFIG_RESET_LPC18XX is not set
# CONFIG_RESET_MESON is not set
# CONFIG_RESET_PISTACHIO is not set
# CONFIG_RESET_SOCFPGA is not set
# CONFIG_RESET_STM32 is not set
# CONFIG_RESET_SUNXI is not set
# CONFIG_TI_SYSCON_RESET is not set
# CONFIG_RESET_ZYNQ is not set
# CONFIG_RESET_TEGRA_BPMP is not set
# CONFIG_FMC is not set

#
# PHY Subsystem
#
CONFIG_GENERIC_PHY=y
# CONFIG_PHY_PXA_28NM_HSIC is not set
CONFIG_PHY_PXA_28NM_USB2=y
CONFIG_BCM_KONA_USB2_PHY=y
CONFIG_POWERCAP=y
CONFIG_INTEL_RAPL=y
# CONFIG_MCB is not set

#
# Performance monitor support
#
CONFIG_RAS=y
# CONFIG_THUNDERBOLT is not set

#
# Android
#
# CONFIG_ANDROID is not set
CONFIG_DEV_DAX=m
CONFIG_NR_DEV_DAX=32768
CONFIG_NVMEM=y
# CONFIG_STM is not set
CONFIG_INTEL_TH=m
# CONFIG_INTEL_TH_PCI is not set
# CONFIG_INTEL_TH_GTH is not set
# CONFIG_INTEL_TH_MSU is not set
CONFIG_INTEL_TH_PTI=m
# CONFIG_INTEL_TH_DEBUG is not set

#
# FPGA Configuration Support
#
CONFIG_FPGA=y
CONFIG_FPGA_REGION=m
CONFIG_FPGA_BRIDGE=m

#
# Firmware Drivers
#
# CONFIG_EDD is not set
# CONFIG_FIRMWARE_MEMMAP is not set
CONFIG_DELL_RBU=y
# CONFIG_DCDBAS is not set
# CONFIG_ISCSI_IBFT_FIND is not set
# CONFIG_FW_CFG_SYSFS is not set
# CONFIG_GOOGLE_FIRMWARE is not set
# CONFIG_EFI_DEV_PATH_PARSER is not set

#
# Tegra firmware driver
#

#
# File systems
#
CONFIG_DCACHE_WORD_ACCESS=y
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
# CONFIG_EXPORTFS_BLOCK_OPS is not set
CONFIG_FILE_LOCKING=y
CONFIG_MANDATORY_FILE_LOCKING=y
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_FANOTIFY=y
CONFIG_QUOTA=y
# CONFIG_QUOTA_NETLINK_INTERFACE is not set
# CONFIG_PRINT_QUOTA_WARNING is not set
CONFIG_QUOTA_DEBUG=y
CONFIG_QFMT_V1=m
# CONFIG_QFMT_V2 is not set
CONFIG_QUOTACTL=y
CONFIG_AUTOFS4_FS=y
CONFIG_FUSE_FS=m
CONFIG_CUSE=m
CONFIG_OVERLAY_FS=m
CONFIG_OVERLAY_FS_REDIRECT_DIR=y

#
# Caches
#
CONFIG_FSCACHE=y
# CONFIG_FSCACHE_STATS is not set
# CONFIG_FSCACHE_HISTOGRAM is not set
CONFIG_FSCACHE_DEBUG=y
# CONFIG_FSCACHE_OBJECT_LIST is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
# CONFIG_PROC_KCORE is not set
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_PROC_CHILDREN=y
CONFIG_KERNFS=y
CONFIG_SYSFS=y
# CONFIG_HUGETLBFS is not set
# CONFIG_HUGETLB_PAGE is not set
CONFIG_CONFIGFS_FS=y
CONFIG_MISC_FILESYSTEMS=y
CONFIG_ORANGEFS_FS=m
CONFIG_ECRYPT_FS=m
# CONFIG_ECRYPT_FS_MESSAGING is not set
# CONFIG_JFFS2_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_PSTORE is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_CODEPAGE_737=m
CONFIG_NLS_CODEPAGE_775=y
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
CONFIG_NLS_CODEPAGE_857=m
CONFIG_NLS_CODEPAGE_860=m
# CONFIG_NLS_CODEPAGE_861 is not set
CONFIG_NLS_CODEPAGE_862=m
# CONFIG_NLS_CODEPAGE_863 is not set
CONFIG_NLS_CODEPAGE_864=y
CONFIG_NLS_CODEPAGE_865=m
# CONFIG_NLS_CODEPAGE_866 is not set
CONFIG_NLS_CODEPAGE_869=m
CONFIG_NLS_CODEPAGE_936=y
CONFIG_NLS_CODEPAGE_950=y
CONFIG_NLS_CODEPAGE_932=y
# CONFIG_NLS_CODEPAGE_949 is not set
CONFIG_NLS_CODEPAGE_874=m
# CONFIG_NLS_ISO8859_8 is not set
CONFIG_NLS_CODEPAGE_1250=m
CONFIG_NLS_CODEPAGE_1251=y
CONFIG_NLS_ASCII=m
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_ISO8859_2=m
CONFIG_NLS_ISO8859_3=m
CONFIG_NLS_ISO8859_4=y
CONFIG_NLS_ISO8859_5=y
# CONFIG_NLS_ISO8859_6 is not set
CONFIG_NLS_ISO8859_7=m
CONFIG_NLS_ISO8859_9=m
CONFIG_NLS_ISO8859_13=m
CONFIG_NLS_ISO8859_14=y
CONFIG_NLS_ISO8859_15=m
CONFIG_NLS_KOI8_R=y
CONFIG_NLS_KOI8_U=y
# CONFIG_NLS_MAC_ROMAN is not set
CONFIG_NLS_MAC_CELTIC=y
CONFIG_NLS_MAC_CENTEURO=m
CONFIG_NLS_MAC_CROATIAN=m
# CONFIG_NLS_MAC_CYRILLIC is not set
# CONFIG_NLS_MAC_GAELIC is not set
CONFIG_NLS_MAC_GREEK=y
CONFIG_NLS_MAC_ICELAND=m
CONFIG_NLS_MAC_INUIT=m
CONFIG_NLS_MAC_ROMANIAN=y
CONFIG_NLS_MAC_TURKISH=y
CONFIG_NLS_UTF8=m

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y

#
# printk and dmesg options
#
CONFIG_PRINTK_TIME=y
CONFIG_CONSOLE_LOGLEVEL_DEFAULT=7
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
# CONFIG_DEBUG_SYNCHRO_TEST is not set
# CONFIG_BOOT_PRINTK_DELAY is not set
# CONFIG_DYNAMIC_DEBUG is not set

#
# Compile-time checks and compiler options
#
# CONFIG_DEBUG_INFO is not set
CONFIG_ENABLE_WARN_DEPRECATED=y
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=1024
# CONFIG_STRIP_ASM_SYMS is not set
# CONFIG_READABLE_ASM is not set
# CONFIG_UNUSED_SYMBOLS is not set
CONFIG_PAGE_OWNER=y
CONFIG_DEBUG_FS=y
CONFIG_HEADERS_CHECK=y
# CONFIG_DEBUG_SECTION_MISMATCH is not set
CONFIG_SECTION_MISMATCH_WARN_ONLY=y
CONFIG_ARCH_WANT_FRAME_POINTERS=y
CONFIG_FRAME_POINTER=y
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
CONFIG_DEBUG_KERNEL=y

#
# Memory Debugging
#
CONFIG_PAGE_EXTENSION=y
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
# CONFIG_PAGE_POISONING is not set
CONFIG_DEBUG_PAGE_REF=y
# CONFIG_DEBUG_OBJECTS is not set
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_SLAB_LEAK=y
CONFIG_HAVE_DEBUG_KMEMLEAK=y
# CONFIG_DEBUG_KMEMLEAK is not set
CONFIG_DEBUG_STACK_USAGE=y
CONFIG_DEBUG_VM=y
# CONFIG_DEBUG_VM_VMACACHE is not set
# CONFIG_DEBUG_VM_RB is not set
CONFIG_DEBUG_VM_PGFLAGS=y
# CONFIG_DEBUG_VIRTUAL is not set
# CONFIG_DEBUG_MEMORY_INIT is not set
CONFIG_HAVE_DEBUG_STACKOVERFLOW=y
# CONFIG_DEBUG_STACKOVERFLOW is not set
CONFIG_HAVE_ARCH_KMEMCHECK=y
# CONFIG_DEBUG_SHIRQ is not set

#
# Debug Lockups and Hangs
#
# CONFIG_LOCKUP_DETECTOR is not set
CONFIG_DETECT_HUNG_TASK=y
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=120
# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0
# CONFIG_WQ_WATCHDOG is not set
# CONFIG_PANIC_ON_OOPS is not set
CONFIG_PANIC_ON_OOPS_VALUE=0
CONFIG_PANIC_TIMEOUT=0
CONFIG_SCHED_DEBUG=y
# CONFIG_SCHED_INFO is not set
# CONFIG_SCHEDSTATS is not set
# CONFIG_SCHED_STACK_END_CHECK is not set
CONFIG_DEBUG_TIMEKEEPING=y
# CONFIG_TIMER_STATS is not set

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
# CONFIG_DEBUG_WW_MUTEX_SLOWPATH is not set
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_LOCKDEP is not set
CONFIG_DEBUG_ATOMIC_SLEEP=y
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_LOCK_TORTURE_TEST is not set
CONFIG_TRACE_IRQFLAGS=y
CONFIG_STACKTRACE=y
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_BUGVERBOSE=y
# CONFIG_DEBUG_LIST is not set
# CONFIG_DEBUG_PI_LIST is not set
CONFIG_DEBUG_SG=y
CONFIG_DEBUG_NOTIFIERS=y
CONFIG_DEBUG_CREDENTIALS=y

#
# RCU Debugging
#
CONFIG_PROVE_RCU=y
CONFIG_PROVE_RCU_REPEATEDLY=y
CONFIG_SPARSE_RCU_POINTER=y
CONFIG_TORTURE_TEST=y
CONFIG_RCU_PERF_TEST=y
CONFIG_RCU_TORTURE_TEST=m
# CONFIG_RCU_TORTURE_TEST_SLOW_PREINIT is not set
CONFIG_RCU_TORTURE_TEST_SLOW_INIT=y
CONFIG_RCU_TORTURE_TEST_SLOW_INIT_DELAY=3
# CONFIG_RCU_TORTURE_TEST_SLOW_CLEANUP is not set
# CONFIG_RCU_TRACE is not set
# CONFIG_RCU_EQS_DEBUG is not set
# CONFIG_DEBUG_WQ_FORCE_RR_CPU is not set
CONFIG_NOTIFIER_ERROR_INJECTION=y
CONFIG_PM_NOTIFIER_ERROR_INJECT=m
# CONFIG_NETDEV_NOTIFIER_ERROR_INJECT is not set
CONFIG_FAULT_INJECTION=y
# CONFIG_FAILSLAB is not set
# CONFIG_FAIL_PAGE_ALLOC is not set
CONFIG_FAIL_FUTEX=y
# CONFIG_FAULT_INJECTION_DEBUG_FS is not set
# CONFIG_LATENCYTOP is not set
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_C_RECORDMCOUNT=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
CONFIG_FUNCTION_TRACER=y
# CONFIG_IRQSOFF_TRACER is not set
# CONFIG_SCHED_TRACER is not set
CONFIG_HWLAT_TRACER=y
CONFIG_FTRACE_SYSCALLS=y
CONFIG_TRACER_SNAPSHOT=y
# CONFIG_TRACER_SNAPSHOT_PER_CPU_SWAP is not set
CONFIG_TRACE_BRANCH_PROFILING=y
# CONFIG_BRANCH_PROFILE_NONE is not set
CONFIG_PROFILE_ANNOTATED_BRANCHES=y
# CONFIG_PROFILE_ALL_BRANCHES is not set
CONFIG_TRACING_BRANCHES=y
CONFIG_BRANCH_TRACER=y
CONFIG_STACK_TRACER=y
CONFIG_KPROBE_EVENT=y
CONFIG_UPROBE_EVENT=y
CONFIG_PROBE_EVENTS=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
# CONFIG_FUNCTION_PROFILER is not set
CONFIG_FTRACE_MCOUNT_RECORD=y
# CONFIG_FTRACE_STARTUP_TEST is not set
# CONFIG_MMIOTRACE is not set
# CONFIG_HIST_TRIGGERS is not set
CONFIG_TRACEPOINT_BENCHMARK=y
CONFIG_RING_BUFFER_BENCHMARK=y
# CONFIG_RING_BUFFER_STARTUP_TEST is not set
CONFIG_TRACE_ENUM_MAP_FILE=y
CONFIG_TRACING_EVENTS_GPIO=y

#
# Runtime Testing
#
CONFIG_TEST_LIST_SORT=y
CONFIG_KPROBES_SANITY_TEST=y
# CONFIG_BACKTRACE_SELF_TEST is not set
CONFIG_RBTREE_TEST=m
# CONFIG_INTERVAL_TREE_TEST is not set
CONFIG_PERCPU_TEST=m
CONFIG_ATOMIC64_SELFTEST=y
CONFIG_TEST_HEXDUMP=m
# CONFIG_TEST_STRING_HELPERS is not set
CONFIG_TEST_KSTRTOX=y
CONFIG_TEST_PRINTF=y
# CONFIG_TEST_BITMAP is not set
CONFIG_TEST_UUID=m
CONFIG_TEST_RHASHTABLE=m
CONFIG_TEST_HASH=y
# CONFIG_PROVIDE_OHCI1394_DMA_INIT is not set
# CONFIG_DMA_API_DEBUG is not set
CONFIG_TEST_LKM=m
# CONFIG_TEST_USER_COPY is not set
# CONFIG_TEST_BPF is not set
# CONFIG_TEST_FIRMWARE is not set
# CONFIG_TEST_UDELAY is not set
CONFIG_MEMTEST=y
# CONFIG_TEST_STATIC_KEYS is not set
# CONFIG_BUG_ON_DATA_CORRUPTION is not set
# CONFIG_SAMPLES is not set
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y
# CONFIG_ARCH_WANTS_UBSAN_NO_NULL is not set
# CONFIG_UBSAN is not set
CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y
# CONFIG_STRICT_DEVMEM is not set
CONFIG_X86_VERBOSE_BOOTUP=y
# CONFIG_EARLY_PRINTK is not set
CONFIG_X86_PTDUMP_CORE=y
CONFIG_X86_PTDUMP=m
# CONFIG_DEBUG_RODATA_TEST is not set
# CONFIG_DEBUG_WX is not set
CONFIG_DEBUG_SET_MODULE_RONX=y
# CONFIG_DEBUG_NX_TEST is not set
CONFIG_DOUBLEFAULT=y
CONFIG_DEBUG_TLBFLUSH=y
CONFIG_IOMMU_STRESS=y
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
# CONFIG_X86_DECODER_SELFTEST is not set
CONFIG_IO_DELAY_TYPE_0X80=0
CONFIG_IO_DELAY_TYPE_0XED=1
CONFIG_IO_DELAY_TYPE_UDELAY=2
CONFIG_IO_DELAY_TYPE_NONE=3
# CONFIG_IO_DELAY_0X80 is not set
# CONFIG_IO_DELAY_0XED is not set
CONFIG_IO_DELAY_UDELAY=y
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEFAULT_IO_DELAY_TYPE=2
CONFIG_DEBUG_BOOT_PARAMS=y
# CONFIG_CPA_DEBUG is not set
# CONFIG_OPTIMIZE_INLINING is not set
# CONFIG_DEBUG_ENTRY is not set
# CONFIG_DEBUG_NMI_SELFTEST is not set
# CONFIG_X86_DEBUG_FPU is not set
CONFIG_PUNIT_ATOM_DEBUG=y

#
# Security options
#
CONFIG_KEYS=y
CONFIG_PERSISTENT_KEYRINGS=y
# CONFIG_TRUSTED_KEYS is not set
CONFIG_ENCRYPTED_KEYS=m
# CONFIG_KEY_DH_OPERATIONS is not set
# CONFIG_SECURITY_DMESG_RESTRICT is not set
# CONFIG_SECURITY is not set
CONFIG_SECURITYFS=y
CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR=y
CONFIG_HAVE_ARCH_HARDENED_USERCOPY=y
CONFIG_HARDENED_USERCOPY=y
# CONFIG_HARDENED_USERCOPY_PAGESPAN is not set
CONFIG_DEFAULT_SECURITY_DAC=y
CONFIG_DEFAULT_SECURITY=""
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER=y
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_RNG_DEFAULT=y
CONFIG_CRYPTO_AKCIPHER2=y
CONFIG_CRYPTO_AKCIPHER=m
CONFIG_CRYPTO_KPP2=y
CONFIG_CRYPTO_KPP=m
CONFIG_CRYPTO_ACOMP2=y
CONFIG_CRYPTO_RSA=m
CONFIG_CRYPTO_DH=m
# CONFIG_CRYPTO_ECDH is not set
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
# CONFIG_CRYPTO_USER is not set
CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
CONFIG_CRYPTO_GF128MUL=y
CONFIG_CRYPTO_NULL=y
CONFIG_CRYPTO_NULL2=y
CONFIG_CRYPTO_WORKQUEUE=y
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_MCRYPTD=m
# CONFIG_CRYPTO_AUTHENC is not set
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_ABLK_HELPER=m
CONFIG_CRYPTO_SIMD=m
CONFIG_CRYPTO_GLUE_HELPER_X86=m

#
# Authenticated Encryption with Associated Data
#
# CONFIG_CRYPTO_CCM is not set
# CONFIG_CRYPTO_GCM is not set
# CONFIG_CRYPTO_CHACHA20POLY1305 is not set
CONFIG_CRYPTO_SEQIV=y
CONFIG_CRYPTO_ECHAINIV=y

#
# Block modes
#
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_CTR=y
# CONFIG_CRYPTO_CTS is not set
CONFIG_CRYPTO_ECB=y
CONFIG_CRYPTO_LRW=y
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_XTS=y
CONFIG_CRYPTO_KEYWRAP=m

#
# Hash modes
#
# CONFIG_CRYPTO_CMAC is not set
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=y
# CONFIG_CRYPTO_VMAC is not set

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
# CONFIG_CRYPTO_CRC32C_INTEL is not set
CONFIG_CRYPTO_CRC32=m
# CONFIG_CRYPTO_CRC32_PCLMUL is not set
CONFIG_CRYPTO_CRCT10DIF=y
# CONFIG_CRYPTO_GHASH is not set
CONFIG_CRYPTO_POLY1305=y
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=y
# CONFIG_CRYPTO_MICHAEL_MIC is not set
CONFIG_CRYPTO_RMD128=m
# CONFIG_CRYPTO_RMD160 is not set
CONFIG_CRYPTO_RMD256=m
# CONFIG_CRYPTO_RMD320 is not set
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=m
# CONFIG_CRYPTO_SHA3 is not set
CONFIG_CRYPTO_TGR192=y
# CONFIG_CRYPTO_WP512 is not set

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_AES_586=y
CONFIG_CRYPTO_AES_NI_INTEL=m
CONFIG_CRYPTO_ANUBIS=y
# CONFIG_CRYPTO_ARC4 is not set
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_BLOWFISH_COMMON=m
CONFIG_CRYPTO_CAMELLIA=y
CONFIG_CRYPTO_CAST_COMMON=y
CONFIG_CRYPTO_CAST5=y
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_DES=y
CONFIG_CRYPTO_FCRYPT=m
# CONFIG_CRYPTO_KHAZAD is not set
CONFIG_CRYPTO_SALSA20=y
CONFIG_CRYPTO_SALSA20_586=y
# CONFIG_CRYPTO_CHACHA20 is not set
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_SERPENT_SSE2_586=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=y
CONFIG_CRYPTO_TWOFISH_COMMON=y
# CONFIG_CRYPTO_TWOFISH_586 is not set

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=y
# CONFIG_CRYPTO_LZO is not set
# CONFIG_CRYPTO_842 is not set
# CONFIG_CRYPTO_LZ4 is not set
CONFIG_CRYPTO_LZ4HC=y

#
# Random Number Generation
#
CONFIG_CRYPTO_ANSI_CPRNG=y
CONFIG_CRYPTO_DRBG_MENU=y
CONFIG_CRYPTO_DRBG_HMAC=y
CONFIG_CRYPTO_DRBG_HASH=y
# CONFIG_CRYPTO_DRBG_CTR is not set
CONFIG_CRYPTO_DRBG=y
CONFIG_CRYPTO_JITTERENTROPY=y
# CONFIG_CRYPTO_USER_API_HASH is not set
# CONFIG_CRYPTO_USER_API_SKCIPHER is not set
# CONFIG_CRYPTO_USER_API_RNG is not set
# CONFIG_CRYPTO_USER_API_AEAD is not set
CONFIG_CRYPTO_HW=y
# CONFIG_CRYPTO_DEV_PADLOCK is not set
# CONFIG_CRYPTO_DEV_GEODE is not set
# CONFIG_CRYPTO_DEV_HIFN_795X is not set
# CONFIG_CRYPTO_DEV_FSL_CAAM_CRYPTO_API_DESC is not set
# CONFIG_CRYPTO_DEV_CCP is not set
# CONFIG_CRYPTO_DEV_QAT_DH895xCC is not set
# CONFIG_CRYPTO_DEV_QAT_C3XXX is not set
# CONFIG_CRYPTO_DEV_QAT_C62X is not set
# CONFIG_CRYPTO_DEV_QAT_DH895xCCVF is not set
# CONFIG_CRYPTO_DEV_QAT_C3XXXVF is not set
# CONFIG_CRYPTO_DEV_QAT_C62XVF is not set
CONFIG_ASYMMETRIC_KEY_TYPE=y
# CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE is not set

#
# Certificates for signature checking
#
CONFIG_SYSTEM_TRUSTED_KEYRING=y
CONFIG_SYSTEM_TRUSTED_KEYS=""
CONFIG_SYSTEM_EXTRA_CERTIFICATE=y
CONFIG_SYSTEM_EXTRA_CERTIFICATE_SIZE=4096
CONFIG_SECONDARY_TRUSTED_KEYRING=y
# CONFIG_SYSTEM_BLACKLIST_KEYRING is not set
CONFIG_HAVE_KVM=y
# CONFIG_VIRTUALIZATION is not set
CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_BITREVERSE=y
# CONFIG_HAVE_ARCH_BITREVERSE is not set
CONFIG_RATIONAL=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
CONFIG_GENERIC_NET_UTILS=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_PCI_IOMAP=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_IO=y
CONFIG_ARCH_HAS_FAST_MULTIPLIER=y
CONFIG_CRC_CCITT=y
CONFIG_CRC16=y
# CONFIG_CRC_T10DIF is not set
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
CONFIG_CRC32_SELFTEST=y
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
CONFIG_CRC7=y
CONFIG_LIBCRC32C=y
CONFIG_CRC8=y
# CONFIG_CRC64_ECMA is not set
# CONFIG_AUDIT_ARCH_COMPAT_GENERIC is not set
CONFIG_RANDOM32_SELFTEST=y
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_LZ4HC_COMPRESS=y
CONFIG_LZ4_DECOMPRESS=y
CONFIG_XZ_DEC=y
CONFIG_XZ_DEC_X86=y
# CONFIG_XZ_DEC_POWERPC is not set
CONFIG_XZ_DEC_IA64=y
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
CONFIG_XZ_DEC_BCJ=y
# CONFIG_XZ_DEC_TEST is not set
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_DECOMPRESS_LZO=y
CONFIG_GENERIC_ALLOCATOR=y
CONFIG_BCH=m
CONFIG_BCH_CONST_PARAMS=y
CONFIG_RADIX_TREE_MULTIORDER=y
CONFIG_ASSOCIATIVE_ARRAY=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT_MAP=y
CONFIG_HAS_DMA=y
CONFIG_DQL=y
CONFIG_GLOB=y
# CONFIG_GLOB_SELFTEST is not set
CONFIG_NLATTR=y
CONFIG_CLZ_TAB=y
CONFIG_CORDIC=y
CONFIG_DDR=y
CONFIG_IRQ_POLL=y
CONFIG_MPILIB=m
# CONFIG_SG_SPLIT is not set
# CONFIG_SG_POOL is not set
CONFIG_ARCH_HAS_SG_CHAIN=y
CONFIG_ARCH_HAS_MMIO_FLUSH=y
CONFIG_STACKDEPOT=y

[-- Attachment #3: job-script --]
[-- Type: text/plain, Size: 3957 bytes --]

#!/bin/sh

export_top_env()
{
	export suite='boot'
	export testcase='boot'
	export timeout='10m'
	export job_origin='/lkp/lkp/src/jobs/boot.yaml'
	export queue='reconfirm'
	export testbox='vm-lkp-wsx03-quantal-i386-30'
	export tbox_group='vm-lkp-wsx03-quantal-i386'
	export branch='linux-review/Michal-Hocko/mm-memcg-fix-Re-OOM-Better-but-still-there-on/20161223-225057'
	export commit='d18e2b2aca0396849f588241e134787a829c707d'
	export kconfig='i386-randconfig-n0-201651'
	export submit_id='585d6e790b9a93563c5c6010'
	export job_file='/lkp/scheduled/vm-lkp-wsx03-quantal-i386-30/boot-1-quantal-core-i386.cgz-d18e2b2aca0396849f588241e134787a829c707d-20161224-87612-9pvix6-7.yaml'
	export id='62a8ab5c8e86e747b4768d8c65bb02a10f00dcbe'
	export model='qemu-system-i386 -enable-kvm'
	export nr_vm=32
	export nr_cpu=1
	export memory='360M'
	export rootfs='quantal-core-i386.cgz'
	export need_kconfig='CONFIG_KVM_GUEST=y'
	export compiler='gcc-6'
	export enqueue_time='2016-12-24 02:35:37 +0800'
	export _id='585d6e790b9a93563c5c6017'
	export user='lkp'
	export result_root='/result/boot/1/vm-lkp-wsx03-quantal-i386/quantal-core-i386.cgz/i386-randconfig-n0-201651/gcc-6/d18e2b2aca0396849f588241e134787a829c707d/9'
	export LKP_SERVER='inn'
	export max_uptime=600
	export initrd='/osimage/quantal/quantal-core-i386.cgz'
	export bootloader_append='root=/dev/ram0
user=lkp
job=/lkp/scheduled/vm-lkp-wsx03-quantal-i386-30/boot-1-quantal-core-i386.cgz-d18e2b2aca0396849f588241e134787a829c707d-20161224-87612-9pvix6-7.yaml
ARCH=i386
kconfig=i386-randconfig-n0-201651
branch=linux-review/Michal-Hocko/mm-memcg-fix-Re-OOM-Better-but-still-there-on/20161223-225057
commit=d18e2b2aca0396849f588241e134787a829c707d
BOOT_IMAGE=/pkg/linux/i386-randconfig-n0-201651/gcc-6/d18e2b2aca0396849f588241e134787a829c707d/vmlinuz-4.9.0-mm1-00095-gd18e2b2
max_uptime=600
RESULT_ROOT=/result/boot/1/vm-lkp-wsx03-quantal-i386/quantal-core-i386.cgz/i386-randconfig-n0-201651/gcc-6/d18e2b2aca0396849f588241e134787a829c707d/9
LKP_SERVER=inn
debug
apic=debug
sysrq_always_enabled
rcupdate.rcu_cpu_stall_timeout=100
net.ifnames=0
printk.devkmsg=on
panic=-1
softlockup_panic=1
nmi_watchdog=panic
oops=panic
load_ramdisk=2
prompt_ramdisk=0
systemd.log_level=err
ignore_loglevel
earlyprintk=ttyS0,115200
console=ttyS0,115200
console=tty0
vga=normal
rw'
	export lkp_initrd='/lkp/lkp/lkp-i386.cgz'
	export modules_initrd='/pkg/linux/i386-randconfig-n0-201651/gcc-6/d18e2b2aca0396849f588241e134787a829c707d/modules.cgz'
	export site='inn'
	export LKP_CGI_PORT=80
	export LKP_CIFS_PORT=139
	export kernel='/pkg/linux/i386-randconfig-n0-201651/gcc-6/d18e2b2aca0396849f588241e134787a829c707d/vmlinuz-4.9.0-mm1-00095-gd18e2b2'
	export dequeue_time='2016-12-24 02:40:40 +0800'
	export job_initrd='/lkp/scheduled/vm-lkp-wsx03-quantal-i386-30/boot-1-quantal-core-i386.cgz-d18e2b2aca0396849f588241e134787a829c707d-20161224-87612-9pvix6-7.cgz'

	[ -n "$LKP_SRC" ] ||
	export LKP_SRC=/lkp/${user:-lkp}/src
}

run_job()
{
	echo $$ > $TMP/run-job.pid

	. $LKP_SRC/lib/http.sh
	. $LKP_SRC/lib/job.sh
	. $LKP_SRC/lib/env.sh

	export_top_env

	run_monitor $LKP_SRC/monitors/one-shot/wrapper boot-slabinfo
	run_monitor $LKP_SRC/monitors/one-shot/wrapper boot-meminfo
	run_monitor $LKP_SRC/monitors/one-shot/wrapper memmap
	run_monitor $LKP_SRC/monitors/wrapper kmsg
	run_monitor $LKP_SRC/monitors/wrapper oom-killer
	run_monitor $LKP_SRC/monitors/plain/watchdog
	run_monitor $LKP_SRC/monitors/wrapper nfs-hang

	run_test $LKP_SRC/tests/wrapper sleep 1
}

extract_stats()
{
	$LKP_SRC/stats/wrapper boot-slabinfo
	$LKP_SRC/stats/wrapper boot-meminfo
	$LKP_SRC/stats/wrapper memmap
	$LKP_SRC/stats/wrapper boot-memory
	$LKP_SRC/stats/wrapper boot-time
	$LKP_SRC/stats/wrapper kernel-size
	$LKP_SRC/stats/wrapper kmsg

	$LKP_SRC/stats/wrapper time sleep.time
	$LKP_SRC/stats/wrapper time
	$LKP_SRC/stats/wrapper dmesg
	$LKP_SRC/stats/wrapper kmsg
	$LKP_SRC/stats/wrapper stderr
	$LKP_SRC/stats/wrapper last_state
}

"$@"

[-- Attachment #4: dmesg.xz --]
[-- Type: application/octet-stream, Size: 14816 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [lkp-developer] [mm, memcg]  d18e2b2aca: WARNING:at_mm/memcontrol.c:#mem_cgroup_update_lru_size
  2016-12-25 22:25                                   ` [lkp-developer] [mm, memcg] d18e2b2aca: WARNING:at_mm/memcontrol.c:#mem_cgroup_update_lru_size kernel test robot
@ 2016-12-26 12:26                                     ` Michal Hocko
  2016-12-26 12:50                                       ` Michal Hocko
  0 siblings, 1 reply; 62+ messages in thread
From: Michal Hocko @ 2016-12-26 12:26 UTC (permalink / raw)
  To: kernel test robot
  Cc: Nils Holland, Mel Gorman, Johannes Weiner, Vladimir Davydov,
	Tetsuo Handa, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs, lkp

On Mon 26-12-16 06:25:56, kernel test robot wrote:
> 
> FYI, we noticed the following commit:
> 
> commit: d18e2b2aca0396849f588241e134787a829c707d ("mm, memcg: fix (Re: OOM: Better, but still there on)")
> url: https://github.com/0day-ci/linux/commits/Michal-Hocko/mm-memcg-fix-Re-OOM-Better-but-still-there-on/20161223-225057
> base: git://git.cmpxchg.org/linux-mmotm.git master
> 
> in testcase: boot
> 
> on test machine: qemu-system-i386 -enable-kvm -m 360M
> 
> caused below changes:
> 
> 
> +--------------------------------------------------------+------------+------------+
> |                                                        | c7d85b880b | d18e2b2aca |
> +--------------------------------------------------------+------------+------------+
> | boot_successes                                         | 8          | 0          |
> | boot_failures                                          | 0          | 2          |
> | WARNING:at_mm/memcontrol.c:#mem_cgroup_update_lru_size | 0          | 2          |
> | kernel_BUG_at_mm/memcontrol.c                          | 0          | 2          |
> | invalid_opcode:#[##]DEBUG_PAGEALLOC                    | 0          | 2          |
> | Kernel_panic-not_syncing:Fatal_exception               | 0          | 2          |
> +--------------------------------------------------------+------------+------------+
> 
> 
> 
> [   95.226364] init: tty6 main process (990) killed by TERM signal
> [   95.314020] init: plymouth-upstart-bridge main process (1039) terminated with status 1
> [   97.588568] ------------[ cut here ]------------
> [   97.594364] WARNING: CPU: 0 PID: 1055 at mm/memcontrol.c:1032 mem_cgroup_update_lru_size+0xdd/0x12b
> [   97.606654] mem_cgroup_update_lru_size(40297f00, 0, -1): lru_size 1 but empty
> [   97.615140] Modules linked in:
> [   97.618834] CPU: 0 PID: 1055 Comm: killall5 Not tainted 4.9.0-mm1-00095-gd18e2b2 #82
> [   97.628008] Call Trace:
> [   97.631025]  dump_stack+0x16/0x18
> [   97.635107]  __warn+0xaf/0xc6
> [   97.638729]  ? mem_cgroup_update_lru_size+0xdd/0x12b

Do you have the full backtrace?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)
  2016-12-23 22:26                                   ` Nils Holland
@ 2016-12-26 12:48                                     ` Michal Hocko
  2016-12-26 18:57                                       ` Nils Holland
                                                         ` (3 more replies)
  0 siblings, 4 replies; 62+ messages in thread
From: Michal Hocko @ 2016-12-26 12:48 UTC (permalink / raw)
  To: Nils Holland
  Cc: Mel Gorman, Johannes Weiner, Vladimir Davydov, Tetsuo Handa,
	linux-kernel, linux-mm, Chris Mason, David Sterba, linux-btrfs

On Fri 23-12-16 23:26:00, Nils Holland wrote:
> On Fri, Dec 23, 2016 at 03:47:39PM +0100, Michal Hocko wrote:
> > 
> > Nils, even though this is still highly experimental, could you give it a
> > try please?
> 
> Yes, no problem! So I kept the very first patch you sent but had to
> revert the latest version of the debugging patch (the one in
> which you added the "mm_vmscan_inactive_list_is_low" event) because
> otherwise the patch you just sent wouldn't apply. Then I rebooted with
> memory cgroups enabled again, and the first thing that strikes the eye
> is that I get this during boot:
> 
> [    1.568174] ------------[ cut here ]------------
> [    1.568327] WARNING: CPU: 0 PID: 1 at mm/memcontrol.c:1032 mem_cgroup_update_lru_size+0x118/0x130
> [    1.568543] mem_cgroup_update_lru_size(f4406400, 2, 1): lru_size 0 but not empty

Ohh, I can see what is wrong! a) there is a bug in the accounting in
my patch (I double account) and b) the detection for the empty list
cannot work after my change because per node zone will not match per
zone statistics. The updated patch is below. So I hope my brain already
works after it's been mostly off last few days...
---
>From 397adf46917b2d9493180354a7b0182aee280a8b Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Fri, 23 Dec 2016 15:11:54 +0100
Subject: [PATCH] mm, memcg: fix the active list aging for lowmem requests when
 memcg is enabled

Nils Holland has reported unexpected OOM killer invocations with 32b
kernel starting with 4.8 kernels

	kworker/u4:5 invoked oom-killer: gfp_mask=0x2400840(GFP_NOFS|__GFP_NOFAIL), nodemask=0, order=0, oom_score_adj=0
	kworker/u4:5 cpuset=/ mems_allowed=0
	CPU: 1 PID: 2603 Comm: kworker/u4:5 Not tainted 4.9.0-gentoo #2
	[...]
	Mem-Info:
	active_anon:58685 inactive_anon:90 isolated_anon:0
	 active_file:274324 inactive_file:281962 isolated_file:0
	 unevictable:0 dirty:649 writeback:0 unstable:0
	 slab_reclaimable:40662 slab_unreclaimable:17754
	 mapped:7382 shmem:202 pagetables:351 bounce:0
	 free:206736 free_pcp:332 free_cma:0
	Node 0 active_anon:234740kB inactive_anon:360kB active_file:1097296kB inactive_file:1127848kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:29528kB dirty:2596kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 184320kB anon_thp: 808kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
	DMA free:3952kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:7316kB inactive_file:0kB unevictable:0kB writepending:96kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:3200kB slab_unreclaimable:1408kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
	lowmem_reserve[]: 0 813 3474 3474
	Normal free:41332kB min:41368kB low:51708kB high:62048kB active_anon:0kB inactive_anon:0kB active_file:532748kB inactive_file:44kB unevictable:0kB writepending:24kB present:897016kB managed:836248kB mlocked:0kB slab_reclaimable:159448kB slab_unreclaimable:69608kB kernel_stack:1112kB pagetables:1404kB bounce:0kB free_pcp:528kB local_pcp:340kB free_cma:0kB
	lowmem_reserve[]: 0 0 21292 21292
	HighMem free:781660kB min:512kB low:34356kB high:68200kB active_anon:234740kB inactive_anon:360kB active_file:557232kB inactive_file:1127804kB unevictable:0kB writepending:2592kB present:2725384kB managed:2725384kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:800kB local_pcp:608kB free_cma:0kB

the oom killer is clearly pre-mature because there there is still a
lot of page cache in the zone Normal which should satisfy this lowmem
request. Further debugging has shown that the reclaim cannot make any
forward progress because the page cache is hidden in the active list
which doesn't get rotated because inactive_list_is_low is not memcg
aware.
It simply subtracts per-zone highmem counters from the respective
memcg's lru sizes which doesn't make any sense. We can simply end up
always seeing the resulting active and inactive counts 0 and return
false. This issue is not limited to 32b kernels but in practice the
effect on systems without CONFIG_HIGHMEM would be much harder to notice
because we do not invoke the OOM killer for allocations requests
targeting < ZONE_NORMAL.

Fix the issue by tracking per zone lru page counts in mem_cgroup_per_node
and subtract per-memcg highmem counts when memcg is enabled. Introduce
helper lruvec_zone_lru_size which redirects to either zone counters or
mem_cgroup_get_zone_lru_size when appropriate.

We are loosing empty LRU but non-zero lru size detection introduced by
ca707239e8a7 ("mm: update_lru_size warn and reset bad lru_size") because
of the inherent zone vs. node discrepancy.

Fixes: f8d1a31163fc ("mm: consider whether to decivate based on eligible zones inactive ratio")
Cc: stable # 4.8+
Reported-by: Nils Holland <nholland@tisys.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/memcontrol.h | 26 +++++++++++++++++++++++---
 include/linux/mm_inline.h  |  2 +-
 mm/memcontrol.c            | 18 ++++++++----------
 mm/vmscan.c                | 26 ++++++++++++++++----------
 4 files changed, 48 insertions(+), 24 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 61d20c17f3b7..002cb08b0f3e 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -120,7 +120,7 @@ struct mem_cgroup_reclaim_iter {
  */
 struct mem_cgroup_per_node {
 	struct lruvec		lruvec;
-	unsigned long		lru_size[NR_LRU_LISTS];
+	unsigned long		lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS];
 
 	struct mem_cgroup_reclaim_iter	iter[DEF_PRIORITY + 1];
 
@@ -432,7 +432,7 @@ static inline bool mem_cgroup_online(struct mem_cgroup *memcg)
 int mem_cgroup_select_victim_node(struct mem_cgroup *memcg);
 
 void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
-		int nr_pages);
+		int zid, int nr_pages);
 
 unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg,
 					   int nid, unsigned int lru_mask);
@@ -441,9 +441,23 @@ static inline
 unsigned long mem_cgroup_get_lru_size(struct lruvec *lruvec, enum lru_list lru)
 {
 	struct mem_cgroup_per_node *mz;
+	unsigned long nr_pages = 0;
+	int zid;
 
 	mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
-	return mz->lru_size[lru];
+	for (zid = 0; zid < MAX_NR_ZONES; zid++)
+		nr_pages += mz->lru_zone_size[zid][lru];
+	return nr_pages;
+}
+
+static inline
+unsigned long mem_cgroup_get_zone_lru_size(struct lruvec *lruvec, enum lru_list lru,
+					   int zone_idx)
+{
+	struct mem_cgroup_per_node *mz;
+
+	mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
+	return mz->lru_zone_size[zone_idx][lru];
 }
 
 void mem_cgroup_handle_over_high(void);
@@ -671,6 +685,12 @@ mem_cgroup_get_lru_size(struct lruvec *lruvec, enum lru_list lru)
 {
 	return 0;
 }
+static inline
+unsigned long mem_cgroup_get_zone_lru_size(struct lruvec *lruvec, enum lru_list lru,
+					   int zone_idx)
+{
+	return 0;
+}
 
 static inline unsigned long
 mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg,
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index 71613e8a720f..41d376e7116d 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -39,7 +39,7 @@ static __always_inline void update_lru_size(struct lruvec *lruvec,
 {
 	__update_lru_size(lruvec, lru, zid, nr_pages);
 #ifdef CONFIG_MEMCG
-	mem_cgroup_update_lru_size(lruvec, lru, nr_pages);
+	mem_cgroup_update_lru_size(lruvec, lru, zid, nr_pages);
 #endif
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 91dfc7c5ce8f..b59676026272 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -625,8 +625,8 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
 unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg,
 					   int nid, unsigned int lru_mask)
 {
+	struct lruvec *lruvec = mem_cgroup_lruvec(NODE_DATA(nid), memcg);
 	unsigned long nr = 0;
-	struct mem_cgroup_per_node *mz;
 	enum lru_list lru;
 
 	VM_BUG_ON((unsigned)nid >= nr_node_ids);
@@ -634,8 +634,7 @@ unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg,
 	for_each_lru(lru) {
 		if (!(BIT(lru) & lru_mask))
 			continue;
-		mz = mem_cgroup_nodeinfo(memcg, nid);
-		nr += mz->lru_size[lru];
+		nr += mem_cgroup_get_lru_size(lruvec, lru);
 	}
 	return nr;
 }
@@ -1002,6 +1001,7 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd
  * mem_cgroup_update_lru_size - account for adding or removing an lru page
  * @lruvec: mem_cgroup per zone lru vector
  * @lru: index of lru list the page is sitting on
+ * @zid: zone id of the accounted pages
  * @nr_pages: positive when adding or negative when removing
  *
  * This function must be called under lru_lock, just before a page is added
@@ -1009,27 +1009,25 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd
  * so as to allow it to check that lru_size 0 is consistent with list_empty).
  */
 void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
-				int nr_pages)
+				int zid, int nr_pages)
 {
 	struct mem_cgroup_per_node *mz;
 	unsigned long *lru_size;
 	long size;
-	bool empty;
 
 	if (mem_cgroup_disabled())
 		return;
 
 	mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
-	lru_size = mz->lru_size + lru;
-	empty = list_empty(lruvec->lists + lru);
+	lru_size = &mz->lru_zone_size[zid][lru];
 
 	if (nr_pages < 0)
 		*lru_size += nr_pages;
 
 	size = *lru_size;
-	if (WARN_ONCE(size < 0 || empty != !size,
-		"%s(%p, %d, %d): lru_size %ld but %sempty\n",
-		__func__, lruvec, lru, nr_pages, size, empty ? "" : "not ")) {
+	if (WARN_ONCE(size < 0,
+		"%s(%p, %d, %d): lru_size %ld\n",
+		__func__, lruvec, lru, nr_pages, size)) {
 		VM_BUG_ON(1);
 		*lru_size = 0;
 	}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c4abf08861d2..c98b1a585992 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -242,6 +242,15 @@ unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru)
 	return node_page_state(lruvec_pgdat(lruvec), NR_LRU_BASE + lru);
 }
 
+unsigned long lruvec_zone_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx)
+{
+	if (!mem_cgroup_disabled())
+		return mem_cgroup_get_zone_lru_size(lruvec, lru, zone_idx);
+
+	return zone_page_state(&lruvec_pgdat(lruvec)->node_zones[zone_idx],
+			       NR_ZONE_LRU_BASE + lru);
+}
+
 /*
  * Add a shrinker callback to be called from the vm.
  */
@@ -1382,8 +1391,7 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode)
  * be complete before mem_cgroup_update_lru_size due to a santity check.
  */
 static __always_inline void update_lru_sizes(struct lruvec *lruvec,
-			enum lru_list lru, unsigned long *nr_zone_taken,
-			unsigned long nr_taken)
+			enum lru_list lru, unsigned long *nr_zone_taken)
 {
 	int zid;
 
@@ -1392,11 +1400,11 @@ static __always_inline void update_lru_sizes(struct lruvec *lruvec,
 			continue;
 
 		__update_lru_size(lruvec, lru, zid, -nr_zone_taken[zid]);
-	}
-
 #ifdef CONFIG_MEMCG
-	mem_cgroup_update_lru_size(lruvec, lru, -nr_taken);
+		mem_cgroup_update_lru_size(lruvec, lru, zid, -nr_zone_taken[zid]);
 #endif
+	}
+
 }
 
 /*
@@ -1501,7 +1509,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 	*nr_scanned = scan;
 	trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan, scan,
 				    nr_taken, mode, is_file_lru(lru));
-	update_lru_sizes(lruvec, lru, nr_zone_taken, nr_taken);
+	update_lru_sizes(lruvec, lru, nr_zone_taken);
 	return nr_taken;
 }
 
@@ -2047,10 +2055,8 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
 		if (!managed_zone(zone))
 			continue;
 
-		inactive_zone = zone_page_state(zone,
-				NR_ZONE_LRU_BASE + (file * LRU_FILE));
-		active_zone = zone_page_state(zone,
-				NR_ZONE_LRU_BASE + (file * LRU_FILE) + LRU_ACTIVE);
+		inactive_zone = lruvec_zone_lru_size(lruvec, file * LRU_FILE, zid);
+		active_zone = lruvec_zone_lru_size(lruvec, (file * LRU_FILE) + LRU_ACTIVE, zid);
 
 		inactive -= min(inactive, inactive_zone);
 		active -= min(active, active_zone);
-- 
2.10.2


-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [lkp-developer] [mm, memcg]  d18e2b2aca: WARNING:at_mm/memcontrol.c:#mem_cgroup_update_lru_size
  2016-12-26 12:26                                     ` Michal Hocko
@ 2016-12-26 12:50                                       ` Michal Hocko
  0 siblings, 0 replies; 62+ messages in thread
From: Michal Hocko @ 2016-12-26 12:50 UTC (permalink / raw)
  To: kernel test robot
  Cc: Nils Holland, Mel Gorman, Johannes Weiner, Vladimir Davydov,
	Tetsuo Handa, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs, lkp

On Mon 26-12-16 13:26:51, Michal Hocko wrote:
> On Mon 26-12-16 06:25:56, kernel test robot wrote:
[...]
> > [   95.226364] init: tty6 main process (990) killed by TERM signal
> > [   95.314020] init: plymouth-upstart-bridge main process (1039) terminated with status 1
> > [   97.588568] ------------[ cut here ]------------
> > [   97.594364] WARNING: CPU: 0 PID: 1055 at mm/memcontrol.c:1032 mem_cgroup_update_lru_size+0xdd/0x12b
> > [   97.606654] mem_cgroup_update_lru_size(40297f00, 0, -1): lru_size 1 but empty
> > [   97.615140] Modules linked in:
> > [   97.618834] CPU: 0 PID: 1055 Comm: killall5 Not tainted 4.9.0-mm1-00095-gd18e2b2 #82
> > [   97.628008] Call Trace:
> > [   97.631025]  dump_stack+0x16/0x18
> > [   97.635107]  __warn+0xaf/0xc6
> > [   97.638729]  ? mem_cgroup_update_lru_size+0xdd/0x12b
> 
> Do you have the full backtrace?

It's not needed. I found the bug in my patch and it should be fixed by
the updated patch http://lkml.kernel.org/r/20161226124839.GB20715@dhcp22.suse.cz
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)
  2016-12-26 12:48                                     ` Michal Hocko
@ 2016-12-26 18:57                                       ` Nils Holland
  2016-12-27  8:08                                         ` Michal Hocko
  2016-12-27 15:55                                       ` Michal Hocko
                                                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 62+ messages in thread
From: Nils Holland @ 2016-12-26 18:57 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mel Gorman, Johannes Weiner, Vladimir Davydov, Tetsuo Handa,
	linux-kernel, linux-mm, Chris Mason, David Sterba, linux-btrfs

On Mon, Dec 26, 2016 at 01:48:40PM +0100, Michal Hocko wrote:
> On Fri 23-12-16 23:26:00, Nils Holland wrote:
> > On Fri, Dec 23, 2016 at 03:47:39PM +0100, Michal Hocko wrote:
> > > 
> > > Nils, even though this is still highly experimental, could you give it a
> > > try please?
> > 
> > Yes, no problem! So I kept the very first patch you sent but had to
> > revert the latest version of the debugging patch (the one in
> > which you added the "mm_vmscan_inactive_list_is_low" event) because
> > otherwise the patch you just sent wouldn't apply. Then I rebooted with
> > memory cgroups enabled again, and the first thing that strikes the eye
> > is that I get this during boot:
> > 
> > [    1.568174] ------------[ cut here ]------------
> > [    1.568327] WARNING: CPU: 0 PID: 1 at mm/memcontrol.c:1032 mem_cgroup_update_lru_size+0x118/0x130
> > [    1.568543] mem_cgroup_update_lru_size(f4406400, 2, 1): lru_size 0 but not empty
> 
> Ohh, I can see what is wrong! a) there is a bug in the accounting in
> my patch (I double account) and b) the detection for the empty list
> cannot work after my change because per node zone will not match per
> zone statistics. The updated patch is below. So I hope my brain already
> works after it's been mostly off last few days...

I tried the updated patch, and I can confirm that the warning during
boot is gone. Also, I've tried my ordinary procedure to reproduce my
testcase, and I can say that a kernel with this new patch also works
fine and doesn't produce OOMs or similar issues.

I had the previous version of the patch in use on a machine non-stop
for the last few days during normal day-to-day workloads and didn't
notice any issues. Now I'll keep a machine running during the next few
days with this patch, and in case I notice something that doesn't look
normal, I'll of course report back!

Greetings
Nils

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)
  2016-12-26 18:57                                       ` Nils Holland
@ 2016-12-27  8:08                                         ` Michal Hocko
  2016-12-27 11:23                                           ` Nils Holland
  0 siblings, 1 reply; 62+ messages in thread
From: Michal Hocko @ 2016-12-27  8:08 UTC (permalink / raw)
  To: Nils Holland
  Cc: Mel Gorman, Johannes Weiner, Vladimir Davydov, Tetsuo Handa,
	linux-kernel, linux-mm, Chris Mason, David Sterba, linux-btrfs

On Mon 26-12-16 19:57:03, Nils Holland wrote:
> On Mon, Dec 26, 2016 at 01:48:40PM +0100, Michal Hocko wrote:
> > On Fri 23-12-16 23:26:00, Nils Holland wrote:
> > > On Fri, Dec 23, 2016 at 03:47:39PM +0100, Michal Hocko wrote:
> > > > 
> > > > Nils, even though this is still highly experimental, could you give it a
> > > > try please?
> > > 
> > > Yes, no problem! So I kept the very first patch you sent but had to
> > > revert the latest version of the debugging patch (the one in
> > > which you added the "mm_vmscan_inactive_list_is_low" event) because
> > > otherwise the patch you just sent wouldn't apply. Then I rebooted with
> > > memory cgroups enabled again, and the first thing that strikes the eye
> > > is that I get this during boot:
> > > 
> > > [    1.568174] ------------[ cut here ]------------
> > > [    1.568327] WARNING: CPU: 0 PID: 1 at mm/memcontrol.c:1032 mem_cgroup_update_lru_size+0x118/0x130
> > > [    1.568543] mem_cgroup_update_lru_size(f4406400, 2, 1): lru_size 0 but not empty
> > 
> > Ohh, I can see what is wrong! a) there is a bug in the accounting in
> > my patch (I double account) and b) the detection for the empty list
> > cannot work after my change because per node zone will not match per
> > zone statistics. The updated patch is below. So I hope my brain already
> > works after it's been mostly off last few days...
> 
> I tried the updated patch, and I can confirm that the warning during
> boot is gone. Also, I've tried my ordinary procedure to reproduce my
> testcase, and I can say that a kernel with this new patch also works
> fine and doesn't produce OOMs or similar issues.
> 
> I had the previous version of the patch in use on a machine non-stop
> for the last few days during normal day-to-day workloads and didn't
> notice any issues. Now I'll keep a machine running during the next few
> days with this patch, and in case I notice something that doesn't look
> normal, I'll of course report back!

Thanks for your testing! Can I add your
Tested-by: Nils Holland <nholland@tisys.org>
?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)
  2016-12-27  8:08                                         ` Michal Hocko
@ 2016-12-27 11:23                                           ` Nils Holland
  2016-12-27 11:27                                             ` Michal Hocko
  0 siblings, 1 reply; 62+ messages in thread
From: Nils Holland @ 2016-12-27 11:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mel Gorman, Johannes Weiner, Vladimir Davydov, Tetsuo Handa,
	linux-kernel, linux-mm, Chris Mason, David Sterba, linux-btrfs

On Tue, Dec 27, 2016 at 09:08:38AM +0100, Michal Hocko wrote:
> On Mon 26-12-16 19:57:03, Nils Holland wrote:
> > On Mon, Dec 26, 2016 at 01:48:40PM +0100, Michal Hocko wrote:
> > > On Fri 23-12-16 23:26:00, Nils Holland wrote:
> > > > On Fri, Dec 23, 2016 at 03:47:39PM +0100, Michal Hocko wrote:
> > > > > 
> > > > > Nils, even though this is still highly experimental, could you give it a
> > > > > try please?
> > > > 
> > > > Yes, no problem! So I kept the very first patch you sent but had to
> > > > revert the latest version of the debugging patch (the one in
> > > > which you added the "mm_vmscan_inactive_list_is_low" event) because
> > > > otherwise the patch you just sent wouldn't apply. Then I rebooted with
> > > > memory cgroups enabled again, and the first thing that strikes the eye
> > > > is that I get this during boot:
> > > > 
> > > > [    1.568174] ------------[ cut here ]------------
> > > > [    1.568327] WARNING: CPU: 0 PID: 1 at mm/memcontrol.c:1032 mem_cgroup_update_lru_size+0x118/0x130
> > > > [    1.568543] mem_cgroup_update_lru_size(f4406400, 2, 1): lru_size 0 but not empty
> > > 
> > > Ohh, I can see what is wrong! a) there is a bug in the accounting in
> > > my patch (I double account) and b) the detection for the empty list
> > > cannot work after my change because per node zone will not match per
> > > zone statistics. The updated patch is below. So I hope my brain already
> > > works after it's been mostly off last few days...
> > 
> > I tried the updated patch, and I can confirm that the warning during
> > boot is gone. Also, I've tried my ordinary procedure to reproduce my
> > testcase, and I can say that a kernel with this new patch also works
> > fine and doesn't produce OOMs or similar issues.
> > 
> > I had the previous version of the patch in use on a machine non-stop
> > for the last few days during normal day-to-day workloads and didn't
> > notice any issues. Now I'll keep a machine running during the next few
> > days with this patch, and in case I notice something that doesn't look
> > normal, I'll of course report back!
> 
> Thanks for your testing! Can I add your
> Tested-by: Nils Holland <nholland@tisys.org>

Yes, I think so! The patch has now been running for 16 hours on my two
machines, and that's an uptime that was hard to achieve since 4.8 for
me. ;-) So my tests clearly suggest that the patch is good! :-)

Greetings
Nils

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)
  2016-12-27 11:23                                           ` Nils Holland
@ 2016-12-27 11:27                                             ` Michal Hocko
  0 siblings, 0 replies; 62+ messages in thread
From: Michal Hocko @ 2016-12-27 11:27 UTC (permalink / raw)
  To: Nils Holland
  Cc: Mel Gorman, Johannes Weiner, Vladimir Davydov, Tetsuo Handa,
	linux-kernel, linux-mm, Chris Mason, David Sterba, linux-btrfs

On Tue 27-12-16 12:23:13, Nils Holland wrote:
> On Tue, Dec 27, 2016 at 09:08:38AM +0100, Michal Hocko wrote:
> > On Mon 26-12-16 19:57:03, Nils Holland wrote:
> > > On Mon, Dec 26, 2016 at 01:48:40PM +0100, Michal Hocko wrote:
> > > > On Fri 23-12-16 23:26:00, Nils Holland wrote:
> > > > > On Fri, Dec 23, 2016 at 03:47:39PM +0100, Michal Hocko wrote:
> > > > > > 
> > > > > > Nils, even though this is still highly experimental, could you give it a
> > > > > > try please?
> > > > > 
> > > > > Yes, no problem! So I kept the very first patch you sent but had to
> > > > > revert the latest version of the debugging patch (the one in
> > > > > which you added the "mm_vmscan_inactive_list_is_low" event) because
> > > > > otherwise the patch you just sent wouldn't apply. Then I rebooted with
> > > > > memory cgroups enabled again, and the first thing that strikes the eye
> > > > > is that I get this during boot:
> > > > > 
> > > > > [    1.568174] ------------[ cut here ]------------
> > > > > [    1.568327] WARNING: CPU: 0 PID: 1 at mm/memcontrol.c:1032 mem_cgroup_update_lru_size+0x118/0x130
> > > > > [    1.568543] mem_cgroup_update_lru_size(f4406400, 2, 1): lru_size 0 but not empty
> > > > 
> > > > Ohh, I can see what is wrong! a) there is a bug in the accounting in
> > > > my patch (I double account) and b) the detection for the empty list
> > > > cannot work after my change because per node zone will not match per
> > > > zone statistics. The updated patch is below. So I hope my brain already
> > > > works after it's been mostly off last few days...
> > > 
> > > I tried the updated patch, and I can confirm that the warning during
> > > boot is gone. Also, I've tried my ordinary procedure to reproduce my
> > > testcase, and I can say that a kernel with this new patch also works
> > > fine and doesn't produce OOMs or similar issues.
> > > 
> > > I had the previous version of the patch in use on a machine non-stop
> > > for the last few days during normal day-to-day workloads and didn't
> > > notice any issues. Now I'll keep a machine running during the next few
> > > days with this patch, and in case I notice something that doesn't look
> > > normal, I'll of course report back!
> > 
> > Thanks for your testing! Can I add your
> > Tested-by: Nils Holland <nholland@tisys.org>
> 
> Yes, I think so! The patch has now been running for 16 hours on my two
> machines, and that's an uptime that was hard to achieve since 4.8 for
> me. ;-) So my tests clearly suggest that the patch is good! :-)

OK, thanks a lot for your testing! I will wait few more days before I
send it to Andrew.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)
  2016-12-26 12:48                                     ` Michal Hocko
  2016-12-26 18:57                                       ` Nils Holland
@ 2016-12-27 15:55                                       ` Michal Hocko
  2016-12-27 16:28                                         ` [PATCH] mm, vmscan: consider eligible zones in get_scan_count kbuild test robot
                                                           ` (2 more replies)
  2016-12-29  0:31                                       ` Minchan Kim
  2016-12-30 10:19                                       ` Mel Gorman
  3 siblings, 3 replies; 62+ messages in thread
From: Michal Hocko @ 2016-12-27 15:55 UTC (permalink / raw)
  To: Nils Holland
  Cc: Mel Gorman, Johannes Weiner, Vladimir Davydov, Tetsuo Handa,
	linux-kernel, linux-mm, Chris Mason, David Sterba, linux-btrfs

Hi,
could you try to run with the following patch on top of the previous
one? I do not think it will make a large change in your workload but
I think we need something like that so some testing under which is known
to make a high lowmem pressure would be really appreciated. If you have
more time to play with it then running with and without the patch with
mm_vmscan_direct_reclaim_{start,end} tracepoints enabled could tell us
whether it make any difference at all.

I would also appreciate if Mel and Johannes had a look at it. I am not
yet sure whether we need the same thing for anon/file balancing in
get_scan_count. I suspect we need but need to think more about that.

Thanks a lot again!
---
>From b51f50340fe9e40b68be198b012f8ab9869c1850 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Tue, 27 Dec 2016 16:28:44 +0100
Subject: [PATCH] mm, vmscan: consider eligible zones in get_scan_count

get_scan_count considers the whole node LRU size when
- doing SCAN_FILE due to many page cache inactive pages
- calculating the number of pages to scan

in both cases this might lead to unexpected behavior especially on 32b
systems where we can expect lowmem memory pressure very often.

A large highmem zone can easily distort SCAN_FILE heuristic because
there might be only few file pages from the eligible zones on the node
lru and we would still enforce file lru scanning which can lead to
trashing while we could still scan anonymous pages.

The later use of lruvec_lru_size can be problematic as well. Especially
when there are not many pages from the eligible zones. We would have to
skip over many pages to find anything to reclaim but shrink_node_memcg
would only reduce the remaining number to scan by SWAP_CLUSTER_MAX
at maximum. Therefore we can end up going over a large LRU many times
without actually having chance to reclaim much if anything at all. The
closer we are out of memory on lowmem zone the worse the problem will
be.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/vmscan.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index c98b1a585992..785b4d7fb8a0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -252,6 +252,32 @@ unsigned long lruvec_zone_lru_size(struct lruvec *lruvec, enum lru_list lru, int
 }
 
 /*
+ * Return the number of pages on the given lru which are eligibne for the
+ * given zone_idx
+ */
+static unsigned long lruvec_lru_size_zone_idx(struct lruvec *lruvec,
+		enum lru_list lru, int zone_idx)
+{
+	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
+	unsigned long lru_size;
+	int zid;
+
+	lru_size = lruvec_lru_size(lruvec, lru);
+	for (zid = zone_idx + 1; zid < MAX_NR_ZONES; zid++) {
+		struct zone *zone = &pgdat->node_zones[zid];
+		unsigned long size;
+
+		if (!managed_zone(zone))
+			continue;
+
+		size = lruvec_zone_lru_size(lruvec, lru, zid);
+		lru_size -= min(size, lru_size);
+	}
+
+	return lru_size;
+}
+
+/*
  * Add a shrinker callback to be called from the vm.
  */
 int register_shrinker(struct shrinker *shrinker)
@@ -2207,7 +2233,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
 	 * system is under heavy pressure.
 	 */
 	if (!inactive_list_is_low(lruvec, true, sc) &&
-	    lruvec_lru_size(lruvec, LRU_INACTIVE_FILE) >> sc->priority) {
+	    lruvec_lru_size_zone_idx(lruvec, LRU_INACTIVE_FILE, sc->reclaim_idx) >> sc->priority) {
 		scan_balance = SCAN_FILE;
 		goto out;
 	}
@@ -2274,7 +2300,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
 			unsigned long size;
 			unsigned long scan;
 
-			size = lruvec_lru_size(lruvec, lru);
+			size = lruvec_lru_size_zone_idx(lruvec, lru, sc->reclaim_idx);
 			scan = size >> sc->priority;
 
 			if (!scan && pass && force_scan)
-- 
2.10.2

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH] mm, vmscan: consider eligible zones in get_scan_count
  2016-12-27 15:55                                       ` Michal Hocko
@ 2016-12-27 16:28                                         ` kbuild test robot
  2016-12-28  8:51                                           ` Michal Hocko
  2016-12-27 19:33                                         ` [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on) Nils Holland
  2016-12-29  1:20                                         ` Minchan Kim
  2 siblings, 1 reply; 62+ messages in thread
From: kbuild test robot @ 2016-12-27 16:28 UTC (permalink / raw)
  To: Michal Hocko
  Cc: kbuild-all, Nils Holland, Mel Gorman, Johannes Weiner,
	Vladimir Davydov, Tetsuo Handa, linux-kernel, linux-mm,
	Chris Mason, David Sterba, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1402 bytes --]

Hi Michal,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.10-rc1 next-20161224]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Michal-Hocko/mm-vmscan-consider-eligible-zones-in-get_scan_count/20161228-000917
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: i386-tinyconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   mm/vmscan.c: In function 'lruvec_lru_size_zone_idx':
>> mm/vmscan.c:264:10: error: implicit declaration of function 'lruvec_zone_lru_size' [-Werror=implicit-function-declaration]
      size = lruvec_zone_lru_size(lruvec, lru, zid);
             ^~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/lruvec_zone_lru_size +264 mm/vmscan.c

   258			struct zone *zone = &pgdat->node_zones[zid];
   259			unsigned long size;
   260	
   261			if (!managed_zone(zone))
   262				continue;
   263	
 > 264			size = lruvec_zone_lru_size(lruvec, lru, zid);
   265			lru_size -= min(size, lru_size);
   266		}
   267	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6418 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)
  2016-12-27 15:55                                       ` Michal Hocko
  2016-12-27 16:28                                         ` [PATCH] mm, vmscan: consider eligible zones in get_scan_count kbuild test robot
@ 2016-12-27 19:33                                         ` Nils Holland
  2016-12-28  8:57                                           ` Michal Hocko
  2016-12-29  1:20                                         ` Minchan Kim
  2 siblings, 1 reply; 62+ messages in thread
From: Nils Holland @ 2016-12-27 19:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mel Gorman, Johannes Weiner, Vladimir Davydov, Tetsuo Handa,
	linux-kernel, linux-mm, Chris Mason, David Sterba, linux-btrfs

On Tue, Dec 27, 2016 at 04:55:33PM +0100, Michal Hocko wrote:
> Hi,
> could you try to run with the following patch on top of the previous
> one? I do not think it will make a large change in your workload but
> I think we need something like that so some testing under which is known
> to make a high lowmem pressure would be really appreciated. If you have
> more time to play with it then running with and without the patch with
> mm_vmscan_direct_reclaim_{start,end} tracepoints enabled could tell us
> whether it make any difference at all.

Of course, no problem!

First, about the events to trace: mm_vmscan_direct_reclaim_start
doesn't seem to exist, but mm_vmscan_direct_reclaim_begin does. I'm
sure that's what you meant and so I took that one instead.

Then I have to admit in both cases (once without the latest patch,
once with) very little trace data was actually produced. In the case
without the patch, the reclaim was started more often and reclaimed a
smaller number of pages each time, in the case with the patch it was
invoked less often, and with the last time it was invoked it reclaimed
a rather big number of pages. I have no clue, however, if that
happened "by chance" or if it was actually causes by the patch and
thus an expected change.

In both cases, my test case was: Reboot, setup logging, do "emerge
firefox" (which unpacks and builds the firefox sources), then, when
the emerge had come so far that the unpacking was done and the
building had started, switch to another console and untar the latest
kernel, libreoffice and (once more) firefox sources there. After that
had completed, I aborted the emerge build process and stopped tracing.

Here's the trace data captured without the latest patch applied:

khugepaged-22    [000] ....   566.123383: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
khugepaged-22    [000] .N..   566.165520: mm_vmscan_direct_reclaim_end: nr_reclaimed=1100
khugepaged-22    [001] ....   587.515424: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
khugepaged-22    [000] ....   587.596035: mm_vmscan_direct_reclaim_end: nr_reclaimed=1029
khugepaged-22    [001] ....   599.879536: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
khugepaged-22    [000] ....   601.000812: mm_vmscan_direct_reclaim_end: nr_reclaimed=1100
khugepaged-22    [001] ....   601.228137: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
khugepaged-22    [001] ....   601.309952: mm_vmscan_direct_reclaim_end: nr_reclaimed=1081
khugepaged-22    [001] ....   694.935267: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
khugepaged-22    [001] .N..   695.081943: mm_vmscan_direct_reclaim_end: nr_reclaimed=1071
khugepaged-22    [001] ....   701.370707: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
khugepaged-22    [001] ....   701.372798: mm_vmscan_direct_reclaim_end: nr_reclaimed=1089
khugepaged-22    [001] ....   764.752036: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
khugepaged-22    [000] ....   771.047905: mm_vmscan_direct_reclaim_end: nr_reclaimed=1039
khugepaged-22    [000] ....   781.760515: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
khugepaged-22    [001] ....   781.826543: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040
khugepaged-22    [001] ....   782.595575: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
khugepaged-22    [000] ....   782.638591: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040
khugepaged-22    [001] ....   782.930455: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
khugepaged-22    [001] ....   782.993608: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040
khugepaged-22    [001] ....   783.330378: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
khugepaged-22    [001] ....   783.369653: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040

And this is the same with the patch applied:

khugepaged-22    [001] ....   523.599997: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
khugepaged-22    [001] ....   523.683110: mm_vmscan_direct_reclaim_end: nr_reclaimed=1092
khugepaged-22    [001] ....   535.345477: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
khugepaged-22    [001] ....   535.401189: mm_vmscan_direct_reclaim_end: nr_reclaimed=1078
khugepaged-22    [000] ....   692.876716: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
khugepaged-22    [001] ....   703.312399: mm_vmscan_direct_reclaim_end: nr_reclaimed=197759

If my test case and thus the results don't sound good, I could of
course try some other test cases ... like capturing for a longer
period of time or trying to produce more memory pressure by running
more processes at the same time, or something like that.

Besides that I can say that the patch hasn't produced any warnings or
other issues so far, so at first glance, it doesn't seem to hurt
anything.

Greetings
Nils

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH] mm, vmscan: consider eligible zones in get_scan_count
  2016-12-27 16:28                                         ` [PATCH] mm, vmscan: consider eligible zones in get_scan_count kbuild test robot
@ 2016-12-28  8:51                                           ` Michal Hocko
  0 siblings, 0 replies; 62+ messages in thread
From: Michal Hocko @ 2016-12-28  8:51 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, Nils Holland, Mel Gorman, Johannes Weiner,
	Vladimir Davydov, Tetsuo Handa, linux-kernel, linux-mm,
	Chris Mason, David Sterba, linux-btrfs

On Wed 28-12-16 00:28:38, kbuild test robot wrote:
> Hi Michal,
> 
> [auto build test ERROR on mmotm/master]
> [also build test ERROR on v4.10-rc1 next-20161224]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
> 
> url:    https://github.com/0day-ci/linux/commits/Michal-Hocko/mm-vmscan-consider-eligible-zones-in-get_scan_count/20161228-000917
> base:   git://git.cmpxchg.org/linux-mmotm.git master
> config: i386-tinyconfig (attached as .config)
> compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=i386 
> 
> All errors (new ones prefixed by >>):
> 
>    mm/vmscan.c: In function 'lruvec_lru_size_zone_idx':
> >> mm/vmscan.c:264:10: error: implicit declaration of function 'lruvec_zone_lru_size' [-Werror=implicit-function-declaration]
>       size = lruvec_zone_lru_size(lruvec, lru, zid);

this patch depends on the previous one
http://lkml.kernel.org/r/20161226124839.GB20715@dhcp22.suse.cz
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)
  2016-12-27 19:33                                         ` [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on) Nils Holland
@ 2016-12-28  8:57                                           ` Michal Hocko
  0 siblings, 0 replies; 62+ messages in thread
From: Michal Hocko @ 2016-12-28  8:57 UTC (permalink / raw)
  To: Nils Holland
  Cc: Mel Gorman, Johannes Weiner, Vladimir Davydov, Tetsuo Handa,
	linux-kernel, linux-mm, Chris Mason, David Sterba, linux-btrfs

On Tue 27-12-16 20:33:09, Nils Holland wrote:
> On Tue, Dec 27, 2016 at 04:55:33PM +0100, Michal Hocko wrote:
> > Hi,
> > could you try to run with the following patch on top of the previous
> > one? I do not think it will make a large change in your workload but
> > I think we need something like that so some testing under which is known
> > to make a high lowmem pressure would be really appreciated. If you have
> > more time to play with it then running with and without the patch with
> > mm_vmscan_direct_reclaim_{start,end} tracepoints enabled could tell us
> > whether it make any difference at all.
> 
> Of course, no problem!
> 
> First, about the events to trace: mm_vmscan_direct_reclaim_start
> doesn't seem to exist, but mm_vmscan_direct_reclaim_begin does. I'm
> sure that's what you meant and so I took that one instead.

yes, sorry about the confusion

> Then I have to admit in both cases (once without the latest patch,
> once with) very little trace data was actually produced. In the case
> without the patch, the reclaim was started more often and reclaimed a
> smaller number of pages each time, in the case with the patch it was
> invoked less often, and with the last time it was invoked it reclaimed
> a rather big number of pages. I have no clue, however, if that
> happened "by chance" or if it was actually causes by the patch and
> thus an expected change.

yes that seems to be a variation of the workload I would say because if
anything the patch should reduce the number of scanned pages.

> In both cases, my test case was: Reboot, setup logging, do "emerge
> firefox" (which unpacks and builds the firefox sources), then, when
> the emerge had come so far that the unpacking was done and the
> building had started, switch to another console and untar the latest
> kernel, libreoffice and (once more) firefox sources there. After that
> had completed, I aborted the emerge build process and stopped tracing.
> 
> Here's the trace data captured without the latest patch applied:
> 
> khugepaged-22    [000] ....   566.123383: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
> khugepaged-22    [000] .N..   566.165520: mm_vmscan_direct_reclaim_end: nr_reclaimed=1100
> khugepaged-22    [001] ....   587.515424: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
> khugepaged-22    [000] ....   587.596035: mm_vmscan_direct_reclaim_end: nr_reclaimed=1029
> khugepaged-22    [001] ....   599.879536: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
> khugepaged-22    [000] ....   601.000812: mm_vmscan_direct_reclaim_end: nr_reclaimed=1100
> khugepaged-22    [001] ....   601.228137: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
> khugepaged-22    [001] ....   601.309952: mm_vmscan_direct_reclaim_end: nr_reclaimed=1081
> khugepaged-22    [001] ....   694.935267: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
> khugepaged-22    [001] .N..   695.081943: mm_vmscan_direct_reclaim_end: nr_reclaimed=1071
> khugepaged-22    [001] ....   701.370707: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
> khugepaged-22    [001] ....   701.372798: mm_vmscan_direct_reclaim_end: nr_reclaimed=1089
> khugepaged-22    [001] ....   764.752036: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
> khugepaged-22    [000] ....   771.047905: mm_vmscan_direct_reclaim_end: nr_reclaimed=1039
> khugepaged-22    [000] ....   781.760515: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
> khugepaged-22    [001] ....   781.826543: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040
> khugepaged-22    [001] ....   782.595575: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
> khugepaged-22    [000] ....   782.638591: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040
> khugepaged-22    [001] ....   782.930455: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
> khugepaged-22    [001] ....   782.993608: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040
> khugepaged-22    [001] ....   783.330378: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
> khugepaged-22    [001] ....   783.369653: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040
> 
> And this is the same with the patch applied:
> 
> khugepaged-22    [001] ....   523.599997: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
> khugepaged-22    [001] ....   523.683110: mm_vmscan_direct_reclaim_end: nr_reclaimed=1092
> khugepaged-22    [001] ....   535.345477: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
> khugepaged-22    [001] ....   535.401189: mm_vmscan_direct_reclaim_end: nr_reclaimed=1078
> khugepaged-22    [000] ....   692.876716: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3
> khugepaged-22    [001] ....   703.312399: mm_vmscan_direct_reclaim_end: nr_reclaimed=197759

In these cases there is no real difference because this is not the
lowmem pressure because those requests can go to the highmem zone.

> If my test case and thus the results don't sound good, I could of
> course try some other test cases ... like capturing for a longer
> period of time or trying to produce more memory pressure by running
> more processes at the same time, or something like that.

yes, a stronger memory pressure would be needed. I suspect that your
original issues was more about active list aging than a really strong
memory pressure. So it might be possible that your workload will not
notice. If you can collect those two tracepoints over a longer time it
can still tell us something but I do not want you to burn a lot of time
on this. The main issue seems to be fixed and the follow up fix can wait
for a throughout review after both Mel and Johannes are back from
holiday.

> Besides that I can say that the patch hasn't produced any warnings or
> other issues so far, so at first glance, it doesn't seem to hurt
> anything.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)
  2016-12-26 12:48                                     ` Michal Hocko
  2016-12-26 18:57                                       ` Nils Holland
  2016-12-27 15:55                                       ` Michal Hocko
@ 2016-12-29  0:31                                       ` Minchan Kim
  2016-12-29  0:48                                         ` Minchan Kim
  2016-12-30 10:19                                       ` Mel Gorman
  3 siblings, 1 reply; 62+ messages in thread
From: Minchan Kim @ 2016-12-29  0:31 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Nils Holland, Mel Gorman, Johannes Weiner, Vladimir Davydov,
	Tetsuo Handa, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs

On Mon, Dec 26, 2016 at 01:48:40PM +0100, Michal Hocko wrote:
> On Fri 23-12-16 23:26:00, Nils Holland wrote:
> > On Fri, Dec 23, 2016 at 03:47:39PM +0100, Michal Hocko wrote:
> > > 
> > > Nils, even though this is still highly experimental, could you give it a
> > > try please?
> > 
> > Yes, no problem! So I kept the very first patch you sent but had to
> > revert the latest version of the debugging patch (the one in
> > which you added the "mm_vmscan_inactive_list_is_low" event) because
> > otherwise the patch you just sent wouldn't apply. Then I rebooted with
> > memory cgroups enabled again, and the first thing that strikes the eye
> > is that I get this during boot:
> > 
> > [    1.568174] ------------[ cut here ]------------
> > [    1.568327] WARNING: CPU: 0 PID: 1 at mm/memcontrol.c:1032 mem_cgroup_update_lru_size+0x118/0x130
> > [    1.568543] mem_cgroup_update_lru_size(f4406400, 2, 1): lru_size 0 but not empty
> 
> Ohh, I can see what is wrong! a) there is a bug in the accounting in
> my patch (I double account) and b) the detection for the empty list
> cannot work after my change because per node zone will not match per
> zone statistics. The updated patch is below. So I hope my brain already
> works after it's been mostly off last few days...
> ---
> From 397adf46917b2d9493180354a7b0182aee280a8b Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Fri, 23 Dec 2016 15:11:54 +0100
> Subject: [PATCH] mm, memcg: fix the active list aging for lowmem requests when
>  memcg is enabled
> 
> Nils Holland has reported unexpected OOM killer invocations with 32b
> kernel starting with 4.8 kernels
> 
> 	kworker/u4:5 invoked oom-killer: gfp_mask=0x2400840(GFP_NOFS|__GFP_NOFAIL), nodemask=0, order=0, oom_score_adj=0
> 	kworker/u4:5 cpuset=/ mems_allowed=0
> 	CPU: 1 PID: 2603 Comm: kworker/u4:5 Not tainted 4.9.0-gentoo #2
> 	[...]
> 	Mem-Info:
> 	active_anon:58685 inactive_anon:90 isolated_anon:0
> 	 active_file:274324 inactive_file:281962 isolated_file:0
> 	 unevictable:0 dirty:649 writeback:0 unstable:0
> 	 slab_reclaimable:40662 slab_unreclaimable:17754
> 	 mapped:7382 shmem:202 pagetables:351 bounce:0
> 	 free:206736 free_pcp:332 free_cma:0
> 	Node 0 active_anon:234740kB inactive_anon:360kB active_file:1097296kB inactive_file:1127848kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:29528kB dirty:2596kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 184320kB anon_thp: 808kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
> 	DMA free:3952kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:7316kB inactive_file:0kB unevictable:0kB writepending:96kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:3200kB slab_unreclaimable:1408kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> 	lowmem_reserve[]: 0 813 3474 3474
> 	Normal free:41332kB min:41368kB low:51708kB high:62048kB active_anon:0kB inactive_anon:0kB active_file:532748kB inactive_file:44kB unevictable:0kB writepending:24kB present:897016kB managed:836248kB mlocked:0kB slab_reclaimable:159448kB slab_unreclaimable:69608kB kernel_stack:1112kB pagetables:1404kB bounce:0kB free_pcp:528kB local_pcp:340kB free_cma:0kB
> 	lowmem_reserve[]: 0 0 21292 21292
> 	HighMem free:781660kB min:512kB low:34356kB high:68200kB active_anon:234740kB inactive_anon:360kB active_file:557232kB inactive_file:1127804kB unevictable:0kB writepending:2592kB present:2725384kB managed:2725384kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:800kB local_pcp:608kB free_cma:0kB
> 
> the oom killer is clearly pre-mature because there there is still a
> lot of page cache in the zone Normal which should satisfy this lowmem
> request. Further debugging has shown that the reclaim cannot make any
> forward progress because the page cache is hidden in the active list
> which doesn't get rotated because inactive_list_is_low is not memcg
> aware.
> It simply subtracts per-zone highmem counters from the respective
> memcg's lru sizes which doesn't make any sense. We can simply end up
> always seeing the resulting active and inactive counts 0 and return
> false. This issue is not limited to 32b kernels but in practice the
> effect on systems without CONFIG_HIGHMEM would be much harder to notice
> because we do not invoke the OOM killer for allocations requests
> targeting < ZONE_NORMAL.
> 
> Fix the issue by tracking per zone lru page counts in mem_cgroup_per_node
> and subtract per-memcg highmem counts when memcg is enabled. Introduce
> helper lruvec_zone_lru_size which redirects to either zone counters or
> mem_cgroup_get_zone_lru_size when appropriate.
> 
> We are loosing empty LRU but non-zero lru size detection introduced by
> ca707239e8a7 ("mm: update_lru_size warn and reset bad lru_size") because
> of the inherent zone vs. node discrepancy.
> 
> Fixes: f8d1a31163fc ("mm: consider whether to decivate based on eligible zones inactive ratio")
> Cc: stable # 4.8+
> Reported-by: Nils Holland <nholland@tisys.org>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Minchan Kim <minchan@kernel.org>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)
  2016-12-29  0:31                                       ` Minchan Kim
@ 2016-12-29  0:48                                         ` Minchan Kim
  2016-12-29  8:52                                           ` Michal Hocko
  0 siblings, 1 reply; 62+ messages in thread
From: Minchan Kim @ 2016-12-29  0:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Nils Holland, Mel Gorman, Johannes Weiner, Vladimir Davydov,
	Tetsuo Handa, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs

On Thu, Dec 29, 2016 at 09:31:54AM +0900, Minchan Kim wrote:
> On Mon, Dec 26, 2016 at 01:48:40PM +0100, Michal Hocko wrote:
> > On Fri 23-12-16 23:26:00, Nils Holland wrote:
> > > On Fri, Dec 23, 2016 at 03:47:39PM +0100, Michal Hocko wrote:
> > > > 
> > > > Nils, even though this is still highly experimental, could you give it a
> > > > try please?
> > > 
> > > Yes, no problem! So I kept the very first patch you sent but had to
> > > revert the latest version of the debugging patch (the one in
> > > which you added the "mm_vmscan_inactive_list_is_low" event) because
> > > otherwise the patch you just sent wouldn't apply. Then I rebooted with
> > > memory cgroups enabled again, and the first thing that strikes the eye
> > > is that I get this during boot:
> > > 
> > > [    1.568174] ------------[ cut here ]------------
> > > [    1.568327] WARNING: CPU: 0 PID: 1 at mm/memcontrol.c:1032 mem_cgroup_update_lru_size+0x118/0x130
> > > [    1.568543] mem_cgroup_update_lru_size(f4406400, 2, 1): lru_size 0 but not empty
> > 
> > Ohh, I can see what is wrong! a) there is a bug in the accounting in
> > my patch (I double account) and b) the detection for the empty list
> > cannot work after my change because per node zone will not match per
> > zone statistics. The updated patch is below. So I hope my brain already
> > works after it's been mostly off last few days...
> > ---
> > From 397adf46917b2d9493180354a7b0182aee280a8b Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Fri, 23 Dec 2016 15:11:54 +0100
> > Subject: [PATCH] mm, memcg: fix the active list aging for lowmem requests when
> >  memcg is enabled
> > 
> > Nils Holland has reported unexpected OOM killer invocations with 32b
> > kernel starting with 4.8 kernels
> > 
> > 	kworker/u4:5 invoked oom-killer: gfp_mask=0x2400840(GFP_NOFS|__GFP_NOFAIL), nodemask=0, order=0, oom_score_adj=0
> > 	kworker/u4:5 cpuset=/ mems_allowed=0
> > 	CPU: 1 PID: 2603 Comm: kworker/u4:5 Not tainted 4.9.0-gentoo #2
> > 	[...]
> > 	Mem-Info:
> > 	active_anon:58685 inactive_anon:90 isolated_anon:0
> > 	 active_file:274324 inactive_file:281962 isolated_file:0
> > 	 unevictable:0 dirty:649 writeback:0 unstable:0
> > 	 slab_reclaimable:40662 slab_unreclaimable:17754
> > 	 mapped:7382 shmem:202 pagetables:351 bounce:0
> > 	 free:206736 free_pcp:332 free_cma:0
> > 	Node 0 active_anon:234740kB inactive_anon:360kB active_file:1097296kB inactive_file:1127848kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:29528kB dirty:2596kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 184320kB anon_thp: 808kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
> > 	DMA free:3952kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:7316kB inactive_file:0kB unevictable:0kB writepending:96kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:3200kB slab_unreclaimable:1408kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > 	lowmem_reserve[]: 0 813 3474 3474
> > 	Normal free:41332kB min:41368kB low:51708kB high:62048kB active_anon:0kB inactive_anon:0kB active_file:532748kB inactive_file:44kB unevictable:0kB writepending:24kB present:897016kB managed:836248kB mlocked:0kB slab_reclaimable:159448kB slab_unreclaimable:69608kB kernel_stack:1112kB pagetables:1404kB bounce:0kB free_pcp:528kB local_pcp:340kB free_cma:0kB
> > 	lowmem_reserve[]: 0 0 21292 21292
> > 	HighMem free:781660kB min:512kB low:34356kB high:68200kB active_anon:234740kB inactive_anon:360kB active_file:557232kB inactive_file:1127804kB unevictable:0kB writepending:2592kB present:2725384kB managed:2725384kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:800kB local_pcp:608kB free_cma:0kB
> > 
> > the oom killer is clearly pre-mature because there there is still a
> > lot of page cache in the zone Normal which should satisfy this lowmem
> > request. Further debugging has shown that the reclaim cannot make any
> > forward progress because the page cache is hidden in the active list
> > which doesn't get rotated because inactive_list_is_low is not memcg
> > aware.
> > It simply subtracts per-zone highmem counters from the respective
> > memcg's lru sizes which doesn't make any sense. We can simply end up
> > always seeing the resulting active and inactive counts 0 and return
> > false. This issue is not limited to 32b kernels but in practice the
> > effect on systems without CONFIG_HIGHMEM would be much harder to notice
> > because we do not invoke the OOM killer for allocations requests
> > targeting < ZONE_NORMAL.
> > 
> > Fix the issue by tracking per zone lru page counts in mem_cgroup_per_node
> > and subtract per-memcg highmem counts when memcg is enabled. Introduce
> > helper lruvec_zone_lru_size which redirects to either zone counters or
> > mem_cgroup_get_zone_lru_size when appropriate.
> > 
> > We are loosing empty LRU but non-zero lru size detection introduced by
> > ca707239e8a7 ("mm: update_lru_size warn and reset bad lru_size") because
> > of the inherent zone vs. node discrepancy.
> > 
> > Fixes: f8d1a31163fc ("mm: consider whether to decivate based on eligible zones inactive ratio")
> > Cc: stable # 4.8+
> > Reported-by: Nils Holland <nholland@tisys.org>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> Acked-by: Minchan Kim <minchan@kernel.org>

Nit:

WARNING: line over 80 characters
#53: FILE: include/linux/memcontrol.h:689:
+unsigned long mem_cgroup_get_zone_lru_size(struct lruvec *lruvec, enum lru_list lru,

WARNING: line over 80 characters
#147: FILE: mm/vmscan.c:248:
+unsigned long lruvec_zone_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx)

WARNING: line over 80 characters
#177: FILE: mm/vmscan.c:1446:
+               mem_cgroup_update_lru_size(lruvec, lru, zid, -nr_zone_taken[zid]);

WARNING: line over 80 characters
#201: FILE: mm/vmscan.c:2099:
+               inactive_zone = lruvec_zone_lru_size(lruvec, file * LRU_FILE, zid);

WARNING: line over 80 characters
#202: FILE: mm/vmscan.c:2100:
+               active_zone = lruvec_zone_lru_size(lruvec, (file * LRU_FILE) + LRU_ACTIVE, zid);

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)
  2016-12-27 15:55                                       ` Michal Hocko
  2016-12-27 16:28                                         ` [PATCH] mm, vmscan: consider eligible zones in get_scan_count kbuild test robot
  2016-12-27 19:33                                         ` [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on) Nils Holland
@ 2016-12-29  1:20                                         ` Minchan Kim
  2016-12-29  9:04                                           ` Michal Hocko
  2 siblings, 1 reply; 62+ messages in thread
From: Minchan Kim @ 2016-12-29  1:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Nils Holland, Mel Gorman, Johannes Weiner, Vladimir Davydov,
	Tetsuo Handa, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs

On Tue, Dec 27, 2016 at 04:55:33PM +0100, Michal Hocko wrote:
> Hi,
> could you try to run with the following patch on top of the previous
> one? I do not think it will make a large change in your workload but
> I think we need something like that so some testing under which is known
> to make a high lowmem pressure would be really appreciated. If you have
> more time to play with it then running with and without the patch with
> mm_vmscan_direct_reclaim_{start,end} tracepoints enabled could tell us
> whether it make any difference at all.
> 
> I would also appreciate if Mel and Johannes had a look at it. I am not
> yet sure whether we need the same thing for anon/file balancing in
> get_scan_count. I suspect we need but need to think more about that.
> 
> Thanks a lot again!
> ---
> From b51f50340fe9e40b68be198b012f8ab9869c1850 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Tue, 27 Dec 2016 16:28:44 +0100
> Subject: [PATCH] mm, vmscan: consider eligible zones in get_scan_count
> 
> get_scan_count considers the whole node LRU size when
> - doing SCAN_FILE due to many page cache inactive pages
> - calculating the number of pages to scan
> 
> in both cases this might lead to unexpected behavior especially on 32b
> systems where we can expect lowmem memory pressure very often.
> 
> A large highmem zone can easily distort SCAN_FILE heuristic because
> there might be only few file pages from the eligible zones on the node
> lru and we would still enforce file lru scanning which can lead to
> trashing while we could still scan anonymous pages.

Nit:
It doesn't make thrashing because isolate_lru_pages filter out them
but I agree it makes pointless CPU burning to find eligible pages.

> 
> The later use of lruvec_lru_size can be problematic as well. Especially
> when there are not many pages from the eligible zones. We would have to
> skip over many pages to find anything to reclaim but shrink_node_memcg
> would only reduce the remaining number to scan by SWAP_CLUSTER_MAX
> at maximum. Therefore we can end up going over a large LRU many times
> without actually having chance to reclaim much if anything at all. The
> closer we are out of memory on lowmem zone the worse the problem will
> be.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/vmscan.c | 30 ++++++++++++++++++++++++++++--
>  1 file changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c98b1a585992..785b4d7fb8a0 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -252,6 +252,32 @@ unsigned long lruvec_zone_lru_size(struct lruvec *lruvec, enum lru_list lru, int
>  }
>  
>  /*
> + * Return the number of pages on the given lru which are eligibne for the
                                                            eligible
> + * given zone_idx
> + */
> +static unsigned long lruvec_lru_size_zone_idx(struct lruvec *lruvec,
> +		enum lru_list lru, int zone_idx)

Nit:

Although there is a comment, function name is rather confusing when I compared
it with lruvec_zone_lru_size.

lruvec_eligible_zones_lru_size is better?


> +{
> +	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
> +	unsigned long lru_size;
> +	int zid;
> +
> +	lru_size = lruvec_lru_size(lruvec, lru);
> +	for (zid = zone_idx + 1; zid < MAX_NR_ZONES; zid++) {
> +		struct zone *zone = &pgdat->node_zones[zid];
> +		unsigned long size;
> +
> +		if (!managed_zone(zone))
> +			continue;
> +
> +		size = lruvec_zone_lru_size(lruvec, lru, zid);
> +		lru_size -= min(size, lru_size);
> +	}
> +
> +	return lru_size;
> +}
> +
> +/*
>   * Add a shrinker callback to be called from the vm.
>   */
>  int register_shrinker(struct shrinker *shrinker)
> @@ -2207,7 +2233,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
>  	 * system is under heavy pressure.
>  	 */
>  	if (!inactive_list_is_low(lruvec, true, sc) &&
> -	    lruvec_lru_size(lruvec, LRU_INACTIVE_FILE) >> sc->priority) {
> +	    lruvec_lru_size_zone_idx(lruvec, LRU_INACTIVE_FILE, sc->reclaim_idx) >> sc->priority) {
>  		scan_balance = SCAN_FILE;
>  		goto out;
>  	}
> @@ -2274,7 +2300,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
>  			unsigned long size;
>  			unsigned long scan;
>  
> -			size = lruvec_lru_size(lruvec, lru);
> +			size = lruvec_lru_size_zone_idx(lruvec, lru, sc->reclaim_idx);
>  			scan = size >> sc->priority;
>  
>  			if (!scan && pass && force_scan)
> -- 
> 2.10.2

Nit:

With this patch, inactive_list_is_low can use lruvec_lru_size_zone_idx rather than
own custom calculation to filter out non-eligible pages. 

Anyway, I think this patch does right things so I suppose this.

Acked-by: Minchan Kim <minchan@kernel.org>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)
  2016-12-29  0:48                                         ` Minchan Kim
@ 2016-12-29  8:52                                           ` Michal Hocko
  0 siblings, 0 replies; 62+ messages in thread
From: Michal Hocko @ 2016-12-29  8:52 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Nils Holland, Mel Gorman, Johannes Weiner, Vladimir Davydov,
	Tetsuo Handa, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs

On Thu 29-12-16 09:48:24, Minchan Kim wrote:
> On Thu, Dec 29, 2016 at 09:31:54AM +0900, Minchan Kim wrote:
[...]
> > Acked-by: Minchan Kim <minchan@kernel.org>

Thanks!
 
> Nit:
> 
> WARNING: line over 80 characters
> #53: FILE: include/linux/memcontrol.h:689:
> +unsigned long mem_cgroup_get_zone_lru_size(struct lruvec *lruvec, enum lru_list lru,
> 
> WARNING: line over 80 characters
> #147: FILE: mm/vmscan.c:248:
> +unsigned long lruvec_zone_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx)
> 
> WARNING: line over 80 characters
> #177: FILE: mm/vmscan.c:1446:
> +               mem_cgroup_update_lru_size(lruvec, lru, zid, -nr_zone_taken[zid]);

fixed

> WARNING: line over 80 characters
> #201: FILE: mm/vmscan.c:2099:
> +               inactive_zone = lruvec_zone_lru_size(lruvec, file * LRU_FILE, zid);
> 
> WARNING: line over 80 characters
> #202: FILE: mm/vmscan.c:2100:
> +               active_zone = lruvec_zone_lru_size(lruvec, (file * LRU_FILE) + LRU_ACTIVE, zid);

I would prefer to have those on the same line though. It will make them
easier to follow.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)
  2016-12-29  1:20                                         ` Minchan Kim
@ 2016-12-29  9:04                                           ` Michal Hocko
  2016-12-30  2:05                                             ` Minchan Kim
  0 siblings, 1 reply; 62+ messages in thread
From: Michal Hocko @ 2016-12-29  9:04 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Nils Holland, Mel Gorman, Johannes Weiner, Vladimir Davydov,
	Tetsuo Handa, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs

On Thu 29-12-16 10:20:26, Minchan Kim wrote:
> On Tue, Dec 27, 2016 at 04:55:33PM +0100, Michal Hocko wrote:
> > Hi,
> > could you try to run with the following patch on top of the previous
> > one? I do not think it will make a large change in your workload but
> > I think we need something like that so some testing under which is known
> > to make a high lowmem pressure would be really appreciated. If you have
> > more time to play with it then running with and without the patch with
> > mm_vmscan_direct_reclaim_{start,end} tracepoints enabled could tell us
> > whether it make any difference at all.
> > 
> > I would also appreciate if Mel and Johannes had a look at it. I am not
> > yet sure whether we need the same thing for anon/file balancing in
> > get_scan_count. I suspect we need but need to think more about that.
> > 
> > Thanks a lot again!
> > ---
> > From b51f50340fe9e40b68be198b012f8ab9869c1850 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Tue, 27 Dec 2016 16:28:44 +0100
> > Subject: [PATCH] mm, vmscan: consider eligible zones in get_scan_count
> > 
> > get_scan_count considers the whole node LRU size when
> > - doing SCAN_FILE due to many page cache inactive pages
> > - calculating the number of pages to scan
> > 
> > in both cases this might lead to unexpected behavior especially on 32b
> > systems where we can expect lowmem memory pressure very often.
> > 
> > A large highmem zone can easily distort SCAN_FILE heuristic because
> > there might be only few file pages from the eligible zones on the node
> > lru and we would still enforce file lru scanning which can lead to
> > trashing while we could still scan anonymous pages.
> 
> Nit:
> It doesn't make thrashing because isolate_lru_pages filter out them
> but I agree it makes pointless CPU burning to find eligible pages.

This is not about isolate_lru_pages. The trashing could happen if we had
lowmem pagecache user which would constantly reclaim recently faulted
in pages while there is anonymous memory in the lowmem which could be
reclaimed instead.
 
[...]
> >  /*
> > + * Return the number of pages on the given lru which are eligibne for the
>                                                             eligible

fixed

> > + * given zone_idx
> > + */
> > +static unsigned long lruvec_lru_size_zone_idx(struct lruvec *lruvec,
> > +		enum lru_list lru, int zone_idx)
> 
> Nit:
> 
> Although there is a comment, function name is rather confusing when I compared
> it with lruvec_zone_lru_size.

I am all for a better name.

> lruvec_eligible_zones_lru_size is better?

this would be too easy to confuse with lruvec_eligible_zone_lru_size.
What about lruvec_lru_size_eligible_zones?
 
> Nit:
> 
> With this patch, inactive_list_is_low can use lruvec_lru_size_zone_idx rather than
> own custom calculation to filter out non-eligible pages. 

Yes, that would be possible and I was considering that. But then I found
useful to see total and reduced numbers in the tracepoint
http://lkml.kernel.org/r/20161228153032.10821-8-mhocko@kernel.org
and didn't want to call lruvec_lru_size 2 times. But if you insist then
I can just do that.

> Anyway, I think this patch does right things so I suppose this.
> 
> Acked-by: Minchan Kim <minchan@kernel.org>

Thanks for the review!

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)
  2016-12-29  9:04                                           ` Michal Hocko
@ 2016-12-30  2:05                                             ` Minchan Kim
  2016-12-30 10:40                                               ` Michal Hocko
  0 siblings, 1 reply; 62+ messages in thread
From: Minchan Kim @ 2016-12-30  2:05 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Nils Holland, Mel Gorman, Johannes Weiner, Vladimir Davydov,
	Tetsuo Handa, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs

On Thu, Dec 29, 2016 at 10:04:32AM +0100, Michal Hocko wrote:
> On Thu 29-12-16 10:20:26, Minchan Kim wrote:
> > On Tue, Dec 27, 2016 at 04:55:33PM +0100, Michal Hocko wrote:
> > > Hi,
> > > could you try to run with the following patch on top of the previous
> > > one? I do not think it will make a large change in your workload but
> > > I think we need something like that so some testing under which is known
> > > to make a high lowmem pressure would be really appreciated. If you have
> > > more time to play with it then running with and without the patch with
> > > mm_vmscan_direct_reclaim_{start,end} tracepoints enabled could tell us
> > > whether it make any difference at all.
> > > 
> > > I would also appreciate if Mel and Johannes had a look at it. I am not
> > > yet sure whether we need the same thing for anon/file balancing in
> > > get_scan_count. I suspect we need but need to think more about that.
> > > 
> > > Thanks a lot again!
> > > ---
> > > From b51f50340fe9e40b68be198b012f8ab9869c1850 Mon Sep 17 00:00:00 2001
> > > From: Michal Hocko <mhocko@suse.com>
> > > Date: Tue, 27 Dec 2016 16:28:44 +0100
> > > Subject: [PATCH] mm, vmscan: consider eligible zones in get_scan_count
> > > 
> > > get_scan_count considers the whole node LRU size when
> > > - doing SCAN_FILE due to many page cache inactive pages
> > > - calculating the number of pages to scan
> > > 
> > > in both cases this might lead to unexpected behavior especially on 32b
> > > systems where we can expect lowmem memory pressure very often.
> > > 
> > > A large highmem zone can easily distort SCAN_FILE heuristic because
> > > there might be only few file pages from the eligible zones on the node
> > > lru and we would still enforce file lru scanning which can lead to
> > > trashing while we could still scan anonymous pages.
> > 
> > Nit:
> > It doesn't make thrashing because isolate_lru_pages filter out them
> > but I agree it makes pointless CPU burning to find eligible pages.
> 
> This is not about isolate_lru_pages. The trashing could happen if we had
> lowmem pagecache user which would constantly reclaim recently faulted
> in pages while there is anonymous memory in the lowmem which could be
> reclaimed instead.
>  
> [...]
> > >  /*
> > > + * Return the number of pages on the given lru which are eligibne for the
> >                                                             eligible
> 
> fixed
> 
> > > + * given zone_idx
> > > + */
> > > +static unsigned long lruvec_lru_size_zone_idx(struct lruvec *lruvec,
> > > +		enum lru_list lru, int zone_idx)
> > 
> > Nit:
> > 
> > Although there is a comment, function name is rather confusing when I compared
> > it with lruvec_zone_lru_size.
> 
> I am all for a better name.
> 
> > lruvec_eligible_zones_lru_size is better?
> 
> this would be too easy to confuse with lruvec_eligible_zone_lru_size.
> What about lruvec_lru_size_eligible_zones?

Don't mind.

>  
> > Nit:
> > 
> > With this patch, inactive_list_is_low can use lruvec_lru_size_zone_idx rather than
> > own custom calculation to filter out non-eligible pages. 
> 
> Yes, that would be possible and I was considering that. But then I found
> useful to see total and reduced numbers in the tracepoint
> http://lkml.kernel.org/r/20161228153032.10821-8-mhocko@kernel.org
> and didn't want to call lruvec_lru_size 2 times. But if you insist then
> I can just do that.

I don't mind either but I think we need to describe the reason if you want to
go with your open-coded version. Otherwise, someone will try to fix it.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)
  2016-12-26 12:48                                     ` Michal Hocko
                                                         ` (2 preceding siblings ...)
  2016-12-29  0:31                                       ` Minchan Kim
@ 2016-12-30 10:19                                       ` Mel Gorman
  2016-12-30 11:05                                         ` Michal Hocko
  3 siblings, 1 reply; 62+ messages in thread
From: Mel Gorman @ 2016-12-30 10:19 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Nils Holland, Johannes Weiner, Vladimir Davydov, Tetsuo Handa,
	linux-kernel, linux-mm, Chris Mason, David Sterba, linux-btrfs

On Mon, Dec 26, 2016 at 01:48:40PM +0100, Michal Hocko wrote:
> On Fri 23-12-16 23:26:00, Nils Holland wrote:
> > On Fri, Dec 23, 2016 at 03:47:39PM +0100, Michal Hocko wrote:
> > > 
> > > Nils, even though this is still highly experimental, could you give it a
> > > try please?
> > 
> > Yes, no problem! So I kept the very first patch you sent but had to
> > revert the latest version of the debugging patch (the one in
> > which you added the "mm_vmscan_inactive_list_is_low" event) because
> > otherwise the patch you just sent wouldn't apply. Then I rebooted with
> > memory cgroups enabled again, and the first thing that strikes the eye
> > is that I get this during boot:
> > 
> > [    1.568174] ------------[ cut here ]------------
> > [    1.568327] WARNING: CPU: 0 PID: 1 at mm/memcontrol.c:1032 mem_cgroup_update_lru_size+0x118/0x130
> > [    1.568543] mem_cgroup_update_lru_size(f4406400, 2, 1): lru_size 0 but not empty
> 
> Ohh, I can see what is wrong! a) there is a bug in the accounting in
> my patch (I double account) and b) the detection for the empty list
> cannot work after my change because per node zone will not match per
> zone statistics. The updated patch is below. So I hope my brain already
> works after it's been mostly off last few days...
> ---
> From 397adf46917b2d9493180354a7b0182aee280a8b Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Fri, 23 Dec 2016 15:11:54 +0100
> Subject: [PATCH] mm, memcg: fix the active list aging for lowmem requests when
>  memcg is enabled
> 
> Nils Holland has reported unexpected OOM killer invocations with 32b
> kernel starting with 4.8 kernels
> 

I think it's unfortunate that per-zone stats are reintroduced to the
memcg structure. I can't help but think that it would have also worked
to always rotate a small number of pages if !inactive_list_is_low and
reclaiming for memcg even if it distorted page aging. However, given
that such an approach would be less robust and this has been heavily
tested;

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)
  2016-12-30  2:05                                             ` Minchan Kim
@ 2016-12-30 10:40                                               ` Michal Hocko
  0 siblings, 0 replies; 62+ messages in thread
From: Michal Hocko @ 2016-12-30 10:40 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Nils Holland, Mel Gorman, Johannes Weiner, Vladimir Davydov,
	Tetsuo Handa, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs, Steven Rostedt

On Fri 30-12-16 11:05:22, Minchan Kim wrote:
> On Thu, Dec 29, 2016 at 10:04:32AM +0100, Michal Hocko wrote:
> > On Thu 29-12-16 10:20:26, Minchan Kim wrote:
> > > On Tue, Dec 27, 2016 at 04:55:33PM +0100, Michal Hocko wrote:
[...]
> > > > + * given zone_idx
> > > > + */
> > > > +static unsigned long lruvec_lru_size_zone_idx(struct lruvec *lruvec,
> > > > +		enum lru_list lru, int zone_idx)
> > > 
> > > Nit:
> > > 
> > > Although there is a comment, function name is rather confusing when I compared
> > > it with lruvec_zone_lru_size.
> > 
> > I am all for a better name.
> > 
> > > lruvec_eligible_zones_lru_size is better?
> > 
> > this would be too easy to confuse with lruvec_eligible_zone_lru_size.
> > What about lruvec_lru_size_eligible_zones?
> 
> Don't mind.

I will go with lruvec_lru_size_eligible_zones then.

> > > Nit:
> > > 
> > > With this patch, inactive_list_is_low can use lruvec_lru_size_zone_idx rather than
> > > own custom calculation to filter out non-eligible pages. 
> > 
> > Yes, that would be possible and I was considering that. But then I found
> > useful to see total and reduced numbers in the tracepoint
> > http://lkml.kernel.org/r/20161228153032.10821-8-mhocko@kernel.org
> > and didn't want to call lruvec_lru_size 2 times. But if you insist then
> > I can just do that.
> 
> I don't mind either but I think we need to describe the reason if you want to
> go with your open-coded version. Otherwise, someone will try to fix it.

OK, I will go with the follow up patch on top of the tracepoints series.
I was hoping that the way how tracing is full of macros would allow us
to evaluate arguments only when the tracepoint is enabled but this
doesn't seem to be the case. Let's CC Steven. Would it be possible to
define a tracepoint in such a way that all given arguments are evaluated
only when the tracepoint is enabled?
---
>From 9a561d652f91f3557db22161600f10ca2462c74f Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Fri, 30 Dec 2016 11:28:20 +0100
Subject: [PATCH] mm, vmscan: cleanup up inactive_list_is_low

inactive_list_is_low is effectively duplicating logic implemented by
lruvec_lru_size_eligibe_zones. Let's use the dedicated function to
get the number of eligible pages on the lru list and ask use
lruvec_lru_size to get the total LRU lize only when the tracing is
really requested. We are still iterating over all LRUs two times in that
case but a) inactive_list_is_low is not a hot path and b) this can be
addressed at the tracing layer and only evaluate arguments only when the
tracing is enabled in future if that ever matters.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/vmscan.c | 38 ++++++++++----------------------------
 1 file changed, 10 insertions(+), 28 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 137bc85067d3..a9c881f06c0e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2054,11 +2054,10 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
 						struct scan_control *sc, bool trace)
 {
 	unsigned long inactive_ratio;
-	unsigned long total_inactive, inactive;
-	unsigned long total_active, active;
+	unsigned long inactive, active;
+	enum lru_list inactive_lru = file * LRU_FILE;
+	enum lru_list active_lru = file * LRU_FILE + LRU_ACTIVE;
 	unsigned long gb;
-	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
-	int zid;
 
 	/*
 	 * If we don't have swap space, anonymous page deactivation
@@ -2067,27 +2066,8 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
 	if (!file && !total_swap_pages)
 		return false;
 
-	total_inactive = inactive = lruvec_lru_size(lruvec, file * LRU_FILE);
-	total_active = active = lruvec_lru_size(lruvec, file * LRU_FILE + LRU_ACTIVE);
-
-	/*
-	 * For zone-constrained allocations, it is necessary to check if
-	 * deactivations are required for lowmem to be reclaimed. This
-	 * calculates the inactive/active pages available in eligible zones.
-	 */
-	for (zid = sc->reclaim_idx + 1; zid < MAX_NR_ZONES; zid++) {
-		struct zone *zone = &pgdat->node_zones[zid];
-		unsigned long inactive_zone, active_zone;
-
-		if (!managed_zone(zone))
-			continue;
-
-		inactive_zone = lruvec_zone_lru_size(lruvec, file * LRU_FILE, zid);
-		active_zone = lruvec_zone_lru_size(lruvec, (file * LRU_FILE) + LRU_ACTIVE, zid);
-
-		inactive -= min(inactive, inactive_zone);
-		active -= min(active, active_zone);
-	}
+	inactive = lruvec_lru_size_eligibe_zones(lruvec, inactive_lru, sc->reclaim_idx);
+	active = lruvec_lru_size_eligibe_zones(lruvec, active_lru, sc->reclaim_idx);
 
 	gb = (inactive + active) >> (30 - PAGE_SHIFT);
 	if (gb)
@@ -2096,10 +2076,12 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
 		inactive_ratio = 1;
 
 	if (trace)
-		trace_mm_vmscan_inactive_list_is_low(pgdat->node_id,
+		trace_mm_vmscan_inactive_list_is_low(lruvec_pgdat(lruvec)->node_id,
 				sc->reclaim_idx,
-				total_inactive, inactive,
-				total_active, active, inactive_ratio, file);
+				lruvec_lru_size(lruvec, inactive_lru), inactive,
+				lruvec_lru_size(lruvec, active_lru), active,
+				inactive_ratio, file);
+
 	return inactive * inactive_ratio < active;
 }
 
-- 
2.10.2

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)
  2016-12-30 10:19                                       ` Mel Gorman
@ 2016-12-30 11:05                                         ` Michal Hocko
  2016-12-30 12:43                                           ` Mel Gorman
  0 siblings, 1 reply; 62+ messages in thread
From: Michal Hocko @ 2016-12-30 11:05 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nils Holland, Johannes Weiner, Vladimir Davydov, Tetsuo Handa,
	linux-kernel, linux-mm, Chris Mason, David Sterba, linux-btrfs

On Fri 30-12-16 10:19:26, Mel Gorman wrote:
> On Mon, Dec 26, 2016 at 01:48:40PM +0100, Michal Hocko wrote:
> > On Fri 23-12-16 23:26:00, Nils Holland wrote:
> > > On Fri, Dec 23, 2016 at 03:47:39PM +0100, Michal Hocko wrote:
> > > > 
> > > > Nils, even though this is still highly experimental, could you give it a
> > > > try please?
> > > 
> > > Yes, no problem! So I kept the very first patch you sent but had to
> > > revert the latest version of the debugging patch (the one in
> > > which you added the "mm_vmscan_inactive_list_is_low" event) because
> > > otherwise the patch you just sent wouldn't apply. Then I rebooted with
> > > memory cgroups enabled again, and the first thing that strikes the eye
> > > is that I get this during boot:
> > > 
> > > [    1.568174] ------------[ cut here ]------------
> > > [    1.568327] WARNING: CPU: 0 PID: 1 at mm/memcontrol.c:1032 mem_cgroup_update_lru_size+0x118/0x130
> > > [    1.568543] mem_cgroup_update_lru_size(f4406400, 2, 1): lru_size 0 but not empty
> > 
> > Ohh, I can see what is wrong! a) there is a bug in the accounting in
> > my patch (I double account) and b) the detection for the empty list
> > cannot work after my change because per node zone will not match per
> > zone statistics. The updated patch is below. So I hope my brain already
> > works after it's been mostly off last few days...
> > ---
> > From 397adf46917b2d9493180354a7b0182aee280a8b Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Fri, 23 Dec 2016 15:11:54 +0100
> > Subject: [PATCH] mm, memcg: fix the active list aging for lowmem requests when
> >  memcg is enabled
> > 
> > Nils Holland has reported unexpected OOM killer invocations with 32b
> > kernel starting with 4.8 kernels
> > 
> 
> I think it's unfortunate that per-zone stats are reintroduced to the
> memcg structure.

the original patch I had didn't add per zone stats but rather did a
nr_highmem counter to mem_cgroup_per_node (inside ifdeff CONFIG_HIGMEM).
This would help for this particular case but it wouldn't work for other
lowmem requests (e.g. GFP_DMA32) and with the kmem accounting this might
be a problem in future. So I've decided to go with a more generic
approach which requires per-zone tracking. I cannot say I would be
overly happy about this at all.

> I can't help but think that it would have also worked
> to always rotate a small number of pages if !inactive_list_is_low and
> reclaiming for memcg even if it distorted page aging.

I am not really sure how that would work. Do you mean something like the
following?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index fa30010a5277..563ada3c02ac 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2044,6 +2044,9 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
 	inactive = lruvec_lru_size(lruvec, file * LRU_FILE);
 	active = lruvec_lru_size(lruvec, file * LRU_FILE + LRU_ACTIVE);
 
+	if (!mem_cgroup_disabled())
+		goto out;
+
 	/*
 	 * For zone-constrained allocations, it is necessary to check if
 	 * deactivations are required for lowmem to be reclaimed. This
@@ -2063,6 +2066,7 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
 		active -= min(active, active_zone);
 	}
 
+out:
 	gb = (inactive + active) >> (30 - PAGE_SHIFT);
 	if (gb)
 		inactive_ratio = int_sqrt(10 * gb);

The problem I see with such an approach is that chances are that this
would reintroduce what f8d1a31163fc ("mm: consider whether to decivate
based on eligible zones inactive ratio") tried to fix. But maybe I have
missed your point.

> However, given that such an approach would be less robust and this has
> been heavily tested;
> 
> Acked-by: Mel Gorman <mgorman@suse.de>

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)
  2016-12-30 11:05                                         ` Michal Hocko
@ 2016-12-30 12:43                                           ` Mel Gorman
  0 siblings, 0 replies; 62+ messages in thread
From: Mel Gorman @ 2016-12-30 12:43 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Nils Holland, Johannes Weiner, Vladimir Davydov, Tetsuo Handa,
	linux-kernel, linux-mm, Chris Mason, David Sterba, linux-btrfs

On Fri, Dec 30, 2016 at 12:05:45PM +0100, Michal Hocko wrote:
> On Fri 30-12-16 10:19:26, Mel Gorman wrote:
> > On Mon, Dec 26, 2016 at 01:48:40PM +0100, Michal Hocko wrote:
> > > On Fri 23-12-16 23:26:00, Nils Holland wrote:
> > > > On Fri, Dec 23, 2016 at 03:47:39PM +0100, Michal Hocko wrote:
> > > > > 
> > > > > Nils, even though this is still highly experimental, could you give it a
> > > > > try please?
> > > > 
> > > > Yes, no problem! So I kept the very first patch you sent but had to
> > > > revert the latest version of the debugging patch (the one in
> > > > which you added the "mm_vmscan_inactive_list_is_low" event) because
> > > > otherwise the patch you just sent wouldn't apply. Then I rebooted with
> > > > memory cgroups enabled again, and the first thing that strikes the eye
> > > > is that I get this during boot:
> > > > 
> > > > [    1.568174] ------------[ cut here ]------------
> > > > [    1.568327] WARNING: CPU: 0 PID: 1 at mm/memcontrol.c:1032 mem_cgroup_update_lru_size+0x118/0x130
> > > > [    1.568543] mem_cgroup_update_lru_size(f4406400, 2, 1): lru_size 0 but not empty
> > > 
> > > Ohh, I can see what is wrong! a) there is a bug in the accounting in
> > > my patch (I double account) and b) the detection for the empty list
> > > cannot work after my change because per node zone will not match per
> > > zone statistics. The updated patch is below. So I hope my brain already
> > > works after it's been mostly off last few days...
> > > ---
> > > From 397adf46917b2d9493180354a7b0182aee280a8b Mon Sep 17 00:00:00 2001
> > > From: Michal Hocko <mhocko@suse.com>
> > > Date: Fri, 23 Dec 2016 15:11:54 +0100
> > > Subject: [PATCH] mm, memcg: fix the active list aging for lowmem requests when
> > >  memcg is enabled
> > > 
> > > Nils Holland has reported unexpected OOM killer invocations with 32b
> > > kernel starting with 4.8 kernels
> > > 
> > 
> > I think it's unfortunate that per-zone stats are reintroduced to the
> > memcg structure.
> 
> the original patch I had didn't add per zone stats but rather did a
> nr_highmem counter to mem_cgroup_per_node (inside ifdeff CONFIG_HIGMEM).
> This would help for this particular case but it wouldn't work for other
> lowmem requests (e.g. GFP_DMA32) and with the kmem accounting this might
> be a problem in future.

That did occur to me.

> So I've decided to go with a more generic
> approach which requires per-zone tracking. I cannot say I would be
> overly happy about this at all.
> 
> > I can't help but think that it would have also worked
> > to always rotate a small number of pages if !inactive_list_is_low and
> > reclaiming for memcg even if it distorted page aging.
> 
> I am not really sure how that would work. Do you mean something like the
> following?
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index fa30010a5277..563ada3c02ac 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2044,6 +2044,9 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
>  	inactive = lruvec_lru_size(lruvec, file * LRU_FILE);
>  	active = lruvec_lru_size(lruvec, file * LRU_FILE + LRU_ACTIVE);
>  
> +	if (!mem_cgroup_disabled())
> +		goto out;
> +
>  	/*
>  	 * For zone-constrained allocations, it is necessary to check if
>  	 * deactivations are required for lowmem to be reclaimed. This
> @@ -2063,6 +2066,7 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
>  		active -= min(active, active_zone);
>  	}
>  
> +out:
>  	gb = (inactive + active) >> (30 - PAGE_SHIFT);
>  	if (gb)
>  		inactive_ratio = int_sqrt(10 * gb);
> 
> The problem I see with such an approach is that chances are that this
> would reintroduce what f8d1a31163fc ("mm: consider whether to decivate
> based on eligible zones inactive ratio") tried to fix. But maybe I have
> missed your point.
> 

No, you didn't miss the point. It was something like that I had in mind
but as I thought about it, I could see some cases where it might not work
and still cause a premature OOM. The per-zone accounting is unfortunate
but it's robust hence the Ack.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2016-12-30 12:43 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-15 22:57 OOM: Better, but still there on 4.9 Nils Holland
2016-12-16  7:39 ` Michal Hocko
2016-12-16 15:58   ` OOM: Better, but still there on Michal Hocko
2016-12-16 15:58     ` [PATCH 1/2] mm: consolidate GFP_NOFAIL checks in the allocator slowpath Michal Hocko
2016-12-16 15:58     ` [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically Michal Hocko
2016-12-16 17:31       ` Johannes Weiner
2016-12-16 22:12         ` Michal Hocko
2016-12-17 11:17           ` Tetsuo Handa
2016-12-18 16:37             ` Michal Hocko
2016-12-16 18:47     ` OOM: Better, but still there on Nils Holland
2016-12-17  0:02       ` Michal Hocko
2016-12-17 12:59         ` Nils Holland
2016-12-17 14:44           ` Tetsuo Handa
2016-12-17 17:11             ` Nils Holland
2016-12-17 21:06             ` Nils Holland
2016-12-18  5:14               ` Tetsuo Handa
2016-12-19 13:45               ` Michal Hocko
2016-12-20  2:08                 ` Nils Holland
2016-12-21  7:36                   ` Michal Hocko
2016-12-21 11:00                     ` Tetsuo Handa
2016-12-21 11:16                       ` Michal Hocko
2016-12-21 14:04                         ` Chris Mason
2016-12-22 10:10                     ` Nils Holland
2016-12-22 10:27                       ` Michal Hocko
2016-12-22 10:35                         ` Nils Holland
2016-12-22 10:46                           ` Tetsuo Handa
2016-12-22 19:17                       ` Michal Hocko
2016-12-22 21:46                         ` Nils Holland
2016-12-23 10:51                           ` Michal Hocko
2016-12-23 12:18                             ` Nils Holland
2016-12-23 12:57                               ` Michal Hocko
2016-12-23 14:47                                 ` [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on) Michal Hocko
2016-12-23 22:26                                   ` Nils Holland
2016-12-26 12:48                                     ` Michal Hocko
2016-12-26 18:57                                       ` Nils Holland
2016-12-27  8:08                                         ` Michal Hocko
2016-12-27 11:23                                           ` Nils Holland
2016-12-27 11:27                                             ` Michal Hocko
2016-12-27 15:55                                       ` Michal Hocko
2016-12-27 16:28                                         ` [PATCH] mm, vmscan: consider eligible zones in get_scan_count kbuild test robot
2016-12-28  8:51                                           ` Michal Hocko
2016-12-27 19:33                                         ` [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on) Nils Holland
2016-12-28  8:57                                           ` Michal Hocko
2016-12-29  1:20                                         ` Minchan Kim
2016-12-29  9:04                                           ` Michal Hocko
2016-12-30  2:05                                             ` Minchan Kim
2016-12-30 10:40                                               ` Michal Hocko
2016-12-29  0:31                                       ` Minchan Kim
2016-12-29  0:48                                         ` Minchan Kim
2016-12-29  8:52                                           ` Michal Hocko
2016-12-30 10:19                                       ` Mel Gorman
2016-12-30 11:05                                         ` Michal Hocko
2016-12-30 12:43                                           ` Mel Gorman
2016-12-25 22:25                                   ` [lkp-developer] [mm, memcg] d18e2b2aca: WARNING:at_mm/memcontrol.c:#mem_cgroup_update_lru_size kernel test robot
2016-12-26 12:26                                     ` Michal Hocko
2016-12-26 12:50                                       ` Michal Hocko
2016-12-18  0:28             ` OOM: Better, but still there on Xin Zhou
2016-12-16 18:15   ` OOM: Better, but still there on 4.9 Chris Mason
2016-12-16 22:14     ` Michal Hocko
2016-12-16 22:47       ` Chris Mason
2016-12-16 23:31         ` Michal Hocko
2016-12-16 19:50   ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).