Greetings, 0day kernel testing robot got the below dmesg and the first bad commit is git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master commit f7a7b53a90f7a489c4e435d1300db121f6b42776 Author: Kirill A. Shutemov AuthorDate: Fri Jan 23 10:11:34 2015 +1100 Commit: Stephen Rothwell CommitDate: Fri Jan 23 10:11:34 2015 +1100 mm: account pmd page tables to the process Dave noticed that unprivileged process can allocate significant amount of memory -- >500 MiB on x86_64 -- and stay unnoticed by oom-killer and memory cgroup. The trick is to allocate a lot of PMD page tables. Linux kernel doesn't account PMD tables to the process, only PTE. The use-cases below use few tricks to allocate a lot of PMD page tables while keeping VmRSS and VmPTE low. oom_score for the process will be 0. #include #include #include #include #include #include #define PUD_SIZE (1UL << 30) #define PMD_SIZE (1UL << 21) #define NR_PUD 130000 int main(void) { char *addr = NULL; unsigned long i; prctl(PR_SET_THP_DISABLE); for (i = 0; i < NR_PUD ; i++) { addr = mmap(addr + PUD_SIZE, PUD_SIZE, PROT_WRITE|PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); if (addr == MAP_FAILED) { perror("mmap"); break; } *addr = 'x'; munmap(addr, PMD_SIZE); mmap(addr, PMD_SIZE, PROT_WRITE|PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED, -1, 0); if (addr == MAP_FAILED) perror("re-mmap"), exit(1); } printf("PID %d consumed %lu KiB in PMD page tables\n", getpid(), i * 4096 >> 10); return pause(); } The patch addresses the issue by account PMD tables to the process the same way we account PTE. The main place where PMD tables is accounted is __pmd_alloc() and free_pmd_range(). But there're few corner cases: - HugeTLB can share PMD page tables. The patch handles by accounting the table to all processes who share it. - x86 PAE pre-allocates few PMD tables on fork. - Architectures with FIRST_USER_ADDRESS > 0. We need to adjust sanity check on exit(2). Accounting only happens on configuration where PMD page table's level is present (PMD is not folded). As with nr_ptes we use per-mm counter. The counter value is used to calculate baseline for badness score by oom-killer. Signed-off-by: Kirill A. Shutemov Reported-by: Dave Hansen Cc: Hugh Dickins Reviewed-by: Cyrill Gorcunov Cc: Pavel Emelyanov Cc: David Rientjes Signed-off-by: Andrew Morton +-----------------------------------+------------+------------+---------------+ | | fe888c1f62 | f7a7b53a90 | next-20150123 | +-----------------------------------+------------+------------+---------------+ | boot_successes | 1364 | 142 | 25 | | boot_failures | 5 | 227 | 19 | | BUG:kernel_test_crashed | 5 | | | | WARNING:at_mm/mmap.c:#exit_mmap() | 0 | 227 | 19 | | backtrace:do_execve | 0 | 227 | 19 | | backtrace:SyS_execve | 0 | 227 | 19 | | backtrace:do_group_exit | 0 | 227 | 19 | | backtrace:SyS_exit_group | 0 | 227 | 19 | | backtrace:do_execveat_common | 0 | 3 | | | backtrace:do_exit | 0 | 5 | | +-----------------------------------+------------+------------+---------------+ [ 17.687075] Freeing unused kernel memory: 1716K (c190d000 - c1aba000) [ 17.808897] random: init urandom read with 5 bits of entropy available [ 17.828360] ------------[ cut here ]------------ [ 17.828989] WARNING: CPU: 1 PID: 681 at mm/mmap.c:2858 exit_mmap+0x197/0x1ad() [ 17.830086] Modules linked in: [ 17.830549] CPU: 1 PID: 681 Comm: init Not tainted 3.19.0-rc5-gf7a7b53 #19 [ 17.831339] 00000001 00000000 00000001 d388bd4c c14341a1 00000000 00000001 c16ebf08 [ 17.832421] d388bd68 c1056987 00000b2a c1150db8 00000001 00000001 00000000 d388bd78 [ 17.833488] c1056a11 00000009 00000000 d388bdd0 c1150db8 d3858380 ffffffff ffffffff [ 17.841323] Call Trace: [ 17.844215] [] dump_stack+0x78/0xa8 [ 17.844700] [] warn_slowpath_common+0xb7/0xce [ 17.847797] [] ? exit_mmap+0x197/0x1ad [ 17.850955] [] warn_slowpath_null+0x14/0x18 [ 17.854131] [] exit_mmap+0x197/0x1ad [ 17.854629] [] mmput+0x52/0xef [ 17.857584] [] flush_old_exec+0x923/0x99d [ 17.860806] [] load_elf_binary+0x430/0x11af [ 17.861378] [] ? local_clock+0x2f/0x39 [ 17.865327] [] ? lock_release_holdtime+0x60/0x6d [ 17.866002] [] search_binary_handler+0x9c/0x20f [ 17.866588] [] load_script+0x339/0x355 [ 17.874149] [] ? sched_clock_cpu+0x188/0x1a3 [ 17.874718] [] ? local_clock+0x2f/0x39 [ 17.878580] [] ? lock_release_holdtime+0x60/0x6d [ 17.879355] [] ? do_raw_read_unlock+0x28/0x53 [ 17.879997] [] search_binary_handler+0x9c/0x20f [ 17.887644] [] do_execveat_common+0x6d6/0x954 [ 17.890904] [] do_execve+0x19/0x1b [ 17.891389] [] SyS_execve+0x21/0x25 [ 17.895168] [] syscall_call+0x7/0x7 [ 17.895653] ---[ end trace 6a7094e9a1d04ce0 ]--- [ 17.909585] ------------[ cut here ]------------ git bisect start de3d2c5b941c632685ab58613f981bf14a42676f ec6f34e5b552fb0a52e6aae1a5afbbb1605cc6cc -- git bisect good 505c8f8b41aaae2239941fc1c25bc8d4aa9188a6 # 08:42 369+ 1 Merge remote-tracking branch 'kbuild/for-next' git bisect good 5cdfab738b22d402bc764e9f5f93824ff5f3800f # 08:46 369+ 0 Merge remote-tracking branch 'audit/next' git bisect good 551aa38a4d27c7e71791ded0ee4a746abe954f9b # 08:53 369+ 0 Merge remote-tracking branch 'usb-gadget/next' git bisect good bf26a22140410ca8fee8de8d74d9b69eeac450d1 # 08:58 369+ 3 Merge remote-tracking branch 'pwm/for-next' git bisect good 522698e0cdb31f34ef897d463ddbe4d289a83b16 # 09:05 369+ 1 Merge remote-tracking branch 'y2038/y2038' git bisect good 879b01ab025b80f0350b3181f2eb86f1a3deadc2 # 09:10 369+ 0 Merge remote-tracking branch 'livepatching/for-next' git bisect bad d347062b744695e0490a53c199fac1a184870d29 # 09:10 0- 156 Merge branch 'akpm-current/current' git bisect bad f7a7b53a90f7a489c4e435d1300db121f6b42776 # 09:34 0- 5 mm: account pmd page tables to the process git bisect good 905d130bf8d5622c4dfa1667414993bb214d3a1e # 10:50 369+ 1 x86: drop _PAGE_FILE and pte_file()-related helpers git bisect good daba3b6a1f18fc36eb6fe15eca008c3e658a8f72 # 11:39 369+ 1 mm: numa: add paranoid check around pte_protnone_numa git bisect good 077ccc6a5a442a0460aba99085a6b84578a01faf # 12:21 369+ 2 memcg: add BUILD_BUG_ON() for string tables git bisect good 76c365c2fe9bc89844dee698b7d3382faa9afc75 # 12:31 369+ 1 oom, PM: make OOM detection in the freezer path raceless git bisect good 10c7667f091d0ab62b13d31f33bef469dc6683b4 # 13:27 369+ 2 fs: shrinker: always scan at least one object of each type git bisect good 8aac135aaf196fd1a0b8f9c08d3514b64cefc4b3 # 13:47 369+ 1 mm: make FIRST_USER_ADDRESS unsigned long on all archs git bisect good fe888c1f6277ea1b0d18dda12fff1dac4617905a # 14:05 369+ 1 arm: define __PAGETABLE_PMD_FOLDED for !LPAE # first bad commit: [f7a7b53a90f7a489c4e435d1300db121f6b42776] mm: account pmd page tables to the process git bisect good fe888c1f6277ea1b0d18dda12fff1dac4617905a # 14:26 1000+ 5 arm: define __PAGETABLE_PMD_FOLDED for !LPAE # extra tests with DEBUG_INFO git bisect good f7a7b53a90f7a489c4e435d1300db121f6b42776 # 14:46 1000+ 0 mm: account pmd page tables to the process # extra tests on HEAD of next/master git bisect bad de3d2c5b941c632685ab58613f981bf14a42676f # 14:46 0- 19 Add linux-next specific files for 20150123 # extra tests on tree/branch next/master git bisect bad de3d2c5b941c632685ab58613f981bf14a42676f # 14:46 0- 19 Add linux-next specific files for 20150123 # extra tests on tree/branch linus/master git bisect good c4e00f1d31c4c83d15162782491689229bd92527 # 16:42 1000+ 3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs # extra tests on tree/branch next/master git bisect bad de3d2c5b941c632685ab58613f981bf14a42676f # 16:43 0- 19 Add linux-next specific files for 20150123 This script may reproduce the error. ---------------------------------------------------------------------------- #!/bin/bash kernel=$1 initrd=quantal-core-i386.cgz wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/$initrd kvm=( qemu-system-x86_64 -cpu kvm64 -enable-kvm -kernel $kernel -initrd $initrd -m 320 -smp 2 -net nic,vlan=1,model=e1000 -net user,vlan=1 -boot order=nc -no-reboot -watchdog i6300esb -rtc base=localtime -serial stdio -display none -monitor null ) append=( hung_task_panic=1 earlyprintk=ttyS0,115200 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal root=/dev/ram0 rw drbd.minor_count=8 ) "${kvm[@]}" --append "${append[*]}" ---------------------------------------------------------------------------- Thanks, Fengguang