Greetings, 0day kernel testing robot got the below dmesg and the first bad commit is https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master commit 3f906ba23689a3f824424c50f3ae937c2c70f676 Author: Thomas Gleixner AuthorDate: Mon Jul 10 15:50:09 2017 -0700 Commit: Linus Torvalds CommitDate: Mon Jul 10 16:32:33 2017 -0700 mm/memory-hotplug: switch locking to a percpu rwsem Andrey reported a potential deadlock with the memory hotplug lock and the cpu hotplug lock. The reason is that memory hotplug takes the memory hotplug lock and then calls stop_machine() which calls get_online_cpus(). That's the reverse lock order to get_online_cpus(); get_online_mems(); in mm/slub_common.c The problem has been there forever. The reason why this was never reported is that the cpu hotplug locking had this homebrewn recursive reader writer semaphore construct which due to the recursion evaded the full lock dep coverage. The memory hotplug code copied that construct verbatim and therefor has similar issues. Three steps to fix this: 1) Convert the memory hotplug locking to a per cpu rwsem so the potential issues get reported proper by lockdep. 2) Lock the online cpus in mem_hotplug_begin() before taking the memory hotplug rwsem and use stop_machine_cpuslocked() in the page_alloc code to avoid recursive locking. 3) The cpu hotpluck locking in #2 causes a recursive locking of the cpu hotplug lock via __offline_pages() -> lru_add_drain_all(). Solve this by invoking lru_add_drain_all_cpuslocked() instead. Link: http://lkml.kernel.org/r/20170704093421.506836322@linutronix.de Reported-by: Andrey Ryabinin Signed-off-by: Thomas Gleixner Acked-by: Michal Hocko Acked-by: Vlastimil Babka Cc: Vladimir Davydov Cc: Peter Zijlstra Cc: Davidlohr Bueso Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds a47fed5b5b mm: swap: provide lru_add_drain_all_cpuslocked() 3f906ba236 mm/memory-hotplug: switch locking to a percpu rwsem 28619527b8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net f2b6e66e98 Add linux-next specific files for 20180904 +------------------------------------------------------------------+------------+------------+------------+---------------+ | | a47fed5b5b | 3f906ba236 | 28619527b8 | next-20180904 | +------------------------------------------------------------------+------------+------------+------------+---------------+ | boot_successes | 132 | 17 | 13 | 15 | | boot_failures | 14 | 29 | 39 | 35 | | BUG_kmalloc-#(Not_tainted):Redzone_overwritten | 12 | 4 | | | | INFO:#-#.First_byte#instead_of | 12 | 4 | | | | INFO:Freed_in_skb_free_head_age=#cpu=#pid= | 1 | 1 | | | | INFO:Slab#objects=#used=#fp=#flags= | 12 | 4 | | | | INFO:Object#@offset=#fp=0x(null) | 12 | 4 | | | | INFO:Freed_in_rcu_process_callbacks_age=#cpu=#pid= | 2 | | | | | INFO:Freed_in_free_ctx_age=#cpu=#pid= | 3 | 1 | | | | BUG:unable_to_handle_kernel | 1 | 1 | | | | Oops:#[##] | 1 | 1 | | | | Kernel_panic-not_syncing:Fatal_exception | 1 | 1 | | | | BUG_fasync_cache(Not_tainted):Freelist_Pointer_check_fails | 1 | | | | | INFO:Slab#objects=#used=#fp=0x(null)flags= | 1 | | | | | INFO:Object#@offset=#fp= | 1 | | | | | INFO:Freed_in_free_pipe_info_age=#cpu=#pid= | 2 | | | | | INFO:Freed_in_load_elf_binary_age=#cpu=#pid= | 1 | | | | | WARNING:possible_circular_locking_dependency_detected | 0 | 26 | 35 | 31 | | Mem-Info | 0 | 2 | 6 | 5 | | INFO:Freed_in_kvfree_age=#cpu=#pid= | 0 | 1 | | | | invoked_oom-killer:gfp_mask=0x | 0 | 0 | 4 | 4 | | Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 0 | 0 | 4 | | | Out_of_memory_and_no_killable_processes | 0 | 0 | 0 | 4 | | Kernel_panic-not_syncing:System_is_deadlocked_on_memory | 0 | 0 | 0 | 4 | +------------------------------------------------------------------+------------+------------+------------+---------------+ vm86 returned ENOSYS, marking as inactive. 20044 iterations. [F:14867 S:5032 HI:3700] [ 57.651003] synth uevent: /module/pcmcia_core: unknown uevent action string [ 71.189062] [ 71.191953] ====================================================== [ 71.192813] WARNING: possible circular locking dependency detected [ 71.193664] 4.12.0-10480-g3f906ba #1 Not tainted [ 71.194355] ------------------------------------------------------ [ 71.195211] trinity-c0/1666 is trying to acquire lock: [ 71.195958] (mem_hotplug_lock.rw_sem){.+.+.+}, at: show_slab_objects+0x14b/0x440 [ 71.197284] [ 71.197284] but task is already holding lock: [ 71.198241] (s_active#39){++++.+}, at: kernfs_seq_start+0x44/0xa0 [ 71.199433] [ 71.199433] which lock already depends on the new lock. [ 71.199433] [ 71.200774] [ 71.200774] the existing dependency chain (in reverse order) is: [ 71.201900] [ 71.201900] -> #2 (s_active#39){++++.+}: [ 71.202814] lock_acquire+0xcf/0x2a0 [ 71.203447] __kernfs_remove+0x337/0x410 [ 71.204121] kernfs_remove_by_name_ns+0x49/0xd0 [ 71.204858] sysfs_remove_link+0x19/0x30 [ 71.205528] sysfs_slab_add+0xa6/0x290 [ 71.206182] __kmem_cache_create+0x52a/0x5b0 [ 71.206889] kmem_cache_create+0x239/0x3a0 [ 71.207578] snic_init_module+0x111/0x203 [ 71.208263] do_one_initcall+0x41/0x19c [ 71.208928] kernel_init_freeable+0x1f4/0x27d [ 71.209647] kernel_init+0xe/0x100 [ 71.210268] ret_from_fork+0x2a/0x40 [ 71.210904] [ 71.210904] -> #1 (slab_mutex){+.+.+.}: [ 71.211806] lock_acquire+0xcf/0x2a0 [ 71.212440] __mutex_lock+0x6f/0xda0 [ 71.213075] mutex_lock_nested+0x1b/0x20 [ 71.213745] kmem_cache_create+0x3e/0x3a0 [ 71.214427] ptlock_cache_init+0x24/0x2d [ 71.215102] start_kernel+0x240/0x4a5 [ 71.215747] x86_64_start_reservations+0x2a/0x2c [ 71.216488] x86_64_start_kernel+0x127/0x136 [ 71.217197] verify_cpu+0x0/0xf1 [ 71.217796] [ 71.217796] -> #0 (mem_hotplug_lock.rw_sem){.+.+.+}: [ 71.218816] __lock_acquire+0x120b/0x1240 [ 71.219496] lock_acquire+0xcf/0x2a0 [ 71.220135] get_online_mems+0x47/0xb0 [ 71.220789] show_slab_objects+0x14b/0x440 [ 71.221476] slabs_show+0x13/0x20 [ 71.222089] slab_attr_show+0x1b/0x30 [ 71.222735] sysfs_kf_seq_show+0x109/0x190 [ 71.223421] kernfs_seq_show+0x27/0x30 [ 71.224076] traverse+0xa8/0x230 [ 71.224674] seq_read+0x1c6/0x530 [ 71.225284] kernfs_fop_read+0x18a/0x1e0 [ 71.225956] do_iter_read+0x164/0x1a0 [ 71.226596] vfs_readv+0x67/0x90 [ 71.227197] do_preadv+0x9e/0xb0 [ 71.227795] SyS_preadv+0x11/0x20 [ 71.228400] entry_SYSCALL_64_fastpath+0x23/0xc2 [ 71.229145] [ 71.229145] other info that might help us debug this: [ 71.229145] [ 71.230462] Chain exists of: [ 71.230462] mem_hotplug_lock.rw_sem --> slab_mutex --> s_active#39 [ 71.230462] [ 71.232069] Possible unsafe locking scenario: [ 71.232069] [ 71.233031] CPU0 CPU1 [ 71.233711] ---- ---- [ 71.234390] lock(s_active#39); [ 71.234938] lock(slab_mutex); [ 71.235737] lock(s_active#39); [ 71.236543] lock(mem_hotplug_lock.rw_sem); [ 71.237200] [ 71.237200] COPYING CREDITS Documentation Kbuild Kconfig LICENSES MAINTAINERS Makefile Next README arch block certs crypto drivers firmware fs include init ipc kernel lib localversion-next mm net obj-bisect samples scripts security sound tools usr virt DEADLOCK COPYING CREDITS Documentation Kbuild Kconfig LICENSES MAINTAINERS Makefile Next README arch block certs crypto drivers firmware fs include init ipc kernel lib localversion-next mm net obj-bisect samples scripts security sound tools usr virt [ 71.237200] [ 71.238303] 3 locks held by trinity-c0/1666: [ 71.238956] #0: (&p->lock){+.+.+.}, at: seq_read+0x41/0x530 [ 71.240100] #1: (&of->mutex){+.+.+.}, at: kernfs_seq_start+0x3c/0xa0 [ 71.241327] #2: (s_active#39){++++.+}, at: kernfs_seq_start+0x44/0xa0 [ 71.242558] [ 71.242558] stack backtrace: [ 71.243360] CPU: 0 PID: 1666 Comm: trinity-c0 Not tainted 4.12.0-10480-g3f906ba #1 [ 71.244500] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 71.245724] Call Trace: [ 71.246187] dump_stack+0x8e/0xd7 [ 71.246744] print_circular_bug+0x1c4/0x220 [ 71.247385] __lock_acquire+0x120b/0x1240 [ 71.248015] lock_acquire+0xcf/0x2a0 [ 71.248594] show_slab_objects+0x14b/0x440 [ 71.249250] get_online_mems+0x47/0xb0 [ 71.249853] show_slab_objects+0x14b/0x440 [ 71.250505] show_slab_objects+0x14b/0x440 [ 71.251140] __lock_is_held+0x64/0xb0 [ 71.251751] slabs_show+0x13/0x20 [ 71.252303] slab_attr_show+0x1b/0x30 [ 71.252896] sysfs_kf_seq_show+0x109/0x190 [ 71.253528] kernfs_seq_show+0x27/0x30 [ 71.254130] traverse+0xa8/0x230 [ 71.254673] seq_read+0x1c6/0x530 [ 71.255229] kernfs_fop_read+0x18a/0x1e0 [ 71.255850] do_iter_read+0x164/0x1a0 [ 71.256438] vfs_readv+0x67/0x90 [ 71.256984] trace_hardirqs_on_caller+0xf7/0x190 [ 71.257688] trace_hardirqs_on+0xd/0x10 [ 71.258316] _raw_spin_unlock_irq+0x2c/0x40 [ 71.258980] __fget_light+0x62/0x90 [ 71.259566] do_preadv+0x9e/0xb0 [ 71.260112] SyS_preadv+0x11/0x20 [ 71.260664] entry_SYSCALL_64_fastpath+0x23/0xc2 [ 71.261356] RIP: 0033:0x457389 [ 71.261881] RSP: 002b:00007ffec728b8e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000127 [ 71.263024] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000457389 [ 71.263979] RDX: 000000000000008c RSI: 0000000002b95250 RDI: 0000000000000094 [ 71.264936] RBP: 00007ff6b84d5000 R08: 0000000000000006 R09: 000000000000fffa [ 71.265891] R10: 0000000000400000 R11: 0000000000000246 R12: 0000000000000000 [ 71.266845] R13: 0000000000000127 R14: 00000000006fe4c0 R15: 00000000cccccccd BusyBox v1.19.4 (2012-04-22 08:49:11 PDT) multi-call binary. Usage: rmmod # HH:MM RESULT GOOD BAD GOOD_BUT_DIRTY DIRTY_NOT_BAD git bisect start v4.13 v4.12 -- git bisect bad 63a86362130f4c17eaa57f3ef5171ec43111a54e # 04:59 B 2 2 1 1 Merge tag 'pm-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm git bisect good 090a81d8766e21d33ab3e4d24e6c8e5eedf086dd # 05:19 G 45 0 5 5 Merge branch 'for-spi' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs git bisect bad ad51271afc21a72479974713abb40ca4b96d1f6b # 05:38 B 1 2 1 1 Merge branch 'akpm' (patches from Andrew) git bisect good 7cee9384cb3e25de33d75ecdbf08bb15b4ea9fa5 # 06:08 G 49 0 4 4 Fix up over-eager 'wait_queue_t' renaming git bisect bad fb4e3beeffa47619985f190663c6ef424f063a22 # 06:48 B 8 10 6 7 Merge tag 'iommu-updates-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu git bisect bad a3ddacbae5abc0a5aabb1e75b655e8cd6dc83888 # 07:12 B 3 2 1 1 Merge tag 'chrome-platform-for-linus-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform git bisect good 322618684353315e14f586b33d8a016286ffa700 # 07:37 G 46 0 8 8 Merge tag 'acpi-extra-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm git bisect bad 9967468c0a109644e4a1f5b39b39bf86fe7507a7 # 08:21 B 6 5 3 3 Merge branch 'akpm' (patches from Andrew) git bisect good 548aa0e3c516d906dae5edb1fc9a1ad2e490120a # 08:41 G 49 0 12 12 Merge tag 'devprop-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm git bisect good a47fed5b5b014f5a13878b90ef2c3a7dc294189f # 09:25 G 49 0 6 6 mm: swap: provide lru_add_drain_all_cpuslocked() git bisect bad 2c6deb01525ac11cc03c44fe31e3f45ce2cadaf9 # 09:41 B 6 6 4 4 bitmap: use memcmp optimisation in more situations git bisect bad b6d0f14abb7c1d9d522c064633c820aaeb9bbfbf # 10:06 B 3 1 0 0 frv: cmpxchg: implement cmpxchg64() git bisect bad 4d461333f144456b80d9eabd7cee7ac02fa5d0ee # 10:22 B 2 3 1 1 x86/kasan: don't allocate extra shadow memory git bisect bad bc1bb362334ebc4c65dd4301f10fb70902b3db7d # 10:38 B 7 6 0 0 zram: constify attribute_group structures. git bisect bad 9d1f4b3f5b29bea431525e528a3ff2dc806ad904 # 11:05 B 1 1 1 1 mm: disallow early_pfn_to_nid on configurations which do not implement it git bisect bad 3f906ba23689a3f824424c50f3ae937c2c70f676 # 11:22 B 2 6 2 2 mm/memory-hotplug: switch locking to a percpu rwsem # first bad commit: [3f906ba23689a3f824424c50f3ae937c2c70f676] mm/memory-hotplug: switch locking to a percpu rwsem git bisect good a47fed5b5b014f5a13878b90ef2c3a7dc294189f # 11:27 G 144 0 12 18 mm: swap: provide lru_add_drain_all_cpuslocked() # extra tests with debug options git bisect bad 3f906ba23689a3f824424c50f3ae937c2c70f676 # 11:44 B 1 6 1 2 mm/memory-hotplug: switch locking to a percpu rwsem # extra tests on HEAD of linux-devel/devel-catchup-201809041401 git bisect bad 6859ff565807876b3ad6f5d7cccedc214bfc51d1 # 11:45 B 5 8 0 0 0day head guard for 'devel-catchup-201809041401' # extra tests on tree/branch linus/master git bisect bad 28619527b8a712590c93d0a9e24b4425b9376a8c # 11:59 B 3 2 1 1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net # extra tests on tree/branch linux-next/master git bisect bad f2b6e66e9885a2eae12efd73cc6f859213f64c23 # 12:14 B 4 6 1 5 Add linux-next specific files for 20180904 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/lkp Intel Corporation