From: Alexey Brodkin <Alexey.Brodkin@synopsys.com> To: Vineet Gupta <Vineet.Gupta1@synopsys.com> Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>, "linux-snps-arc@lists.infradead.org" <linux-snps-arc@lists.infradead.org> Subject: arc: mm->mmap_sem gets locked in do_page_fault() in case of OOM killer invocation Date: Fri, 16 Feb 2018 12:40:30 +0000 [thread overview] Message-ID: <1518784830.3544.33.camel@synopsys.com> (raw) Hi Vineet, While playing with OOM killer I bumped in a pure software deadlock on ARC which is even observed in simulation (i.e. it has nothing to do with HW peculiarities). What's nice kernel even sees that lock-up if "Lock Debugging" is enabled. That's what I see: -------------------------------------------->8------------------------------------------- # /home/oom-test 450 & /home/oom-test 450 oom-test invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 CPU: 0 PID: 67 Comm: oom-test Not tainted 4.14.19 #2 Stack Trace: arc_unwind_core.constprop.1+0xd4/0xf8 dump_header.isra.6+0x84/0x2f8 oom_kill_process+0x258/0x7c8 out_of_memory+0xb8/0x5e0 __alloc_pages_nodemask+0x922/0xd28 handle_mm_fault+0x284/0xd90 do_page_fault+0xf6/0x2a0 ret_from_exception+0x0/0x8 Mem-Info: active_anon:62276 inactive_anon:341 isolated_anon:0 active_file:0 inactive_file:0 isolated_file:0 unevictable:0 dirty:0 writeback:0 unstable:0 slab_reclaimable:26 slab_unreclaimable:196 mapped:105 shmem:578 pagetables:263 bounce:0 free:344 free_pcp:39 free_cma:0 Node 0 active_anon:498208kB inactive_anon:2728kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:840kB dirty: 0kB writeback:0kB shmem:4624kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no Normal free:2752kB min:2840kB low:3544kB high:4248kB active_anon:498208kB inactive_anon:2728kB active_file:0kB inactive_file:0kB unevictable:0kB writependin g:0kB present:524288kB managed:508584kB mlocked:0kB kernel_stack:240kB pagetables:2104kB bounce:0kB free_pcp:312kB local_pcp:312kB free_cma:0kB lowmem_reserve[]: 0 0 Normal: 0*8kB 0*16kB 0*32kB 1*64kB (M) 1*128kB (M) 0*256kB 1*512kB (M) 0*1024kB 1*2048kB (M) 0*4096kB 0*8192kB = 2752kB 578 total pagecache pages 65536 pages RAM 0 pages HighMem/MovableOnly 1963 pages reserved [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [ 41] 0 41 157 103 3 0 0 0 syslogd [ 43] 0 43 156 106 3 0 0 0 klogd [ 63] 0 63 157 99 3 0 0 0 getty [ 64] 0 64 159 118 3 0 0 0 sh [ 66] 0 66 115291 31094 124 0 0 0 oom-test [ 67] 0 67 115291 31004 124 0 0 0 oom-test Out of memory: Kill process 66 (oom-test) score 476 or sacrifice child Killed process 66 (oom-test) total-vm:922328kB, anon-rss:248328kB, file-rss:0kB, shmem-rss:424kB ============================================ WARNING: possible recursive locking detected 4.14.19 #2 Not tainted -------------------------------------------- oom-test/66 is trying to acquire lock: (&mm->mmap_sem){++++}, at: [<80217d50>] do_exit+0x444/0x7f8 but task is already holding lock: (&mm->mmap_sem){++++}, at: [<8021028a>] do_page_fault+0x9e/0x2a0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&mm->mmap_sem); lock(&mm->mmap_sem); *** DEADLOCK *** May be due to missing lock nesting notation 1 lock held by oom-test/66: #0: (&mm->mmap_sem){++++}, at: [<8021028a>] do_page_fault+0x9e/0x2a0 stack backtrace: CPU: 0 PID: 66 Comm: oom-test Not tainted 4.14.19 #2 Stack Trace: arc_unwind_core.constprop.1+0xd4/0xf8 __lock_acquire+0x582/0x1494 lock_acquire+0x3c/0x58 down_read+0x1a/0x28 do_exit+0x444/0x7f8 do_group_exit+0x26/0x8c get_signal+0x1aa/0x7d4 do_signal+0x30/0x220 resume_user_mode_begin+0x90/0xd8 -------------------------------------------->8------------------------------------------- Looking at our code in "arch/arc/mm/fault.c" I may see why "mm->mmap_sem" is not released: 1. fatal_signal_pending(current) returns non-zero value 2. ((fault & VM_FAULT_ERROR) && !(fault & VM_FAULT_RETRY)) is false thus up_read(&mm->mmap_sem) is not executed. 3. It was a user-space process thus we simply return [with "mm->mmap_sem" still held]. See the code snippet below: -------------------------------------------->8------------------------------------------- /* If Pagefault was interrupted by SIGKILL, exit page fault "early" */ if (unlikely(fatal_signal_pending(current))) { if ((fault & VM_FAULT_ERROR) && !(fault & VM_FAULT_RETRY)) up_read(&mm->mmap_sem); if (user_mode(regs)) return; } -------------------------------------------->8------------------------------------------- Then we leave page fault handler and before returning to user-space we process pending signal which happen to be a death signal and so we end-up executing the following code-path (see stack trace above): do_exit() -> exit_mm() -> down_read(&mm->mmap_sem) <-- And here we go locking ourselves for good. What's interesting most if not all architectures return from page fault handler with "mm->mmap_sem" held in case of fatal_signal_pending(). So I would expect the same failure as I see on ARC to happen on other arches too... though I was not able to trigger that on ARM (WandBoard Quad). I think because on ARM and many others the check is a bit different: -------------------------------------------->8------------------------------------------- if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) { if (!user_mode(regs)) goto no_context; return 0; } -------------------------------------------->8------------------------------------------- So to get into problematic code-path (i.e. exit with "mm->mmap_sem" still held) we need __do_page_fault() to return VM_FAULT_RETRY. Which makes reproduction even more complicated but I think it's still doable :) The simplest solution here seems to be unconditional up_read(&mm->mmap_sem) before return but that's so strange it was not done by that time. Anyways any thought are very welcome! -Alexey
WARNING: multiple messages have this Message-ID (diff)
From: Alexey.Brodkin@synopsys.com (Alexey Brodkin) To: linux-snps-arc@lists.infradead.org Subject: arc: mm->mmap_sem gets locked in do_page_fault() in case of OOM killer invocation Date: Fri, 16 Feb 2018 12:40:30 +0000 [thread overview] Message-ID: <1518784830.3544.33.camel@synopsys.com> (raw) Hi Vineet, While playing with OOM killer I bumped in a pure software deadlock on ARC which is even observed in simulation (i.e. it has nothing to do with HW peculiarities). What's nice kernel even sees that lock-up if "Lock Debugging" is enabled. That's what I see: -------------------------------------------->8------------------------------------------- # /home/oom-test 450 & /home/oom-test 450 oom-test invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 CPU: 0 PID: 67 Comm: oom-test Not tainted 4.14.19 #2 Stack Trace: arc_unwind_core.constprop.1+0xd4/0xf8 dump_header.isra.6+0x84/0x2f8 oom_kill_process+0x258/0x7c8 out_of_memory+0xb8/0x5e0 __alloc_pages_nodemask+0x922/0xd28 handle_mm_fault+0x284/0xd90 do_page_fault+0xf6/0x2a0 ret_from_exception+0x0/0x8 Mem-Info: active_anon:62276 inactive_anon:341 isolated_anon:0 active_file:0 inactive_file:0 isolated_file:0 unevictable:0 dirty:0 writeback:0 unstable:0 slab_reclaimable:26 slab_unreclaimable:196 mapped:105 shmem:578 pagetables:263 bounce:0 free:344 free_pcp:39 free_cma:0 Node 0 active_anon:498208kB inactive_anon:2728kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:840kB dirty: 0kB writeback:0kB shmem:4624kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no Normal free:2752kB min:2840kB low:3544kB high:4248kB active_anon:498208kB inactive_anon:2728kB active_file:0kB inactive_file:0kB unevictable:0kB writependin g:0kB present:524288kB managed:508584kB mlocked:0kB kernel_stack:240kB pagetables:2104kB bounce:0kB free_pcp:312kB local_pcp:312kB free_cma:0kB lowmem_reserve[]: 0 0 Normal: 0*8kB 0*16kB 0*32kB 1*64kB (M) 1*128kB (M) 0*256kB 1*512kB (M) 0*1024kB 1*2048kB (M) 0*4096kB 0*8192kB = 2752kB 578 total pagecache pages 65536 pages RAM 0 pages HighMem/MovableOnly 1963 pages reserved [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [ 41] 0 41 157 103 3 0 0 0 syslogd [ 43] 0 43 156 106 3 0 0 0 klogd [ 63] 0 63 157 99 3 0 0 0 getty [ 64] 0 64 159 118 3 0 0 0 sh [ 66] 0 66 115291 31094 124 0 0 0 oom-test [ 67] 0 67 115291 31004 124 0 0 0 oom-test Out of memory: Kill process 66 (oom-test) score 476 or sacrifice child Killed process 66 (oom-test) total-vm:922328kB, anon-rss:248328kB, file-rss:0kB, shmem-rss:424kB ============================================ WARNING: possible recursive locking detected 4.14.19 #2 Not tainted -------------------------------------------- oom-test/66 is trying to acquire lock: (&mm->mmap_sem){++++}, at: [<80217d50>] do_exit+0x444/0x7f8 but task is already holding lock: (&mm->mmap_sem){++++}, at: [<8021028a>] do_page_fault+0x9e/0x2a0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&mm->mmap_sem); lock(&mm->mmap_sem); *** DEADLOCK *** May be due to missing lock nesting notation 1 lock held by oom-test/66: #0: (&mm->mmap_sem){++++}, at: [<8021028a>] do_page_fault+0x9e/0x2a0 stack backtrace: CPU: 0 PID: 66 Comm: oom-test Not tainted 4.14.19 #2 Stack Trace: arc_unwind_core.constprop.1+0xd4/0xf8 __lock_acquire+0x582/0x1494 lock_acquire+0x3c/0x58 down_read+0x1a/0x28 do_exit+0x444/0x7f8 do_group_exit+0x26/0x8c get_signal+0x1aa/0x7d4 do_signal+0x30/0x220 resume_user_mode_begin+0x90/0xd8 -------------------------------------------->8------------------------------------------- Looking at our code in "arch/arc/mm/fault.c" I may see why "mm->mmap_sem" is not released: 1. fatal_signal_pending(current) returns non-zero value 2. ((fault & VM_FAULT_ERROR) && !(fault & VM_FAULT_RETRY)) is false thus up_read(&mm->mmap_sem) is not executed. 3. It was a user-space process thus we simply return [with "mm->mmap_sem" still held]. See the code snippet below: -------------------------------------------->8------------------------------------------- /* If Pagefault was interrupted by SIGKILL, exit page fault "early" */ if (unlikely(fatal_signal_pending(current))) { if ((fault & VM_FAULT_ERROR) && !(fault & VM_FAULT_RETRY)) up_read(&mm->mmap_sem); if (user_mode(regs)) return; } -------------------------------------------->8------------------------------------------- Then we leave page fault handler and before returning to user-space we process pending signal which happen to be a death signal and so we end-up executing the following code-path (see stack trace above): do_exit() -> exit_mm() -> down_read(&mm->mmap_sem) <-- And here we go locking ourselves for good. What's interesting most if not all architectures return from page fault handler with "mm->mmap_sem" held in case of fatal_signal_pending(). So I would expect the same failure as I see on ARC to happen on other arches too... though I was not able to trigger that on ARM (WandBoard Quad). I think because on ARM and many others the check is a bit different: -------------------------------------------->8------------------------------------------- if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) { if (!user_mode(regs)) goto no_context; return 0; } -------------------------------------------->8------------------------------------------- So to get into problematic code-path (i.e. exit with "mm->mmap_sem" still held) we need __do_page_fault() to return VM_FAULT_RETRY. Which makes reproduction even more complicated but I think it's still doable :) The simplest solution here seems to be unconditional up_read(&mm->mmap_sem) before return but that's so strange it was not done by that time. Anyways any thought are very welcome! -Alexey
next reply other threads:[~2018-02-16 12:40 UTC|newest] Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-02-16 12:40 Alexey Brodkin [this message] 2018-02-16 12:40 ` arc: mm->mmap_sem gets locked in do_page_fault() in case of OOM killer invocation Alexey Brodkin 2018-02-26 20:44 ` Alexey Brodkin 2018-02-26 20:44 ` Alexey Brodkin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1518784830.3544.33.camel@synopsys.com \ --to=alexey.brodkin@synopsys.com \ --cc=Vineet.Gupta1@synopsys.com \ --cc=linux-arch@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-snps-arc@lists.infradead.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.