* Re: [Bug 216489] New: Machine freezes due to memory lock [not found] <bug-216489-27@https.bugzilla.kernel.org/> @ 2022-09-15 20:39 ` Andrew Morton 2022-09-15 22:42 ` Matthew Wilcox 0 siblings, 1 reply; 10+ messages in thread From: Andrew Morton @ 2022-09-15 20:39 UTC (permalink / raw) To: dev, linux-mm, Uladzislau Rezki; +Cc: bugzilla-daemon (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Wed, 14 Sep 2022 15:07:46 +0000 bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=216489 > > Bug ID: 216489 > Summary: Machine freezes due to memory lock > Product: Memory Management > Version: 2.5 > Kernel Version: 5.19.8 > Hardware: AMD > OS: Linux > Tree: Mainline > Status: NEW > Severity: high > Priority: P1 > Component: Other > Assignee: akpm@linux-foundation.org > Reporter: dev@der-flo.net > Regression: No > > Hi all, > With Kernel 5.19.x we noticed system freezes. This happens in virtual > environments as well as on real hardware. > On a real hardware machine we were able to catch the moment of freeze with > continuous profiling. Thanks. I forwarded this to Uladzislau and he offered to help. He said: : I can help with debugging. What i need is reproduce steps. Could you : please clarify if it is easy to hit and what kind of profiling triggers it? and : I do not think that Matthew Wilcox commits destroys it but... I see that : __vunmap() is invoked by the free_work() thus a caller is in atomic : context including IRQ context. > Specification of the machine where we captured the freeze: > Thinkpad T14 > CPU: AMD Ryzen 7 PRO 4750U > Kernel: 5.19.8-200.fc36.x86_64 > > Stacktrace of kworker/12:3 that is using all resources and causing the freeze: > > # Source Location Function Name Function Line > 0 arch/x86/include/asm/vdso/processor.h:13 rep_nop 11 > 1 arch/x86/include/asm/vdso/processor.h:18 cpu_relax 16 > 2 kernel/locking/qspinlock.c:514 native_queued_spin_lock_slowpath > 316 > 3 kernel/locking/qspinlock.c:316 native_queued_spin_lock_slowpath > N/A > 4 arch/x86/include/asm/paravirt.h:591 pv_queued_spin_lock_slowpath > 588 > 5 arch/x86/include/asm/qspinlock.h:51 queued_spin_lock_slowpath 49 > 6 include/asm-generic/qspinlock.h:114 queued_spin_lock 107 > 7 include/linux/spinlock.h:185 do_raw_spin_lock 182 > 8 include/linux/spinlock_api_smp.h:134 __raw_spin_lock 130 > 9 kernel/locking/spinlock.c:154 _raw_spin_lock 152 > 10 include/linux/spinlock.h:349 spin_lock 347 > 11 mm/vmalloc.c:1805 find_vmap_area 1801 > 12 mm/vmalloc.c:2525 find_vm_area 2521 > 13 mm/vmalloc.c:2639 __vunmap 2628 > 14 mm/vmalloc.c:97 free_work 91 > 15 kernel/workqueue.c:2289 process_one_work 2181 > 16 kernel/workqueue.c:2436 worker_thread 2378 > 17 kernel/kthread.c:376 kthread 330 > 18 N/A ret_from_fork N/A > > The functions in the above shown stacktrace hardly change. There is only one > commit 993d0b287e2ef7bee2e8b13b0ce4d2b5066f278e which introduces changes to > find_vmap_area() for 5.19. > > With this change in mind we looked for stacktraces which make also use of this > new commit. And in a different kernel thread we do notice the use of > check_heap_object(): > > # Source Location Function Name Function Line > 0 arch/x86/include/asm/paravirt.h:704 arch_local_irq_enable 702 > 1 arch/x86/include/asm/irqflags.h:138 arch_local_irq_restore 135 > 2 kernel/sched/sched.h:1330 raw_spin_rq_unlock_irqrestore 1327 > 3 kernel/sched/sched.h:1327 raw_spin_rq_unlock_irqrestore N/A > 4 kernel/sched/sched.h:1611 rq_unlock_irqrestore 1607 > 5 kernel/sched/fair.c:8288 update_blocked_averages 8272 > 6 kernel/sched/fair.c:11133 run_rebalance_domains 11115 > 7 kernel/softirq.c:571 __do_softirq 528 > 8 kernel/softirq.c:445 invoke_softirq 433 > 9 kernel/softirq.c:650 __irq_exit_rcu 640 > 10 arch/x86/kernel/apic/apic.c:1106 sysvec_apic_timer_interrupt N/A > 11 N/A asm_sysvec_apic_timer_interrupt N/A > 12 include/linux/mmzone.h:1403 __nr_to_section 1395 > 13 include/linux/mmzone.h:1488 __pfn_to_section 1486 > 14 include/linux/mmzone.h:1539 pfn_valid 1524 > 15 arch/x86/mm/physaddr.c:65 __virt_addr_valid 47 > 16 mm/usercopy.c:188 check_heap_object 161 > 17 mm/usercopy.c:250 __check_object_size 212 > 18 mm/usercopy.c:212 __check_object_size N/A > 19 include/linux/thread_info.h:199 check_object_size 195 > 20 lib/strncpy_from_user.c:137 strncpy_from_user 113 > 21 fs/namei.c:150 getname_flags 129 > 22 fs/namei.c:2896 user_path_at_empty 2893 > 23 include/linux/namei.h:57 user_path_at 54 > 24 fs/open.c:446 do_faccessat 420 > 25 arch/x86/entry/common.c:50 do_syscall_x64 40 > 26 arch/x86/entry/common.c:80 do_syscall_64 73 > 27 N/A entry_SYSCALL_64_after_hwframe N/A > > We are neither experts in the mm subsystem nor can provide a fix, but wanted to > let you know about our findings. > > Cheers, > Florian > > -- > You may reply to this email to add a comment. > > You are receiving this mail because: > You are the assignee for the bug. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Bug 216489] New: Machine freezes due to memory lock 2022-09-15 20:39 ` [Bug 216489] New: Machine freezes due to memory lock Andrew Morton @ 2022-09-15 22:42 ` Matthew Wilcox 2022-09-15 23:59 ` Yu Zhao 0 siblings, 1 reply; 10+ messages in thread From: Matthew Wilcox @ 2022-09-15 22:42 UTC (permalink / raw) To: Andrew Morton; +Cc: dev, linux-mm, Uladzislau Rezki, bugzilla-daemon, Kees Cook On Thu, Sep 15, 2022 at 01:39:31PM -0700, Andrew Morton wrote: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Wed, 14 Sep 2022 15:07:46 +0000 bugzilla-daemon@kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=216489 > > > > Bug ID: 216489 > > Summary: Machine freezes due to memory lock > > Product: Memory Management > > Version: 2.5 > > Kernel Version: 5.19.8 > > Hardware: AMD > > OS: Linux > > Tree: Mainline > > Status: NEW > > Severity: high > > Priority: P1 > > Component: Other > > Assignee: akpm@linux-foundation.org > > Reporter: dev@der-flo.net > > Regression: No > > > > Hi all, > > With Kernel 5.19.x we noticed system freezes. This happens in virtual > > environments as well as on real hardware. > > On a real hardware machine we were able to catch the moment of freeze with > > continuous profiling. > > Thanks. I forwarded this to Uladzislau and he offered to help. He said: > > > : I can help with debugging. What i need is reproduce steps. Could you > : please clarify if it is easy to hit and what kind of profiling triggers it? > > and > > : I do not think that Matthew Wilcox commits destroys it but... I see that > : __vunmap() is invoked by the free_work() thus a caller is in atomic > : context including IRQ context. To keep all of this information together; Florian emailed me off-list and I replied cc'ing Kees. I asked to try this patch to decide whether it's the extra load on the spinlock from examining the vmap tree more often: diff --git a/mm/usercopy.c b/mm/usercopy.c index c1ee15a98633..76d2d4fb6d22 100644 --- a/mm/usercopy.c +++ b/mm/usercopy.c @@ -173,15 +173,6 @@ static inline void check_heap_object(const void *ptr, unsigned long n, } if (is_vmalloc_addr(ptr)) { - struct vmap_area *area = find_vmap_area(addr); - - if (!area) - usercopy_abort("vmalloc", "no area", to_user, 0, n); - - if (n > area->va_end - addr) { - offset = addr - area->va_start; - usercopy_abort("vmalloc", NULL, to_user, offset, n); - } return; } Kees wrote: } If you can reproduce the hangs, perhaps enable: } } CONFIG_DEBUG_LOCKDEP=y } CONFIG_DEBUG_ATOMIC_SLEEP=y } } I would expect any hung spinlock to complain very loudly under } LOCKDEP... I hope we can keep the remainder of the debugging in this email thread. > > Specification of the machine where we captured the freeze: > > Thinkpad T14 > > CPU: AMD Ryzen 7 PRO 4750U > > Kernel: 5.19.8-200.fc36.x86_64 > > > > Stacktrace of kworker/12:3 that is using all resources and causing the freeze: > > > > # Source Location Function Name Function Line > > 0 arch/x86/include/asm/vdso/processor.h:13 rep_nop 11 > > 1 arch/x86/include/asm/vdso/processor.h:18 cpu_relax 16 > > 2 kernel/locking/qspinlock.c:514 native_queued_spin_lock_slowpath > > 316 > > 3 kernel/locking/qspinlock.c:316 native_queued_spin_lock_slowpath > > N/A > > 4 arch/x86/include/asm/paravirt.h:591 pv_queued_spin_lock_slowpath > > 588 > > 5 arch/x86/include/asm/qspinlock.h:51 queued_spin_lock_slowpath 49 > > 6 include/asm-generic/qspinlock.h:114 queued_spin_lock 107 > > 7 include/linux/spinlock.h:185 do_raw_spin_lock 182 > > 8 include/linux/spinlock_api_smp.h:134 __raw_spin_lock 130 > > 9 kernel/locking/spinlock.c:154 _raw_spin_lock 152 > > 10 include/linux/spinlock.h:349 spin_lock 347 > > 11 mm/vmalloc.c:1805 find_vmap_area 1801 > > 12 mm/vmalloc.c:2525 find_vm_area 2521 > > 13 mm/vmalloc.c:2639 __vunmap 2628 > > 14 mm/vmalloc.c:97 free_work 91 > > 15 kernel/workqueue.c:2289 process_one_work 2181 > > 16 kernel/workqueue.c:2436 worker_thread 2378 > > 17 kernel/kthread.c:376 kthread 330 > > 18 N/A ret_from_fork N/A > > > > The functions in the above shown stacktrace hardly change. There is only one > > commit 993d0b287e2ef7bee2e8b13b0ce4d2b5066f278e which introduces changes to > > find_vmap_area() for 5.19. > > > > With this change in mind we looked for stacktraces which make also use of this > > new commit. And in a different kernel thread we do notice the use of > > check_heap_object(): > > > > # Source Location Function Name Function Line > > 0 arch/x86/include/asm/paravirt.h:704 arch_local_irq_enable 702 > > 1 arch/x86/include/asm/irqflags.h:138 arch_local_irq_restore 135 > > 2 kernel/sched/sched.h:1330 raw_spin_rq_unlock_irqrestore 1327 > > 3 kernel/sched/sched.h:1327 raw_spin_rq_unlock_irqrestore N/A > > 4 kernel/sched/sched.h:1611 rq_unlock_irqrestore 1607 > > 5 kernel/sched/fair.c:8288 update_blocked_averages 8272 > > 6 kernel/sched/fair.c:11133 run_rebalance_domains 11115 > > 7 kernel/softirq.c:571 __do_softirq 528 > > 8 kernel/softirq.c:445 invoke_softirq 433 > > 9 kernel/softirq.c:650 __irq_exit_rcu 640 > > 10 arch/x86/kernel/apic/apic.c:1106 sysvec_apic_timer_interrupt N/A > > 11 N/A asm_sysvec_apic_timer_interrupt N/A > > 12 include/linux/mmzone.h:1403 __nr_to_section 1395 > > 13 include/linux/mmzone.h:1488 __pfn_to_section 1486 > > 14 include/linux/mmzone.h:1539 pfn_valid 1524 > > 15 arch/x86/mm/physaddr.c:65 __virt_addr_valid 47 > > 16 mm/usercopy.c:188 check_heap_object 161 > > 17 mm/usercopy.c:250 __check_object_size 212 > > 18 mm/usercopy.c:212 __check_object_size N/A > > 19 include/linux/thread_info.h:199 check_object_size 195 > > 20 lib/strncpy_from_user.c:137 strncpy_from_user 113 > > 21 fs/namei.c:150 getname_flags 129 > > 22 fs/namei.c:2896 user_path_at_empty 2893 > > 23 include/linux/namei.h:57 user_path_at 54 > > 24 fs/open.c:446 do_faccessat 420 > > 25 arch/x86/entry/common.c:50 do_syscall_x64 40 > > 26 arch/x86/entry/common.c:80 do_syscall_64 73 > > 27 N/A entry_SYSCALL_64_after_hwframe N/A > > > > We are neither experts in the mm subsystem nor can provide a fix, but wanted to > > let you know about our findings. > > > > Cheers, > > Florian > > > > -- > > You may reply to this email to add a comment. > > > > You are receiving this mail because: > > You are the assignee for the bug. > ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [Bug 216489] New: Machine freezes due to memory lock 2022-09-15 22:42 ` Matthew Wilcox @ 2022-09-15 23:59 ` Yu Zhao 2022-09-16 8:38 ` Matthew Wilcox 0 siblings, 1 reply; 10+ messages in thread From: Yu Zhao @ 2022-09-15 23:59 UTC (permalink / raw) To: Matthew Wilcox, Andrew Morton Cc: dev, Linux-MM, Uladzislau Rezki, bugzilla-daemon, Kees Cook On Thu, Sep 15, 2022 at 4:42 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Thu, Sep 15, 2022 at 01:39:31PM -0700, Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > > bugzilla web interface). > > > > On Wed, 14 Sep 2022 15:07:46 +0000 bugzilla-daemon@kernel.org wrote: > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=216489 > > > > > > Bug ID: 216489 > > > Summary: Machine freezes due to memory lock > > > Product: Memory Management > > > Version: 2.5 > > > Kernel Version: 5.19.8 > > > Hardware: AMD > > > OS: Linux > > > Tree: Mainline > > > Status: NEW > > > Severity: high > > > Priority: P1 > > > Component: Other > > > Assignee: akpm@linux-foundation.org > > > Reporter: dev@der-flo.net > > > Regression: No > > > > > > Hi all, > > > With Kernel 5.19.x we noticed system freezes. This happens in virtual > > > environments as well as on real hardware. > > > On a real hardware machine we were able to catch the moment of freeze with > > > continuous profiling. > > > > Thanks. I forwarded this to Uladzislau and he offered to help. He said: > > > > > > : I can help with debugging. What i need is reproduce steps. Could you > > : please clarify if it is easy to hit and what kind of profiling triggers it? > > > > and > > > > : I do not think that Matthew Wilcox commits destroys it but... I see that > > : __vunmap() is invoked by the free_work() thus a caller is in atomic > > : context including IRQ context. > > To keep all of this information together; Florian emailed me off-list > and I replied cc'ing Kees. > > I asked to try this patch to decide whether it's the extra load on the > spinlock from examining the vmap tree more often: > > diff --git a/mm/usercopy.c b/mm/usercopy.c > index c1ee15a98633..76d2d4fb6d22 100644 > --- a/mm/usercopy.c > +++ b/mm/usercopy.c > @@ -173,15 +173,6 @@ static inline void check_heap_object(const void *ptr, unsigned long n, > } > > if (is_vmalloc_addr(ptr)) { > - struct vmap_area *area = find_vmap_area(addr); > - > - if (!area) > - usercopy_abort("vmalloc", "no area", to_user, 0, n); > - > - if (n > area->va_end - addr) { > - offset = addr - area->va_start; > - usercopy_abort("vmalloc", NULL, to_user, offset, n); > - } > return; > } > > > Kees wrote: > > } If you can reproduce the hangs, perhaps enable: > } > } CONFIG_DEBUG_LOCKDEP=y > } CONFIG_DEBUG_ATOMIC_SLEEP=y > } > } I would expect any hung spinlock to complain very loudly under > } LOCKDEP... > > I hope we can keep the remainder of the debugging in this email thread. > > > > Specification of the machine where we captured the freeze: > > > Thinkpad T14 > > > CPU: AMD Ryzen 7 PRO 4750U > > > Kernel: 5.19.8-200.fc36.x86_64 > > > > > > Stacktrace of kworker/12:3 that is using all resources and causing the freeze: > > > > > > # Source Location Function Name Function Line > > > 0 arch/x86/include/asm/vdso/processor.h:13 rep_nop 11 > > > 1 arch/x86/include/asm/vdso/processor.h:18 cpu_relax 16 > > > 2 kernel/locking/qspinlock.c:514 native_queued_spin_lock_slowpath > > > 316 > > > 3 kernel/locking/qspinlock.c:316 native_queued_spin_lock_slowpath > > > N/A > > > 4 arch/x86/include/asm/paravirt.h:591 pv_queued_spin_lock_slowpath > > > 588 > > > 5 arch/x86/include/asm/qspinlock.h:51 queued_spin_lock_slowpath 49 > > > 6 include/asm-generic/qspinlock.h:114 queued_spin_lock 107 > > > 7 include/linux/spinlock.h:185 do_raw_spin_lock 182 > > > 8 include/linux/spinlock_api_smp.h:134 __raw_spin_lock 130 > > > 9 kernel/locking/spinlock.c:154 _raw_spin_lock 152 > > > 10 include/linux/spinlock.h:349 spin_lock 347 > > > 11 mm/vmalloc.c:1805 find_vmap_area 1801 > > > 12 mm/vmalloc.c:2525 find_vm_area 2521 > > > 13 mm/vmalloc.c:2639 __vunmap 2628 > > > 14 mm/vmalloc.c:97 free_work 91 > > > 15 kernel/workqueue.c:2289 process_one_work 2181 > > > 16 kernel/workqueue.c:2436 worker_thread 2378 > > > 17 kernel/kthread.c:376 kthread 330 > > > 18 N/A ret_from_fork N/A > > > > > > The functions in the above shown stacktrace hardly change. There is only one > > > commit 993d0b287e2ef7bee2e8b13b0ce4d2b5066f278e which introduces changes to > > > find_vmap_area() for 5.19. > > > > > > With this change in mind we looked for stacktraces which make also use of this > > > new commit. And in a different kernel thread we do notice the use of > > > check_heap_object(): > > > > > > # Source Location Function Name Function Line > > > 0 arch/x86/include/asm/paravirt.h:704 arch_local_irq_enable 702 > > > 1 arch/x86/include/asm/irqflags.h:138 arch_local_irq_restore 135 > > > 2 kernel/sched/sched.h:1330 raw_spin_rq_unlock_irqrestore 1327 > > > 3 kernel/sched/sched.h:1327 raw_spin_rq_unlock_irqrestore N/A > > > 4 kernel/sched/sched.h:1611 rq_unlock_irqrestore 1607 > > > 5 kernel/sched/fair.c:8288 update_blocked_averages 8272 > > > 6 kernel/sched/fair.c:11133 run_rebalance_domains 11115 > > > 7 kernel/softirq.c:571 __do_softirq 528 > > > 8 kernel/softirq.c:445 invoke_softirq 433 > > > 9 kernel/softirq.c:650 __irq_exit_rcu 640 > > > 10 arch/x86/kernel/apic/apic.c:1106 sysvec_apic_timer_interrupt N/A > > > 11 N/A asm_sysvec_apic_timer_interrupt N/A > > > 12 include/linux/mmzone.h:1403 __nr_to_section 1395 > > > 13 include/linux/mmzone.h:1488 __pfn_to_section 1486 > > > 14 include/linux/mmzone.h:1539 pfn_valid 1524 > > > 15 arch/x86/mm/physaddr.c:65 __virt_addr_valid 47 > > > 16 mm/usercopy.c:188 check_heap_object 161 > > > 17 mm/usercopy.c:250 __check_object_size 212 > > > 18 mm/usercopy.c:212 __check_object_size N/A > > > 19 include/linux/thread_info.h:199 check_object_size 195 > > > 20 lib/strncpy_from_user.c:137 strncpy_from_user 113 > > > 21 fs/namei.c:150 getname_flags 129 > > > 22 fs/namei.c:2896 user_path_at_empty 2893 > > > 23 include/linux/namei.h:57 user_path_at 54 > > > 24 fs/open.c:446 do_faccessat 420 > > > 25 arch/x86/entry/common.c:50 do_syscall_x64 40 > > > 26 arch/x86/entry/common.c:80 do_syscall_64 73 > > > 27 N/A entry_SYSCALL_64_after_hwframe N/A > > > > > > We are neither experts in the mm subsystem nor can provide a fix, but wanted to > > > let you know about our findings. > > > > > > Cheers, > > > Florian > > > > > > -- > > > You may reply to this email to add a comment. > > > > > > You are receiving this mail because: > > > You are the assignee for the bug. I think this is a manifest of the lockdep warning I reported a couple of weeks ago: https://lore.kernel.org/r/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Bug 216489] New: Machine freezes due to memory lock 2022-09-15 23:59 ` Yu Zhao @ 2022-09-16 8:38 ` Matthew Wilcox 2022-09-16 9:46 ` Kees Cook 0 siblings, 1 reply; 10+ messages in thread From: Matthew Wilcox @ 2022-09-16 8:38 UTC (permalink / raw) To: Yu Zhao Cc: Andrew Morton, dev, Linux-MM, Uladzislau Rezki, bugzilla-daemon, Kees Cook On Thu, Sep 15, 2022 at 05:59:56PM -0600, Yu Zhao wrote: > I think this is a manifest of the lockdep warning I reported a couple > of weeks ago: > https://lore.kernel.org/r/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com/ That would certainly match the symptoms. Turning vmap_lock into an NMI-safe lock would be bad. I don't even know if we have primitives for that (it's not like you can disable an NMI ...) I don't quite have time to write a patch right now. Perhaps something like: struct vmap_area *find_vmap_area_nmi(unsigned long addr) { struct vmap_area *va; if (spin_trylock(&vmap_area_lock)) return NULL; va = __find_vmap_area(addr, &vmap_area_root); spin_unlock(&vmap_area_lock); return va; } and then call find_vmap_area_nmi() in check_heap_object(). I may have the polarity of the return value of spin_trylock() incorrect. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Bug 216489] New: Machine freezes due to memory lock 2022-09-16 8:38 ` Matthew Wilcox @ 2022-09-16 9:46 ` Kees Cook 2022-09-16 12:28 ` Uladzislau Rezki 2022-09-16 14:15 ` Matthew Wilcox 0 siblings, 2 replies; 10+ messages in thread From: Kees Cook @ 2022-09-16 9:46 UTC (permalink / raw) To: Matthew Wilcox Cc: Yu Zhao, Andrew Morton, dev, Linux-MM, Uladzislau Rezki, bugzilla-daemon On Fri, Sep 16, 2022 at 09:38:33AM +0100, Matthew Wilcox wrote: > On Thu, Sep 15, 2022 at 05:59:56PM -0600, Yu Zhao wrote: > > I think this is a manifest of the lockdep warning I reported a couple > > of weeks ago: > > https://lore.kernel.org/r/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com/ > > That would certainly match the symptoms. > > Turning vmap_lock into an NMI-safe lock would be bad. I don't even know > if we have primitives for that (it's not like you can disable an NMI > ...) > > I don't quite have time to write a patch right now. Perhaps something > like: > > struct vmap_area *find_vmap_area_nmi(unsigned long addr) > { > struct vmap_area *va; > > if (spin_trylock(&vmap_area_lock)) > return NULL; > va = __find_vmap_area(addr, &vmap_area_root); > spin_unlock(&vmap_area_lock); > > return va; > } > > and then call find_vmap_area_nmi() in check_heap_object(). I may have > the polarity of the return value of spin_trylock() incorrect. I think we'll need something slightly tweaked, since this would return NULL under any contention (and a NULL return is fatal in check_heap_object()). It seems like we need to explicitly check for being in nmi context in check_heap_object() to deal with it? Like this (only build tested): diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 096d48aa3437..c8a00f181a11 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -216,6 +216,7 @@ void free_vm_area(struct vm_struct *area); extern struct vm_struct *remove_vm_area(const void *addr); extern struct vm_struct *find_vm_area(const void *addr); struct vmap_area *find_vmap_area(unsigned long addr); +struct vmap_area *find_vmap_area_try(unsigned long addr); static inline bool is_vm_area_hugepages(const void *addr) { diff --git a/mm/usercopy.c b/mm/usercopy.c index c1ee15a98633..9f943c29e7ec 100644 --- a/mm/usercopy.c +++ b/mm/usercopy.c @@ -173,7 +173,16 @@ static inline void check_heap_object(const void *ptr, unsigned long n, } if (is_vmalloc_addr(ptr)) { - struct vmap_area *area = find_vmap_area(addr); + struct vmap_area *area; + + if (!in_nmi()) { + area = find_vmap_area(addr); + } else { + area = find_vmap_area_try(addr); + /* Give up under NMI to avoid deadlocks. */ + if (!area) + return; + } if (!area) usercopy_abort("vmalloc", "no area", to_user, 0, n); diff --git a/mm/vmalloc.c b/mm/vmalloc.c index dd6cdb201195..f14f1902c2f6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1840,6 +1840,17 @@ struct vmap_area *find_vmap_area(unsigned long addr) return va; } +struct vmap_area *find_vmap_area_try(unsigned long addr) +{ + struct vmap_area *va = NULL; + + if (spin_trylock(&vmap_area_lock)) { + va = __find_vmap_area(addr, &vmap_area_root); + spin_unlock(&vmap_area_lock); + } + return va; +} + /*** Per cpu kva allocator ***/ /* -- Kees Cook ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [Bug 216489] New: Machine freezes due to memory lock 2022-09-16 9:46 ` Kees Cook @ 2022-09-16 12:28 ` Uladzislau Rezki 2022-09-16 12:32 ` Uladzislau Rezki 2022-09-16 14:15 ` Matthew Wilcox 1 sibling, 1 reply; 10+ messages in thread From: Uladzislau Rezki @ 2022-09-16 12:28 UTC (permalink / raw) To: Kees Cook, Matthew Wilcox, Yu Zhao, Andrew Morton Cc: Matthew Wilcox, Yu Zhao, Andrew Morton, dev, Linux-MM, Uladzislau Rezki, bugzilla-daemon On Fri, Sep 16, 2022 at 02:46:39AM -0700, Kees Cook wrote: > On Fri, Sep 16, 2022 at 09:38:33AM +0100, Matthew Wilcox wrote: > > On Thu, Sep 15, 2022 at 05:59:56PM -0600, Yu Zhao wrote: > > > I think this is a manifest of the lockdep warning I reported a couple > > > of weeks ago: > > > https://lore.kernel.org/r/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com/ > > > > That would certainly match the symptoms. > > > > Turning vmap_lock into an NMI-safe lock would be bad. I don't even know > > if we have primitives for that (it's not like you can disable an NMI > > ...) > > > > I don't quite have time to write a patch right now. Perhaps something > > like: > > > > struct vmap_area *find_vmap_area_nmi(unsigned long addr) > > { > > struct vmap_area *va; > > > > if (spin_trylock(&vmap_area_lock)) > > return NULL; > > va = __find_vmap_area(addr, &vmap_area_root); > > spin_unlock(&vmap_area_lock); > > > > return va; > > } > > > > and then call find_vmap_area_nmi() in check_heap_object(). I may have > > the polarity of the return value of spin_trylock() incorrect. > > I think we'll need something slightly tweaked, since this would > return NULL under any contention (and a NULL return is fatal in > check_heap_object()). It seems like we need to explicitly check > for being in nmi context in check_heap_object() to deal with it? > Like this (only build tested): > > > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h > index 096d48aa3437..c8a00f181a11 100644 > --- a/include/linux/vmalloc.h > +++ b/include/linux/vmalloc.h > @@ -216,6 +216,7 @@ void free_vm_area(struct vm_struct *area); > extern struct vm_struct *remove_vm_area(const void *addr); > extern struct vm_struct *find_vm_area(const void *addr); > struct vmap_area *find_vmap_area(unsigned long addr); > +struct vmap_area *find_vmap_area_try(unsigned long addr); > > static inline bool is_vm_area_hugepages(const void *addr) > { > diff --git a/mm/usercopy.c b/mm/usercopy.c > index c1ee15a98633..9f943c29e7ec 100644 > --- a/mm/usercopy.c > +++ b/mm/usercopy.c > @@ -173,7 +173,16 @@ static inline void check_heap_object(const void *ptr, unsigned long n, > } > > if (is_vmalloc_addr(ptr)) { > - struct vmap_area *area = find_vmap_area(addr); > + struct vmap_area *area; > + > + if (!in_nmi()) { > + area = find_vmap_area(addr); > + } else { > + area = find_vmap_area_try(addr); > + /* Give up under NMI to avoid deadlocks. */ > + if (!area) > + return; > + } > > if (!area) > usercopy_abort("vmalloc", "no area", to_user, 0, n); > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index dd6cdb201195..f14f1902c2f6 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1840,6 +1840,17 @@ struct vmap_area *find_vmap_area(unsigned long addr) > return va; > } > > +struct vmap_area *find_vmap_area_try(unsigned long addr) > +{ > + struct vmap_area *va = NULL; > + > + if (spin_trylock(&vmap_area_lock)) { > + va = __find_vmap_area(addr, &vmap_area_root); > + spin_unlock(&vmap_area_lock); > + } > + return va; > +} > + > /*** Per cpu kva allocator ***/ > > /* > OK. The problem is about using find_vmap_area() from the IRQ context. Indeed it can be dead-locked. It is not supposed to be used there. But if you want then we should have a helper. Please note that it might be a regular IRQ also so it is not limited to NMI context only, because somebody could decide later to use it from a regular IRQ. IMHO, it makes sense to make use of in_interrupt() helper instead so we cover here a hw-IRQ context including NMI one. It also would be aligned with deferred vfreeing: <snip> tatic void __vfree(const void *addr) { if (unlikely(in_interrupt())) __vfree_deferred(addr); else __vunmap(addr, 1); } <snip> so we handle here not only NMI scenario. I think we should align. -- Uladzislau Rezki ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Bug 216489] New: Machine freezes due to memory lock 2022-09-16 12:28 ` Uladzislau Rezki @ 2022-09-16 12:32 ` Uladzislau Rezki 0 siblings, 0 replies; 10+ messages in thread From: Uladzislau Rezki @ 2022-09-16 12:32 UTC (permalink / raw) To: Kees Cook, Matthew Wilcox, Yu Zhao, Andrew Morton Cc: Kees Cook, Matthew Wilcox, Yu Zhao, Andrew Morton, dev, Linux-MM, bugzilla-daemon On Fri, Sep 16, 2022 at 02:28:08PM +0200, Uladzislau Rezki wrote: > On Fri, Sep 16, 2022 at 02:46:39AM -0700, Kees Cook wrote: > > On Fri, Sep 16, 2022 at 09:38:33AM +0100, Matthew Wilcox wrote: > > > On Thu, Sep 15, 2022 at 05:59:56PM -0600, Yu Zhao wrote: > > > > I think this is a manifest of the lockdep warning I reported a couple > > > > of weeks ago: > > > > https://lore.kernel.org/r/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com/ > > > > > > That would certainly match the symptoms. > > > > > > Turning vmap_lock into an NMI-safe lock would be bad. I don't even know > > > if we have primitives for that (it's not like you can disable an NMI > > > ...) > > > > > > I don't quite have time to write a patch right now. Perhaps something > > > like: > > > > > > struct vmap_area *find_vmap_area_nmi(unsigned long addr) > > > { > > > struct vmap_area *va; > > > > > > if (spin_trylock(&vmap_area_lock)) > > > return NULL; > > > va = __find_vmap_area(addr, &vmap_area_root); > > > spin_unlock(&vmap_area_lock); > > > > > > return va; > > > } > > > > > > and then call find_vmap_area_nmi() in check_heap_object(). I may have > > > the polarity of the return value of spin_trylock() incorrect. > > > > I think we'll need something slightly tweaked, since this would > > return NULL under any contention (and a NULL return is fatal in > > check_heap_object()). It seems like we need to explicitly check > > for being in nmi context in check_heap_object() to deal with it? > > Like this (only build tested): > > > > > > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h > > index 096d48aa3437..c8a00f181a11 100644 > > --- a/include/linux/vmalloc.h > > +++ b/include/linux/vmalloc.h > > @@ -216,6 +216,7 @@ void free_vm_area(struct vm_struct *area); > > extern struct vm_struct *remove_vm_area(const void *addr); > > extern struct vm_struct *find_vm_area(const void *addr); > > struct vmap_area *find_vmap_area(unsigned long addr); > > +struct vmap_area *find_vmap_area_try(unsigned long addr); > > > > static inline bool is_vm_area_hugepages(const void *addr) > > { > > diff --git a/mm/usercopy.c b/mm/usercopy.c > > index c1ee15a98633..9f943c29e7ec 100644 > > --- a/mm/usercopy.c > > +++ b/mm/usercopy.c > > @@ -173,7 +173,16 @@ static inline void check_heap_object(const void *ptr, unsigned long n, > > } > > > > if (is_vmalloc_addr(ptr)) { > > - struct vmap_area *area = find_vmap_area(addr); > > + struct vmap_area *area; > > + > > + if (!in_nmi()) { > > + area = find_vmap_area(addr); > > + } else { > > + area = find_vmap_area_try(addr); > > + /* Give up under NMI to avoid deadlocks. */ > > + if (!area) > > + return; > > + } > > > > if (!area) > > usercopy_abort("vmalloc", "no area", to_user, 0, n); > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index dd6cdb201195..f14f1902c2f6 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -1840,6 +1840,17 @@ struct vmap_area *find_vmap_area(unsigned long addr) > > return va; > > } > > > > +struct vmap_area *find_vmap_area_try(unsigned long addr) > > +{ > > + struct vmap_area *va = NULL; > > + > > + if (spin_trylock(&vmap_area_lock)) { > > + va = __find_vmap_area(addr, &vmap_area_root); > > + spin_unlock(&vmap_area_lock); > > + } > > + return va; > > +} > > + > > /*** Per cpu kva allocator ***/ > > > > /* > > > OK. The problem is about using find_vmap_area() from the IRQ context. Indeed > it can be dead-locked. It is not supposed to be used there. But if you want > then we should have a helper. > > Please note that it might be a regular IRQ also so it is not limited to NMI > context only, because somebody could decide later to use it from a regular > IRQ. > > IMHO, it makes sense to make use of in_interrupt() helper instead so we > cover here a hw-IRQ context including NMI one. It also would be aligned > with deferred vfreeing: > > <snip> > tatic void __vfree(const void *addr) > { > if (unlikely(in_interrupt())) > __vfree_deferred(addr); > else > __vunmap(addr, 1); > } > <snip> > > so we handle here not only NMI scenario. I think we should align. > Another thing that i should mention is, using sleepable locks(it is for PREEMPT_RT) is not allowed in any atomic. So for PREEMPT_RT point of view it is broken. -- Uladzislau Rezki ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Bug 216489] New: Machine freezes due to memory lock 2022-09-16 9:46 ` Kees Cook 2022-09-16 12:28 ` Uladzislau Rezki @ 2022-09-16 14:15 ` Matthew Wilcox 2022-09-16 14:42 ` Kees Cook 2022-09-16 18:47 ` Uladzislau Rezki 1 sibling, 2 replies; 10+ messages in thread From: Matthew Wilcox @ 2022-09-16 14:15 UTC (permalink / raw) To: Kees Cook Cc: Yu Zhao, Andrew Morton, dev, Linux-MM, Uladzislau Rezki, bugzilla-daemon On Fri, Sep 16, 2022 at 02:46:39AM -0700, Kees Cook wrote: > On Fri, Sep 16, 2022 at 09:38:33AM +0100, Matthew Wilcox wrote: > > On Thu, Sep 15, 2022 at 05:59:56PM -0600, Yu Zhao wrote: > > > I think this is a manifest of the lockdep warning I reported a couple > > > of weeks ago: > > > https://lore.kernel.org/r/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com/ > > > > That would certainly match the symptoms. > > > > Turning vmap_lock into an NMI-safe lock would be bad. I don't even know > > if we have primitives for that (it's not like you can disable an NMI > > ...) > > > > I don't quite have time to write a patch right now. Perhaps something > > like: > > > > struct vmap_area *find_vmap_area_nmi(unsigned long addr) > > { > > struct vmap_area *va; > > > > if (spin_trylock(&vmap_area_lock)) > > return NULL; > > va = __find_vmap_area(addr, &vmap_area_root); > > spin_unlock(&vmap_area_lock); > > > > return va; > > } > > > > and then call find_vmap_area_nmi() in check_heap_object(). I may have > > the polarity of the return value of spin_trylock() incorrect. > > I think we'll need something slightly tweaked, since this would > return NULL under any contention (and a NULL return is fatal in > check_heap_object()). It seems like we need to explicitly check > for being in nmi context in check_heap_object() to deal with it? > Like this (only build tested): Right, and Ulad is right about it beig callable from any context. I think the longterm solution is to make the vmap_area_root tree walkable under RCU protection. For now, let's have a distinct return code (ERR_PTR(-EAGAIN), perhaps?) to indicate that we've hit contention. It generally won't matter if we hit it in process context because hardening doesn't have to be 100% reliable to be useful. Erm ... so what prevents this race: CPU 0 CPU 1 copy_to_user() check_heap_object() area = find_vmap_area(addr) __purge_vmap_area_lazy() merge_or_add_vmap_area_augment() __merge_or_add_vmap_area() kmem_cache_free(vmap_area_cachep, va); if (n > area->va_end - addr) { Yes, it's a race in the code that allocated this memory; they're simultaneously calling copy_to_user() and __vunmap(). We'll catch this bad behaviour sooner rather than later, but sometimes in trying to catch this bug, we'll get caught by the bug and go splat. I don't know that we need to go through heroics to be sure we don't get caught by this bug. It already has to run a workqueue to do the freeing. We could delay it even further with RCU or something, but we're only trading off one kind of badness for another. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Bug 216489] New: Machine freezes due to memory lock 2022-09-16 14:15 ` Matthew Wilcox @ 2022-09-16 14:42 ` Kees Cook 2022-09-16 18:47 ` Uladzislau Rezki 1 sibling, 0 replies; 10+ messages in thread From: Kees Cook @ 2022-09-16 14:42 UTC (permalink / raw) To: Matthew Wilcox Cc: Yu Zhao, Andrew Morton, dev, Linux-MM, Uladzislau Rezki, bugzilla-daemon On Fri, Sep 16, 2022 at 03:15:05PM +0100, Matthew Wilcox wrote: > Right, and Ulad is right about it beig callable from any context. I think > the longterm solution is to make the vmap_area_root tree walkable under > RCU protection. Agreed, I've updated my proposed patch. > For now, let's have a distinct return code (ERR_PTR(-EAGAIN), perhaps?) to > indicate that we've hit contention. It generally won't matter if we > hit it in process context because hardening doesn't have to be 100% > reliable to be useful. Right, as I note in the series[1], hardening shouldn't be getting called _at all_ in this path. :P -Kees [1] https://lore.kernel.org/linux-hardening/20220916135953.1320601-1-keescook@chromium.org/ -- Kees Cook ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Bug 216489] New: Machine freezes due to memory lock 2022-09-16 14:15 ` Matthew Wilcox 2022-09-16 14:42 ` Kees Cook @ 2022-09-16 18:47 ` Uladzislau Rezki 1 sibling, 0 replies; 10+ messages in thread From: Uladzislau Rezki @ 2022-09-16 18:47 UTC (permalink / raw) To: Matthew Wilcox Cc: Kees Cook, Yu Zhao, Andrew Morton, dev, Linux-MM, Uladzislau Rezki, bugzilla-daemon On Fri, Sep 16, 2022 at 03:15:05PM +0100, Matthew Wilcox wrote: > On Fri, Sep 16, 2022 at 02:46:39AM -0700, Kees Cook wrote: > > On Fri, Sep 16, 2022 at 09:38:33AM +0100, Matthew Wilcox wrote: > > > On Thu, Sep 15, 2022 at 05:59:56PM -0600, Yu Zhao wrote: > > > > I think this is a manifest of the lockdep warning I reported a couple > > > > of weeks ago: > > > > https://lore.kernel.org/r/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com/ > > > > > > That would certainly match the symptoms. > > > > > > Turning vmap_lock into an NMI-safe lock would be bad. I don't even know > > > if we have primitives for that (it's not like you can disable an NMI > > > ...) > > > > > > I don't quite have time to write a patch right now. Perhaps something > > > like: > > > > > > struct vmap_area *find_vmap_area_nmi(unsigned long addr) > > > { > > > struct vmap_area *va; > > > > > > if (spin_trylock(&vmap_area_lock)) > > > return NULL; > > > va = __find_vmap_area(addr, &vmap_area_root); > > > spin_unlock(&vmap_area_lock); > > > > > > return va; > > > } > > > > > > and then call find_vmap_area_nmi() in check_heap_object(). I may have > > > the polarity of the return value of spin_trylock() incorrect. > > > > I think we'll need something slightly tweaked, since this would > > return NULL under any contention (and a NULL return is fatal in > > check_heap_object()). It seems like we need to explicitly check > > for being in nmi context in check_heap_object() to deal with it? > > Like this (only build tested): > > Right, and Ulad is right about it beig callable from any context. I think > the longterm solution is to make the vmap_area_root tree walkable under > RCU protection. > > For now, let's have a distinct return code (ERR_PTR(-EAGAIN), perhaps?) to > indicate that we've hit contention. It generally won't matter if we > hit it in process context because hardening doesn't have to be 100% > reliable to be useful. > > Erm ... so what prevents this race: > > CPU 0 CPU 1 > copy_to_user() > check_heap_object() > area = find_vmap_area(addr) > __purge_vmap_area_lazy() > merge_or_add_vmap_area_augment() > __merge_or_add_vmap_area() > kmem_cache_free(vmap_area_cachep, va); > Sounds like it can happen. I think it is a good argument to switch to the RCU usage here for safe access to va after the lock is released. So i can think about it and put it as task to my todo list. Since it is not urgent so far it is OK to wait for a splat. But it might never happens :) -- Uladzislau Rezki ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-09-16 18:47 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <bug-216489-27@https.bugzilla.kernel.org/> 2022-09-15 20:39 ` [Bug 216489] New: Machine freezes due to memory lock Andrew Morton 2022-09-15 22:42 ` Matthew Wilcox 2022-09-15 23:59 ` Yu Zhao 2022-09-16 8:38 ` Matthew Wilcox 2022-09-16 9:46 ` Kees Cook 2022-09-16 12:28 ` Uladzislau Rezki 2022-09-16 12:32 ` Uladzislau Rezki 2022-09-16 14:15 ` Matthew Wilcox 2022-09-16 14:42 ` Kees Cook 2022-09-16 18:47 ` Uladzislau Rezki
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.