* Re: [Bug 216489] New: Machine freezes due to memory lock
[not found] <bug-216489-27@https.bugzilla.kernel.org/>
@ 2022-09-15 20:39 ` Andrew Morton
2022-09-15 22:42 ` Matthew Wilcox
0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2022-09-15 20:39 UTC (permalink / raw)
To: dev, linux-mm, Uladzislau Rezki; +Cc: bugzilla-daemon
(switched to email. Please respond via emailed reply-to-all, not via the
bugzilla web interface).
On Wed, 14 Sep 2022 15:07:46 +0000 bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216489
>
> Bug ID: 216489
> Summary: Machine freezes due to memory lock
> Product: Memory Management
> Version: 2.5
> Kernel Version: 5.19.8
> Hardware: AMD
> OS: Linux
> Tree: Mainline
> Status: NEW
> Severity: high
> Priority: P1
> Component: Other
> Assignee: akpm@linux-foundation.org
> Reporter: dev@der-flo.net
> Regression: No
>
> Hi all,
> With Kernel 5.19.x we noticed system freezes. This happens in virtual
> environments as well as on real hardware.
> On a real hardware machine we were able to catch the moment of freeze with
> continuous profiling.
Thanks. I forwarded this to Uladzislau and he offered to help. He said:
: I can help with debugging. What i need is reproduce steps. Could you
: please clarify if it is easy to hit and what kind of profiling triggers it?
and
: I do not think that Matthew Wilcox commits destroys it but... I see that
: __vunmap() is invoked by the free_work() thus a caller is in atomic
: context including IRQ context.
> Specification of the machine where we captured the freeze:
> Thinkpad T14
> CPU: AMD Ryzen 7 PRO 4750U
> Kernel: 5.19.8-200.fc36.x86_64
>
> Stacktrace of kworker/12:3 that is using all resources and causing the freeze:
>
> # Source Location Function Name Function Line
> 0 arch/x86/include/asm/vdso/processor.h:13 rep_nop 11
> 1 arch/x86/include/asm/vdso/processor.h:18 cpu_relax 16
> 2 kernel/locking/qspinlock.c:514 native_queued_spin_lock_slowpath
> 316
> 3 kernel/locking/qspinlock.c:316 native_queued_spin_lock_slowpath
> N/A
> 4 arch/x86/include/asm/paravirt.h:591 pv_queued_spin_lock_slowpath
> 588
> 5 arch/x86/include/asm/qspinlock.h:51 queued_spin_lock_slowpath 49
> 6 include/asm-generic/qspinlock.h:114 queued_spin_lock 107
> 7 include/linux/spinlock.h:185 do_raw_spin_lock 182
> 8 include/linux/spinlock_api_smp.h:134 __raw_spin_lock 130
> 9 kernel/locking/spinlock.c:154 _raw_spin_lock 152
> 10 include/linux/spinlock.h:349 spin_lock 347
> 11 mm/vmalloc.c:1805 find_vmap_area 1801
> 12 mm/vmalloc.c:2525 find_vm_area 2521
> 13 mm/vmalloc.c:2639 __vunmap 2628
> 14 mm/vmalloc.c:97 free_work 91
> 15 kernel/workqueue.c:2289 process_one_work 2181
> 16 kernel/workqueue.c:2436 worker_thread 2378
> 17 kernel/kthread.c:376 kthread 330
> 18 N/A ret_from_fork N/A
>
> The functions in the above shown stacktrace hardly change. There is only one
> commit 993d0b287e2ef7bee2e8b13b0ce4d2b5066f278e which introduces changes to
> find_vmap_area() for 5.19.
>
> With this change in mind we looked for stacktraces which make also use of this
> new commit. And in a different kernel thread we do notice the use of
> check_heap_object():
>
> # Source Location Function Name Function Line
> 0 arch/x86/include/asm/paravirt.h:704 arch_local_irq_enable 702
> 1 arch/x86/include/asm/irqflags.h:138 arch_local_irq_restore 135
> 2 kernel/sched/sched.h:1330 raw_spin_rq_unlock_irqrestore 1327
> 3 kernel/sched/sched.h:1327 raw_spin_rq_unlock_irqrestore N/A
> 4 kernel/sched/sched.h:1611 rq_unlock_irqrestore 1607
> 5 kernel/sched/fair.c:8288 update_blocked_averages 8272
> 6 kernel/sched/fair.c:11133 run_rebalance_domains 11115
> 7 kernel/softirq.c:571 __do_softirq 528
> 8 kernel/softirq.c:445 invoke_softirq 433
> 9 kernel/softirq.c:650 __irq_exit_rcu 640
> 10 arch/x86/kernel/apic/apic.c:1106 sysvec_apic_timer_interrupt N/A
> 11 N/A asm_sysvec_apic_timer_interrupt N/A
> 12 include/linux/mmzone.h:1403 __nr_to_section 1395
> 13 include/linux/mmzone.h:1488 __pfn_to_section 1486
> 14 include/linux/mmzone.h:1539 pfn_valid 1524
> 15 arch/x86/mm/physaddr.c:65 __virt_addr_valid 47
> 16 mm/usercopy.c:188 check_heap_object 161
> 17 mm/usercopy.c:250 __check_object_size 212
> 18 mm/usercopy.c:212 __check_object_size N/A
> 19 include/linux/thread_info.h:199 check_object_size 195
> 20 lib/strncpy_from_user.c:137 strncpy_from_user 113
> 21 fs/namei.c:150 getname_flags 129
> 22 fs/namei.c:2896 user_path_at_empty 2893
> 23 include/linux/namei.h:57 user_path_at 54
> 24 fs/open.c:446 do_faccessat 420
> 25 arch/x86/entry/common.c:50 do_syscall_x64 40
> 26 arch/x86/entry/common.c:80 do_syscall_64 73
> 27 N/A entry_SYSCALL_64_after_hwframe N/A
>
> We are neither experts in the mm subsystem nor can provide a fix, but wanted to
> let you know about our findings.
>
> Cheers,
> Florian
>
> --
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
> You are the assignee for the bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Bug 216489] New: Machine freezes due to memory lock
2022-09-15 20:39 ` [Bug 216489] New: Machine freezes due to memory lock Andrew Morton
@ 2022-09-15 22:42 ` Matthew Wilcox
2022-09-15 23:59 ` Yu Zhao
0 siblings, 1 reply; 10+ messages in thread
From: Matthew Wilcox @ 2022-09-15 22:42 UTC (permalink / raw)
To: Andrew Morton; +Cc: dev, linux-mm, Uladzislau Rezki, bugzilla-daemon, Kees Cook
On Thu, Sep 15, 2022 at 01:39:31PM -0700, Andrew Morton wrote:
> (switched to email. Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Wed, 14 Sep 2022 15:07:46 +0000 bugzilla-daemon@kernel.org wrote:
>
> > https://bugzilla.kernel.org/show_bug.cgi?id=216489
> >
> > Bug ID: 216489
> > Summary: Machine freezes due to memory lock
> > Product: Memory Management
> > Version: 2.5
> > Kernel Version: 5.19.8
> > Hardware: AMD
> > OS: Linux
> > Tree: Mainline
> > Status: NEW
> > Severity: high
> > Priority: P1
> > Component: Other
> > Assignee: akpm@linux-foundation.org
> > Reporter: dev@der-flo.net
> > Regression: No
> >
> > Hi all,
> > With Kernel 5.19.x we noticed system freezes. This happens in virtual
> > environments as well as on real hardware.
> > On a real hardware machine we were able to catch the moment of freeze with
> > continuous profiling.
>
> Thanks. I forwarded this to Uladzislau and he offered to help. He said:
>
>
> : I can help with debugging. What i need is reproduce steps. Could you
> : please clarify if it is easy to hit and what kind of profiling triggers it?
>
> and
>
> : I do not think that Matthew Wilcox commits destroys it but... I see that
> : __vunmap() is invoked by the free_work() thus a caller is in atomic
> : context including IRQ context.
To keep all of this information together; Florian emailed me off-list
and I replied cc'ing Kees.
I asked to try this patch to decide whether it's the extra load on the
spinlock from examining the vmap tree more often:
diff --git a/mm/usercopy.c b/mm/usercopy.c
index c1ee15a98633..76d2d4fb6d22 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -173,15 +173,6 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
}
if (is_vmalloc_addr(ptr)) {
- struct vmap_area *area = find_vmap_area(addr);
-
- if (!area)
- usercopy_abort("vmalloc", "no area", to_user, 0, n);
-
- if (n > area->va_end - addr) {
- offset = addr - area->va_start;
- usercopy_abort("vmalloc", NULL, to_user, offset, n);
- }
return;
}
Kees wrote:
} If you can reproduce the hangs, perhaps enable:
}
} CONFIG_DEBUG_LOCKDEP=y
} CONFIG_DEBUG_ATOMIC_SLEEP=y
}
} I would expect any hung spinlock to complain very loudly under
} LOCKDEP...
I hope we can keep the remainder of the debugging in this email thread.
> > Specification of the machine where we captured the freeze:
> > Thinkpad T14
> > CPU: AMD Ryzen 7 PRO 4750U
> > Kernel: 5.19.8-200.fc36.x86_64
> >
> > Stacktrace of kworker/12:3 that is using all resources and causing the freeze:
> >
> > # Source Location Function Name Function Line
> > 0 arch/x86/include/asm/vdso/processor.h:13 rep_nop 11
> > 1 arch/x86/include/asm/vdso/processor.h:18 cpu_relax 16
> > 2 kernel/locking/qspinlock.c:514 native_queued_spin_lock_slowpath
> > 316
> > 3 kernel/locking/qspinlock.c:316 native_queued_spin_lock_slowpath
> > N/A
> > 4 arch/x86/include/asm/paravirt.h:591 pv_queued_spin_lock_slowpath
> > 588
> > 5 arch/x86/include/asm/qspinlock.h:51 queued_spin_lock_slowpath 49
> > 6 include/asm-generic/qspinlock.h:114 queued_spin_lock 107
> > 7 include/linux/spinlock.h:185 do_raw_spin_lock 182
> > 8 include/linux/spinlock_api_smp.h:134 __raw_spin_lock 130
> > 9 kernel/locking/spinlock.c:154 _raw_spin_lock 152
> > 10 include/linux/spinlock.h:349 spin_lock 347
> > 11 mm/vmalloc.c:1805 find_vmap_area 1801
> > 12 mm/vmalloc.c:2525 find_vm_area 2521
> > 13 mm/vmalloc.c:2639 __vunmap 2628
> > 14 mm/vmalloc.c:97 free_work 91
> > 15 kernel/workqueue.c:2289 process_one_work 2181
> > 16 kernel/workqueue.c:2436 worker_thread 2378
> > 17 kernel/kthread.c:376 kthread 330
> > 18 N/A ret_from_fork N/A
> >
> > The functions in the above shown stacktrace hardly change. There is only one
> > commit 993d0b287e2ef7bee2e8b13b0ce4d2b5066f278e which introduces changes to
> > find_vmap_area() for 5.19.
> >
> > With this change in mind we looked for stacktraces which make also use of this
> > new commit. And in a different kernel thread we do notice the use of
> > check_heap_object():
> >
> > # Source Location Function Name Function Line
> > 0 arch/x86/include/asm/paravirt.h:704 arch_local_irq_enable 702
> > 1 arch/x86/include/asm/irqflags.h:138 arch_local_irq_restore 135
> > 2 kernel/sched/sched.h:1330 raw_spin_rq_unlock_irqrestore 1327
> > 3 kernel/sched/sched.h:1327 raw_spin_rq_unlock_irqrestore N/A
> > 4 kernel/sched/sched.h:1611 rq_unlock_irqrestore 1607
> > 5 kernel/sched/fair.c:8288 update_blocked_averages 8272
> > 6 kernel/sched/fair.c:11133 run_rebalance_domains 11115
> > 7 kernel/softirq.c:571 __do_softirq 528
> > 8 kernel/softirq.c:445 invoke_softirq 433
> > 9 kernel/softirq.c:650 __irq_exit_rcu 640
> > 10 arch/x86/kernel/apic/apic.c:1106 sysvec_apic_timer_interrupt N/A
> > 11 N/A asm_sysvec_apic_timer_interrupt N/A
> > 12 include/linux/mmzone.h:1403 __nr_to_section 1395
> > 13 include/linux/mmzone.h:1488 __pfn_to_section 1486
> > 14 include/linux/mmzone.h:1539 pfn_valid 1524
> > 15 arch/x86/mm/physaddr.c:65 __virt_addr_valid 47
> > 16 mm/usercopy.c:188 check_heap_object 161
> > 17 mm/usercopy.c:250 __check_object_size 212
> > 18 mm/usercopy.c:212 __check_object_size N/A
> > 19 include/linux/thread_info.h:199 check_object_size 195
> > 20 lib/strncpy_from_user.c:137 strncpy_from_user 113
> > 21 fs/namei.c:150 getname_flags 129
> > 22 fs/namei.c:2896 user_path_at_empty 2893
> > 23 include/linux/namei.h:57 user_path_at 54
> > 24 fs/open.c:446 do_faccessat 420
> > 25 arch/x86/entry/common.c:50 do_syscall_x64 40
> > 26 arch/x86/entry/common.c:80 do_syscall_64 73
> > 27 N/A entry_SYSCALL_64_after_hwframe N/A
> >
> > We are neither experts in the mm subsystem nor can provide a fix, but wanted to
> > let you know about our findings.
> >
> > Cheers,
> > Florian
> >
> > --
> > You may reply to this email to add a comment.
> >
> > You are receiving this mail because:
> > You are the assignee for the bug.
>
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [Bug 216489] New: Machine freezes due to memory lock
2022-09-15 22:42 ` Matthew Wilcox
@ 2022-09-15 23:59 ` Yu Zhao
2022-09-16 8:38 ` Matthew Wilcox
0 siblings, 1 reply; 10+ messages in thread
From: Yu Zhao @ 2022-09-15 23:59 UTC (permalink / raw)
To: Matthew Wilcox, Andrew Morton
Cc: dev, Linux-MM, Uladzislau Rezki, bugzilla-daemon, Kees Cook
On Thu, Sep 15, 2022 at 4:42 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Sep 15, 2022 at 01:39:31PM -0700, Andrew Morton wrote:
> > (switched to email. Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> >
> > On Wed, 14 Sep 2022 15:07:46 +0000 bugzilla-daemon@kernel.org wrote:
> >
> > > https://bugzilla.kernel.org/show_bug.cgi?id=216489
> > >
> > > Bug ID: 216489
> > > Summary: Machine freezes due to memory lock
> > > Product: Memory Management
> > > Version: 2.5
> > > Kernel Version: 5.19.8
> > > Hardware: AMD
> > > OS: Linux
> > > Tree: Mainline
> > > Status: NEW
> > > Severity: high
> > > Priority: P1
> > > Component: Other
> > > Assignee: akpm@linux-foundation.org
> > > Reporter: dev@der-flo.net
> > > Regression: No
> > >
> > > Hi all,
> > > With Kernel 5.19.x we noticed system freezes. This happens in virtual
> > > environments as well as on real hardware.
> > > On a real hardware machine we were able to catch the moment of freeze with
> > > continuous profiling.
> >
> > Thanks. I forwarded this to Uladzislau and he offered to help. He said:
> >
> >
> > : I can help with debugging. What i need is reproduce steps. Could you
> > : please clarify if it is easy to hit and what kind of profiling triggers it?
> >
> > and
> >
> > : I do not think that Matthew Wilcox commits destroys it but... I see that
> > : __vunmap() is invoked by the free_work() thus a caller is in atomic
> > : context including IRQ context.
>
> To keep all of this information together; Florian emailed me off-list
> and I replied cc'ing Kees.
>
> I asked to try this patch to decide whether it's the extra load on the
> spinlock from examining the vmap tree more often:
>
> diff --git a/mm/usercopy.c b/mm/usercopy.c
> index c1ee15a98633..76d2d4fb6d22 100644
> --- a/mm/usercopy.c
> +++ b/mm/usercopy.c
> @@ -173,15 +173,6 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
> }
>
> if (is_vmalloc_addr(ptr)) {
> - struct vmap_area *area = find_vmap_area(addr);
> -
> - if (!area)
> - usercopy_abort("vmalloc", "no area", to_user, 0, n);
> -
> - if (n > area->va_end - addr) {
> - offset = addr - area->va_start;
> - usercopy_abort("vmalloc", NULL, to_user, offset, n);
> - }
> return;
> }
>
>
> Kees wrote:
>
> } If you can reproduce the hangs, perhaps enable:
> }
> } CONFIG_DEBUG_LOCKDEP=y
> } CONFIG_DEBUG_ATOMIC_SLEEP=y
> }
> } I would expect any hung spinlock to complain very loudly under
> } LOCKDEP...
>
> I hope we can keep the remainder of the debugging in this email thread.
>
> > > Specification of the machine where we captured the freeze:
> > > Thinkpad T14
> > > CPU: AMD Ryzen 7 PRO 4750U
> > > Kernel: 5.19.8-200.fc36.x86_64
> > >
> > > Stacktrace of kworker/12:3 that is using all resources and causing the freeze:
> > >
> > > # Source Location Function Name Function Line
> > > 0 arch/x86/include/asm/vdso/processor.h:13 rep_nop 11
> > > 1 arch/x86/include/asm/vdso/processor.h:18 cpu_relax 16
> > > 2 kernel/locking/qspinlock.c:514 native_queued_spin_lock_slowpath
> > > 316
> > > 3 kernel/locking/qspinlock.c:316 native_queued_spin_lock_slowpath
> > > N/A
> > > 4 arch/x86/include/asm/paravirt.h:591 pv_queued_spin_lock_slowpath
> > > 588
> > > 5 arch/x86/include/asm/qspinlock.h:51 queued_spin_lock_slowpath 49
> > > 6 include/asm-generic/qspinlock.h:114 queued_spin_lock 107
> > > 7 include/linux/spinlock.h:185 do_raw_spin_lock 182
> > > 8 include/linux/spinlock_api_smp.h:134 __raw_spin_lock 130
> > > 9 kernel/locking/spinlock.c:154 _raw_spin_lock 152
> > > 10 include/linux/spinlock.h:349 spin_lock 347
> > > 11 mm/vmalloc.c:1805 find_vmap_area 1801
> > > 12 mm/vmalloc.c:2525 find_vm_area 2521
> > > 13 mm/vmalloc.c:2639 __vunmap 2628
> > > 14 mm/vmalloc.c:97 free_work 91
> > > 15 kernel/workqueue.c:2289 process_one_work 2181
> > > 16 kernel/workqueue.c:2436 worker_thread 2378
> > > 17 kernel/kthread.c:376 kthread 330
> > > 18 N/A ret_from_fork N/A
> > >
> > > The functions in the above shown stacktrace hardly change. There is only one
> > > commit 993d0b287e2ef7bee2e8b13b0ce4d2b5066f278e which introduces changes to
> > > find_vmap_area() for 5.19.
> > >
> > > With this change in mind we looked for stacktraces which make also use of this
> > > new commit. And in a different kernel thread we do notice the use of
> > > check_heap_object():
> > >
> > > # Source Location Function Name Function Line
> > > 0 arch/x86/include/asm/paravirt.h:704 arch_local_irq_enable 702
> > > 1 arch/x86/include/asm/irqflags.h:138 arch_local_irq_restore 135
> > > 2 kernel/sched/sched.h:1330 raw_spin_rq_unlock_irqrestore 1327
> > > 3 kernel/sched/sched.h:1327 raw_spin_rq_unlock_irqrestore N/A
> > > 4 kernel/sched/sched.h:1611 rq_unlock_irqrestore 1607
> > > 5 kernel/sched/fair.c:8288 update_blocked_averages 8272
> > > 6 kernel/sched/fair.c:11133 run_rebalance_domains 11115
> > > 7 kernel/softirq.c:571 __do_softirq 528
> > > 8 kernel/softirq.c:445 invoke_softirq 433
> > > 9 kernel/softirq.c:650 __irq_exit_rcu 640
> > > 10 arch/x86/kernel/apic/apic.c:1106 sysvec_apic_timer_interrupt N/A
> > > 11 N/A asm_sysvec_apic_timer_interrupt N/A
> > > 12 include/linux/mmzone.h:1403 __nr_to_section 1395
> > > 13 include/linux/mmzone.h:1488 __pfn_to_section 1486
> > > 14 include/linux/mmzone.h:1539 pfn_valid 1524
> > > 15 arch/x86/mm/physaddr.c:65 __virt_addr_valid 47
> > > 16 mm/usercopy.c:188 check_heap_object 161
> > > 17 mm/usercopy.c:250 __check_object_size 212
> > > 18 mm/usercopy.c:212 __check_object_size N/A
> > > 19 include/linux/thread_info.h:199 check_object_size 195
> > > 20 lib/strncpy_from_user.c:137 strncpy_from_user 113
> > > 21 fs/namei.c:150 getname_flags 129
> > > 22 fs/namei.c:2896 user_path_at_empty 2893
> > > 23 include/linux/namei.h:57 user_path_at 54
> > > 24 fs/open.c:446 do_faccessat 420
> > > 25 arch/x86/entry/common.c:50 do_syscall_x64 40
> > > 26 arch/x86/entry/common.c:80 do_syscall_64 73
> > > 27 N/A entry_SYSCALL_64_after_hwframe N/A
> > >
> > > We are neither experts in the mm subsystem nor can provide a fix, but wanted to
> > > let you know about our findings.
> > >
> > > Cheers,
> > > Florian
> > >
> > > --
> > > You may reply to this email to add a comment.
> > >
> > > You are receiving this mail because:
> > > You are the assignee for the bug.
I think this is a manifest of the lockdep warning I reported a couple
of weeks ago:
https://lore.kernel.org/r/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Bug 216489] New: Machine freezes due to memory lock
2022-09-15 23:59 ` Yu Zhao
@ 2022-09-16 8:38 ` Matthew Wilcox
2022-09-16 9:46 ` Kees Cook
0 siblings, 1 reply; 10+ messages in thread
From: Matthew Wilcox @ 2022-09-16 8:38 UTC (permalink / raw)
To: Yu Zhao
Cc: Andrew Morton, dev, Linux-MM, Uladzislau Rezki, bugzilla-daemon,
Kees Cook
On Thu, Sep 15, 2022 at 05:59:56PM -0600, Yu Zhao wrote:
> I think this is a manifest of the lockdep warning I reported a couple
> of weeks ago:
> https://lore.kernel.org/r/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com/
That would certainly match the symptoms.
Turning vmap_lock into an NMI-safe lock would be bad. I don't even know
if we have primitives for that (it's not like you can disable an NMI
...)
I don't quite have time to write a patch right now. Perhaps something
like:
struct vmap_area *find_vmap_area_nmi(unsigned long addr)
{
struct vmap_area *va;
if (spin_trylock(&vmap_area_lock))
return NULL;
va = __find_vmap_area(addr, &vmap_area_root);
spin_unlock(&vmap_area_lock);
return va;
}
and then call find_vmap_area_nmi() in check_heap_object(). I may have
the polarity of the return value of spin_trylock() incorrect.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Bug 216489] New: Machine freezes due to memory lock
2022-09-16 8:38 ` Matthew Wilcox
@ 2022-09-16 9:46 ` Kees Cook
2022-09-16 12:28 ` Uladzislau Rezki
2022-09-16 14:15 ` Matthew Wilcox
0 siblings, 2 replies; 10+ messages in thread
From: Kees Cook @ 2022-09-16 9:46 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Yu Zhao, Andrew Morton, dev, Linux-MM, Uladzislau Rezki, bugzilla-daemon
On Fri, Sep 16, 2022 at 09:38:33AM +0100, Matthew Wilcox wrote:
> On Thu, Sep 15, 2022 at 05:59:56PM -0600, Yu Zhao wrote:
> > I think this is a manifest of the lockdep warning I reported a couple
> > of weeks ago:
> > https://lore.kernel.org/r/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com/
>
> That would certainly match the symptoms.
>
> Turning vmap_lock into an NMI-safe lock would be bad. I don't even know
> if we have primitives for that (it's not like you can disable an NMI
> ...)
>
> I don't quite have time to write a patch right now. Perhaps something
> like:
>
> struct vmap_area *find_vmap_area_nmi(unsigned long addr)
> {
> struct vmap_area *va;
>
> if (spin_trylock(&vmap_area_lock))
> return NULL;
> va = __find_vmap_area(addr, &vmap_area_root);
> spin_unlock(&vmap_area_lock);
>
> return va;
> }
>
> and then call find_vmap_area_nmi() in check_heap_object(). I may have
> the polarity of the return value of spin_trylock() incorrect.
I think we'll need something slightly tweaked, since this would
return NULL under any contention (and a NULL return is fatal in
check_heap_object()). It seems like we need to explicitly check
for being in nmi context in check_heap_object() to deal with it?
Like this (only build tested):
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 096d48aa3437..c8a00f181a11 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -216,6 +216,7 @@ void free_vm_area(struct vm_struct *area);
extern struct vm_struct *remove_vm_area(const void *addr);
extern struct vm_struct *find_vm_area(const void *addr);
struct vmap_area *find_vmap_area(unsigned long addr);
+struct vmap_area *find_vmap_area_try(unsigned long addr);
static inline bool is_vm_area_hugepages(const void *addr)
{
diff --git a/mm/usercopy.c b/mm/usercopy.c
index c1ee15a98633..9f943c29e7ec 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -173,7 +173,16 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
}
if (is_vmalloc_addr(ptr)) {
- struct vmap_area *area = find_vmap_area(addr);
+ struct vmap_area *area;
+
+ if (!in_nmi()) {
+ area = find_vmap_area(addr);
+ } else {
+ area = find_vmap_area_try(addr);
+ /* Give up under NMI to avoid deadlocks. */
+ if (!area)
+ return;
+ }
if (!area)
usercopy_abort("vmalloc", "no area", to_user, 0, n);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index dd6cdb201195..f14f1902c2f6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1840,6 +1840,17 @@ struct vmap_area *find_vmap_area(unsigned long addr)
return va;
}
+struct vmap_area *find_vmap_area_try(unsigned long addr)
+{
+ struct vmap_area *va = NULL;
+
+ if (spin_trylock(&vmap_area_lock)) {
+ va = __find_vmap_area(addr, &vmap_area_root);
+ spin_unlock(&vmap_area_lock);
+ }
+ return va;
+}
+
/*** Per cpu kva allocator ***/
/*
--
Kees Cook
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [Bug 216489] New: Machine freezes due to memory lock
2022-09-16 9:46 ` Kees Cook
@ 2022-09-16 12:28 ` Uladzislau Rezki
2022-09-16 12:32 ` Uladzislau Rezki
2022-09-16 14:15 ` Matthew Wilcox
1 sibling, 1 reply; 10+ messages in thread
From: Uladzislau Rezki @ 2022-09-16 12:28 UTC (permalink / raw)
To: Kees Cook, Matthew Wilcox, Yu Zhao, Andrew Morton
Cc: Matthew Wilcox, Yu Zhao, Andrew Morton, dev, Linux-MM,
Uladzislau Rezki, bugzilla-daemon
On Fri, Sep 16, 2022 at 02:46:39AM -0700, Kees Cook wrote:
> On Fri, Sep 16, 2022 at 09:38:33AM +0100, Matthew Wilcox wrote:
> > On Thu, Sep 15, 2022 at 05:59:56PM -0600, Yu Zhao wrote:
> > > I think this is a manifest of the lockdep warning I reported a couple
> > > of weeks ago:
> > > https://lore.kernel.org/r/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com/
> >
> > That would certainly match the symptoms.
> >
> > Turning vmap_lock into an NMI-safe lock would be bad. I don't even know
> > if we have primitives for that (it's not like you can disable an NMI
> > ...)
> >
> > I don't quite have time to write a patch right now. Perhaps something
> > like:
> >
> > struct vmap_area *find_vmap_area_nmi(unsigned long addr)
> > {
> > struct vmap_area *va;
> >
> > if (spin_trylock(&vmap_area_lock))
> > return NULL;
> > va = __find_vmap_area(addr, &vmap_area_root);
> > spin_unlock(&vmap_area_lock);
> >
> > return va;
> > }
> >
> > and then call find_vmap_area_nmi() in check_heap_object(). I may have
> > the polarity of the return value of spin_trylock() incorrect.
>
> I think we'll need something slightly tweaked, since this would
> return NULL under any contention (and a NULL return is fatal in
> check_heap_object()). It seems like we need to explicitly check
> for being in nmi context in check_heap_object() to deal with it?
> Like this (only build tested):
>
>
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 096d48aa3437..c8a00f181a11 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -216,6 +216,7 @@ void free_vm_area(struct vm_struct *area);
> extern struct vm_struct *remove_vm_area(const void *addr);
> extern struct vm_struct *find_vm_area(const void *addr);
> struct vmap_area *find_vmap_area(unsigned long addr);
> +struct vmap_area *find_vmap_area_try(unsigned long addr);
>
> static inline bool is_vm_area_hugepages(const void *addr)
> {
> diff --git a/mm/usercopy.c b/mm/usercopy.c
> index c1ee15a98633..9f943c29e7ec 100644
> --- a/mm/usercopy.c
> +++ b/mm/usercopy.c
> @@ -173,7 +173,16 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
> }
>
> if (is_vmalloc_addr(ptr)) {
> - struct vmap_area *area = find_vmap_area(addr);
> + struct vmap_area *area;
> +
> + if (!in_nmi()) {
> + area = find_vmap_area(addr);
> + } else {
> + area = find_vmap_area_try(addr);
> + /* Give up under NMI to avoid deadlocks. */
> + if (!area)
> + return;
> + }
>
> if (!area)
> usercopy_abort("vmalloc", "no area", to_user, 0, n);
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index dd6cdb201195..f14f1902c2f6 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1840,6 +1840,17 @@ struct vmap_area *find_vmap_area(unsigned long addr)
> return va;
> }
>
> +struct vmap_area *find_vmap_area_try(unsigned long addr)
> +{
> + struct vmap_area *va = NULL;
> +
> + if (spin_trylock(&vmap_area_lock)) {
> + va = __find_vmap_area(addr, &vmap_area_root);
> + spin_unlock(&vmap_area_lock);
> + }
> + return va;
> +}
> +
> /*** Per cpu kva allocator ***/
>
> /*
>
OK. The problem is about using find_vmap_area() from the IRQ context. Indeed
it can be dead-locked. It is not supposed to be used there. But if you want
then we should have a helper.
Please note that it might be a regular IRQ also so it is not limited to NMI
context only, because somebody could decide later to use it from a regular
IRQ.
IMHO, it makes sense to make use of in_interrupt() helper instead so we
cover here a hw-IRQ context including NMI one. It also would be aligned
with deferred vfreeing:
<snip>
tatic void __vfree(const void *addr)
{
if (unlikely(in_interrupt()))
__vfree_deferred(addr);
else
__vunmap(addr, 1);
}
<snip>
so we handle here not only NMI scenario. I think we should align.
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Bug 216489] New: Machine freezes due to memory lock
2022-09-16 12:28 ` Uladzislau Rezki
@ 2022-09-16 12:32 ` Uladzislau Rezki
0 siblings, 0 replies; 10+ messages in thread
From: Uladzislau Rezki @ 2022-09-16 12:32 UTC (permalink / raw)
To: Kees Cook, Matthew Wilcox, Yu Zhao, Andrew Morton
Cc: Kees Cook, Matthew Wilcox, Yu Zhao, Andrew Morton, dev, Linux-MM,
bugzilla-daemon
On Fri, Sep 16, 2022 at 02:28:08PM +0200, Uladzislau Rezki wrote:
> On Fri, Sep 16, 2022 at 02:46:39AM -0700, Kees Cook wrote:
> > On Fri, Sep 16, 2022 at 09:38:33AM +0100, Matthew Wilcox wrote:
> > > On Thu, Sep 15, 2022 at 05:59:56PM -0600, Yu Zhao wrote:
> > > > I think this is a manifest of the lockdep warning I reported a couple
> > > > of weeks ago:
> > > > https://lore.kernel.org/r/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com/
> > >
> > > That would certainly match the symptoms.
> > >
> > > Turning vmap_lock into an NMI-safe lock would be bad. I don't even know
> > > if we have primitives for that (it's not like you can disable an NMI
> > > ...)
> > >
> > > I don't quite have time to write a patch right now. Perhaps something
> > > like:
> > >
> > > struct vmap_area *find_vmap_area_nmi(unsigned long addr)
> > > {
> > > struct vmap_area *va;
> > >
> > > if (spin_trylock(&vmap_area_lock))
> > > return NULL;
> > > va = __find_vmap_area(addr, &vmap_area_root);
> > > spin_unlock(&vmap_area_lock);
> > >
> > > return va;
> > > }
> > >
> > > and then call find_vmap_area_nmi() in check_heap_object(). I may have
> > > the polarity of the return value of spin_trylock() incorrect.
> >
> > I think we'll need something slightly tweaked, since this would
> > return NULL under any contention (and a NULL return is fatal in
> > check_heap_object()). It seems like we need to explicitly check
> > for being in nmi context in check_heap_object() to deal with it?
> > Like this (only build tested):
> >
> >
> > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> > index 096d48aa3437..c8a00f181a11 100644
> > --- a/include/linux/vmalloc.h
> > +++ b/include/linux/vmalloc.h
> > @@ -216,6 +216,7 @@ void free_vm_area(struct vm_struct *area);
> > extern struct vm_struct *remove_vm_area(const void *addr);
> > extern struct vm_struct *find_vm_area(const void *addr);
> > struct vmap_area *find_vmap_area(unsigned long addr);
> > +struct vmap_area *find_vmap_area_try(unsigned long addr);
> >
> > static inline bool is_vm_area_hugepages(const void *addr)
> > {
> > diff --git a/mm/usercopy.c b/mm/usercopy.c
> > index c1ee15a98633..9f943c29e7ec 100644
> > --- a/mm/usercopy.c
> > +++ b/mm/usercopy.c
> > @@ -173,7 +173,16 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
> > }
> >
> > if (is_vmalloc_addr(ptr)) {
> > - struct vmap_area *area = find_vmap_area(addr);
> > + struct vmap_area *area;
> > +
> > + if (!in_nmi()) {
> > + area = find_vmap_area(addr);
> > + } else {
> > + area = find_vmap_area_try(addr);
> > + /* Give up under NMI to avoid deadlocks. */
> > + if (!area)
> > + return;
> > + }
> >
> > if (!area)
> > usercopy_abort("vmalloc", "no area", to_user, 0, n);
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index dd6cdb201195..f14f1902c2f6 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -1840,6 +1840,17 @@ struct vmap_area *find_vmap_area(unsigned long addr)
> > return va;
> > }
> >
> > +struct vmap_area *find_vmap_area_try(unsigned long addr)
> > +{
> > + struct vmap_area *va = NULL;
> > +
> > + if (spin_trylock(&vmap_area_lock)) {
> > + va = __find_vmap_area(addr, &vmap_area_root);
> > + spin_unlock(&vmap_area_lock);
> > + }
> > + return va;
> > +}
> > +
> > /*** Per cpu kva allocator ***/
> >
> > /*
> >
> OK. The problem is about using find_vmap_area() from the IRQ context. Indeed
> it can be dead-locked. It is not supposed to be used there. But if you want
> then we should have a helper.
>
> Please note that it might be a regular IRQ also so it is not limited to NMI
> context only, because somebody could decide later to use it from a regular
> IRQ.
>
> IMHO, it makes sense to make use of in_interrupt() helper instead so we
> cover here a hw-IRQ context including NMI one. It also would be aligned
> with deferred vfreeing:
>
> <snip>
> tatic void __vfree(const void *addr)
> {
> if (unlikely(in_interrupt()))
> __vfree_deferred(addr);
> else
> __vunmap(addr, 1);
> }
> <snip>
>
> so we handle here not only NMI scenario. I think we should align.
>
Another thing that i should mention is, using sleepable locks(it is for
PREEMPT_RT) is not allowed in any atomic. So for PREEMPT_RT point of view
it is broken.
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Bug 216489] New: Machine freezes due to memory lock
2022-09-16 9:46 ` Kees Cook
2022-09-16 12:28 ` Uladzislau Rezki
@ 2022-09-16 14:15 ` Matthew Wilcox
2022-09-16 14:42 ` Kees Cook
2022-09-16 18:47 ` Uladzislau Rezki
1 sibling, 2 replies; 10+ messages in thread
From: Matthew Wilcox @ 2022-09-16 14:15 UTC (permalink / raw)
To: Kees Cook
Cc: Yu Zhao, Andrew Morton, dev, Linux-MM, Uladzislau Rezki, bugzilla-daemon
On Fri, Sep 16, 2022 at 02:46:39AM -0700, Kees Cook wrote:
> On Fri, Sep 16, 2022 at 09:38:33AM +0100, Matthew Wilcox wrote:
> > On Thu, Sep 15, 2022 at 05:59:56PM -0600, Yu Zhao wrote:
> > > I think this is a manifest of the lockdep warning I reported a couple
> > > of weeks ago:
> > > https://lore.kernel.org/r/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com/
> >
> > That would certainly match the symptoms.
> >
> > Turning vmap_lock into an NMI-safe lock would be bad. I don't even know
> > if we have primitives for that (it's not like you can disable an NMI
> > ...)
> >
> > I don't quite have time to write a patch right now. Perhaps something
> > like:
> >
> > struct vmap_area *find_vmap_area_nmi(unsigned long addr)
> > {
> > struct vmap_area *va;
> >
> > if (spin_trylock(&vmap_area_lock))
> > return NULL;
> > va = __find_vmap_area(addr, &vmap_area_root);
> > spin_unlock(&vmap_area_lock);
> >
> > return va;
> > }
> >
> > and then call find_vmap_area_nmi() in check_heap_object(). I may have
> > the polarity of the return value of spin_trylock() incorrect.
>
> I think we'll need something slightly tweaked, since this would
> return NULL under any contention (and a NULL return is fatal in
> check_heap_object()). It seems like we need to explicitly check
> for being in nmi context in check_heap_object() to deal with it?
> Like this (only build tested):
Right, and Ulad is right about it beig callable from any context. I think
the longterm solution is to make the vmap_area_root tree walkable under
RCU protection.
For now, let's have a distinct return code (ERR_PTR(-EAGAIN), perhaps?) to
indicate that we've hit contention. It generally won't matter if we
hit it in process context because hardening doesn't have to be 100%
reliable to be useful.
Erm ... so what prevents this race:
CPU 0 CPU 1
copy_to_user()
check_heap_object()
area = find_vmap_area(addr)
__purge_vmap_area_lazy()
merge_or_add_vmap_area_augment()
__merge_or_add_vmap_area()
kmem_cache_free(vmap_area_cachep, va);
if (n > area->va_end - addr) {
Yes, it's a race in the code that allocated this memory; they're
simultaneously calling copy_to_user() and __vunmap(). We'll catch
this bad behaviour sooner rather than later, but sometimes in trying to
catch this bug, we'll get caught by the bug and go splat.
I don't know that we need to go through heroics to be sure we don't
get caught by this bug. It already has to run a workqueue to do the
freeing. We could delay it even further with RCU or something, but
we're only trading off one kind of badness for another.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Bug 216489] New: Machine freezes due to memory lock
2022-09-16 14:15 ` Matthew Wilcox
@ 2022-09-16 14:42 ` Kees Cook
2022-09-16 18:47 ` Uladzislau Rezki
1 sibling, 0 replies; 10+ messages in thread
From: Kees Cook @ 2022-09-16 14:42 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Yu Zhao, Andrew Morton, dev, Linux-MM, Uladzislau Rezki, bugzilla-daemon
On Fri, Sep 16, 2022 at 03:15:05PM +0100, Matthew Wilcox wrote:
> Right, and Ulad is right about it beig callable from any context. I think
> the longterm solution is to make the vmap_area_root tree walkable under
> RCU protection.
Agreed, I've updated my proposed patch.
> For now, let's have a distinct return code (ERR_PTR(-EAGAIN), perhaps?) to
> indicate that we've hit contention. It generally won't matter if we
> hit it in process context because hardening doesn't have to be 100%
> reliable to be useful.
Right, as I note in the series[1], hardening shouldn't be getting called
_at all_ in this path. :P
-Kees
[1] https://lore.kernel.org/linux-hardening/20220916135953.1320601-1-keescook@chromium.org/
--
Kees Cook
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Bug 216489] New: Machine freezes due to memory lock
2022-09-16 14:15 ` Matthew Wilcox
2022-09-16 14:42 ` Kees Cook
@ 2022-09-16 18:47 ` Uladzislau Rezki
1 sibling, 0 replies; 10+ messages in thread
From: Uladzislau Rezki @ 2022-09-16 18:47 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Kees Cook, Yu Zhao, Andrew Morton, dev, Linux-MM,
Uladzislau Rezki, bugzilla-daemon
On Fri, Sep 16, 2022 at 03:15:05PM +0100, Matthew Wilcox wrote:
> On Fri, Sep 16, 2022 at 02:46:39AM -0700, Kees Cook wrote:
> > On Fri, Sep 16, 2022 at 09:38:33AM +0100, Matthew Wilcox wrote:
> > > On Thu, Sep 15, 2022 at 05:59:56PM -0600, Yu Zhao wrote:
> > > > I think this is a manifest of the lockdep warning I reported a couple
> > > > of weeks ago:
> > > > https://lore.kernel.org/r/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com/
> > >
> > > That would certainly match the symptoms.
> > >
> > > Turning vmap_lock into an NMI-safe lock would be bad. I don't even know
> > > if we have primitives for that (it's not like you can disable an NMI
> > > ...)
> > >
> > > I don't quite have time to write a patch right now. Perhaps something
> > > like:
> > >
> > > struct vmap_area *find_vmap_area_nmi(unsigned long addr)
> > > {
> > > struct vmap_area *va;
> > >
> > > if (spin_trylock(&vmap_area_lock))
> > > return NULL;
> > > va = __find_vmap_area(addr, &vmap_area_root);
> > > spin_unlock(&vmap_area_lock);
> > >
> > > return va;
> > > }
> > >
> > > and then call find_vmap_area_nmi() in check_heap_object(). I may have
> > > the polarity of the return value of spin_trylock() incorrect.
> >
> > I think we'll need something slightly tweaked, since this would
> > return NULL under any contention (and a NULL return is fatal in
> > check_heap_object()). It seems like we need to explicitly check
> > for being in nmi context in check_heap_object() to deal with it?
> > Like this (only build tested):
>
> Right, and Ulad is right about it beig callable from any context. I think
> the longterm solution is to make the vmap_area_root tree walkable under
> RCU protection.
>
> For now, let's have a distinct return code (ERR_PTR(-EAGAIN), perhaps?) to
> indicate that we've hit contention. It generally won't matter if we
> hit it in process context because hardening doesn't have to be 100%
> reliable to be useful.
>
> Erm ... so what prevents this race:
>
> CPU 0 CPU 1
> copy_to_user()
> check_heap_object()
> area = find_vmap_area(addr)
> __purge_vmap_area_lazy()
> merge_or_add_vmap_area_augment()
> __merge_or_add_vmap_area()
> kmem_cache_free(vmap_area_cachep, va);
>
Sounds like it can happen. I think it is a good argument to switch to
the RCU usage here for safe access to va after the lock is released.
So i can think about it and put it as task to my todo list. Since it
is not urgent so far it is OK to wait for a splat. But it might never
happens :)
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-09-16 18:47 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <bug-216489-27@https.bugzilla.kernel.org/>
2022-09-15 20:39 ` [Bug 216489] New: Machine freezes due to memory lock Andrew Morton
2022-09-15 22:42 ` Matthew Wilcox
2022-09-15 23:59 ` Yu Zhao
2022-09-16 8:38 ` Matthew Wilcox
2022-09-16 9:46 ` Kees Cook
2022-09-16 12:28 ` Uladzislau Rezki
2022-09-16 12:32 ` Uladzislau Rezki
2022-09-16 14:15 ` Matthew Wilcox
2022-09-16 14:42 ` Kees Cook
2022-09-16 18:47 ` Uladzislau Rezki
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.