All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Bug 216489] New: Machine freezes due to memory lock
       [not found] <bug-216489-27@https.bugzilla.kernel.org/>
@ 2022-09-15 20:39 ` Andrew Morton
  2022-09-15 22:42   ` Matthew Wilcox
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2022-09-15 20:39 UTC (permalink / raw)
  To: dev, linux-mm, Uladzislau Rezki; +Cc: bugzilla-daemon

(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Wed, 14 Sep 2022 15:07:46 +0000 bugzilla-daemon@kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=216489
> 
>             Bug ID: 216489
>            Summary: Machine freezes due to memory lock
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 5.19.8
>           Hardware: AMD
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: Other
>           Assignee: akpm@linux-foundation.org
>           Reporter: dev@der-flo.net
>         Regression: No
> 
> Hi all,
> With Kernel 5.19.x we noticed system freezes. This happens in virtual
> environments as well as on real hardware.
> On a real hardware machine we were able to catch the moment of freeze with
> continuous profiling.

Thanks.  I forwarded this to Uladzislau and he offered to help.  He said:


: I can help with debugging. What i need is reproduce steps. Could you
: please clarify if it is easy to hit and what kind of profiling triggers it?

and

: I do not think that Matthew Wilcox commits destroys it but... I see that
: __vunmap() is invoked by the free_work() thus a caller is in atomic
: context including IRQ context.

> Specification of the machine where we captured the freeze:
> Thinkpad T14
> CPU: AMD Ryzen 7 PRO 4750U
> Kernel: 5.19.8-200.fc36.x86_64
> 
> Stacktrace of kworker/12:3 that is using all resources and causing the freeze:
> 
> #   Source Location                 Function Name               Function Line
> 0   arch/x86/include/asm/vdso/processor.h:13    rep_nop                 11
> 1   arch/x86/include/asm/vdso/processor.h:18    cpu_relax               16
> 2   kernel/locking/qspinlock.c:514          native_queued_spin_lock_slowpath   
> 316
> 3   kernel/locking/qspinlock.c:316          native_queued_spin_lock_slowpath   
> N/A
> 4   arch/x86/include/asm/paravirt.h:591     pv_queued_spin_lock_slowpath       
> 588
> 5   arch/x86/include/asm/qspinlock.h:51     queued_spin_lock_slowpath       49
> 6   include/asm-generic/qspinlock.h:114     queued_spin_lock            107
> 7   include/linux/spinlock.h:185            do_raw_spin_lock            182
> 8   include/linux/spinlock_api_smp.h:134        __raw_spin_lock             130
> 9   kernel/locking/spinlock.c:154           _raw_spin_lock              152
> 10  include/linux/spinlock.h:349            spin_lock               347
> 11  mm/vmalloc.c:1805               find_vmap_area              1801
> 12  mm/vmalloc.c:2525               find_vm_area                2521
> 13  mm/vmalloc.c:2639               __vunmap                2628
> 14  mm/vmalloc.c:97                 free_work               91
> 15  kernel/workqueue.c:2289             process_one_work            2181
> 16  kernel/workqueue.c:2436             worker_thread               2378
> 17  kernel/kthread.c:376                kthread                 330
> 18  N/A                     ret_from_fork               N/A
> 
> The functions in the above shown stacktrace hardly change. There is only one
> commit 993d0b287e2ef7bee2e8b13b0ce4d2b5066f278e which introduces changes to
> find_vmap_area() for 5.19.
> 
> With this change in mind we looked for stacktraces which make also use of this
> new commit. And in a different kernel thread we do notice the use of
> check_heap_object():
> 
> #   Source Location             Function Name           Function Line
> 0   arch/x86/include/asm/paravirt.h:704 arch_local_irq_enable       702
> 1   arch/x86/include/asm/irqflags.h:138 arch_local_irq_restore      135
> 2   kernel/sched/sched.h:1330       raw_spin_rq_unlock_irqrestore   1327
> 3   kernel/sched/sched.h:1327       raw_spin_rq_unlock_irqrestore   N/A
> 4   kernel/sched/sched.h:1611       rq_unlock_irqrestore        1607
> 5   kernel/sched/fair.c:8288        update_blocked_averages     8272
> 6   kernel/sched/fair.c:11133       run_rebalance_domains       11115
> 7   kernel/softirq.c:571            __do_softirq            528
> 8   kernel/softirq.c:445            invoke_softirq          433
> 9   kernel/softirq.c:650            __irq_exit_rcu          640
> 10  arch/x86/kernel/apic/apic.c:1106    sysvec_apic_timer_interrupt N/A
> 11  N/A                 asm_sysvec_apic_timer_interrupt N/A
> 12  include/linux/mmzone.h:1403     __nr_to_section         1395
> 13  include/linux/mmzone.h:1488     __pfn_to_section        1486
> 14  include/linux/mmzone.h:1539     pfn_valid           1524
> 15  arch/x86/mm/physaddr.c:65       __virt_addr_valid       47
> 16  mm/usercopy.c:188           check_heap_object       161
> 17  mm/usercopy.c:250           __check_object_size     212
> 18  mm/usercopy.c:212           __check_object_size     N/A
> 19  include/linux/thread_info.h:199     check_object_size       195
> 20  lib/strncpy_from_user.c:137     strncpy_from_user       113
> 21  fs/namei.c:150              getname_flags           129
> 22  fs/namei.c:2896             user_path_at_empty      2893
> 23  include/linux/namei.h:57        user_path_at            54
> 24  fs/open.c:446               do_faccessat            420
> 25  arch/x86/entry/common.c:50      do_syscall_x64          40
> 26  arch/x86/entry/common.c:80      do_syscall_64           73
> 27  N/A                 entry_SYSCALL_64_after_hwframe  N/A
> 
> We are neither experts in the mm subsystem nor can provide a fix, but wanted to
> let you know about our findings.
> 
> Cheers,
>  Florian
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are the assignee for the bug.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug 216489] New: Machine freezes due to memory lock
  2022-09-15 20:39 ` [Bug 216489] New: Machine freezes due to memory lock Andrew Morton
@ 2022-09-15 22:42   ` Matthew Wilcox
  2022-09-15 23:59     ` Yu Zhao
  0 siblings, 1 reply; 10+ messages in thread
From: Matthew Wilcox @ 2022-09-15 22:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: dev, linux-mm, Uladzislau Rezki, bugzilla-daemon, Kees Cook

On Thu, Sep 15, 2022 at 01:39:31PM -0700, Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Wed, 14 Sep 2022 15:07:46 +0000 bugzilla-daemon@kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=216489
> > 
> >             Bug ID: 216489
> >            Summary: Machine freezes due to memory lock
> >            Product: Memory Management
> >            Version: 2.5
> >     Kernel Version: 5.19.8
> >           Hardware: AMD
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: high
> >           Priority: P1
> >          Component: Other
> >           Assignee: akpm@linux-foundation.org
> >           Reporter: dev@der-flo.net
> >         Regression: No
> > 
> > Hi all,
> > With Kernel 5.19.x we noticed system freezes. This happens in virtual
> > environments as well as on real hardware.
> > On a real hardware machine we were able to catch the moment of freeze with
> > continuous profiling.
> 
> Thanks.  I forwarded this to Uladzislau and he offered to help.  He said:
> 
> 
> : I can help with debugging. What i need is reproduce steps. Could you
> : please clarify if it is easy to hit and what kind of profiling triggers it?
> 
> and
> 
> : I do not think that Matthew Wilcox commits destroys it but... I see that
> : __vunmap() is invoked by the free_work() thus a caller is in atomic
> : context including IRQ context.

To keep all of this information together; Florian emailed me off-list
and I replied cc'ing Kees.

I asked to try this patch to decide whether it's the extra load on the
spinlock from examining the vmap tree more often:

diff --git a/mm/usercopy.c b/mm/usercopy.c
index c1ee15a98633..76d2d4fb6d22 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -173,15 +173,6 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
 	}
 
 	if (is_vmalloc_addr(ptr)) {
-		struct vmap_area *area = find_vmap_area(addr);
-
-		if (!area)
-			usercopy_abort("vmalloc", "no area", to_user, 0, n);
-
-		if (n > area->va_end - addr) {
-			offset = addr - area->va_start;
-			usercopy_abort("vmalloc", NULL, to_user, offset, n);
-		}
 		return;
 	}
 

Kees wrote:

} If you can reproduce the hangs, perhaps enable:
} 
} CONFIG_DEBUG_LOCKDEP=y
} CONFIG_DEBUG_ATOMIC_SLEEP=y
} 
} I would expect any hung spinlock to complain very loudly under
} LOCKDEP...

I hope we can keep the remainder of the debugging in this email thread.

> > Specification of the machine where we captured the freeze:
> > Thinkpad T14
> > CPU: AMD Ryzen 7 PRO 4750U
> > Kernel: 5.19.8-200.fc36.x86_64
> > 
> > Stacktrace of kworker/12:3 that is using all resources and causing the freeze:
> > 
> > #   Source Location                 Function Name               Function Line
> > 0   arch/x86/include/asm/vdso/processor.h:13    rep_nop                 11
> > 1   arch/x86/include/asm/vdso/processor.h:18    cpu_relax               16
> > 2   kernel/locking/qspinlock.c:514          native_queued_spin_lock_slowpath   
> > 316
> > 3   kernel/locking/qspinlock.c:316          native_queued_spin_lock_slowpath   
> > N/A
> > 4   arch/x86/include/asm/paravirt.h:591     pv_queued_spin_lock_slowpath       
> > 588
> > 5   arch/x86/include/asm/qspinlock.h:51     queued_spin_lock_slowpath       49
> > 6   include/asm-generic/qspinlock.h:114     queued_spin_lock            107
> > 7   include/linux/spinlock.h:185            do_raw_spin_lock            182
> > 8   include/linux/spinlock_api_smp.h:134        __raw_spin_lock             130
> > 9   kernel/locking/spinlock.c:154           _raw_spin_lock              152
> > 10  include/linux/spinlock.h:349            spin_lock               347
> > 11  mm/vmalloc.c:1805               find_vmap_area              1801
> > 12  mm/vmalloc.c:2525               find_vm_area                2521
> > 13  mm/vmalloc.c:2639               __vunmap                2628
> > 14  mm/vmalloc.c:97                 free_work               91
> > 15  kernel/workqueue.c:2289             process_one_work            2181
> > 16  kernel/workqueue.c:2436             worker_thread               2378
> > 17  kernel/kthread.c:376                kthread                 330
> > 18  N/A                     ret_from_fork               N/A
> > 
> > The functions in the above shown stacktrace hardly change. There is only one
> > commit 993d0b287e2ef7bee2e8b13b0ce4d2b5066f278e which introduces changes to
> > find_vmap_area() for 5.19.
> > 
> > With this change in mind we looked for stacktraces which make also use of this
> > new commit. And in a different kernel thread we do notice the use of
> > check_heap_object():
> > 
> > #   Source Location             Function Name           Function Line
> > 0   arch/x86/include/asm/paravirt.h:704 arch_local_irq_enable       702
> > 1   arch/x86/include/asm/irqflags.h:138 arch_local_irq_restore      135
> > 2   kernel/sched/sched.h:1330       raw_spin_rq_unlock_irqrestore   1327
> > 3   kernel/sched/sched.h:1327       raw_spin_rq_unlock_irqrestore   N/A
> > 4   kernel/sched/sched.h:1611       rq_unlock_irqrestore        1607
> > 5   kernel/sched/fair.c:8288        update_blocked_averages     8272
> > 6   kernel/sched/fair.c:11133       run_rebalance_domains       11115
> > 7   kernel/softirq.c:571            __do_softirq            528
> > 8   kernel/softirq.c:445            invoke_softirq          433
> > 9   kernel/softirq.c:650            __irq_exit_rcu          640
> > 10  arch/x86/kernel/apic/apic.c:1106    sysvec_apic_timer_interrupt N/A
> > 11  N/A                 asm_sysvec_apic_timer_interrupt N/A
> > 12  include/linux/mmzone.h:1403     __nr_to_section         1395
> > 13  include/linux/mmzone.h:1488     __pfn_to_section        1486
> > 14  include/linux/mmzone.h:1539     pfn_valid           1524
> > 15  arch/x86/mm/physaddr.c:65       __virt_addr_valid       47
> > 16  mm/usercopy.c:188           check_heap_object       161
> > 17  mm/usercopy.c:250           __check_object_size     212
> > 18  mm/usercopy.c:212           __check_object_size     N/A
> > 19  include/linux/thread_info.h:199     check_object_size       195
> > 20  lib/strncpy_from_user.c:137     strncpy_from_user       113
> > 21  fs/namei.c:150              getname_flags           129
> > 22  fs/namei.c:2896             user_path_at_empty      2893
> > 23  include/linux/namei.h:57        user_path_at            54
> > 24  fs/open.c:446               do_faccessat            420
> > 25  arch/x86/entry/common.c:50      do_syscall_x64          40
> > 26  arch/x86/entry/common.c:80      do_syscall_64           73
> > 27  N/A                 entry_SYSCALL_64_after_hwframe  N/A
> > 
> > We are neither experts in the mm subsystem nor can provide a fix, but wanted to
> > let you know about our findings.
> > 
> > Cheers,
> >  Florian
> > 
> > -- 
> > You may reply to this email to add a comment.
> > 
> > You are receiving this mail because:
> > You are the assignee for the bug.
> 


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Bug 216489] New: Machine freezes due to memory lock
  2022-09-15 22:42   ` Matthew Wilcox
@ 2022-09-15 23:59     ` Yu Zhao
  2022-09-16  8:38       ` Matthew Wilcox
  0 siblings, 1 reply; 10+ messages in thread
From: Yu Zhao @ 2022-09-15 23:59 UTC (permalink / raw)
  To: Matthew Wilcox, Andrew Morton
  Cc: dev, Linux-MM, Uladzislau Rezki, bugzilla-daemon, Kees Cook

On Thu, Sep 15, 2022 at 4:42 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Sep 15, 2022 at 01:39:31PM -0700, Andrew Morton wrote:
> > (switched to email.  Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> >
> > On Wed, 14 Sep 2022 15:07:46 +0000 bugzilla-daemon@kernel.org wrote:
> >
> > > https://bugzilla.kernel.org/show_bug.cgi?id=216489
> > >
> > >             Bug ID: 216489
> > >            Summary: Machine freezes due to memory lock
> > >            Product: Memory Management
> > >            Version: 2.5
> > >     Kernel Version: 5.19.8
> > >           Hardware: AMD
> > >                 OS: Linux
> > >               Tree: Mainline
> > >             Status: NEW
> > >           Severity: high
> > >           Priority: P1
> > >          Component: Other
> > >           Assignee: akpm@linux-foundation.org
> > >           Reporter: dev@der-flo.net
> > >         Regression: No
> > >
> > > Hi all,
> > > With Kernel 5.19.x we noticed system freezes. This happens in virtual
> > > environments as well as on real hardware.
> > > On a real hardware machine we were able to catch the moment of freeze with
> > > continuous profiling.
> >
> > Thanks.  I forwarded this to Uladzislau and he offered to help.  He said:
> >
> >
> > : I can help with debugging. What i need is reproduce steps. Could you
> > : please clarify if it is easy to hit and what kind of profiling triggers it?
> >
> > and
> >
> > : I do not think that Matthew Wilcox commits destroys it but... I see that
> > : __vunmap() is invoked by the free_work() thus a caller is in atomic
> > : context including IRQ context.
>
> To keep all of this information together; Florian emailed me off-list
> and I replied cc'ing Kees.
>
> I asked to try this patch to decide whether it's the extra load on the
> spinlock from examining the vmap tree more often:
>
> diff --git a/mm/usercopy.c b/mm/usercopy.c
> index c1ee15a98633..76d2d4fb6d22 100644
> --- a/mm/usercopy.c
> +++ b/mm/usercopy.c
> @@ -173,15 +173,6 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
>         }
>
>         if (is_vmalloc_addr(ptr)) {
> -               struct vmap_area *area = find_vmap_area(addr);
> -
> -               if (!area)
> -                       usercopy_abort("vmalloc", "no area", to_user, 0, n);
> -
> -               if (n > area->va_end - addr) {
> -                       offset = addr - area->va_start;
> -                       usercopy_abort("vmalloc", NULL, to_user, offset, n);
> -               }
>                 return;
>         }
>
>
> Kees wrote:
>
> } If you can reproduce the hangs, perhaps enable:
> }
> } CONFIG_DEBUG_LOCKDEP=y
> } CONFIG_DEBUG_ATOMIC_SLEEP=y
> }
> } I would expect any hung spinlock to complain very loudly under
> } LOCKDEP...
>
> I hope we can keep the remainder of the debugging in this email thread.
>
> > > Specification of the machine where we captured the freeze:
> > > Thinkpad T14
> > > CPU: AMD Ryzen 7 PRO 4750U
> > > Kernel: 5.19.8-200.fc36.x86_64
> > >
> > > Stacktrace of kworker/12:3 that is using all resources and causing the freeze:
> > >
> > > #   Source Location                 Function Name               Function Line
> > > 0   arch/x86/include/asm/vdso/processor.h:13    rep_nop                 11
> > > 1   arch/x86/include/asm/vdso/processor.h:18    cpu_relax               16
> > > 2   kernel/locking/qspinlock.c:514          native_queued_spin_lock_slowpath
> > > 316
> > > 3   kernel/locking/qspinlock.c:316          native_queued_spin_lock_slowpath
> > > N/A
> > > 4   arch/x86/include/asm/paravirt.h:591     pv_queued_spin_lock_slowpath
> > > 588
> > > 5   arch/x86/include/asm/qspinlock.h:51     queued_spin_lock_slowpath       49
> > > 6   include/asm-generic/qspinlock.h:114     queued_spin_lock            107
> > > 7   include/linux/spinlock.h:185            do_raw_spin_lock            182
> > > 8   include/linux/spinlock_api_smp.h:134        __raw_spin_lock             130
> > > 9   kernel/locking/spinlock.c:154           _raw_spin_lock              152
> > > 10  include/linux/spinlock.h:349            spin_lock               347
> > > 11  mm/vmalloc.c:1805               find_vmap_area              1801
> > > 12  mm/vmalloc.c:2525               find_vm_area                2521
> > > 13  mm/vmalloc.c:2639               __vunmap                2628
> > > 14  mm/vmalloc.c:97                 free_work               91
> > > 15  kernel/workqueue.c:2289             process_one_work            2181
> > > 16  kernel/workqueue.c:2436             worker_thread               2378
> > > 17  kernel/kthread.c:376                kthread                 330
> > > 18  N/A                     ret_from_fork               N/A
> > >
> > > The functions in the above shown stacktrace hardly change. There is only one
> > > commit 993d0b287e2ef7bee2e8b13b0ce4d2b5066f278e which introduces changes to
> > > find_vmap_area() for 5.19.
> > >
> > > With this change in mind we looked for stacktraces which make also use of this
> > > new commit. And in a different kernel thread we do notice the use of
> > > check_heap_object():
> > >
> > > #   Source Location             Function Name           Function Line
> > > 0   arch/x86/include/asm/paravirt.h:704 arch_local_irq_enable       702
> > > 1   arch/x86/include/asm/irqflags.h:138 arch_local_irq_restore      135
> > > 2   kernel/sched/sched.h:1330       raw_spin_rq_unlock_irqrestore   1327
> > > 3   kernel/sched/sched.h:1327       raw_spin_rq_unlock_irqrestore   N/A
> > > 4   kernel/sched/sched.h:1611       rq_unlock_irqrestore        1607
> > > 5   kernel/sched/fair.c:8288        update_blocked_averages     8272
> > > 6   kernel/sched/fair.c:11133       run_rebalance_domains       11115
> > > 7   kernel/softirq.c:571            __do_softirq            528
> > > 8   kernel/softirq.c:445            invoke_softirq          433
> > > 9   kernel/softirq.c:650            __irq_exit_rcu          640
> > > 10  arch/x86/kernel/apic/apic.c:1106    sysvec_apic_timer_interrupt N/A
> > > 11  N/A                 asm_sysvec_apic_timer_interrupt N/A
> > > 12  include/linux/mmzone.h:1403     __nr_to_section         1395
> > > 13  include/linux/mmzone.h:1488     __pfn_to_section        1486
> > > 14  include/linux/mmzone.h:1539     pfn_valid           1524
> > > 15  arch/x86/mm/physaddr.c:65       __virt_addr_valid       47
> > > 16  mm/usercopy.c:188           check_heap_object       161
> > > 17  mm/usercopy.c:250           __check_object_size     212
> > > 18  mm/usercopy.c:212           __check_object_size     N/A
> > > 19  include/linux/thread_info.h:199     check_object_size       195
> > > 20  lib/strncpy_from_user.c:137     strncpy_from_user       113
> > > 21  fs/namei.c:150              getname_flags           129
> > > 22  fs/namei.c:2896             user_path_at_empty      2893
> > > 23  include/linux/namei.h:57        user_path_at            54
> > > 24  fs/open.c:446               do_faccessat            420
> > > 25  arch/x86/entry/common.c:50      do_syscall_x64          40
> > > 26  arch/x86/entry/common.c:80      do_syscall_64           73
> > > 27  N/A                 entry_SYSCALL_64_after_hwframe  N/A
> > >
> > > We are neither experts in the mm subsystem nor can provide a fix, but wanted to
> > > let you know about our findings.
> > >
> > > Cheers,
> > >  Florian
> > >
> > > --
> > > You may reply to this email to add a comment.
> > >
> > > You are receiving this mail because:
> > > You are the assignee for the bug.

I think this is a manifest of the lockdep warning I reported a couple
of weeks ago:
https://lore.kernel.org/r/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug 216489] New: Machine freezes due to memory lock
  2022-09-15 23:59     ` Yu Zhao
@ 2022-09-16  8:38       ` Matthew Wilcox
  2022-09-16  9:46         ` Kees Cook
  0 siblings, 1 reply; 10+ messages in thread
From: Matthew Wilcox @ 2022-09-16  8:38 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Andrew Morton, dev, Linux-MM, Uladzislau Rezki, bugzilla-daemon,
	Kees Cook

On Thu, Sep 15, 2022 at 05:59:56PM -0600, Yu Zhao wrote:
> I think this is a manifest of the lockdep warning I reported a couple
> of weeks ago:
> https://lore.kernel.org/r/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com/

That would certainly match the symptoms.

Turning vmap_lock into an NMI-safe lock would be bad.  I don't even know
if we have primitives for that (it's not like you can disable an NMI
...)

I don't quite have time to write a patch right now.  Perhaps something
like:

struct vmap_area *find_vmap_area_nmi(unsigned long addr)
{
        struct vmap_area *va;

        if (spin_trylock(&vmap_area_lock))
		return NULL;
        va = __find_vmap_area(addr, &vmap_area_root);
        spin_unlock(&vmap_area_lock);

        return va;
}

and then call find_vmap_area_nmi() in check_heap_object().  I may have
the polarity of the return value of spin_trylock() incorrect.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug 216489] New: Machine freezes due to memory lock
  2022-09-16  8:38       ` Matthew Wilcox
@ 2022-09-16  9:46         ` Kees Cook
  2022-09-16 12:28           ` Uladzislau Rezki
  2022-09-16 14:15           ` Matthew Wilcox
  0 siblings, 2 replies; 10+ messages in thread
From: Kees Cook @ 2022-09-16  9:46 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Yu Zhao, Andrew Morton, dev, Linux-MM, Uladzislau Rezki, bugzilla-daemon

On Fri, Sep 16, 2022 at 09:38:33AM +0100, Matthew Wilcox wrote:
> On Thu, Sep 15, 2022 at 05:59:56PM -0600, Yu Zhao wrote:
> > I think this is a manifest of the lockdep warning I reported a couple
> > of weeks ago:
> > https://lore.kernel.org/r/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com/
> 
> That would certainly match the symptoms.
> 
> Turning vmap_lock into an NMI-safe lock would be bad.  I don't even know
> if we have primitives for that (it's not like you can disable an NMI
> ...)
> 
> I don't quite have time to write a patch right now.  Perhaps something
> like:
> 
> struct vmap_area *find_vmap_area_nmi(unsigned long addr)
> {
>         struct vmap_area *va;
> 
>         if (spin_trylock(&vmap_area_lock))
> 		return NULL;
>         va = __find_vmap_area(addr, &vmap_area_root);
>         spin_unlock(&vmap_area_lock);
> 
>         return va;
> }
> 
> and then call find_vmap_area_nmi() in check_heap_object().  I may have
> the polarity of the return value of spin_trylock() incorrect.

I think we'll need something slightly tweaked, since this would
return NULL under any contention (and a NULL return is fatal in
check_heap_object()). It seems like we need to explicitly check
for being in nmi context in check_heap_object() to deal with it?
Like this (only build tested):


diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 096d48aa3437..c8a00f181a11 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -216,6 +216,7 @@ void free_vm_area(struct vm_struct *area);
 extern struct vm_struct *remove_vm_area(const void *addr);
 extern struct vm_struct *find_vm_area(const void *addr);
 struct vmap_area *find_vmap_area(unsigned long addr);
+struct vmap_area *find_vmap_area_try(unsigned long addr);
 
 static inline bool is_vm_area_hugepages(const void *addr)
 {
diff --git a/mm/usercopy.c b/mm/usercopy.c
index c1ee15a98633..9f943c29e7ec 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -173,7 +173,16 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
 	}
 
 	if (is_vmalloc_addr(ptr)) {
-		struct vmap_area *area = find_vmap_area(addr);
+		struct vmap_area *area;
+
+		if (!in_nmi()) {
+			area = find_vmap_area(addr);
+		} else {
+			area = find_vmap_area_try(addr);
+			/* Give up under NMI to avoid deadlocks. */
+			if (!area)
+				return;
+		}
 
 		if (!area)
 			usercopy_abort("vmalloc", "no area", to_user, 0, n);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index dd6cdb201195..f14f1902c2f6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1840,6 +1840,17 @@ struct vmap_area *find_vmap_area(unsigned long addr)
 	return va;
 }
 
+struct vmap_area *find_vmap_area_try(unsigned long addr)
+{
+	struct vmap_area *va = NULL;
+
+	if (spin_trylock(&vmap_area_lock)) {
+		va = __find_vmap_area(addr, &vmap_area_root);
+		spin_unlock(&vmap_area_lock);
+	}
+	return va;
+}
+
 /*** Per cpu kva allocator ***/
 
 /*

-- 
Kees Cook


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Bug 216489] New: Machine freezes due to memory lock
  2022-09-16  9:46         ` Kees Cook
@ 2022-09-16 12:28           ` Uladzislau Rezki
  2022-09-16 12:32             ` Uladzislau Rezki
  2022-09-16 14:15           ` Matthew Wilcox
  1 sibling, 1 reply; 10+ messages in thread
From: Uladzislau Rezki @ 2022-09-16 12:28 UTC (permalink / raw)
  To: Kees Cook, Matthew Wilcox, Yu Zhao, Andrew Morton
  Cc: Matthew Wilcox, Yu Zhao, Andrew Morton, dev, Linux-MM,
	Uladzislau Rezki, bugzilla-daemon

On Fri, Sep 16, 2022 at 02:46:39AM -0700, Kees Cook wrote:
> On Fri, Sep 16, 2022 at 09:38:33AM +0100, Matthew Wilcox wrote:
> > On Thu, Sep 15, 2022 at 05:59:56PM -0600, Yu Zhao wrote:
> > > I think this is a manifest of the lockdep warning I reported a couple
> > > of weeks ago:
> > > https://lore.kernel.org/r/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com/
> > 
> > That would certainly match the symptoms.
> > 
> > Turning vmap_lock into an NMI-safe lock would be bad.  I don't even know
> > if we have primitives for that (it's not like you can disable an NMI
> > ...)
> > 
> > I don't quite have time to write a patch right now.  Perhaps something
> > like:
> > 
> > struct vmap_area *find_vmap_area_nmi(unsigned long addr)
> > {
> >         struct vmap_area *va;
> > 
> >         if (spin_trylock(&vmap_area_lock))
> > 		return NULL;
> >         va = __find_vmap_area(addr, &vmap_area_root);
> >         spin_unlock(&vmap_area_lock);
> > 
> >         return va;
> > }
> > 
> > and then call find_vmap_area_nmi() in check_heap_object().  I may have
> > the polarity of the return value of spin_trylock() incorrect.
> 
> I think we'll need something slightly tweaked, since this would
> return NULL under any contention (and a NULL return is fatal in
> check_heap_object()). It seems like we need to explicitly check
> for being in nmi context in check_heap_object() to deal with it?
> Like this (only build tested):
> 
> 
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 096d48aa3437..c8a00f181a11 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -216,6 +216,7 @@ void free_vm_area(struct vm_struct *area);
>  extern struct vm_struct *remove_vm_area(const void *addr);
>  extern struct vm_struct *find_vm_area(const void *addr);
>  struct vmap_area *find_vmap_area(unsigned long addr);
> +struct vmap_area *find_vmap_area_try(unsigned long addr);
>  
>  static inline bool is_vm_area_hugepages(const void *addr)
>  {
> diff --git a/mm/usercopy.c b/mm/usercopy.c
> index c1ee15a98633..9f943c29e7ec 100644
> --- a/mm/usercopy.c
> +++ b/mm/usercopy.c
> @@ -173,7 +173,16 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
>  	}
>  
>  	if (is_vmalloc_addr(ptr)) {
> -		struct vmap_area *area = find_vmap_area(addr);
> +		struct vmap_area *area;
> +
> +		if (!in_nmi()) {
> +			area = find_vmap_area(addr);
> +		} else {
> +			area = find_vmap_area_try(addr);
> +			/* Give up under NMI to avoid deadlocks. */
> +			if (!area)
> +				return;
> +		}
>  
>  		if (!area)
>  			usercopy_abort("vmalloc", "no area", to_user, 0, n);
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index dd6cdb201195..f14f1902c2f6 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1840,6 +1840,17 @@ struct vmap_area *find_vmap_area(unsigned long addr)
>  	return va;
>  }
>  
> +struct vmap_area *find_vmap_area_try(unsigned long addr)
> +{
> +	struct vmap_area *va = NULL;
> +
> +	if (spin_trylock(&vmap_area_lock)) {
> +		va = __find_vmap_area(addr, &vmap_area_root);
> +		spin_unlock(&vmap_area_lock);
> +	}
> +	return va;
> +}
> +
>  /*** Per cpu kva allocator ***/
>  
>  /*
> 
OK. The problem is about using find_vmap_area() from the IRQ context. Indeed
it can be dead-locked. It is not supposed to be used there. But if you want 
then we should have a helper.

Please note that it might be a regular IRQ also so it is not limited to NMI
context only, because somebody could decide later to use it from a regular
IRQ.

IMHO, it makes sense to make use of in_interrupt() helper instead so we
cover here a hw-IRQ context including NMI one. It also would be aligned
with deferred vfreeing:

<snip>
tatic void __vfree(const void *addr)
{
	if (unlikely(in_interrupt()))
		__vfree_deferred(addr);
	else
		__vunmap(addr, 1);
}
<snip>

so we handle here not only NMI scenario. I think we should align.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug 216489] New: Machine freezes due to memory lock
  2022-09-16 12:28           ` Uladzislau Rezki
@ 2022-09-16 12:32             ` Uladzislau Rezki
  0 siblings, 0 replies; 10+ messages in thread
From: Uladzislau Rezki @ 2022-09-16 12:32 UTC (permalink / raw)
  To: Kees Cook, Matthew Wilcox, Yu Zhao, Andrew Morton
  Cc: Kees Cook, Matthew Wilcox, Yu Zhao, Andrew Morton, dev, Linux-MM,
	bugzilla-daemon

On Fri, Sep 16, 2022 at 02:28:08PM +0200, Uladzislau Rezki wrote:
> On Fri, Sep 16, 2022 at 02:46:39AM -0700, Kees Cook wrote:
> > On Fri, Sep 16, 2022 at 09:38:33AM +0100, Matthew Wilcox wrote:
> > > On Thu, Sep 15, 2022 at 05:59:56PM -0600, Yu Zhao wrote:
> > > > I think this is a manifest of the lockdep warning I reported a couple
> > > > of weeks ago:
> > > > https://lore.kernel.org/r/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com/
> > > 
> > > That would certainly match the symptoms.
> > > 
> > > Turning vmap_lock into an NMI-safe lock would be bad.  I don't even know
> > > if we have primitives for that (it's not like you can disable an NMI
> > > ...)
> > > 
> > > I don't quite have time to write a patch right now.  Perhaps something
> > > like:
> > > 
> > > struct vmap_area *find_vmap_area_nmi(unsigned long addr)
> > > {
> > >         struct vmap_area *va;
> > > 
> > >         if (spin_trylock(&vmap_area_lock))
> > > 		return NULL;
> > >         va = __find_vmap_area(addr, &vmap_area_root);
> > >         spin_unlock(&vmap_area_lock);
> > > 
> > >         return va;
> > > }
> > > 
> > > and then call find_vmap_area_nmi() in check_heap_object().  I may have
> > > the polarity of the return value of spin_trylock() incorrect.
> > 
> > I think we'll need something slightly tweaked, since this would
> > return NULL under any contention (and a NULL return is fatal in
> > check_heap_object()). It seems like we need to explicitly check
> > for being in nmi context in check_heap_object() to deal with it?
> > Like this (only build tested):
> > 
> > 
> > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> > index 096d48aa3437..c8a00f181a11 100644
> > --- a/include/linux/vmalloc.h
> > +++ b/include/linux/vmalloc.h
> > @@ -216,6 +216,7 @@ void free_vm_area(struct vm_struct *area);
> >  extern struct vm_struct *remove_vm_area(const void *addr);
> >  extern struct vm_struct *find_vm_area(const void *addr);
> >  struct vmap_area *find_vmap_area(unsigned long addr);
> > +struct vmap_area *find_vmap_area_try(unsigned long addr);
> >  
> >  static inline bool is_vm_area_hugepages(const void *addr)
> >  {
> > diff --git a/mm/usercopy.c b/mm/usercopy.c
> > index c1ee15a98633..9f943c29e7ec 100644
> > --- a/mm/usercopy.c
> > +++ b/mm/usercopy.c
> > @@ -173,7 +173,16 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
> >  	}
> >  
> >  	if (is_vmalloc_addr(ptr)) {
> > -		struct vmap_area *area = find_vmap_area(addr);
> > +		struct vmap_area *area;
> > +
> > +		if (!in_nmi()) {
> > +			area = find_vmap_area(addr);
> > +		} else {
> > +			area = find_vmap_area_try(addr);
> > +			/* Give up under NMI to avoid deadlocks. */
> > +			if (!area)
> > +				return;
> > +		}
> >  
> >  		if (!area)
> >  			usercopy_abort("vmalloc", "no area", to_user, 0, n);
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index dd6cdb201195..f14f1902c2f6 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -1840,6 +1840,17 @@ struct vmap_area *find_vmap_area(unsigned long addr)
> >  	return va;
> >  }
> >  
> > +struct vmap_area *find_vmap_area_try(unsigned long addr)
> > +{
> > +	struct vmap_area *va = NULL;
> > +
> > +	if (spin_trylock(&vmap_area_lock)) {
> > +		va = __find_vmap_area(addr, &vmap_area_root);
> > +		spin_unlock(&vmap_area_lock);
> > +	}
> > +	return va;
> > +}
> > +
> >  /*** Per cpu kva allocator ***/
> >  
> >  /*
> > 
> OK. The problem is about using find_vmap_area() from the IRQ context. Indeed
> it can be dead-locked. It is not supposed to be used there. But if you want 
> then we should have a helper.
> 
> Please note that it might be a regular IRQ also so it is not limited to NMI
> context only, because somebody could decide later to use it from a regular
> IRQ.
> 
> IMHO, it makes sense to make use of in_interrupt() helper instead so we
> cover here a hw-IRQ context including NMI one. It also would be aligned
> with deferred vfreeing:
> 
> <snip>
> tatic void __vfree(const void *addr)
> {
> 	if (unlikely(in_interrupt()))
> 		__vfree_deferred(addr);
> 	else
> 		__vunmap(addr, 1);
> }
> <snip>
> 
> so we handle here not only NMI scenario. I think we should align.
> 
Another thing that i should mention is, using sleepable locks(it is for
PREEMPT_RT) is not allowed in any atomic. So for PREEMPT_RT point of view
it is broken.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug 216489] New: Machine freezes due to memory lock
  2022-09-16  9:46         ` Kees Cook
  2022-09-16 12:28           ` Uladzislau Rezki
@ 2022-09-16 14:15           ` Matthew Wilcox
  2022-09-16 14:42             ` Kees Cook
  2022-09-16 18:47             ` Uladzislau Rezki
  1 sibling, 2 replies; 10+ messages in thread
From: Matthew Wilcox @ 2022-09-16 14:15 UTC (permalink / raw)
  To: Kees Cook
  Cc: Yu Zhao, Andrew Morton, dev, Linux-MM, Uladzislau Rezki, bugzilla-daemon

On Fri, Sep 16, 2022 at 02:46:39AM -0700, Kees Cook wrote:
> On Fri, Sep 16, 2022 at 09:38:33AM +0100, Matthew Wilcox wrote:
> > On Thu, Sep 15, 2022 at 05:59:56PM -0600, Yu Zhao wrote:
> > > I think this is a manifest of the lockdep warning I reported a couple
> > > of weeks ago:
> > > https://lore.kernel.org/r/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com/
> > 
> > That would certainly match the symptoms.
> > 
> > Turning vmap_lock into an NMI-safe lock would be bad.  I don't even know
> > if we have primitives for that (it's not like you can disable an NMI
> > ...)
> > 
> > I don't quite have time to write a patch right now.  Perhaps something
> > like:
> > 
> > struct vmap_area *find_vmap_area_nmi(unsigned long addr)
> > {
> >         struct vmap_area *va;
> > 
> >         if (spin_trylock(&vmap_area_lock))
> > 		return NULL;
> >         va = __find_vmap_area(addr, &vmap_area_root);
> >         spin_unlock(&vmap_area_lock);
> > 
> >         return va;
> > }
> > 
> > and then call find_vmap_area_nmi() in check_heap_object().  I may have
> > the polarity of the return value of spin_trylock() incorrect.
> 
> I think we'll need something slightly tweaked, since this would
> return NULL under any contention (and a NULL return is fatal in
> check_heap_object()). It seems like we need to explicitly check
> for being in nmi context in check_heap_object() to deal with it?
> Like this (only build tested):

Right, and Ulad is right about it beig callable from any context.  I think
the longterm solution is to make the vmap_area_root tree walkable under
RCU protection.

For now, let's have a distinct return code (ERR_PTR(-EAGAIN), perhaps?) to
indicate that we've hit contention.  It generally won't matter if we
hit it in process context because hardening doesn't have to be 100%
reliable to be useful.

Erm ... so what prevents this race:

CPU 0					CPU 1
copy_to_user()
check_heap_object()
area = find_vmap_area(addr)
					__purge_vmap_area_lazy()
					merge_or_add_vmap_area_augment()
					__merge_or_add_vmap_area()
					kmem_cache_free(vmap_area_cachep, va);
if (n > area->va_end - addr) {

Yes, it's a race in the code that allocated this memory; they're
simultaneously calling copy_to_user() and __vunmap().  We'll catch
this bad behaviour sooner rather than later, but sometimes in trying to
catch this bug, we'll get caught by the bug and go splat.

I don't know that we need to go through heroics to be sure we don't
get caught by this bug.  It already has to run a workqueue to do the
freeing.  We could delay it even further with RCU or something, but
we're only trading off one kind of badness for another.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug 216489] New: Machine freezes due to memory lock
  2022-09-16 14:15           ` Matthew Wilcox
@ 2022-09-16 14:42             ` Kees Cook
  2022-09-16 18:47             ` Uladzislau Rezki
  1 sibling, 0 replies; 10+ messages in thread
From: Kees Cook @ 2022-09-16 14:42 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Yu Zhao, Andrew Morton, dev, Linux-MM, Uladzislau Rezki, bugzilla-daemon

On Fri, Sep 16, 2022 at 03:15:05PM +0100, Matthew Wilcox wrote:
> Right, and Ulad is right about it beig callable from any context.  I think
> the longterm solution is to make the vmap_area_root tree walkable under
> RCU protection.

Agreed, I've updated my proposed patch.

> For now, let's have a distinct return code (ERR_PTR(-EAGAIN), perhaps?) to
> indicate that we've hit contention.  It generally won't matter if we
> hit it in process context because hardening doesn't have to be 100%
> reliable to be useful.

Right, as I note in the series[1], hardening shouldn't be getting called
_at all_ in this path. :P

-Kees

[1] https://lore.kernel.org/linux-hardening/20220916135953.1320601-1-keescook@chromium.org/

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug 216489] New: Machine freezes due to memory lock
  2022-09-16 14:15           ` Matthew Wilcox
  2022-09-16 14:42             ` Kees Cook
@ 2022-09-16 18:47             ` Uladzislau Rezki
  1 sibling, 0 replies; 10+ messages in thread
From: Uladzislau Rezki @ 2022-09-16 18:47 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Kees Cook, Yu Zhao, Andrew Morton, dev, Linux-MM,
	Uladzislau Rezki, bugzilla-daemon

On Fri, Sep 16, 2022 at 03:15:05PM +0100, Matthew Wilcox wrote:
> On Fri, Sep 16, 2022 at 02:46:39AM -0700, Kees Cook wrote:
> > On Fri, Sep 16, 2022 at 09:38:33AM +0100, Matthew Wilcox wrote:
> > > On Thu, Sep 15, 2022 at 05:59:56PM -0600, Yu Zhao wrote:
> > > > I think this is a manifest of the lockdep warning I reported a couple
> > > > of weeks ago:
> > > > https://lore.kernel.org/r/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com/
> > > 
> > > That would certainly match the symptoms.
> > > 
> > > Turning vmap_lock into an NMI-safe lock would be bad.  I don't even know
> > > if we have primitives for that (it's not like you can disable an NMI
> > > ...)
> > > 
> > > I don't quite have time to write a patch right now.  Perhaps something
> > > like:
> > > 
> > > struct vmap_area *find_vmap_area_nmi(unsigned long addr)
> > > {
> > >         struct vmap_area *va;
> > > 
> > >         if (spin_trylock(&vmap_area_lock))
> > > 		return NULL;
> > >         va = __find_vmap_area(addr, &vmap_area_root);
> > >         spin_unlock(&vmap_area_lock);
> > > 
> > >         return va;
> > > }
> > > 
> > > and then call find_vmap_area_nmi() in check_heap_object().  I may have
> > > the polarity of the return value of spin_trylock() incorrect.
> > 
> > I think we'll need something slightly tweaked, since this would
> > return NULL under any contention (and a NULL return is fatal in
> > check_heap_object()). It seems like we need to explicitly check
> > for being in nmi context in check_heap_object() to deal with it?
> > Like this (only build tested):
> 
> Right, and Ulad is right about it beig callable from any context.  I think
> the longterm solution is to make the vmap_area_root tree walkable under
> RCU protection.
> 
> For now, let's have a distinct return code (ERR_PTR(-EAGAIN), perhaps?) to
> indicate that we've hit contention.  It generally won't matter if we
> hit it in process context because hardening doesn't have to be 100%
> reliable to be useful.
> 
> Erm ... so what prevents this race:
> 
> CPU 0					CPU 1
> copy_to_user()
> check_heap_object()
> area = find_vmap_area(addr)
> 					__purge_vmap_area_lazy()
> 					merge_or_add_vmap_area_augment()
> 					__merge_or_add_vmap_area()
> 					kmem_cache_free(vmap_area_cachep, va);
>
Sounds like it can happen. I think it is a good argument to switch to
the RCU usage here for safe access to va after the lock is released.
So i can think about it and put it as task to my todo list. Since it
is not urgent so far it is OK to wait for a splat. But it might never
happens :)

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-09-16 18:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-216489-27@https.bugzilla.kernel.org/>
2022-09-15 20:39 ` [Bug 216489] New: Machine freezes due to memory lock Andrew Morton
2022-09-15 22:42   ` Matthew Wilcox
2022-09-15 23:59     ` Yu Zhao
2022-09-16  8:38       ` Matthew Wilcox
2022-09-16  9:46         ` Kees Cook
2022-09-16 12:28           ` Uladzislau Rezki
2022-09-16 12:32             ` Uladzislau Rezki
2022-09-16 14:15           ` Matthew Wilcox
2022-09-16 14:42             ` Kees Cook
2022-09-16 18:47             ` Uladzislau Rezki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.