All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: dev@der-flo.net, linux-mm@kvack.org,
	Uladzislau Rezki <urezki@gmail.com>,
	bugzilla-daemon@kernel.org, Kees Cook <keescook@chromium.org>
Subject: Re: [Bug 216489] New: Machine freezes due to memory lock
Date: Thu, 15 Sep 2022 23:42:17 +0100	[thread overview]
Message-ID: <YyOqSWAmAFxx8RCt@casper.infradead.org> (raw)
In-Reply-To: <20220915133931.ee0a6c8a86c59a144828eb60@linux-foundation.org>

On Thu, Sep 15, 2022 at 01:39:31PM -0700, Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Wed, 14 Sep 2022 15:07:46 +0000 bugzilla-daemon@kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=216489
> > 
> >             Bug ID: 216489
> >            Summary: Machine freezes due to memory lock
> >            Product: Memory Management
> >            Version: 2.5
> >     Kernel Version: 5.19.8
> >           Hardware: AMD
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: high
> >           Priority: P1
> >          Component: Other
> >           Assignee: akpm@linux-foundation.org
> >           Reporter: dev@der-flo.net
> >         Regression: No
> > 
> > Hi all,
> > With Kernel 5.19.x we noticed system freezes. This happens in virtual
> > environments as well as on real hardware.
> > On a real hardware machine we were able to catch the moment of freeze with
> > continuous profiling.
> 
> Thanks.  I forwarded this to Uladzislau and he offered to help.  He said:
> 
> 
> : I can help with debugging. What i need is reproduce steps. Could you
> : please clarify if it is easy to hit and what kind of profiling triggers it?
> 
> and
> 
> : I do not think that Matthew Wilcox commits destroys it but... I see that
> : __vunmap() is invoked by the free_work() thus a caller is in atomic
> : context including IRQ context.

To keep all of this information together; Florian emailed me off-list
and I replied cc'ing Kees.

I asked to try this patch to decide whether it's the extra load on the
spinlock from examining the vmap tree more often:

diff --git a/mm/usercopy.c b/mm/usercopy.c
index c1ee15a98633..76d2d4fb6d22 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -173,15 +173,6 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
 	}
 
 	if (is_vmalloc_addr(ptr)) {
-		struct vmap_area *area = find_vmap_area(addr);
-
-		if (!area)
-			usercopy_abort("vmalloc", "no area", to_user, 0, n);
-
-		if (n > area->va_end - addr) {
-			offset = addr - area->va_start;
-			usercopy_abort("vmalloc", NULL, to_user, offset, n);
-		}
 		return;
 	}
 

Kees wrote:

} If you can reproduce the hangs, perhaps enable:
} 
} CONFIG_DEBUG_LOCKDEP=y
} CONFIG_DEBUG_ATOMIC_SLEEP=y
} 
} I would expect any hung spinlock to complain very loudly under
} LOCKDEP...

I hope we can keep the remainder of the debugging in this email thread.

> > Specification of the machine where we captured the freeze:
> > Thinkpad T14
> > CPU: AMD Ryzen 7 PRO 4750U
> > Kernel: 5.19.8-200.fc36.x86_64
> > 
> > Stacktrace of kworker/12:3 that is using all resources and causing the freeze:
> > 
> > #   Source Location                 Function Name               Function Line
> > 0   arch/x86/include/asm/vdso/processor.h:13    rep_nop                 11
> > 1   arch/x86/include/asm/vdso/processor.h:18    cpu_relax               16
> > 2   kernel/locking/qspinlock.c:514          native_queued_spin_lock_slowpath   
> > 316
> > 3   kernel/locking/qspinlock.c:316          native_queued_spin_lock_slowpath   
> > N/A
> > 4   arch/x86/include/asm/paravirt.h:591     pv_queued_spin_lock_slowpath       
> > 588
> > 5   arch/x86/include/asm/qspinlock.h:51     queued_spin_lock_slowpath       49
> > 6   include/asm-generic/qspinlock.h:114     queued_spin_lock            107
> > 7   include/linux/spinlock.h:185            do_raw_spin_lock            182
> > 8   include/linux/spinlock_api_smp.h:134        __raw_spin_lock             130
> > 9   kernel/locking/spinlock.c:154           _raw_spin_lock              152
> > 10  include/linux/spinlock.h:349            spin_lock               347
> > 11  mm/vmalloc.c:1805               find_vmap_area              1801
> > 12  mm/vmalloc.c:2525               find_vm_area                2521
> > 13  mm/vmalloc.c:2639               __vunmap                2628
> > 14  mm/vmalloc.c:97                 free_work               91
> > 15  kernel/workqueue.c:2289             process_one_work            2181
> > 16  kernel/workqueue.c:2436             worker_thread               2378
> > 17  kernel/kthread.c:376                kthread                 330
> > 18  N/A                     ret_from_fork               N/A
> > 
> > The functions in the above shown stacktrace hardly change. There is only one
> > commit 993d0b287e2ef7bee2e8b13b0ce4d2b5066f278e which introduces changes to
> > find_vmap_area() for 5.19.
> > 
> > With this change in mind we looked for stacktraces which make also use of this
> > new commit. And in a different kernel thread we do notice the use of
> > check_heap_object():
> > 
> > #   Source Location             Function Name           Function Line
> > 0   arch/x86/include/asm/paravirt.h:704 arch_local_irq_enable       702
> > 1   arch/x86/include/asm/irqflags.h:138 arch_local_irq_restore      135
> > 2   kernel/sched/sched.h:1330       raw_spin_rq_unlock_irqrestore   1327
> > 3   kernel/sched/sched.h:1327       raw_spin_rq_unlock_irqrestore   N/A
> > 4   kernel/sched/sched.h:1611       rq_unlock_irqrestore        1607
> > 5   kernel/sched/fair.c:8288        update_blocked_averages     8272
> > 6   kernel/sched/fair.c:11133       run_rebalance_domains       11115
> > 7   kernel/softirq.c:571            __do_softirq            528
> > 8   kernel/softirq.c:445            invoke_softirq          433
> > 9   kernel/softirq.c:650            __irq_exit_rcu          640
> > 10  arch/x86/kernel/apic/apic.c:1106    sysvec_apic_timer_interrupt N/A
> > 11  N/A                 asm_sysvec_apic_timer_interrupt N/A
> > 12  include/linux/mmzone.h:1403     __nr_to_section         1395
> > 13  include/linux/mmzone.h:1488     __pfn_to_section        1486
> > 14  include/linux/mmzone.h:1539     pfn_valid           1524
> > 15  arch/x86/mm/physaddr.c:65       __virt_addr_valid       47
> > 16  mm/usercopy.c:188           check_heap_object       161
> > 17  mm/usercopy.c:250           __check_object_size     212
> > 18  mm/usercopy.c:212           __check_object_size     N/A
> > 19  include/linux/thread_info.h:199     check_object_size       195
> > 20  lib/strncpy_from_user.c:137     strncpy_from_user       113
> > 21  fs/namei.c:150              getname_flags           129
> > 22  fs/namei.c:2896             user_path_at_empty      2893
> > 23  include/linux/namei.h:57        user_path_at            54
> > 24  fs/open.c:446               do_faccessat            420
> > 25  arch/x86/entry/common.c:50      do_syscall_x64          40
> > 26  arch/x86/entry/common.c:80      do_syscall_64           73
> > 27  N/A                 entry_SYSCALL_64_after_hwframe  N/A
> > 
> > We are neither experts in the mm subsystem nor can provide a fix, but wanted to
> > let you know about our findings.
> > 
> > Cheers,
> >  Florian
> > 
> > -- 
> > You may reply to this email to add a comment.
> > 
> > You are receiving this mail because:
> > You are the assignee for the bug.
> 


  reply	other threads:[~2022-09-15 22:42 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bug-216489-27@https.bugzilla.kernel.org/>
2022-09-15 20:39 ` [Bug 216489] New: Machine freezes due to memory lock Andrew Morton
2022-09-15 22:42   ` Matthew Wilcox [this message]
2022-09-15 23:59     ` Yu Zhao
2022-09-16  8:38       ` Matthew Wilcox
2022-09-16  9:46         ` Kees Cook
2022-09-16 12:28           ` Uladzislau Rezki
2022-09-16 12:32             ` Uladzislau Rezki
2022-09-16 14:15           ` Matthew Wilcox
2022-09-16 14:42             ` Kees Cook
2022-09-16 18:47             ` Uladzislau Rezki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YyOqSWAmAFxx8RCt@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=bugzilla-daemon@kernel.org \
    --cc=dev@der-flo.net \
    --cc=keescook@chromium.org \
    --cc=linux-mm@kvack.org \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.