From: Matthew Wilcox <willy@infradead.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: dev@der-flo.net, linux-mm@kvack.org,
Uladzislau Rezki <urezki@gmail.com>,
bugzilla-daemon@kernel.org, Kees Cook <keescook@chromium.org>
Subject: Re: [Bug 216489] New: Machine freezes due to memory lock
Date: Thu, 15 Sep 2022 23:42:17 +0100 [thread overview]
Message-ID: <YyOqSWAmAFxx8RCt@casper.infradead.org> (raw)
In-Reply-To: <20220915133931.ee0a6c8a86c59a144828eb60@linux-foundation.org>
On Thu, Sep 15, 2022 at 01:39:31PM -0700, Andrew Morton wrote:
> (switched to email. Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Wed, 14 Sep 2022 15:07:46 +0000 bugzilla-daemon@kernel.org wrote:
>
> > https://bugzilla.kernel.org/show_bug.cgi?id=216489
> >
> > Bug ID: 216489
> > Summary: Machine freezes due to memory lock
> > Product: Memory Management
> > Version: 2.5
> > Kernel Version: 5.19.8
> > Hardware: AMD
> > OS: Linux
> > Tree: Mainline
> > Status: NEW
> > Severity: high
> > Priority: P1
> > Component: Other
> > Assignee: akpm@linux-foundation.org
> > Reporter: dev@der-flo.net
> > Regression: No
> >
> > Hi all,
> > With Kernel 5.19.x we noticed system freezes. This happens in virtual
> > environments as well as on real hardware.
> > On a real hardware machine we were able to catch the moment of freeze with
> > continuous profiling.
>
> Thanks. I forwarded this to Uladzislau and he offered to help. He said:
>
>
> : I can help with debugging. What i need is reproduce steps. Could you
> : please clarify if it is easy to hit and what kind of profiling triggers it?
>
> and
>
> : I do not think that Matthew Wilcox commits destroys it but... I see that
> : __vunmap() is invoked by the free_work() thus a caller is in atomic
> : context including IRQ context.
To keep all of this information together; Florian emailed me off-list
and I replied cc'ing Kees.
I asked to try this patch to decide whether it's the extra load on the
spinlock from examining the vmap tree more often:
diff --git a/mm/usercopy.c b/mm/usercopy.c
index c1ee15a98633..76d2d4fb6d22 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -173,15 +173,6 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
}
if (is_vmalloc_addr(ptr)) {
- struct vmap_area *area = find_vmap_area(addr);
-
- if (!area)
- usercopy_abort("vmalloc", "no area", to_user, 0, n);
-
- if (n > area->va_end - addr) {
- offset = addr - area->va_start;
- usercopy_abort("vmalloc", NULL, to_user, offset, n);
- }
return;
}
Kees wrote:
} If you can reproduce the hangs, perhaps enable:
}
} CONFIG_DEBUG_LOCKDEP=y
} CONFIG_DEBUG_ATOMIC_SLEEP=y
}
} I would expect any hung spinlock to complain very loudly under
} LOCKDEP...
I hope we can keep the remainder of the debugging in this email thread.
> > Specification of the machine where we captured the freeze:
> > Thinkpad T14
> > CPU: AMD Ryzen 7 PRO 4750U
> > Kernel: 5.19.8-200.fc36.x86_64
> >
> > Stacktrace of kworker/12:3 that is using all resources and causing the freeze:
> >
> > # Source Location Function Name Function Line
> > 0 arch/x86/include/asm/vdso/processor.h:13 rep_nop 11
> > 1 arch/x86/include/asm/vdso/processor.h:18 cpu_relax 16
> > 2 kernel/locking/qspinlock.c:514 native_queued_spin_lock_slowpath
> > 316
> > 3 kernel/locking/qspinlock.c:316 native_queued_spin_lock_slowpath
> > N/A
> > 4 arch/x86/include/asm/paravirt.h:591 pv_queued_spin_lock_slowpath
> > 588
> > 5 arch/x86/include/asm/qspinlock.h:51 queued_spin_lock_slowpath 49
> > 6 include/asm-generic/qspinlock.h:114 queued_spin_lock 107
> > 7 include/linux/spinlock.h:185 do_raw_spin_lock 182
> > 8 include/linux/spinlock_api_smp.h:134 __raw_spin_lock 130
> > 9 kernel/locking/spinlock.c:154 _raw_spin_lock 152
> > 10 include/linux/spinlock.h:349 spin_lock 347
> > 11 mm/vmalloc.c:1805 find_vmap_area 1801
> > 12 mm/vmalloc.c:2525 find_vm_area 2521
> > 13 mm/vmalloc.c:2639 __vunmap 2628
> > 14 mm/vmalloc.c:97 free_work 91
> > 15 kernel/workqueue.c:2289 process_one_work 2181
> > 16 kernel/workqueue.c:2436 worker_thread 2378
> > 17 kernel/kthread.c:376 kthread 330
> > 18 N/A ret_from_fork N/A
> >
> > The functions in the above shown stacktrace hardly change. There is only one
> > commit 993d0b287e2ef7bee2e8b13b0ce4d2b5066f278e which introduces changes to
> > find_vmap_area() for 5.19.
> >
> > With this change in mind we looked for stacktraces which make also use of this
> > new commit. And in a different kernel thread we do notice the use of
> > check_heap_object():
> >
> > # Source Location Function Name Function Line
> > 0 arch/x86/include/asm/paravirt.h:704 arch_local_irq_enable 702
> > 1 arch/x86/include/asm/irqflags.h:138 arch_local_irq_restore 135
> > 2 kernel/sched/sched.h:1330 raw_spin_rq_unlock_irqrestore 1327
> > 3 kernel/sched/sched.h:1327 raw_spin_rq_unlock_irqrestore N/A
> > 4 kernel/sched/sched.h:1611 rq_unlock_irqrestore 1607
> > 5 kernel/sched/fair.c:8288 update_blocked_averages 8272
> > 6 kernel/sched/fair.c:11133 run_rebalance_domains 11115
> > 7 kernel/softirq.c:571 __do_softirq 528
> > 8 kernel/softirq.c:445 invoke_softirq 433
> > 9 kernel/softirq.c:650 __irq_exit_rcu 640
> > 10 arch/x86/kernel/apic/apic.c:1106 sysvec_apic_timer_interrupt N/A
> > 11 N/A asm_sysvec_apic_timer_interrupt N/A
> > 12 include/linux/mmzone.h:1403 __nr_to_section 1395
> > 13 include/linux/mmzone.h:1488 __pfn_to_section 1486
> > 14 include/linux/mmzone.h:1539 pfn_valid 1524
> > 15 arch/x86/mm/physaddr.c:65 __virt_addr_valid 47
> > 16 mm/usercopy.c:188 check_heap_object 161
> > 17 mm/usercopy.c:250 __check_object_size 212
> > 18 mm/usercopy.c:212 __check_object_size N/A
> > 19 include/linux/thread_info.h:199 check_object_size 195
> > 20 lib/strncpy_from_user.c:137 strncpy_from_user 113
> > 21 fs/namei.c:150 getname_flags 129
> > 22 fs/namei.c:2896 user_path_at_empty 2893
> > 23 include/linux/namei.h:57 user_path_at 54
> > 24 fs/open.c:446 do_faccessat 420
> > 25 arch/x86/entry/common.c:50 do_syscall_x64 40
> > 26 arch/x86/entry/common.c:80 do_syscall_64 73
> > 27 N/A entry_SYSCALL_64_after_hwframe N/A
> >
> > We are neither experts in the mm subsystem nor can provide a fix, but wanted to
> > let you know about our findings.
> >
> > Cheers,
> > Florian
> >
> > --
> > You may reply to this email to add a comment.
> >
> > You are receiving this mail because:
> > You are the assignee for the bug.
>
next prev parent reply other threads:[~2022-09-15 22:42 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <bug-216489-27@https.bugzilla.kernel.org/>
2022-09-15 20:39 ` [Bug 216489] New: Machine freezes due to memory lock Andrew Morton
2022-09-15 22:42 ` Matthew Wilcox [this message]
2022-09-15 23:59 ` Yu Zhao
2022-09-16 8:38 ` Matthew Wilcox
2022-09-16 9:46 ` Kees Cook
2022-09-16 12:28 ` Uladzislau Rezki
2022-09-16 12:32 ` Uladzislau Rezki
2022-09-16 14:15 ` Matthew Wilcox
2022-09-16 14:42 ` Kees Cook
2022-09-16 18:47 ` Uladzislau Rezki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YyOqSWAmAFxx8RCt@casper.infradead.org \
--to=willy@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=bugzilla-daemon@kernel.org \
--cc=dev@der-flo.net \
--cc=keescook@chromium.org \
--cc=linux-mm@kvack.org \
--cc=urezki@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).