From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1311C32771 for ; Fri, 16 Sep 2022 00:00:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9602B8D0002; Thu, 15 Sep 2022 20:00:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 90FAD8D0001; Thu, 15 Sep 2022 20:00:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D7A58D0002; Thu, 15 Sep 2022 20:00:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 6F1CB8D0001 for ; Thu, 15 Sep 2022 20:00:34 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 247BF161B0C for ; Fri, 16 Sep 2022 00:00:34 +0000 (UTC) X-FDA: 79915991988.04.5A90439 Received: from mail-vs1-f49.google.com (mail-vs1-f49.google.com [209.85.217.49]) by imf17.hostedemail.com (Postfix) with ESMTP id ADC604009F for ; Fri, 16 Sep 2022 00:00:33 +0000 (UTC) Received: by mail-vs1-f49.google.com with SMTP id c3so21021967vsc.6 for ; Thu, 15 Sep 2022 17:00:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=iUDFKu2hzqWe5+W9iwtnizg1PXPOc5Ke9FPdf4UtpjY=; b=UsswbGvSytqaXlzztZXQSjK4vXMd4Mrd8AC+89bLXhQJXFMhMV7al2uupWqmz57VI6 53IXCcq1EWXWmkic2PU+3qNDVyL8QMuYyDOhKBI2dxFeYdjmA/XUAoT9dNNGX5pE6IXP OrMmDhfQfNaZjfuqY6fAHo3a3o4LDdcp0lTL/yoKfY/p7CoYey8MupXjlcz6zw/0Wdxd QgidpaKM8fTHNsmml27djmZ2jgyNsQehwcwEFIQ/8SaB5dOovfnY+NBVQlWqlTj9/zYr UpKT9WKRHhuiL0su0+DmsRElVrurHkIn7OnonyLW/SLLbvEMaxrCRnWzKaxPJ3hsYeM7 tlAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=iUDFKu2hzqWe5+W9iwtnizg1PXPOc5Ke9FPdf4UtpjY=; b=MAAI/r70IxZqaxKVo8bFPJB4CbN+vs0owLfnv/xS5k0WhOeDTSw29lbCw0CRJ3DeWV 57CuOb3rUs3yJ+yZJBFPsY92ExeREGZyQBgroRYhpZBM5cSmTWAlV6XErBVgFShtGoVk Pfi6vM9l2rA4XRj8pbdl6RsFO8sSowTgBdwVB9fSefJRgal7A33hH4hoWg8aH0nS8Z2L 8WdlTumYN7exAPTvbnAz1iT7+6i2kQMnWmthZzdeWqyZ9/IbWBwhhFTuJvfodUeijQ1K hkikppDL09mmzrFOw5Irmnzsb/82Y6gz8qkBF8hUK86EmlHlXLgQ18IBQtDuIl4C6eVb 0CfQ== X-Gm-Message-State: ACrzQf39sV3k8abIysM51MM7U7Y9cVlS/K0bPeIkbaAKJ5UpuOhZm7BM Ws4RNUOX5eFl7SHSXiJ868h9KPne2K9D9zNJ+pNpaQ== X-Google-Smtp-Source: AMsMyM70i4dodKygL8lEzW+BY2XajcGNJKyv5xl/QOaC9LjLCU82ssABux8wag5e9z6yqiX3aJrQRqVcBB1uRoqia8Q= X-Received: by 2002:a05:6102:3666:b0:398:7f9e:278 with SMTP id bg6-20020a056102366600b003987f9e0278mr1226158vsb.50.1663286432784; Thu, 15 Sep 2022 17:00:32 -0700 (PDT) MIME-Version: 1.0 References: <20220915133931.ee0a6c8a86c59a144828eb60@linux-foundation.org> In-Reply-To: From: Yu Zhao Date: Thu, 15 Sep 2022 17:59:56 -0600 Message-ID: Subject: Re: [Bug 216489] New: Machine freezes due to memory lock To: Matthew Wilcox , Andrew Morton Cc: dev@der-flo.net, Linux-MM , Uladzislau Rezki , bugzilla-daemon@kernel.org, Kees Cook Content-Type: text/plain; charset="UTF-8" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1663286433; a=rsa-sha256; cv=none; b=CcBh9DkWUJ05AoqYPLJd1tO/g8cgdWwBUDtqjg0dhNPIi/nqscirw2gHguM3DPkYC6fB3x okruKrR6BrBYTE1ReFgJ3IOSQqew7pcMsZp4MIeZXaxVEvEXuvtoiY9C6Gg9sOasHipqFC QiITd8S5ClOaa9LD228kOlx4EBdwUIA= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=UsswbGvS; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of yuzhao@google.com designates 209.85.217.49 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1663286433; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iUDFKu2hzqWe5+W9iwtnizg1PXPOc5Ke9FPdf4UtpjY=; b=G8sTm8KnBP1KcYiie0B8d70k/wJQWzlioZqj1leyAMlOVPh5BukCYW1TOv9B3TlO5kWfuo qcIyfjSSPHDPM1u0qs7YfbGQglZ8jJ0kFnQ70nnZWrheyqNwjxTS9P4o9TZucjL2fW2HLD k8gBLlSDLdX7aTwuhAP/c26jih7Sg80= X-Rspam-User: Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=UsswbGvS; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of yuzhao@google.com designates 209.85.217.49 as permitted sender) smtp.mailfrom=yuzhao@google.com X-Stat-Signature: jgdgmjxq6ujowiei48b9app1wnzx57zx X-Rspamd-Queue-Id: ADC604009F X-Rspamd-Server: rspam09 X-HE-Tag: 1663286433-632762 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Sep 15, 2022 at 4:42 PM Matthew Wilcox wrote: > > On Thu, Sep 15, 2022 at 01:39:31PM -0700, Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > > bugzilla web interface). > > > > On Wed, 14 Sep 2022 15:07:46 +0000 bugzilla-daemon@kernel.org wrote: > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=216489 > > > > > > Bug ID: 216489 > > > Summary: Machine freezes due to memory lock > > > Product: Memory Management > > > Version: 2.5 > > > Kernel Version: 5.19.8 > > > Hardware: AMD > > > OS: Linux > > > Tree: Mainline > > > Status: NEW > > > Severity: high > > > Priority: P1 > > > Component: Other > > > Assignee: akpm@linux-foundation.org > > > Reporter: dev@der-flo.net > > > Regression: No > > > > > > Hi all, > > > With Kernel 5.19.x we noticed system freezes. This happens in virtual > > > environments as well as on real hardware. > > > On a real hardware machine we were able to catch the moment of freeze with > > > continuous profiling. > > > > Thanks. I forwarded this to Uladzislau and he offered to help. He said: > > > > > > : I can help with debugging. What i need is reproduce steps. Could you > > : please clarify if it is easy to hit and what kind of profiling triggers it? > > > > and > > > > : I do not think that Matthew Wilcox commits destroys it but... I see that > > : __vunmap() is invoked by the free_work() thus a caller is in atomic > > : context including IRQ context. > > To keep all of this information together; Florian emailed me off-list > and I replied cc'ing Kees. > > I asked to try this patch to decide whether it's the extra load on the > spinlock from examining the vmap tree more often: > > diff --git a/mm/usercopy.c b/mm/usercopy.c > index c1ee15a98633..76d2d4fb6d22 100644 > --- a/mm/usercopy.c > +++ b/mm/usercopy.c > @@ -173,15 +173,6 @@ static inline void check_heap_object(const void *ptr, unsigned long n, > } > > if (is_vmalloc_addr(ptr)) { > - struct vmap_area *area = find_vmap_area(addr); > - > - if (!area) > - usercopy_abort("vmalloc", "no area", to_user, 0, n); > - > - if (n > area->va_end - addr) { > - offset = addr - area->va_start; > - usercopy_abort("vmalloc", NULL, to_user, offset, n); > - } > return; > } > > > Kees wrote: > > } If you can reproduce the hangs, perhaps enable: > } > } CONFIG_DEBUG_LOCKDEP=y > } CONFIG_DEBUG_ATOMIC_SLEEP=y > } > } I would expect any hung spinlock to complain very loudly under > } LOCKDEP... > > I hope we can keep the remainder of the debugging in this email thread. > > > > Specification of the machine where we captured the freeze: > > > Thinkpad T14 > > > CPU: AMD Ryzen 7 PRO 4750U > > > Kernel: 5.19.8-200.fc36.x86_64 > > > > > > Stacktrace of kworker/12:3 that is using all resources and causing the freeze: > > > > > > # Source Location Function Name Function Line > > > 0 arch/x86/include/asm/vdso/processor.h:13 rep_nop 11 > > > 1 arch/x86/include/asm/vdso/processor.h:18 cpu_relax 16 > > > 2 kernel/locking/qspinlock.c:514 native_queued_spin_lock_slowpath > > > 316 > > > 3 kernel/locking/qspinlock.c:316 native_queued_spin_lock_slowpath > > > N/A > > > 4 arch/x86/include/asm/paravirt.h:591 pv_queued_spin_lock_slowpath > > > 588 > > > 5 arch/x86/include/asm/qspinlock.h:51 queued_spin_lock_slowpath 49 > > > 6 include/asm-generic/qspinlock.h:114 queued_spin_lock 107 > > > 7 include/linux/spinlock.h:185 do_raw_spin_lock 182 > > > 8 include/linux/spinlock_api_smp.h:134 __raw_spin_lock 130 > > > 9 kernel/locking/spinlock.c:154 _raw_spin_lock 152 > > > 10 include/linux/spinlock.h:349 spin_lock 347 > > > 11 mm/vmalloc.c:1805 find_vmap_area 1801 > > > 12 mm/vmalloc.c:2525 find_vm_area 2521 > > > 13 mm/vmalloc.c:2639 __vunmap 2628 > > > 14 mm/vmalloc.c:97 free_work 91 > > > 15 kernel/workqueue.c:2289 process_one_work 2181 > > > 16 kernel/workqueue.c:2436 worker_thread 2378 > > > 17 kernel/kthread.c:376 kthread 330 > > > 18 N/A ret_from_fork N/A > > > > > > The functions in the above shown stacktrace hardly change. There is only one > > > commit 993d0b287e2ef7bee2e8b13b0ce4d2b5066f278e which introduces changes to > > > find_vmap_area() for 5.19. > > > > > > With this change in mind we looked for stacktraces which make also use of this > > > new commit. And in a different kernel thread we do notice the use of > > > check_heap_object(): > > > > > > # Source Location Function Name Function Line > > > 0 arch/x86/include/asm/paravirt.h:704 arch_local_irq_enable 702 > > > 1 arch/x86/include/asm/irqflags.h:138 arch_local_irq_restore 135 > > > 2 kernel/sched/sched.h:1330 raw_spin_rq_unlock_irqrestore 1327 > > > 3 kernel/sched/sched.h:1327 raw_spin_rq_unlock_irqrestore N/A > > > 4 kernel/sched/sched.h:1611 rq_unlock_irqrestore 1607 > > > 5 kernel/sched/fair.c:8288 update_blocked_averages 8272 > > > 6 kernel/sched/fair.c:11133 run_rebalance_domains 11115 > > > 7 kernel/softirq.c:571 __do_softirq 528 > > > 8 kernel/softirq.c:445 invoke_softirq 433 > > > 9 kernel/softirq.c:650 __irq_exit_rcu 640 > > > 10 arch/x86/kernel/apic/apic.c:1106 sysvec_apic_timer_interrupt N/A > > > 11 N/A asm_sysvec_apic_timer_interrupt N/A > > > 12 include/linux/mmzone.h:1403 __nr_to_section 1395 > > > 13 include/linux/mmzone.h:1488 __pfn_to_section 1486 > > > 14 include/linux/mmzone.h:1539 pfn_valid 1524 > > > 15 arch/x86/mm/physaddr.c:65 __virt_addr_valid 47 > > > 16 mm/usercopy.c:188 check_heap_object 161 > > > 17 mm/usercopy.c:250 __check_object_size 212 > > > 18 mm/usercopy.c:212 __check_object_size N/A > > > 19 include/linux/thread_info.h:199 check_object_size 195 > > > 20 lib/strncpy_from_user.c:137 strncpy_from_user 113 > > > 21 fs/namei.c:150 getname_flags 129 > > > 22 fs/namei.c:2896 user_path_at_empty 2893 > > > 23 include/linux/namei.h:57 user_path_at 54 > > > 24 fs/open.c:446 do_faccessat 420 > > > 25 arch/x86/entry/common.c:50 do_syscall_x64 40 > > > 26 arch/x86/entry/common.c:80 do_syscall_64 73 > > > 27 N/A entry_SYSCALL_64_after_hwframe N/A > > > > > > We are neither experts in the mm subsystem nor can provide a fix, but wanted to > > > let you know about our findings. > > > > > > Cheers, > > > Florian > > > > > > -- > > > You may reply to this email to add a comment. > > > > > > You are receiving this mail because: > > > You are the assignee for the bug. I think this is a manifest of the lockdep warning I reported a couple of weeks ago: https://lore.kernel.org/r/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com/