From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6CE5C5479D for ; Thu, 12 Jan 2023 00:37:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2085C8E0002; Wed, 11 Jan 2023 19:37:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 191798E0001; Wed, 11 Jan 2023 19:37:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 031DA8E0002; Wed, 11 Jan 2023 19:37:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E2FDF8E0001 for ; Wed, 11 Jan 2023 19:37:47 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id B10461A062E for ; Thu, 12 Jan 2023 00:37:47 +0000 (UTC) X-FDA: 80344284174.09.0BD9822 Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) by imf17.hostedemail.com (Postfix) with ESMTP id 2135040010 for ; Thu, 12 Jan 2023 00:37:45 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=no4xU7lP; spf=pass (imf17.hostedemail.com: domain of pedro.falcato@gmail.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=pedro.falcato@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673483866; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QQybfljezbpjIcQRaMgbBWNLNDOtaKJdBbtvg7KxLWk=; b=QjMRYqT4YliyYbcbV3083Qk14cG1HOetporPyy8u15802KRoJPdfVvzju0TERQ/HWQ41fL 41f6dm4MC9poopQkaMd2GNiJDLBpQVLRXF5pCJPl2+KKJAHfVlmmhdObB2gJmZPomux/cm TFkN2uNrIP2M0gkF874R4y/j8IX3d/E= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=no4xU7lP; spf=pass (imf17.hostedemail.com: domain of pedro.falcato@gmail.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=pedro.falcato@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673483866; a=rsa-sha256; cv=none; b=iuSZMxsm10PQaEniom6w5oVyu8nhOzV+895GUzRd1JJ5s5rR55uydYYxf1oh5qw0d3B6g6 s2ykfWIhFHsPR980IqlX9pgllKJ2DAmMgpV7MEu5fALBs0cLY+Va/21wvN+Qm8zUwQEiuM PmHjX/n4jWC++OzxbT/NmVSTBo08mLM= Received: by mail-pj1-f45.google.com with SMTP id c8-20020a17090a4d0800b00225c3614161so21834279pjg.5 for ; Wed, 11 Jan 2023 16:37:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=QQybfljezbpjIcQRaMgbBWNLNDOtaKJdBbtvg7KxLWk=; b=no4xU7lPmNdhM4MBL45sxx8uEKJqgJvK0SjWs+YGXd4+QOs13UkfmkhYsdc1N/37BS 3Gc8MMszYNk0J+fV9azvfIPMWTVv7VdpbeicGq5EeN7IH9O7ZR9zMaqvybkx1VNaYs8/ kRiu814Oq7XT5a+M3LYefHbOW4/g02CrtvH1iz3wlTCTy+iDppnl+p2HCC6AQPWdfxJ4 6Dkuk0LyIXh1P5QXnHp9lxqfJk5vf5JOGQhn55BZY/dFssUSbZh+mxl/6ZdUSQMisaQK 7LRNOHeX24s84WfoTS0ng9QIDP/+NGevZXwOc8IexKRosYviLH4zR7mfRi+MjBkd4Uuf kUfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=QQybfljezbpjIcQRaMgbBWNLNDOtaKJdBbtvg7KxLWk=; b=3O4RftLpcAoJAfD5S4heB4GBIGxyeNoRixWX7hDHFfjvj4qnqbt+SZ7cq0E9GP/p9S S0gYqfZnfwqWPSmZ89bbkueQjvMBkRIJbzwOe1qweaRbXlLlyIWT8/tF6GpnutezYfJa hYLvp6RIvA8sOznhiTnYQKFVNXX+9KHazjV7yHzMVVrQdbgQEO9ouiT13Ty1PM+CFMH2 PsJjnFjV6w/UzGIPa6CB15+dR8mvoIQv9QX7SZyoJQ+vuUBJDLzTFoC+mA5cxEB0Nxq1 JIaRfMIRoWWXpqwmUPorKRuGFbyU4YjxNxUywG0tG7oLtTUu/DlKRlpmooHKJdV/XQR1 pnIQ== X-Gm-Message-State: AFqh2kp76Ua79p0Ia0FEqmXAM+ehHYtEoGr4mMn3M3mA5Nqu4L4msJRX GL5L/QKHbD3zq2LjestSjuq1sZrNPgRXEqOnH3Q= X-Google-Smtp-Source: AMrXdXuWTqAVKsjvBH4EEfHmlpip/wyOtf6jzZQXPy7wmiNV/CoqJv4iqzhnnrvHL5ak0LfiwLU7m1Kuxkq4+JCc++U= X-Received: by 2002:a17:90b:2398:b0:226:fe58:d60a with SMTP id mr24-20020a17090b239800b00226fe58d60amr1887377pjb.225.1673483864950; Wed, 11 Jan 2023 16:37:44 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Pedro Falcato Date: Thu, 12 Jan 2023 00:37:33 +0000 Message-ID: Subject: Re: Stalls in qemu with host running 6.1 (everything stuck at mmap_read_lock()) To: Jiri Slaby Cc: Paolo Bonzini , kvm@vger.kernel.org, Andrew Morton , mm , yuzhao@google.com, Michal Hocko , Vlastimil Babka , shy828301@gmail.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 2135040010 X-Stat-Signature: ccrrmqf9fxf8ju33thfnazbbx7hw4sn9 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1673483865-520311 X-HE-Meta: U2FsdGVkX18H0kCvPptvWrADmGL1QwmA2ps8gRjn2RpujRJ6vZibb6TqqU2cBMI72xt/HDTR0CNRl30ZYUjio5rQvCC2N8WC0fxQVUscpgqztcdsCFB2AJxKjF15/vaWGHhFvyFtMA5RoE7nCzunv58obOeEHAcB5Oec+wIcaC/Do7OmUtBstPp/tF4cOCUeWjeeGlD4Dm6av7r5xCm1IfVZcKL0MR1SX4hWbPvDjJAUIRajeBDtPLhF3pZj2owAUxmS3O3cYix2lP1aADRyoky3fe5sWBKRctNRGmJDxogZF93rzcsQcgWLY/7/Ov4pCxQNQ01vudon5OWALvbXiWqRUtLmGyL/U05gXTRVQmDW6OVTX6/IL/vrBtU1l2l3ErBiBOYQKjxKfGvvziDL/+WP145sqlEF35OQhcb+f6otGzQjN00rJdBRFWotemcgB+deYZxdVBIyjx9UIrS/Vfb1WHi/Zn/t937FM6TvsaitcOTrrV2gyUQliWAL977lYbxRHz8YgwIphHhFxS8MePq7MyaLVBftxSCXDdaC79ouadl0bxsoATcNTaTNCQfte7JtZ0ipMPe3t74L+zpfcNpygbJV/GmszS9Hjf5rmnlojcnGq9UB8FYoJkpIk3OMEqH+lIkO49sly9+vpsRn3ZqSCdZa3iBuJdO0gJ3HZuuQ6AbCNsQXp41Y84jBGnmgnYjFZMYBfwhh+0nPuEL0+FwBgDquyqozE52IXKQCL0I0VtOOrdRq8CFTnDPiaFKA2f4xe2hTRj/uociCTbqcT5ZE1brzSvma2I9refpmtD7pyy6HVwsCZByFKFg6ams+M8/gBbZ821GtpQ9t/G24AQNmOd7cYDKDe6vPcE3iY1pDIk/f5UTF8Mf3AIMEToVn9hgnjJec6HkMpt3t+CkEbkXQ441LlzuH2tX2hQ4Sk6s4cbUZNP7JdfRm1GAsaIoQTCb+ipOvvR1sT3GL7Ip Ms+nAY14 O4kZenIVvX3OZ8u6oskumFuQSy+Rg6NxwWdNJMF48PuDYhG3g+FpcpJGdbT5Q/AZiZuITHBUsXy1+QgLqbbGl8s7v9vSFjk4FXAV208ty418UmDsSLR/H3dKdsE4QBJ15wUIH X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jan 11, 2023 at 8:00 AM Jiri Slaby wrote: > > Hi, > > after I updated the host from 6.0 to 6.1 (being at 6.1.4 ATM), my qemu > VMs started stalling (and the host at the same point too). It doesn't > happen right after boot, maybe a suspend-resume cycle is needed (or > longer uptime, or a couple of qemu VM starts, or ...). But when it > happens, it happens all the time till the next reboot. > > Older guest's kernels/distros are affected as well as Win10. > > In guests, I see for example stalls in memset_orig or > smp_call_function_many_cond -- traces below. > > qemu-kvm-7.1.0-13.34.x86_64 from openSUSE. > > It's quite interesting that: > $ cat /proc//cmdline > is stuck at read: > > openat(AT_FDCWD, "/proc/12239/cmdline", O_RDONLY) = 3 > newfstatat(3, "", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_EMPTY_PATH) = 0 > fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 > mmap(NULL, 139264, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, > 0) = 0x7f22f0487000 > read(3, ^C^C^C^\^C > > too. So I dumped blocked tasks (sysrq-w) on _host_ (see below) and > everything seems to stall on mmap_read_lock() or > mmap_write_lock_killable(). I don't see the hog (the one actually > _having_ and sitting on the (presumably write) lock) in the dump though. > I will perhaps boot a LOCKDEP-enabled kernel, so that I can do sysrq-d > next time and see the holder. > > > There should be enough free memory (note caches at 8G): > total used free shared buff/cache > available > Mem: 15Gi 10Gi 400Mi 2,5Gi 8,0Gi > 5,0Gi > Swap: 0B 0B 0B > > > I rmmoded kvm-intel now, so: > qemu-kvm: failed to initialize kvm: No such file or directory > qemu-kvm: falling back to tcg > and it behaves the same (more or less expected). > > Is this known? Any idea how to debug this? Or maybe someone (I CCed a > couple of guys who Acked mmap_*_lock() shuffling patches in 6.1) has a > clue? Bisection is hard as it reproduces only under certain unknown > circumstances. Hi, I just want to chime in and say that I've also hit this regression right as I (Arch) updated to 6.1 a few weeks ago. This completely ruined my qemu workflow such that I had to fallback to using an LTS kernel. Some data I've gathered: 1) It seems to not happen right after booting - I'm unsure if this is due to memory pressure or less CPU load or any other factor 2) It seems to intensify after swapping a fair amount? At least this has been my experience. 3) The largest slowdown seems to be when qemu is booting the guest, possibly during heavy memory allocation - problems range from "takes tens of seconds to boot" to "qemu is completely blocked and needs a SIGKILL spam". 4) While traditional process monitoring tools break (likely due to mmap_lock getting hogged), I can (empirically, using /bin/free) tell that the system seems to be swapping in/out quite a fair bit My 4) is particularly confusing to me as I had originally blamed the problem on the MGLRU changes, while you don't seem to be swapping at all. Could this be related to the maple tree patches? Should we CC both the MGLRU folks and the maple folks? I have little insight into what the kernel's state actually is apart from this - perf seems to break, and I have no kernel debugger as this is my live personal machine :/ I would love it if someone hinted to possible things I/we could try in order to track this down. Is this not git-bisectable at all? Thanks, Pedro