From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60E33C25B07 for ; Wed, 10 Aug 2022 17:49:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231587AbiHJRt2 (ORCPT ); Wed, 10 Aug 2022 13:49:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53014 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232561AbiHJRtE (ORCPT ); Wed, 10 Aug 2022 13:49:04 -0400 Received: from mail-qt1-x831.google.com (mail-qt1-x831.google.com [IPv6:2607:f8b0:4864:20::831]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15CA08C456 for ; Wed, 10 Aug 2022 10:48:56 -0700 (PDT) Received: by mail-qt1-x831.google.com with SMTP id u12so11815764qtk.0 for ; Wed, 10 Aug 2022 10:48:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20210112.gappssmtp.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc; bh=b56RlgqsDjhndoKXh76AAfBh0fo34wRquvTp2LsFHgk=; b=exB4l+5E6/quhEzaiSIDb4pdiGPX1b7hO2HymEU/s0Ly9NVmwcqirP380zissRt0EU l7snuO/yv6izW4vfFWMEdB9B+D3ZVfwfxSr1WOYbDpIdqILXMI/K7LkiAKb+7oPxDfAJ hvMf5BPeFLFvnTXDiQf2qoOGMSr1kEyaRmsWqOHzHzLHA0Ef4sYYjUkkFgQ+If6pcK5P QP2rYx8ggkB2mQsu1SOwt0kvxoa+SZAhq0nCyBqRWQjksiwvXztpZ6vLKYwYnytRJsjW yMZCuHzeO0c/9Ug/c5WtOF5FmRfQQc+7qd6WRT4Nd27y1o4/ilC5FSwm4ts7sYLDeN85 3fpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc; bh=b56RlgqsDjhndoKXh76AAfBh0fo34wRquvTp2LsFHgk=; b=KHNHpI/9cM56lv3Au+Y6pT4iEE2YUJiOubPG7gEge2IxhC0e52SfbAFY18t6gvYBPY AJgBwXIfRJP+xs0bzVxyQpfmn/GtMJmcDkUWMn3/EGPDyB6dcvQlB4X0H5e6wzfNdVMf gmdHcWGkV+OXgsDUK4leEJpyd2EuKTTKLYRYwGV5ilv6HIMEVdUnMVYC7oEj7p6YKgfP ivk3U5SDKqojD75oH8wOU5fP/tW1LCYGE0m955zRf0XBUjiO260Ja16BnMghSKk4nMmG vki9df7EhtFG/OmDxsBPEV1VClPnXCnx8G6wqt7WNlzAh2Asr5bkZIGB7zeeFfqMlEpW hSfw== X-Gm-Message-State: ACgBeo0jMiaZeB0xdFRnTAiQblCVToOP9umzyFHiXpIoTNaPEPu2hcKG ZSuqSriscfmSwttdHn5YYab55Q== X-Google-Smtp-Source: AA6agR7N7ISFUfCzkKkyp2mKsj1gWbPB/HcnKeULzt0zI4kXHOGqvXsg6K3Ka5VQlyfKqj8EDn9sRw== X-Received: by 2002:a05:622a:3cf:b0:32c:1d47:2164 with SMTP id k15-20020a05622a03cf00b0032c1d472164mr25047708qtx.97.1660153735049; Wed, 10 Aug 2022 10:48:55 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id x14-20020a05620a258e00b006b9a89d408csm252567qko.100.2022.08.10.10.48.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Aug 2022 10:48:52 -0700 (PDT) Date: Wed, 10 Aug 2022 13:48:50 -0400 From: Josef Bacik To: Chris Murphy Cc: Btrfs BTRFS , Linux-RAID , linux-block@vger.kernel.org, linux-kernel Subject: Re: stalling IO regression in linux 5.12 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On Wed, Aug 10, 2022 at 12:35:34PM -0400, Chris Murphy wrote: > CPU: Intel E5-2680 v3 > RAM: 128 G > 02:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] [1000:005d] (rev 02), using megaraid_sas driver > 8 Disks: TOSHIBA AL13SEB600 > > > The problem exhibits as increasing load, increasing IO pressure (PSI), and actual IO goes to zero. It never happens on kernel 5.11 series, and always happens after 5.12-rc1 and persists through 5.18.0. There's a new mix of behaviors with 5.19, I suspect the mm improvements in this series might be masking the problem. > > The workload involves openqa, which spins up 30 qemu-kvm instances, and does a bunch of tests, generating quite a lot of writes: qcow2 files, and video in the form of many screenshots, and various log files, for each VM. These VMs are each in their own cgroup. As the problem begins, I see increasing IO pressure, and decreasing IO, for each qemu instance's cgroup, and the cgroups for httpd, journald, auditd, and postgresql. IO pressure goes to nearly ~99% and IO is literally 0. > > The problem left unattended to progress will eventually result in a completely unresponsive system, with no kernel messages. It reproduces in the following configurations, the first two I provide links to full dmesg with sysrq+w: > > btrfs raid10 (native) on plain partitions [1] > btrfs single/dup on dmcrypt on mdadm raid 10 and parity raid [2] > XFS on dmcrypt on mdadm raid10 or parity raid > > I've started a bisect, but for some reason I haven't figured out I've started getting compiled kernels that don't boot the hardware. The failure is very early on such that the UUID for the root file system isn't found, but not much to go on as to why.[3] I have tested the first and last skipped commits in the bisect log below, they successfully boot a VM but not the hardware. > > Anyway, I'm kinda stuck at this point trying to narrow it down further. Any suggestions? Thanks. > I looked at the traces, btrfs is stuck waiting on IO and blk tags, which means we've got a lot of outstanding requests and are waiting for them to finish so we can allocate more requests. Additionally I'm seeing a bunch of the blkg async submit things, which are used when we have the block cgroup stuff turned on and compression enabled, so we punt any compressed bios to a per-cgroup async thread to submit the IO's in the appropriate block cgroup context. This could mean we're just being overly mean and generating too many IO's, but since the IO goes to 0 I'm more inclined to believe there's a screw up in whatever IO cgroup controller you're using. To help narrow this down can you disable any IO controller you've got enabled and see if you can reproduce? If you can sysrq+w is super helpful as it'll point us in the next direction to look. Thanks, Josef