From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02CD3C54EE9 for ; Tue, 6 Sep 2022 09:50:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239322AbiIFJud (ORCPT ); Tue, 6 Sep 2022 05:50:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43258 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239530AbiIFJuI (ORCPT ); Tue, 6 Sep 2022 05:50:08 -0400 Received: from mail-ej1-x632.google.com (mail-ej1-x632.google.com [IPv6:2a00:1450:4864:20::632]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C84827858D for ; Tue, 6 Sep 2022 02:49:49 -0700 (PDT) Received: by mail-ej1-x632.google.com with SMTP id dc5so2068057ejb.9 for ; Tue, 06 Sep 2022 02:49:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date; bh=oao2eanIO+0ZaEYkDA8iPFUpxGnonh4X6mc942fm8R4=; b=d2NE3aJ/RSQUgLsWYJl2sqB7Z1JaC7wGi3diVySoKCPD5ZAjja+SR6EoNlqgH1wOc9 pyu4wABTErZsruJ9x0qnTz9PQlyEPZNuanv9f5/KevrYQyhqBRtAtsefAuVO1amc7mOb mt1kUZDuEi2PrFTxfPM8XakVj5bHTRR3eVj0g2OszjIz9+b647K/rGWIOvL7LpzIK42/ heFOwiXuzQK+1R8plnGC+tVW5WZYGTYIl/uG/qOxnCxB6O6wjRFhfy2nu/eJeUflRADV st5cowZ6zUNAJkTV36Tw8dpgPUy0tSaCmzTfTxILs4wM3keaLJ+NxUXOlqB1YHVXQlvt R4tg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date; bh=oao2eanIO+0ZaEYkDA8iPFUpxGnonh4X6mc942fm8R4=; b=Q8kte3hW73ejPkGzoPqpZy5qsOTQXu5zzZP1KgEKyyvcvmHR2EAFPQyGR5THRFXDAB wedDHyWNidTIgtLgaW+vBDZFboFY/If390HhgRumGSFwEwFx5/emvcWsxs0nwosXov+I vNiDzKWDtUiVO7WYVzhNLNybF+9M8gynXKrq7XwWjxCETP8AfLTmCS/pHozk8BoOicCE xn+AFjV+HkmwC8pZx3Gxd1Qp1RbklaVV/E3ZxvUqjhFpuJJqga5ra9ukDWRS+8ERiVSD owGZbRD6lRIgbXIBBERhrjmP8VKxz9nL8pQB+Ws6lEldoMTZ3LnAb9l8eCrQUbpt1N1g gdhA== X-Gm-Message-State: ACgBeo2yVF/iDSkkgzQa/GjmtSbGiz+ePK/7j941gOGW6tsTRY9kVGWp oT6Syu0XeYeY5x7XI6vobWOouQ== X-Google-Smtp-Source: AA6agR7FwnltaiAu7qfbGeW00HD6BDekEPUWp54MdUv7bEf3ywr7Mhgm1DpRaNsxCGrq5NadBoKkiQ== X-Received: by 2002:a17:907:a053:b0:741:698d:4e40 with SMTP id gz19-20020a170907a05300b00741698d4e40mr29433518ejc.616.1662457785641; Tue, 06 Sep 2022 02:49:45 -0700 (PDT) Received: from [192.168.0.13] ([83.216.184.132]) by smtp.gmail.com with ESMTPSA id t19-20020a1709067c1300b00730b61d8a5esm6329618ejo.61.2022.09.06.02.49.44 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 06 Sep 2022 02:49:45 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) Subject: Re: stalling IO regression since linux 5.12, through 5.18 From: Paolo Valente In-Reply-To: Date: Tue, 6 Sep 2022 11:49:43 +0200 Cc: Jan Kara , Ming Lei , Chris Murphy , Nikolay Borisov , Jens Axboe , Btrfs BTRFS , Linux-RAID , linux-block , linux-kernel , Josef Bacik , "yukuai (C)" Content-Transfer-Encoding: quoted-printable Message-Id: <7A67D5D9-EB63-4B0C-BC51-4A4CDBC2077E@linaro.org> References: <568465de-5c3b-4d94-a74b-5b83ce2f942f@www.fastmail.com> <9f2f608a-cd5f-4736-9e6d-07ccc2eca12c@www.fastmail.com> <5426d0f9-6539-477d-8feb-2b49136b960f@www.fastmail.com> <0f731b0a-fbd5-4e7b-a3df-0ed63360c1e0@www.fastmail.com> <297cbb87-87aa-2e1d-1fc3-8e96c241f28f@huaweicloud.com> <20220901080336.glpv4i3hyae2zkpk@quack3> To: Yu Kuai X-Mailer: Apple Mail (2.3445.104.11) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Il giorno 1 set 2022, alle ore 10:19, Yu Kuai = ha scritto: >=20 > =E5=9C=A8 2022/09/01 16:03, Jan Kara =E5=86=99=E9=81=93: >> On Thu 01-09-22 15:02:03, Yu Kuai wrote: >>> Hi, Chris >>>=20 >>> =E5=9C=A8 2022/08/20 15:00, Ming Lei =E5=86=99=E9=81=93: >>>> On Fri, Aug 19, 2022 at 03:20:25PM -0400, Chris Murphy wrote: >>>>>=20 >>>>>=20 >>>>> On Thu, Aug 18, 2022, at 1:24 AM, Ming Lei wrote: >>>>>> On Thu, Aug 18, 2022 at 12:27:04AM -0400, Chris Murphy wrote: >>>>>>>=20 >>>>>>>=20 >>>>>>> On Thu, Aug 18, 2022, at 12:18 AM, Chris Murphy wrote: >>>>>>>> On Thu, Aug 18, 2022, at 12:12 AM, Chris Murphy wrote: >>>>>>>>> On Wed, Aug 17, 2022, at 11:41 PM, Ming Lei wrote: >>>>>>>>>=20 >>>>>>>>>> OK, can you post the blk-mq debugfs log after you trigger it = on v5.17? >>>>>>>=20 >>>>>>> Same boot, 3rd log. But the load is above 300 so I kinda need to = sysrq+b soon. >>>>>>>=20 >>>>>>> = https://drive.google.com/file/d/1375H558kqPTdng439rvG6LuXXWPXLToo/view?usp= =3Dsharing >>>>>>>=20 >>>>>>=20 >>>>>> Also please test the following one too: >>>>>>=20 >>>>>>=20 >>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>>>> index 5ee62b95f3e5..d01c64be08e2 100644 >>>>>> --- a/block/blk-mq.c >>>>>> +++ b/block/blk-mq.c >>>>>> @@ -1991,7 +1991,8 @@ bool blk_mq_dispatch_rq_list(struct = blk_mq_hw_ctx >>>>>> *hctx, struct list_head *list, >>>>>> if (!needs_restart || >>>>>> (no_tag && = list_empty_careful(&hctx->dispatch_wait.entry))) >>>>>> blk_mq_run_hw_queue(hctx, true); >>>>>> - else if (needs_restart && needs_resource) >>>>>> + else if (needs_restart && (needs_resource || >>>>>> + = blk_mq_is_shared_tags(hctx->flags))) >>>>>> blk_mq_delay_run_hw_queue(hctx, = BLK_MQ_RESOURCE_DELAY); >>>>>>=20 >>>>>> blk_mq_update_dispatch_busy(hctx, true); >>>>>>=20 >>>>>=20 >>>>>=20 >>>>> With just this patch on top of 5.17.0, it still hangs. I've = captured block debugfs log: >>>>> = https://drive.google.com/file/d/1ic4YHxoL9RrCdy_5FNdGfh_q_J3d_Ft0/view?usp= =3Dsharing >>>>=20 >>>> The log is similar with before, and the only difference is RESTART = not >>>> set. >>>>=20 >>>> Also follows another patch merged to v5.18 and it fixes io stall = too, feel free to test it: >>>>=20 >>>> 8f5fea65b06d blk-mq: avoid extending delays of active hctx from = blk_mq_delay_run_hw_queues >>>=20 >>> Have you tried this patch? >>>=20 >>> We meet a similar problem in our test, and I'm pretty sure about the >>> situation at the scene, >>>=20 >>> Our test environment=EF=BC=9Anvme with bfq ioscheduler, >>>=20 >>> How io is stalled: >>>=20 >>> 1. hctx1 dispatch rq from bfq in service queue, bfqq becomes empty, >>> dispatch somehow fails and rq is inserted to hctx1->dispatch, new = run >>> work is queued. >>>=20 >>> 2. other hctx tries to dispatch rq, however, in service bfqq is >>> empty, bfq_dispatch_request return NULL, thus >>> blk_mq_delay_run_hw_queues is called. >>>=20 >>> 3. for the problem described in above patch=EF=BC=8Crun work from = "hctx1" >>> can be stalled. >>>=20 >>> Above patch should fix this io stall, however, it seems to me bfq do >>> have some problems that in service bfqq doesn't expire under = following >>> situation: >>>=20 >>> 1. dispatched rqs don't complete >>> 2. no new rq is issued to bfq >> And I guess: >> 3. there are requests queued in other bfqqs >> ? >=20 > Yes, of course, other bfqqs still have requests, but current > implementation have flaws that even if other bfqqs doesn't have > requests, bfq_asymmetric_scenario() can still return true because > num_groups_with_pending_reqs > 0. We tried to fix this, however, there > seems to be some misunderstanding with Paolo, and it's not applied to > mainline yet... >=20 I think this is an unsolved performance issue (being solved patiently by Yu Kuai), but not a functional flaw. The solution of this issue would probably solve this stall, but not the essential problem: refcounting gets broken if reqs disappear for bfq without any notification. Thanks, Paolo > Thanks, > Kuai >> Otherwise I don't see a point in expiring current bfqq because = there's >> nothing bfq could do anyway. But under normal circumstances the = request >> completion should not take so long so I don't think it would be = really >> worth it to implement some special mechanism for this in bfq. >> Honza