From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CEE66C3F6B0 for ; Wed, 17 Aug 2022 09:53:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233603AbiHQJxI (ORCPT ); Wed, 17 Aug 2022 05:53:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44852 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234305AbiHQJxB (ORCPT ); Wed, 17 Aug 2022 05:53:01 -0400 Received: from mail.itouring.de (mail.itouring.de [IPv6:2a01:4f8:a0:4463::2]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 591EA7CAA9; Wed, 17 Aug 2022 02:52:59 -0700 (PDT) Received: from tux.applied-asynchrony.com (p5ddd78be.dip0.t-ipconnect.de [93.221.120.190]) by mail.itouring.de (Postfix) with ESMTPSA id 61A25103762; Wed, 17 Aug 2022 11:52:55 +0200 (CEST) Received: from [192.168.100.221] (hho.applied-asynchrony.com [192.168.100.221]) by tux.applied-asynchrony.com (Postfix) with ESMTP id 08CBDF01600; Wed, 17 Aug 2022 11:52:55 +0200 (CEST) Subject: Re: stalling IO regression since linux 5.12, through 5.18 To: Chris Murphy , Nikolay Borisov , Jens Axboe , Jan Kara , Paolo Valente Cc: Linux-RAID , linux-block , linux-kernel , Josef Bacik , linux-block References: <2220d403-e443-4e60-b7c3-d149e402c13e@www.fastmail.com> <61e5ccda-a527-4fea-9850-91095ffa91c4@www.fastmail.com> <4995baed-c561-421d-ba3e-3a75d6a738a3@www.fastmail.com> <2b8a38fa-f15f-45e8-8caa-61c5f8cd52de@www.fastmail.com> From: =?UTF-8?Q?Holger_Hoffst=c3=a4tte?= Organization: Applied Asynchrony, Inc. Message-ID: <7c830487-95a6-b008-920b-8bc4a318f10a@applied-asynchrony.com> Date: Wed, 17 Aug 2022 11:52:54 +0200 MIME-Version: 1.0 In-Reply-To: <2b8a38fa-f15f-45e8-8caa-61c5f8cd52de@www.fastmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On 2022-08-16 17:34, Chris Murphy wrote: > > On Tue, Aug 16, 2022, at 11:25 AM, Nikolay Borisov wrote: >> How about changing the scheduler either mq-deadline or noop, just >> to see if this is also reproducible with a different scheduler. I >> guess noop would imply the blk cgroup controller is going to be >> disabled > > I already reported on that: always happens with bfq within an hour or > less. Doesn't happen with mq-deadline for ~25+ hours. Does happen > with bfq with the above patches removed. Does happen with > cgroup.disabled=io set. > > Sounds to me like it's something bfq depends on and is somehow > becoming perturbed in a way that mq-deadline does not, and has > changed between 5.11 and 5.12. I have no idea what's under bfq that > matches this description. Chris, just a shot in the dark but can you try the patch from https://lore.kernel.org/linux-block/20220803121504.212071-1-yukuai1@huaweicloud.com/ on top of something more recent than 5.12? Ideally 5.19 where it applies cleanly. No guarantees, I just remembered this patch and your problem sounds like a lost wakeup. Maybe BFQ just drives the sbitmap in a way that triggers the symptom. -h