From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0A0CCA0FFC for ; Tue, 5 Sep 2023 16:04:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237084AbjIEQDv (ORCPT ); Tue, 5 Sep 2023 12:03:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40508 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344611AbjIEDuV (ORCPT ); Mon, 4 Sep 2023 23:50:21 -0400 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 01E20E6; Mon, 4 Sep 2023 20:50:16 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4Rfs38422bz4f3vdX; Tue, 5 Sep 2023 11:50:12 +0800 (CST) Received: from [10.174.176.73] (unknown [10.174.176.73]) by APP4 (Coremail) with SMTP id gCh0CgD3hqlypfZkvoWYCQ--.31883S3; Tue, 05 Sep 2023 11:50:11 +0800 (CST) Subject: Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition To: Yu Kuai , Dragan Stancevic , song@kernel.org Cc: buczek@molgen.mpg.de, guoqing.jiang@linux.dev, it+raid@molgen.mpg.de, linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, msmith626@gmail.com, "yangerkun@huawei.com" References: <20230822211627.1389410-1-dragan@stancevic.com> <2061b123-6332-1456-e7c3-b713752527fb@stancevic.com> <07d5c7c2-c444-8747-ed6d-ca24231decd8@huaweicloud.com> <0d79d1f9-00e8-93be-3c7c-244030521cd7@huaweicloud.com> From: Yu Kuai Message-ID: Date: Tue, 5 Sep 2023 11:50:10 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <0d79d1f9-00e8-93be-3c7c-244030521cd7@huaweicloud.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID: gCh0CgD3hqlypfZkvoWYCQ--.31883S3 X-Coremail-Antispam: 1UD129KBjvJXoW7Ar4DXF1UuFWUXrW3GrW7Arb_yoW8JFy7pF WrXa1akw4UZrnrA34xJr4q9a409wsxAr9xGrn5Gw4xAay5uF9F9FWI9a1YgF9xGrs2gr90 vwnaqF1Dtry8X3JanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUU9F14x267AKxVW8JVW5JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK02 1l84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4U JVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_Gc CE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E 2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJV W8JwACjcxG0xvEwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lFIxGxcIEc7CjxVA2Y2ka 0xkIwI1lc7I2V7IY0VAS07AlzVAYIcxG8wCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7x kEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E 67AF67kF1VAFwI0_Jw0_GFylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCw CI42IY6xIIjxv20xvEc7CjxVAFwI0_Jr0_Gr1lIxAIcVCF04k26cxKx2IYs7xG6rW3Jr0E 3s1lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcS sGvfC2KfnxnUUI43ZEXa7VUbXdbUUUUUU== X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org Hi, 在 2023/08/30 9:36, Yu Kuai 写道: > Hi, > > 在 2023/08/29 4:32, Dragan Stancevic 写道: > >> Just a followup on 6.1 testing. I tried reproducing this problem for 5 >> days with 6.1.42 kernel without your patches and I was not able to >> reproduce it. oops, I forgot that you need to backport this patch first to reporduce this problem: https://lore.kernel.org/all/20230529132037.2124527-2-yukuai1@huaweicloud.com/ The patch fix the deadlock as well, but it introduce some regressions. Thanks, Kuai >> >> It seems that 6.1 has some other code that prevents this from happening. >> > > I see that there are lots of patches for raid456 between 5.10 and 6.1, > however, I remember that I used to reporduce the deadlock after 6.1, and > it's true it's not easy to reporduce, see below: > > https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@huaweicloud.com/ > > > My guess is that 6.1 is harder to reporduce than 5.10 due to some > changes inside raid456. > > By the way, raid10 had a similiar deadlock, and can be fixed the same > way, so it make sense to backport these patches. > > https://lore.kernel.org/r/20230529132037.2124527-5-yukuai1@huaweicloud.com > > Thanks, > Kuai > > >> On 5.10 I can reproduce it within minutes to an hour. >> > > . >