From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB30CC433F5 for ; Tue, 21 Sep 2021 06:40:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AE6DD61168 for ; Tue, 21 Sep 2021 06:40:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229763AbhIUGmD (ORCPT ); Tue, 21 Sep 2021 02:42:03 -0400 Received: from mail107.syd.optusnet.com.au ([211.29.132.53]:33446 "EHLO mail107.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229624AbhIUGmD (ORCPT ); Tue, 21 Sep 2021 02:42:03 -0400 Received: from dread.disaster.area (pa49-195-238-16.pa.nsw.optusnet.com.au [49.195.238.16]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 93C201009B41; Tue, 21 Sep 2021 16:40:33 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1mSZSC-00EwYv-Bt; Tue, 21 Sep 2021 16:40:32 +1000 Date: Tue, 21 Sep 2021 16:40:32 +1000 From: Dave Chinner To: Jens Axboe Cc: io-uring , linux-fsdevel Subject: [5.15-rc1 regression] io_uring: fsstress hangs in do_coredump() on exit Message-ID: <20210921064032.GW2361455@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=YKPhNiOx c=1 sm=1 tr=0 a=DzKKRZjfViQTE5W6EVc0VA==:117 a=DzKKRZjfViQTE5W6EVc0VA==:17 a=kj9zAlcOel0A:10 a=7QKq2e-ADPsA:10 a=7-415B0cAAAA:8 a=BnQZExpEWCbP54y3Tb4A:9 a=CjuIK1q_8ugA:10 a=biEYGPWJfzWAr4FL6Ov7:22 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Hi Jens, I updated all my trees from 5.14 to 5.15-rc2 this morning and immediately had problems running the recoveryloop fstest group on them. These tests have a typical pattern of "run load in the background, shutdown the filesystem, kill load, unmount and test recovery". Whent eh load includes fsstress, and it gets killed after shutdown, it hangs on exit like so: # echo w > /proc/sysrq-trigger [ 370.669482] sysrq: Show Blocked State [ 370.671732] task:fsstress state:D stack:11088 pid: 9619 ppid: 9615 flags:0x00000000 [ 370.675870] Call Trace: [ 370.677067] __schedule+0x310/0x9f0 [ 370.678564] schedule+0x67/0xe0 [ 370.679545] schedule_timeout+0x114/0x160 [ 370.682002] __wait_for_common+0xc0/0x160 [ 370.684274] wait_for_completion+0x24/0x30 [ 370.685471] do_coredump+0x202/0x1150 [ 370.690270] get_signal+0x4c2/0x900 [ 370.691305] arch_do_signal_or_restart+0x106/0x7a0 [ 370.693888] exit_to_user_mode_prepare+0xfb/0x1d0 [ 370.695241] syscall_exit_to_user_mode+0x17/0x40 [ 370.696572] do_syscall_64+0x42/0x80 [ 370.697620] entry_SYSCALL_64_after_hwframe+0x44/0xae It's 100% reproducable on one of my test machines, but only one of them. That one machine is running fstests on pmem, so it has synchronous storage. Every other test machine using normal async storage (nvme, iscsi, etc) and none of them are hanging. A quick troll of the commit history between 5.14 and 5.15-rc2 indicates a couple of potential candidates. The 5th kernel build (instead of ~16 for a bisect) told me that commit 15e20db2e0ce ("io-wq: only exit on fatal signals") is the cause of the regression. I've confirmed that this is the first commit where the problem shows up. Cheers, Dave. -- Dave Chinner david@fromorbit.com