Re: [5.15-rc1 regression] io_uring: fsstress hangs in do_coredump() on exit

From: Jens Axboe <axboe@kernel.dk>
To: Dave Chinner <david@fromorbit.com>
Cc: io-uring <io-uring@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [5.15-rc1 regression] io_uring: fsstress hangs in do_coredump() on exit
Date: Tue, 21 Sep 2021 15:41:20 -0600	[thread overview]
Message-ID: <6d46951b-a7b3-0feb-3af0-aaa8ec87b87a@kernel.dk> (raw)
In-Reply-To: <20210921213552.GZ2361455@dread.disaster.area>

On 9/21/21 3:35 PM, Dave Chinner wrote:
> On Tue, Sep 21, 2021 at 08:19:53AM -0600, Jens Axboe wrote:
>> On 9/21/21 7:25 AM, Jens Axboe wrote:
>>> On 9/21/21 12:40 AM, Dave Chinner wrote:
>>>> Hi Jens,
>>>>
>>>> I updated all my trees from 5.14 to 5.15-rc2 this morning and
>>>> immediately had problems running the recoveryloop fstest group on
>>>> them. These tests have a typical pattern of "run load in the
>>>> background, shutdown the filesystem, kill load, unmount and test
>>>> recovery".
>>>>
>>>> Whent eh load includes fsstress, and it gets killed after shutdown,
>>>> it hangs on exit like so:
>>>>
>>>> # echo w > /proc/sysrq-trigger 
>>>> [  370.669482] sysrq: Show Blocked State
>>>> [  370.671732] task:fsstress        state:D stack:11088 pid: 9619 ppid:  9615 flags:0x00000000
>>>> [  370.675870] Call Trace:
>>>> [  370.677067]  __schedule+0x310/0x9f0
>>>> [  370.678564]  schedule+0x67/0xe0
>>>> [  370.679545]  schedule_timeout+0x114/0x160
>>>> [  370.682002]  __wait_for_common+0xc0/0x160
>>>> [  370.684274]  wait_for_completion+0x24/0x30
>>>> [  370.685471]  do_coredump+0x202/0x1150
>>>> [  370.690270]  get_signal+0x4c2/0x900
>>>> [  370.691305]  arch_do_signal_or_restart+0x106/0x7a0
>>>> [  370.693888]  exit_to_user_mode_prepare+0xfb/0x1d0
>>>> [  370.695241]  syscall_exit_to_user_mode+0x17/0x40
>>>> [  370.696572]  do_syscall_64+0x42/0x80
>>>> [  370.697620]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>>
>>>> It's 100% reproducable on one of my test machines, but only one of
>>>> them. That one machine is running fstests on pmem, so it has
>>>> synchronous storage. Every other test machine using normal async
>>>> storage (nvme, iscsi, etc) and none of them are hanging.
>>>>
>>>> A quick troll of the commit history between 5.14 and 5.15-rc2
>>>> indicates a couple of potential candidates. The 5th kernel build
>>>> (instead of ~16 for a bisect) told me that commit 15e20db2e0ce
>>>> ("io-wq: only exit on fatal signals") is the cause of the
>>>> regression. I've confirmed that this is the first commit where the
>>>> problem shows up.
>>>
>>> Thanks for the report Dave, I'll take a look. Can you elaborate on
>>> exactly what is being run? And when killed, it's a non-fatal signal?
> 
> It's whatever kill/killall sends by default.  Typical behaviour that
> causes a hang is something like:
> 
> $FSSTRESS_PROG -n10000000 -p $PROCS -d $load_dir >> $seqres.full 2>&1 &
> ....
> sleep 5
> _scratch_shutdown
> $KILLALL_PROG -q $FSSTRESS_PROG
> wait
> 
> _shutdown_scratch is typically just an 'xfs_io -rx -c "shutdown"
> /mnt/scratch' command that shuts down the filesystem. Other tests in
> the recoveryloop group use DM targets to fail IO that trigger a
> shutdown, others inject errors that trigger shutdowns, etc. But the
> result is that all hang waiting for fsstress processes that have
> been using io_uring to exit.
> 
> Just run fstests with "./check -g recoveryloop" - there's only a
> handful of tests and it only takes about 5 minutes to run them all
> on a fake DRAM based pmem device..

I made a trivial reproducer just to verify.

>> Can you try with this patch?
>>
>> diff --git a/fs/io-wq.c b/fs/io-wq.c
>> index b5fd015268d7..1e55a0a2a217 100644
>> --- a/fs/io-wq.c
>> +++ b/fs/io-wq.c
>> @@ -586,7 +586,8 @@ static int io_wqe_worker(void *data)
>>  
>>  			if (!get_signal(&ksig))
>>  				continue;
>> -			if (fatal_signal_pending(current))
>> +			if (fatal_signal_pending(current) ||
>> +			    signal_group_exit(current->signal)) {
>>  				break;
>>  			continue;
>>  		}
> 
> Cleaned up so it compiles and the tests run properly again. But
> playing whack-a-mole with signals seems kinda fragile. I was pointed
> to this patchset by another dev on #xfs overnight who saw the same
> hangs that also fixed the hang:

It seems sane to me - exit if there's a fatal signal, or doing core
dump. Don't think there should be other conditions.

> https://lore.kernel.org/lkml/cover.1629655338.git.olivier@trillion01.com/
> 
> It was posted about a month ago and I don't see any response to it
> on the lists...

That's been a long discussion, but it's a different topic really. Yes
it's signals, but it's not this particular issue. It'll happen to work
around this issue, as it cancels everything post core dumping.

-- 
Jens Axboe