All of lore.kernel.org
 help / color / mirror / Atom feed
* [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs
@ 2022-08-03 14:35 Thomas Deutschmann
  2022-08-11 12:34 ` Thomas Deutschmann
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Deutschmann @ 2022-08-03 14:35 UTC (permalink / raw)
  To: vverma, song; +Cc: stable, regressions

Hi,

while trying to backup a Dell R7525 system running Debian bookworm/testing
using
LVM snapshots I noticed that the system will 'freeze' sometimes (not all the
times) when creating the snapshot.

First I thought this was related to LVM so I created

https://listman.redhat.com/archives/linux-lvm/2022-July/026228.html
(continued at
https://listman.redhat.com/archives/linux-lvm/2022-August/thread.html#26229)

Long story short:

I was even able to reproduce with fsfreeze, see last strace lines

> [...]
> 14471 1659449870.984635 openat(AT_FDCWD, "/var/lib/machines", O_RDONLY) =
3
> 14471 1659449870.984658 newfstatat(3, "", {st_mode=S_IFDIR|0700,
st_size=4096, ...}, AT_EMPTY_PATH) = 0
> 14471 1659449870.984678 ioctl(3, FIFREEZE

so I started to bisect kernel and found the following bad commit:

> md: add support for REQ_NOWAIT
> 
> commit 021a24460dc2 ("block: add QUEUE_FLAG_NOWAIT") added support
> for checking whether a given bdev supports handling of REQ_NOWAIT or not.
> Since then commit 6abc49468eea ("dm: add support for REQ_NOWAIT and enable
> it for linear target") added support for REQ_NOWAIT for dm. This uses
> a similar approach to incorporate REQ_NOWAIT for md based bios.
> 
> This patch was tested using t/io_uring tool within FIO. A nvme drive
> was partitioned into 2 partitions and a simple raid 0 configuration
> /dev/md0 was created.
> 
> md0 : active raid0 nvme4n1p1[1] nvme4n1p2[0]
>       937423872 blocks super 1.2 512k chunks
> 
> Before patch:
> 
> $ ./t/io_uring /dev/md0 -p 0 -a 0 -d 1 -r 100
> 
> Running top while the above runs:
> 
> $ ps -eL | grep $(pidof io_uring)
> 
>   38396   38396 pts/2    00:00:00 io_uring
>   38396   38397 pts/2    00:00:15 io_uring
>   38396   38398 pts/2    00:00:13 iou-wrk-38397
> 
> We can see iou-wrk-38397 io worker thread created which gets created
> when io_uring sees that the underlying device (/dev/md0 in this case)
> doesn't support nowait.
> 
> After patch:
> 
> $ ./t/io_uring /dev/md0 -p 0 -a 0 -d 1 -r 100
> 
> Running top while the above runs:
> 
> $ ps -eL | grep $(pidof io_uring)
> 
>   38341   38341 pts/2    00:10:22 io_uring
>   38341   38342 pts/2    00:10:37 io_uring
> 
> After running this patch, we don't see any io worker thread
> being created which indicated that io_uring saw that the
> underlying device does support nowait. This is the exact behaviour
> noticed on a dm device which also supports nowait.
> 
> For all the other raid personalities except raid0, we would need
> to train pieces which involves make_request fn in order for them
> to correctly handle REQ_NOWAIT.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i
d=f51d46d0e7cb5b8494aa534d276a9d8915a2443d

After reverting this commit (and follow up commit
0f9650bd838efe5c52f7e5f40c3204ad59f1964d)
v5.18.15 and v5.19 worked for me again.

At this point I still wonder why I experienced the same problem even after I
removed one nvme device from the mdraid array and tested it separately. So
maybe
there is another nowait/REQ_NOWAIT problem somewhere. During bisect I only
tested
against the mdraid array.


#regzbot introduced: f51d46d0e7cb5b8494aa534d276a9d8915a2443d
#regzbot link:
https://listman.redhat.com/archives/linux-lvm/2022-July/026228.html
#regzbot link:
https://listman.redhat.com/archives/linux-lvm/2022-August/thread.html#26229


-- 
Regards,
Thomas



^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs
  2022-08-03 14:35 [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs Thomas Deutschmann
@ 2022-08-11 12:34 ` Thomas Deutschmann
  2022-08-15 10:58   ` Thorsten Leemhuis
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Deutschmann @ 2022-08-11 12:34 UTC (permalink / raw)
  To: vverma, song; +Cc: stable, regressions

Hi,

any news on this? Is there anything else you need from me or I can help
with?

Thanks.


-- 
Regards,
Thomas


-----Original Message-----
From: Thomas Deutschmann <whissi@whissi.de> 
Sent: Wednesday, August 3, 2022 4:35 PM
To: vverma@digitalocean.com; song@kernel.org
Cc: stable@vger.kernel.org; regressions@lists.linux.dev
Subject: [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs

Hi,

while trying to backup a Dell R7525 system running Debian bookworm/testing
using LVM snapshots I noticed that the system will 'freeze' sometimes (not
all the
times) when creating the snapshot.

First I thought this was related to LVM so I created

https://listman.redhat.com/archives/linux-lvm/2022-July/026228.html
(continued at
https://listman.redhat.com/archives/linux-lvm/2022-August/thread.html#26229)

Long story short:

I was even able to reproduce with fsfreeze, see last strace lines

> [...]
> 14471 1659449870.984635 openat(AT_FDCWD, "/var/lib/machines", O_RDONLY) =3
> 14471 1659449870.984658 newfstatat(3, "",
{st_mode=S_IFDIR|0700,st_size=4096, ...}, AT_EMPTY_PATH) = 0
> 14471 1659449870.984678 ioctl(3, FIFREEZE

so I started to bisect kernel and found the following bad commit:

> md: add support for REQ_NOWAIT
> 
> commit 021a24460dc2 ("block: add QUEUE_FLAG_NOWAIT") added support
> for checking whether a given bdev supports handling of REQ_NOWAIT or not.
> Since then commit 6abc49468eea ("dm: add support for REQ_NOWAIT and enable
> it for linear target") added support for REQ_NOWAIT for dm. This uses
> a similar approach to incorporate REQ_NOWAIT for md based bios.
> 
> This patch was tested using t/io_uring tool within FIO. A nvme drive
> was partitioned into 2 partitions and a simple raid 0 configuration
> /dev/md0 was created.
> 
> md0 : active raid0 nvme4n1p1[1] nvme4n1p2[0]
>       937423872 blocks super 1.2 512k chunks
> 
> Before patch:
> 
> $ ./t/io_uring /dev/md0 -p 0 -a 0 -d 1 -r 100
> 
> Running top while the above runs:
> 
> $ ps -eL | grep $(pidof io_uring)
> 
>   38396   38396 pts/2    00:00:00 io_uring
>   38396   38397 pts/2    00:00:15 io_uring
>   38396   38398 pts/2    00:00:13 iou-wrk-38397
> 
> We can see iou-wrk-38397 io worker thread created which gets created
> when io_uring sees that the underlying device (/dev/md0 in this case)
> doesn't support nowait.
> 
> After patch:
> 
> $ ./t/io_uring /dev/md0 -p 0 -a 0 -d 1 -r 100
> 
> Running top while the above runs:
> 
> $ ps -eL | grep $(pidof io_uring)
> 
>   38341   38341 pts/2    00:10:22 io_uring
>   38341   38342 pts/2    00:10:37 io_uring
> 
> After running this patch, we don't see any io worker thread
> being created which indicated that io_uring saw that the
> underlying device does support nowait. This is the exact behaviour
> noticed on a dm device which also supports nowait.
> 
> For all the other raid personalities except raid0, we would need
> to train pieces which involves make_request fn in order for them
> to correctly handle REQ_NOWAIT.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i
d=f51d46d0e7cb5b8494aa534d276a9d8915a2443d

After reverting this commit (and follow up commit
0f9650bd838efe5c52f7e5f40c3204ad59f1964d)
v5.18.15 and v5.19 worked for me again.

At this point I still wonder why I experienced the same problem even after I
removed one nvme device from the mdraid array and tested it separately. So
maybe there is another nowait/REQ_NOWAIT problem somewhere. During bisect
I only tested against the mdraid array.


#regzbot introduced: f51d46d0e7cb5b8494aa534d276a9d8915a2443d
#regzbot link:
https://listman.redhat.com/archives/linux-lvm/2022-July/026228.html
#regzbot link:
https://listman.redhat.com/archives/linux-lvm/2022-August/thread.html#26229


-- 
Regards,
Thomas





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs
  2022-08-11 12:34 ` Thomas Deutschmann
@ 2022-08-15 10:58   ` Thorsten Leemhuis
  2022-08-15 15:46     ` Vishal Verma
  2022-09-08 13:25     ` [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs #forregzbot Thorsten Leemhuis
  0 siblings, 2 replies; 22+ messages in thread
From: Thorsten Leemhuis @ 2022-08-15 10:58 UTC (permalink / raw)
  To: vverma, song; +Cc: stable, regressions, Thomas Deutschmann, Jens Axboe

Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

[CCing Jens, as the top-level maintainer who in this case also reviewed
the patch that causes this regression.]

Vishal, Song, what up here? Could you please look into this and at least
comment on the issue, as it's a regression that was reported more than
10 days ago already. Ideally at this point it would be good if the
regression was fixed already, as explained by "Prioritize work on fixing
regressions" here:
https://docs.kernel.org/process/handling-regressions.html#prioritize-work-on-fixing-regressions

Ciao, Thorsten

On 11.08.22 14:34, Thomas Deutschmann wrote:

> 
> Hi,
> 
> any news on this? Is there anything else you need from me or I can help
> with?
> 
> Thanks.
> 
> 
> -- Regards, Thomas -----Original Message----- From: Thomas Deutschmann
> <whissi@whissi.de> Sent: Wednesday, August 3, 2022 4:35 PM To:
> vverma@digitalocean.com; song@kernel.org Cc: stable@vger.kernel.org;
> regressions@lists.linux.dev Subject: [REGRESSION] v5.17-rc1+: FIFREEZE
> ioctl system call hangs Hi, while trying to backup a Dell R7525 system
> running Debian bookworm/testing using LVM snapshots I noticed that the
> system will 'freeze' sometimes (not all the times) when creating the
> snapshot. First I thought this was related to LVM so I created
> https://listman.redhat.com/archives/linux-lvm/2022-July/026228.html
> (continued at
> https://listman.redhat.com/archives/linux-lvm/2022-August/thread.html#26229) Long story short: I was even able to reproduce with fsfreeze, see last strace lines
>> [...]
>> 14471 1659449870.984635 openat(AT_FDCWD, "/var/lib/machines", O_RDONLY) =3
>> 14471 1659449870.984658 newfstatat(3, "",
> {st_mode=S_IFDIR|0700,st_size=4096, ...}, AT_EMPTY_PATH) = 0
>> 14471 1659449870.984678 ioctl(3, FIFREEZE
> so I started to bisect kernel and found the following bad commit:
> 
>> md: add support for REQ_NOWAIT
>>
>> commit 021a24460dc2 ("block: add QUEUE_FLAG_NOWAIT") added support
>> for checking whether a given bdev supports handling of REQ_NOWAIT or not.
>> Since then commit 6abc49468eea ("dm: add support for REQ_NOWAIT and enable
>> it for linear target") added support for REQ_NOWAIT for dm. This uses
>> a similar approach to incorporate REQ_NOWAIT for md based bios.
>>
>> This patch was tested using t/io_uring tool within FIO. A nvme drive
>> was partitioned into 2 partitions and a simple raid 0 configuration
>> /dev/md0 was created.
>>
>> md0 : active raid0 nvme4n1p1[1] nvme4n1p2[0]
>>       937423872 blocks super 1.2 512k chunks
>>
>> Before patch:
>>
>> $ ./t/io_uring /dev/md0 -p 0 -a 0 -d 1 -r 100
>>
>> Running top while the above runs:
>>
>> $ ps -eL | grep $(pidof io_uring)
>>
>>   38396   38396 pts/2    00:00:00 io_uring
>>   38396   38397 pts/2    00:00:15 io_uring
>>   38396   38398 pts/2    00:00:13 iou-wrk-38397
>>
>> We can see iou-wrk-38397 io worker thread created which gets created
>> when io_uring sees that the underlying device (/dev/md0 in this case)
>> doesn't support nowait.
>>
>> After patch:
>>
>> $ ./t/io_uring /dev/md0 -p 0 -a 0 -d 1 -r 100
>>
>> Running top while the above runs:
>>
>> $ ps -eL | grep $(pidof io_uring)
>>
>>   38341   38341 pts/2    00:10:22 io_uring
>>   38341   38342 pts/2    00:10:37 io_uring
>>
>> After running this patch, we don't see any io worker thread
>> being created which indicated that io_uring saw that the
>> underlying device does support nowait. This is the exact behaviour
>> noticed on a dm device which also supports nowait.
>>
>> For all the other raid personalities except raid0, we would need
>> to train pieces which involves make_request fn in order for them
>> to correctly handle REQ_NOWAIT.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i
> d=f51d46d0e7cb5b8494aa534d276a9d8915a2443d
> 
> After reverting this commit (and follow up commit
> 0f9650bd838efe5c52f7e5f40c3204ad59f1964d)
> v5.18.15 and v5.19 worked for me again.
> 
> At this point I still wonder why I experienced the same problem even after I
> removed one nvme device from the mdraid array and tested it separately. So
> maybe there is another nowait/REQ_NOWAIT problem somewhere. During bisect
> I only tested against the mdraid array.
> 
> 
> #regzbot introduced: f51d46d0e7cb5b8494aa534d276a9d8915a2443d
> #regzbot link:
> https://listman.redhat.com/archives/linux-lvm/2022-July/026228.html
> #regzbot link:
> https://listman.redhat.com/archives/linux-lvm/2022-August/thread.html#26229
> 
> 
> -- Regards, Thomas
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs
  2022-08-15 10:58   ` Thorsten Leemhuis
@ 2022-08-15 15:46     ` Vishal Verma
  2022-08-17  6:19       ` Song Liu
  2022-09-08 13:25     ` [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs #forregzbot Thorsten Leemhuis
  1 sibling, 1 reply; 22+ messages in thread
From: Vishal Verma @ 2022-08-15 15:46 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Song Liu, stable, regressions, Thomas Deutschmann, Jens Axboe

Just saw this. I’m trying to understand whether this happens only on md array or individual nvme drives (without any raid) too?
The commit you pointed added REQ_NOWAIT for md based arrays, but if it is happening on individual nvme drives then that could point to something with REQ_NOWAIT I think.

> On Aug 15, 2022, at 3:58 AM, Thorsten Leemhuis <regressions@leemhuis.info> wrote:
> 
> Hi, this is your Linux kernel regression tracker. Top-posting for once,
> to make this easily accessible to everyone.
> 
> [CCing Jens, as the top-level maintainer who in this case also reviewed
> the patch that causes this regression.]
> 
> Vishal, Song, what up here? Could you please look into this and at least
> comment on the issue, as it's a regression that was reported more than
> 10 days ago already. Ideally at this point it would be good if the
> regression was fixed already, as explained by "Prioritize work on fixing
> regressions" here:
> https://docs.kernel.org/process/handling-regressions.html#prioritize-work-on-fixing-regressions
> 
> Ciao, Thorsten
> 
> On 11.08.22 14:34, Thomas Deutschmann wrote:
> 
>> 
>> Hi,
>> 
>> any news on this? Is there anything else you need from me or I can help
>> with?
>> 
>> Thanks.
>> 
>> 
>> -- Regards, Thomas -----Original Message----- From: Thomas Deutschmann
>> <whissi@whissi.de> Sent: Wednesday, August 3, 2022 4:35 PM To:
>> vverma@digitalocean.com; song@kernel.org Cc: stable@vger.kernel.org;
>> regressions@lists.linux.dev Subject: [REGRESSION] v5.17-rc1+: FIFREEZE
>> ioctl system call hangs Hi, while trying to backup a Dell R7525 system
>> running Debian bookworm/testing using LVM snapshots I noticed that the
>> system will 'freeze' sometimes (not all the times) when creating the
>> snapshot. First I thought this was related to LVM so I created
>> https://listman.redhat.com/archives/linux-lvm/2022-July/026228.html
>> (continued at
>> https://listman.redhat.com/archives/linux-lvm/2022-August/thread.html#26229) Long story short: I was even able to reproduce with fsfreeze, see last strace lines
>>> [...]
>>> 14471 1659449870.984635 openat(AT_FDCWD, "/var/lib/machines", O_RDONLY) =3
>>> 14471 1659449870.984658 newfstatat(3, "",
>> {st_mode=S_IFDIR|0700,st_size=4096, ...}, AT_EMPTY_PATH) = 0
>>> 14471 1659449870.984678 ioctl(3, FIFREEZE
>> so I started to bisect kernel and found the following bad commit:
>> 
>>> md: add support for REQ_NOWAIT
>>> 
>>> commit 021a24460dc2 ("block: add QUEUE_FLAG_NOWAIT") added support
>>> for checking whether a given bdev supports handling of REQ_NOWAIT or not.
>>> Since then commit 6abc49468eea ("dm: add support for REQ_NOWAIT and enable
>>> it for linear target") added support for REQ_NOWAIT for dm. This uses
>>> a similar approach to incorporate REQ_NOWAIT for md based bios.
>>> 
>>> This patch was tested using t/io_uring tool within FIO. A nvme drive
>>> was partitioned into 2 partitions and a simple raid 0 configuration
>>> /dev/md0 was created.
>>> 
>>> md0 : active raid0 nvme4n1p1[1] nvme4n1p2[0]
>>>      937423872 blocks super 1.2 512k chunks
>>> 
>>> Before patch:
>>> 
>>> $ ./t/io_uring /dev/md0 -p 0 -a 0 -d 1 -r 100
>>> 
>>> Running top while the above runs:
>>> 
>>> $ ps -eL | grep $(pidof io_uring)
>>> 
>>>  38396   38396 pts/2    00:00:00 io_uring
>>>  38396   38397 pts/2    00:00:15 io_uring
>>>  38396   38398 pts/2    00:00:13 iou-wrk-38397
>>> 
>>> We can see iou-wrk-38397 io worker thread created which gets created
>>> when io_uring sees that the underlying device (/dev/md0 in this case)
>>> doesn't support nowait.
>>> 
>>> After patch:
>>> 
>>> $ ./t/io_uring /dev/md0 -p 0 -a 0 -d 1 -r 100
>>> 
>>> Running top while the above runs:
>>> 
>>> $ ps -eL | grep $(pidof io_uring)
>>> 
>>>  38341   38341 pts/2    00:10:22 io_uring
>>>  38341   38342 pts/2    00:10:37 io_uring
>>> 
>>> After running this patch, we don't see any io worker thread
>>> being created which indicated that io_uring saw that the
>>> underlying device does support nowait. This is the exact behaviour
>>> noticed on a dm device which also supports nowait.
>>> 
>>> For all the other raid personalities except raid0, we would need
>>> to train pieces which involves make_request fn in order for them
>>> to correctly handle REQ_NOWAIT.
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i
>> d=f51d46d0e7cb5b8494aa534d276a9d8915a2443d
>> 
>> After reverting this commit (and follow up commit
>> 0f9650bd838efe5c52f7e5f40c3204ad59f1964d)
>> v5.18.15 and v5.19 worked for me again.
>> 
>> At this point I still wonder why I experienced the same problem even after I
>> removed one nvme device from the mdraid array and tested it separately. So
>> maybe there is another nowait/REQ_NOWAIT problem somewhere. During bisect
>> I only tested against the mdraid array.
>> 
>> 
>> #regzbot introduced: f51d46d0e7cb5b8494aa534d276a9d8915a2443d
>> #regzbot link:
>> https://listman.redhat.com/archives/linux-lvm/2022-July/026228.html
>> #regzbot link:
>> https://listman.redhat.com/archives/linux-lvm/2022-August/thread.html#26229
>> 
>> 
>> -- Regards, Thomas
>> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs
  2022-08-15 15:46     ` Vishal Verma
@ 2022-08-17  6:19       ` Song Liu
  2022-08-17  6:53         ` Thomas Deutschmann
  0 siblings, 1 reply; 22+ messages in thread
From: Song Liu @ 2022-08-17  6:19 UTC (permalink / raw)
  To: Vishal Verma
  Cc: Thorsten Leemhuis, stable, regressions, Thomas Deutschmann, Jens Axboe

On Mon, Aug 15, 2022 at 8:46 AM Vishal Verma <vverma@digitalocean.com> wrote:
>
> Just saw this. I’m trying to understand whether this happens only on md array or individual nvme drives (without any raid) too?
> The commit you pointed added REQ_NOWAIT for md based arrays, but if it is happening on individual nvme drives then that could point to something with REQ_NOWAIT I think.

Agreed with this analysis.

>
> > On Aug 15, 2022, at 3:58 AM, Thorsten Leemhuis <regressions@leemhuis.info> wrote:
> >
> > Hi, this is your Linux kernel regression tracker. Top-posting for once,
> > to make this easily accessible to everyone.
> >
> > [CCing Jens, as the top-level maintainer who in this case also reviewed
> > the patch that causes this regression.]
> >
> > Vishal, Song, what up here? Could you please look into this and at least
> > comment on the issue, as it's a regression that was reported more than
> > 10 days ago already. Ideally at this point it would be good if the
> > regression was fixed already, as explained by "Prioritize work on fixing
> > regressions" here:
> > https://docs.kernel.org/process/handling-regressions.html#prioritize-work-on-fixing-regressions

I am sorry for the delay.

[...]

> >>
> >> Hi,
> >>
> >> any news on this? Is there anything else you need from me or I can help
> >> with?
> >>
> >> Thanks.
> >>
> >>
> >> -- Regards, Thomas -----Original Message----- From: Thomas Deutschmann
> >> <whissi@whissi.de> Sent: Wednesday, August 3, 2022 4:35 PM To:
> >> vverma@digitalocean.com; song@kernel.org Cc: stable@vger.kernel.org;
> >> regressions@lists.linux.dev Subject: [REGRESSION] v5.17-rc1+: FIFREEZE
> >> ioctl system call hangs Hi, while trying to backup a Dell R7525 system
> >> running Debian bookworm/testing using LVM snapshots I noticed that the
> >> system will 'freeze' sometimes (not all the times) when creating the
> >> snapshot. First I thought this was related to LVM so I created
> >> https://listman.redhat.com/archives/linux-lvm/2022-July/026228.html
> >> (continued at
> >> https://listman.redhat.com/archives/linux-lvm/2022-August/thread.html#26229) Long story short: I was even able to reproduce with fsfreeze, see last strace lines
> >>> [...]
> >>> 14471 1659449870.984635 openat(AT_FDCWD, "/var/lib/machines", O_RDONLY) =3
> >>> 14471 1659449870.984658 newfstatat(3, "",
> >> {st_mode=S_IFDIR|0700,st_size=4096, ...}, AT_EMPTY_PATH) = 0
> >>> 14471 1659449870.984678 ioctl(3, FIFREEZE
> >> so I started to bisect kernel and found the following bad commit:

I am not able to reproduce this on 5.19+ kernel. I have:

[root@eth50-1 ~]# lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sr0      11:0    1 1024M  0 rom
vda     253:0    0   32G  0 disk
├─vda1  253:1    0    2G  0 part  /boot
└─vda2  253:2    0   30G  0 part  /
nvme0n1 259:0    0    4G  0 disk
└─md0     9:0    0   12G  0 raid5 /root/mnt
nvme2n1 259:1    0    4G  0 disk
└─md0     9:0    0   12G  0 raid5 /root/mnt
nvme3n1 259:2    0    4G  0 disk
└─md0     9:0    0   12G  0 raid5 /root/mnt
nvme1n1 259:3    0    4G  0 disk
└─md0     9:0    0   12G  0 raid5 /root/mnt
[root@eth50-1 ~]# for x in {1..100} ; do fsfreeze --unfreeze /root/mnt
; fsfreeze --freeze /root/mnt ; done

Did I miss something?

Thanks,
Song

[...]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs
  2022-08-17  6:19       ` Song Liu
@ 2022-08-17  6:53         ` Thomas Deutschmann
  2022-08-17 18:29           ` Thomas Deutschmann
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Deutschmann @ 2022-08-17  6:53 UTC (permalink / raw)
  To: Song Liu, Vishal Verma; +Cc: Thorsten Leemhuis, stable, regressions, Jens Axboe

Hi,

On 2022-08-17 08:19, Song Liu wrote:
> On Mon, Aug 15, 2022 at 8:46 AM Vishal Verma
> <vverma@digitalocean.com> wrote:
>> 
>> Just saw this. I’m trying to understand whether this happens only
>> on md array or individual nvme drives (without any raid) too? The
>> commit you pointed added REQ_NOWAIT for md based arrays, but if it
>> is happening on individual nvme drives then that could point to
>> something with REQ_NOWAIT I think.
> 
> Agreed with this analysis.

I bisected again, this time I tested against the single nvme device.

I did it 2 times, and always ended up with

 > git bisect start
 > # good: [8bb7eca972ad531c9b149c0a51ab43a417385813] Linux 5.15
 > git bisect good 8bb7eca972ad531c9b149c0a51ab43a417385813
 > # bad: [df0cc57e057f18e44dac8e6c18aba47ab53202f9] Linux 5.16
 > git bisect bad df0cc57e057f18e44dac8e6c18aba47ab53202f9
 > # good: [2219b0ceefe835b92a8a74a73fe964aa052742a2] Merge tag 
'soc-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
 > git bisect good 2219b0ceefe835b92a8a74a73fe964aa052742a2
 > # good: [206825f50f908771934e1fba2bfc2e1f1138b36a] Merge tag 
'mtd/for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
 > git bisect good 206825f50f908771934e1fba2bfc2e1f1138b36a
 > # bad: [4e1fddc98d2585ddd4792b5e44433dcee7ece001] tcp_cubic: fix 
spurious Hystart ACK train detections for not-cwnd-limited flows
 > git bisect bad 4e1fddc98d2585ddd4792b5e44433dcee7ece001
 > # good: [dbf49896187fd58c577fa1574a338e4f3672b4b2] Merge branch 
'akpm' (patches from Andrew)
 > git bisect good dbf49896187fd58c577fa1574a338e4f3672b4b2
 > # good: [0ecca62beb12eeb13965ed602905c8bf53ac93d0] Merge tag 
'ceph-for-5.16-rc1' of git://github.com/ceph/ceph-client
 > git bisect good 0ecca62beb12eeb13965ed602905c8bf53ac93d0
 > # bad: [7d5775d49e4a488bc8a07e5abb2b71a4c28aadbb] Merge tag 
'printk-for-5.16-fixup' of 
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
 > git bisect bad 7d5775d49e4a488bc8a07e5abb2b71a4c28aadbb
 > # good: [35c8fad4a703fdfa009ed274f80bb64b49314cde] Merge tag 
'perf-tools-for-v5.16-2021-11-13' of 
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
 > git bisect good 35c8fad4a703fdfa009ed274f80bb64b49314cde
 > # good: [6ea45c57dc176dde529ab5d7c4b3f20e52a2bd82] Merge tag 
'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
 > git bisect good 6ea45c57dc176dde529ab5d7c4b3f20e52a2bd82
 > # bad: [fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf] Linux 5.16-rc1
 > git bisect bad fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf
 > # good: [475c3f599582a34e189f047ed3fb7e90a295ea5b] sh: fix READ/WRITE 
redefinition warnings
 > git bisect good 475c3f599582a34e189f047ed3fb7e90a295ea5b
 > # good: [c3b68c27f58a07130382f3fa6320c3652ad76f15] Merge tag 
'for-5.16/parisc-3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
 > git bisect good c3b68c27f58a07130382f3fa6320c3652ad76f15
 > # good: [4a6b35b3b3f28df81fea931dc77c4c229cbdb5b2] xfs: sync 
xfs_btree_split macros with userspace libxfs
 > git bisect good 4a6b35b3b3f28df81fea931dc77c4c229cbdb5b2
 > # good: [dee2b702bcf067d7b6b62c18bdd060ff0810a800] kconfig: Add 
support for -Wimplicit-fallthrough
 > git bisect good dee2b702bcf067d7b6b62c18bdd060ff0810a800
 > # first bad commit: [fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf] Linux 
5.16-rc1

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf

...but this doesn't make any sense, right?

However, I cannot reproduce with the commit before, i.e. dee2b702bcf0 
didn't freeze during my 10 test runs.
But with fa55b7dcdc (or any later commit), system will freeze on _every_ 
test run?!

I checked out 1bd297988b75 which never failed before, changed Makefile 
to PATCHLEVEL=16 and EXTRAVERSION=-rc1 and guess what: It's now failing, 
too.

So this sounds like some code changes behavior when KV is >=5.16-rc1. Is 
that possible?

Anyway, I started to test v5.10 (with PATCHLEVEL=16 and 
EXTRAVERSION=-rc1 set) which worked so I started another bisect session 
where I named all KV to 5.16-rc1.

I'll post my finding when this session is completed.


> I am not able to reproduce this on 5.19+ kernel. I have:
> 
> [root@eth50-1 ~]# lsblk NAME    MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT 
> sr0      11:0    1 1024M  0 rom vda     253:0    0   32G  0 disk 
> ├─vda1  253:1    0    2G  0 part  /boot └─vda2  253:2    0   30G  0
> part  / nvme0n1 259:0    0    4G  0 disk └─md0     9:0    0   12G  0
> raid5 /root/mnt nvme2n1 259:1    0    4G  0 disk └─md0     9:0    0
> 12G  0 raid5 /root/mnt nvme3n1 259:2    0    4G  0 disk └─md0     9:0
> 0   12G  0 raid5 /root/mnt nvme1n1 259:3    0    4G  0 disk └─md0
> 9:0    0   12G  0 raid5 /root/mnt [root@eth50-1 ~]# for x in {1..100}
> ; do fsfreeze --unfreeze /root/mnt ; fsfreeze --freeze /root/mnt ;
> done
> 
> Did I miss something?

Well, your reproducer doesn't work. Like written in my initial mail, 
executing `fsfreeze --freeze...` directly after boot doesn't even fail 
for me. The device/array must have seen some I/O to trigger this.

To be more precise:

During my current bisect session (where I set KV to 5.16-rc1 for all 
kernels), I noticed that my 'reproducer' failed:

To trigger the problem, it is not enough to create random I/O by copying 
some files for example.

I am using mysqld (MariaDB 10.6.8) and restore ~20GB of SQL dumps -- 
somehow this is triggering the problem in a reliable way. The mysqld is 
using O_DIRECT 
(https://mariadb.com/kb/en/innodb-system-variables/#innodb_flush_method) 
-- maybe Direct I/O is the trigger.

This process usually takes ~620s on my test system where I am 
experiencing the problem. After import I called `fsfreeze --freeze ...` 
against the mount point used by mysqld.
When this command did not return (=fsfreeze was hanging), I marked 
revision as bad.

Since setting KV in all kernels to "5.16-rc1" I noticed that the import 
process sometimes "freezed" -- mysqld was still running and responsive 
(that's not the case when fsfreeze hangs for example) and `SHOW 
PROCESSLIST` showed the running imports with still increasing time 
counter. However, no data are read and written anymore. Although 
fsfreeze command works when this happens. Anyway, I marked revisions 
showing this behavior as bad, too.

I'll post my results when I finished this bisect session.


-- 
Regards,
Thomas


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs
  2022-08-17  6:53         ` Thomas Deutschmann
@ 2022-08-17 18:29           ` Thomas Deutschmann
  2022-08-19  2:46             ` Thomas Deutschmann
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Deutschmann @ 2022-08-17 18:29 UTC (permalink / raw)
  To: Song Liu, Vishal Verma; +Cc: Thorsten Leemhuis, stable, regressions, Jens Axboe

On 2022-08-17 08:53, Thomas Deutschmann wrote:
> I'll post my results when I finished this bisect session.

I bisected kernel with KV set to "5.16-rc1":

> git bisect start
> # good: [2c85ebc57b3e1817b6ce1a6b703928e113a90442] Linux 5.10
> git bisect good 2c85ebc57b3e1817b6ce1a6b703928e113a90442
> # bad: [8bb7eca972ad531c9b149c0a51ab43a417385813] Linux 5.15
> git bisect bad 8bb7eca972ad531c9b149c0a51ab43a417385813
> # bad: [6bdf2fbc48f104a84606f6165aa8a20d9a7d9074] Merge tag 'nvme-5.13-2021-05-13' of git://git.infradead.org/nvme into block-5.13
> git bisect bad 6bdf2fbc48f104a84606f6165aa8a20d9a7d9074
> # good: [02f9fc286e039d0bef7284fb1200ee755b525bde] Merge tag 'pm-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
> git bisect good 02f9fc286e039d0bef7284fb1200ee755b525bde
> # bad: [f351f4b63dac127079bbd77da64b2a61c09d522d] usb: xhci-mtk: fix oops when unbind driver
> git bisect bad f351f4b63dac127079bbd77da64b2a61c09d522d
> # good: [28b9aaac4cc5a11485b6f70656e4e9ead590cf5b] Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
> git bisect good 28b9aaac4cc5a11485b6f70656e4e9ead590cf5b
> # good: [cf64c2a905e0dabcc473ca70baf275fb3a61fac4] Merge branch 'work.sparc32' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
> git bisect good cf64c2a905e0dabcc473ca70baf275fb3a61fac4
> # bad: [ea6be461cbedefaa881711a43f2842aabbd12fd4] Merge tag 'acpi-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
> git bisect bad ea6be461cbedefaa881711a43f2842aabbd12fd4
> # good: [1c9077cdecd027714736e70704da432ee2b946bb] Merge tag 'nfs-for-5.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
> git bisect good 1c9077cdecd027714736e70704da432ee2b946bb
> # good: [efba6d3a7c4bb59f0750609fae0f9644d82304b6] Merge tag 'for-5.12/io_uring-2021-02-25' of git://git.kernel.dk/linux-block
> git bisect good efba6d3a7c4bb59f0750609fae0f9644d82304b6
> # bad: [0b311e34d5033fdcca4c9b5f2d9165b3604704d3] Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
> git bisect bad 0b311e34d5033fdcca4c9b5f2d9165b3604704d3
> # good: [5ceabb6078b80a8544ba86d6ee523ad755ae6d5e] Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
> git bisect good 5ceabb6078b80a8544ba86d6ee523ad755ae6d5e
> # bad: [3ab6608e66b16159c3a3c2d7015b9c11cd3396c1] Merge tag 'block-5.12-2021-02-27' of git://git.kernel.dk/linux-block
> git bisect bad 3ab6608e66b16159c3a3c2d7015b9c11cd3396c1
> # bad: [e941894eae31b52f0fd9bdb3ce20620afa152f45] io-wq: make buffered file write hashed work map per-ctx
> git bisect bad e941894eae31b52f0fd9bdb3ce20620afa152f45
> # good: [4379bf8bd70b5de6bba7d53015b0c36c57a634ee] io_uring: remove io_identity
> git bisect good 4379bf8bd70b5de6bba7d53015b0c36c57a634ee
> # good: [1c0aa1fae1acb77c5f9917adb0e4cb4500b9f3a6] io_uring: flag new native workers with IORING_FEAT_NATIVE_WORKERS
> git bisect good 1c0aa1fae1acb77c5f9917adb0e4cb4500b9f3a6
> # good: [0100e6bbdbb79404e56939313662b42737026574] arch: ensure parisc/powerpc handle PF_IO_WORKER in copy_thread()
> git bisect good 0100e6bbdbb79404e56939313662b42737026574
> # good: [8b3e78b5955abb98863832453f5c74eca8f53c3a] io-wq: fix races around manager/worker creation and task exit
> git bisect good 8b3e78b5955abb98863832453f5c74eca8f53c3a
> # good: [eb2de9418d56b5e6ebf27bad51dbce3e22ee109b] io-wq: fix race around io_worker grabbing
> git bisect good eb2de9418d56b5e6ebf27bad51dbce3e22ee109b
> # first bad commit: [e941894eae31b52f0fd9bdb3ce20620afa152f45] io-wq: make buffered file write hashed work map per-ctx
> 
> From e941894eae31b52f0fd9bdb3ce20620afa152f45
> From: Jens Axboe
> Date: Fri, 19 Feb 2021 12:33:30 -0700
> Subject: io-wq: make buffered file write hashed work map per-ctx
> 
> Before the io-wq thread change, we maintained a hash work map and lock
> per-node per-ring. That wasn't ideal, as we really wanted it to be per
> ring. But now that we have per-task workers, the hash map ends up being
> just per-task. That'll work just fine for the normal case of having
> one task use a ring, but if you share the ring between tasks, then it's
> considerably worse than it was before.
> 
> Make the hash map per ctx instead, which provides full per-ctx buffered
> write serialization on hashed writes.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e941894eae31b52f0fd9bdb3ce20620afa152f45

But I think this result is misleading.

Like mentioned, the problem I experienced during this bisect session was 
different (not the FIFREEZE ioctl hang). This sounds more like the 
already fixed regressions caused by the commit above, i.e.

- 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0242f6426ea78fbe3933b44f8c55ae93ec37f6cc

- 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=d3e3c102d107bb84251455a298cf475f24bab995


I will do another round with 2b7196a219bf (good) <-> 5.18 (bad).


-- 
Regards,
Thomas


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs
  2022-08-17 18:29           ` Thomas Deutschmann
@ 2022-08-19  2:46             ` Thomas Deutschmann
  2022-08-20  1:04               ` Song Liu
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Deutschmann @ 2022-08-19  2:46 UTC (permalink / raw)
  To: Song Liu, Vishal Verma; +Cc: Thorsten Leemhuis, stable, regressions, Jens Axboe

On 2022-08-17 20:29, Thomas Deutschmann wrote:
> I will do another round with 2b7196a219bf (good) <-> 5.18 (bad).

...and this one also ended up in

> first bad commit: [fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf] Linux 5.16-rc1 

Now I built vanilla 5.18.18 and fsfreeze will hang after FIFREEZE ioctl 
system call after running my reproducer which generated I/O load.

=> So looks like bug is still present, right?

When I now just edit Makefile and set KV <5.16-rc1, i.e.

> diff --git a/Makefile b/Makefile
> index 23162e2bdf14..0f344944d828 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1,7 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0
>  VERSION = 5
> -PATCHLEVEL = 18
> -SUBLEVEL = 18
> +PATCHLEVEL = 15
> +SUBLEVEL = 0
>  EXTRAVERSION =
>  NAME = Superb Owl
> 

then I can no longer reproduce the problem.

Of course,

> diff --git a/Makefile b/Makefile
> index 23162e2bdf14..0f344944d828 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1,7 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0
>  VERSION = 5
> -PATCHLEVEL = 18
> -SUBLEVEL = 18
> +PATCHLEVEL = 15
> +SUBLEVEL = 99
>  EXTRAVERSION =
>  NAME = Superb Owl
> 

will freeze again.

For me it looks like kernel is taking a different code path depending on 
KV but I don't know how to proceed. Any idea how to continue debugging this?


-- 
Regards,
Thomas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs
  2022-08-19  2:46             ` Thomas Deutschmann
@ 2022-08-20  1:04               ` Song Liu
  2022-08-22 15:29                 ` Thomas Deutschmann
  0 siblings, 1 reply; 22+ messages in thread
From: Song Liu @ 2022-08-20  1:04 UTC (permalink / raw)
  To: Thomas Deutschmann
  Cc: Vishal Verma, Thorsten Leemhuis, stable, regressions, Jens Axboe

On Thu, Aug 18, 2022 at 7:46 PM Thomas Deutschmann <whissi@whissi.de> wrote:
>
> On 2022-08-17 20:29, Thomas Deutschmann wrote:
> > I will do another round with 2b7196a219bf (good) <-> 5.18 (bad).
>
> ...and this one also ended up in
>
> > first bad commit: [fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf] Linux 5.16-rc1
>
> Now I built vanilla 5.18.18 and fsfreeze will hang after FIFREEZE ioctl
> system call after running my reproducer which generated I/O load.
>
> => So looks like bug is still present, right?
>
> When I now just edit Makefile and set KV <5.16-rc1, i.e.
>
> > diff --git a/Makefile b/Makefile
> > index 23162e2bdf14..0f344944d828 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -1,7 +1,7 @@
> >  # SPDX-License-Identifier: GPL-2.0
> >  VERSION = 5
> > -PATCHLEVEL = 18
> > -SUBLEVEL = 18
> > +PATCHLEVEL = 15
> > +SUBLEVEL = 0
> >  EXTRAVERSION =
> >  NAME = Superb Owl
> >
>
> then I can no longer reproduce the problem.
>
> Of course,
>
> > diff --git a/Makefile b/Makefile
> > index 23162e2bdf14..0f344944d828 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -1,7 +1,7 @@
> >  # SPDX-License-Identifier: GPL-2.0
> >  VERSION = 5
> > -PATCHLEVEL = 18
> > -SUBLEVEL = 18
> > +PATCHLEVEL = 15
> > +SUBLEVEL = 99
> >  EXTRAVERSION =
> >  NAME = Superb Owl
> >
>
> will freeze again.
>
> For me it looks like kernel is taking a different code path depending on
> KV but I don't know how to proceed. Any idea how to continue debugging this?

Hmm.. does the user space use different logic based on the kernel version?

I still cannot reproduce the issue. Have you tried to reproduce the
issue without
mysqld? Something with fio will be great.

Thanks,
Song

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs
  2022-08-20  1:04               ` Song Liu
@ 2022-08-22 15:29                 ` Thomas Deutschmann
  2022-08-22 16:30                   ` Thomas Deutschmann
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Deutschmann @ 2022-08-22 15:29 UTC (permalink / raw)
  To: Song Liu; +Cc: Vishal Verma, Thorsten Leemhuis, stable, regressions, Jens Axboe

On 2022-08-20 03:04, Song Liu wrote:
> Hmm.. does the user space use different logic based on the kernel version?
> 
> I still cannot reproduce the issue. Have you tried to reproduce the
> issue without mysqld? Something with fio will be great.

No, I spent last day trying various fio options but I was unable to 
reproduce the problem yet.

I managed to reduce the required mysql I/O -- I can now reproduce after 
importing ~150MB SQL dump instead of 20GB.

It's also interesting: Just hard killing mysqld which will cause 
recovery on next start is already enough to trigger the problem.

I filed ticket with MariaDB to get some input from them, maybe they have 
an idea for another reproducer: https://jira.mariadb.org/browse/MDEV-29349


-- 
Regards,
Thomas


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs
  2022-08-22 15:29                 ` Thomas Deutschmann
@ 2022-08-22 16:30                   ` Thomas Deutschmann
  2022-08-22 21:52                     ` Song Liu
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Deutschmann @ 2022-08-22 16:30 UTC (permalink / raw)
  To: Song Liu; +Cc: Vishal Verma, Thorsten Leemhuis, stable, regressions, Jens Axboe

Hi,

I can now reproduce using fio:

I looked around in MariaDB issue tracker and found 
https://jira.mariadb.org/browse/MDEV-26674 which lead me to 
https://github.com/MariaDB/server/commit/de7db5517de11a58d57d2a41d0bc6f38b6f92dd8 
-- it's a conditional based on $KV and I hit that kernel regression 
during one of my bisect attempts (see 
https://lore.kernel.org/all/701f3fc0-2f0c-a32c-0d41-b489a9a59b99@whissi.de/).

Setting innodb_use_native_aio=OFF will prevent the problem.

This helped me to find https://github.com/axboe/fio/issues/1195 so I now 
have a working reproducer for fio.

   $ cat reproducer.fio
   [global]
   direct=1
   thread=1
   norandommap=1
   group_reporting=1
   time_based=1
   ioengine=io_uring

   rw=randwrite
   bs=4096
   runtime=20
   numjobs=1
   fixedbufs=1
   hipri=1
   registerfiles=1
   sqthread_poll=1


   [filename0]
   directory=/srv/machines/fio
   size=200M
   iodepth=1
   cpus_allowed=20


...now call fio like "fio reproducer.fio". After one successful fio run, 
fsfreeze will already hang for me.


-- 
Regards,
Thomas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs
  2022-08-22 16:30                   ` Thomas Deutschmann
@ 2022-08-22 21:52                     ` Song Liu
  2022-08-22 22:44                       ` Thomas Deutschmann
  0 siblings, 1 reply; 22+ messages in thread
From: Song Liu @ 2022-08-22 21:52 UTC (permalink / raw)
  To: Thomas Deutschmann
  Cc: Vishal Verma, Thorsten Leemhuis, stable, regressions, Jens Axboe

On Mon, Aug 22, 2022 at 9:30 AM Thomas Deutschmann <whissi@whissi.de> wrote:
>
> Hi,
>
> I can now reproduce using fio:
>
> I looked around in MariaDB issue tracker and found
> https://jira.mariadb.org/browse/MDEV-26674 which lead me to
> https://github.com/MariaDB/server/commit/de7db5517de11a58d57d2a41d0bc6f38b6f92dd8
> -- it's a conditional based on $KV and I hit that kernel regression
> during one of my bisect attempts (see
> https://lore.kernel.org/all/701f3fc0-2f0c-a32c-0d41-b489a9a59b99@whissi.de/).
>
> Setting innodb_use_native_aio=OFF will prevent the problem.
>
> This helped me to find https://github.com/axboe/fio/issues/1195 so I now
> have a working reproducer for fio.
>
>    $ cat reproducer.fio
>    [global]
>    direct=1
>    thread=1
>    norandommap=1
>    group_reporting=1
>    time_based=1
>    ioengine=io_uring
>
>    rw=randwrite
>    bs=4096
>    runtime=20
>    numjobs=1
>    fixedbufs=1
>    hipri=1
>    registerfiles=1
>    sqthread_poll=1
>
>
>    [filename0]
>    directory=/srv/machines/fio
>    size=200M
>    iodepth=1
>    cpus_allowed=20
>
>
> ...now call fio like "fio reproducer.fio". After one successful fio run,
> fsfreeze will already hang for me.

Hmm.. I still cannot repro the hang in my test. I have:

[root@eth50-1 ~]# mount | grep mnt
/dev/md0 on /root/mnt type ext4 (rw,relatime,stripe=384)
[root@eth50-1 ~]# lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sr0      11:0    1 1024M  0 rom
vda     253:0    0   32G  0 disk
├─vda1  253:1    0    2G  0 part  /boot
└─vda2  253:2    0   30G  0 part  /
nvme0n1 259:0    0    4G  0 disk
└─md0     9:0    0   12G  0 raid5 /root/mnt
nvme2n1 259:1    0    4G  0 disk
└─md0     9:0    0   12G  0 raid5 /root/mnt
nvme3n1 259:2    0    4G  0 disk
└─md0     9:0    0   12G  0 raid5 /root/mnt
nvme1n1 259:3    0    4G  0 disk
└─md0     9:0    0   12G  0 raid5 /root/mnt

[root@eth50-1 ~]# history
  381  fio iou/repro.fio
  382  fsfreeze --freeze /root/mnt
  383  fsfreeze --unfreeze /root/mnt
  384  fio iou/repro.fio
  385  fsfreeze --freeze /root/mnt
  386  fsfreeze --unfreeze /root/mnt
^^^^^^^^^^^^^^ all works fine.

Did I miss something?

Thanks,
Song

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs
  2022-08-22 21:52                     ` Song Liu
@ 2022-08-22 22:44                       ` Thomas Deutschmann
  2022-08-22 22:59                         ` Song Liu
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Deutschmann @ 2022-08-22 22:44 UTC (permalink / raw)
  To: Song Liu; +Cc: Vishal Verma, Thorsten Leemhuis, stable, regressions, Jens Axboe

On 2022-08-22 23:52, Song Liu wrote:
> Hmm.. I still cannot repro the hang in my test. I have:
> 
> [root@eth50-1 ~]# mount | grep mnt
> /dev/md0 on /root/mnt type ext4 (rw,relatime,stripe=384)
> [root@eth50-1 ~]# lsblk
> NAME    MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
> sr0      11:0    1 1024M  0 rom
> vda     253:0    0   32G  0 disk
> ├─vda1  253:1    0    2G  0 part  /boot
> └─vda2  253:2    0   30G  0 part  /
> nvme0n1 259:0    0    4G  0 disk
> └─md0     9:0    0   12G  0 raid5 /root/mnt
> nvme2n1 259:1    0    4G  0 disk
> └─md0     9:0    0   12G  0 raid5 /root/mnt
> nvme3n1 259:2    0    4G  0 disk
> └─md0     9:0    0   12G  0 raid5 /root/mnt
> nvme1n1 259:3    0    4G  0 disk
> └─md0     9:0    0   12G  0 raid5 /root/mnt
> 
> [root@eth50-1 ~]# history
>    381  fio iou/repro.fio
>    382  fsfreeze --freeze /root/mnt
>    383  fsfreeze --unfreeze /root/mnt
>    384  fio iou/repro.fio
>    385  fsfreeze --freeze /root/mnt
>    386  fsfreeze --unfreeze /root/mnt
> ^^^^^^^^^^^^^^ all works fine.
> 
> Did I miss something?

No :(

I am currently not testing against the mdraid but this shouldn't matter.

However, it looks like you don't test on bare metal, do you?

I tried to test on VMware Workstation 16 myself but VMware's nvme 
implementation is currently broken 
(https://github.com/vmware/open-vm-tools/issues/579).


-- 
Regards,
Thomas


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs
  2022-08-22 22:44                       ` Thomas Deutschmann
@ 2022-08-22 22:59                         ` Song Liu
  2022-08-23  1:37                           ` Song Liu
  0 siblings, 1 reply; 22+ messages in thread
From: Song Liu @ 2022-08-22 22:59 UTC (permalink / raw)
  To: Thomas Deutschmann
  Cc: Vishal Verma, Thorsten Leemhuis, stable, regressions, Jens Axboe

On Mon, Aug 22, 2022 at 3:44 PM Thomas Deutschmann <whissi@whissi.de> wrote:
>
> On 2022-08-22 23:52, Song Liu wrote:
> > Hmm.. I still cannot repro the hang in my test. I have:
> >
> > [root@eth50-1 ~]# mount | grep mnt
> > /dev/md0 on /root/mnt type ext4 (rw,relatime,stripe=384)
> > [root@eth50-1 ~]# lsblk
> > NAME    MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
> > sr0      11:0    1 1024M  0 rom
> > vda     253:0    0   32G  0 disk
> > ├─vda1  253:1    0    2G  0 part  /boot
> > └─vda2  253:2    0   30G  0 part  /
> > nvme0n1 259:0    0    4G  0 disk
> > └─md0     9:0    0   12G  0 raid5 /root/mnt
> > nvme2n1 259:1    0    4G  0 disk
> > └─md0     9:0    0   12G  0 raid5 /root/mnt
> > nvme3n1 259:2    0    4G  0 disk
> > └─md0     9:0    0   12G  0 raid5 /root/mnt
> > nvme1n1 259:3    0    4G  0 disk
> > └─md0     9:0    0   12G  0 raid5 /root/mnt
> >
> > [root@eth50-1 ~]# history
> >    381  fio iou/repro.fio
> >    382  fsfreeze --freeze /root/mnt
> >    383  fsfreeze --unfreeze /root/mnt
> >    384  fio iou/repro.fio
> >    385  fsfreeze --freeze /root/mnt
> >    386  fsfreeze --unfreeze /root/mnt
> > ^^^^^^^^^^^^^^ all works fine.
> >
> > Did I miss something?
>
> No :(
>
> I am currently not testing against the mdraid but this shouldn't matter.
>
> However, it looks like you don't test on bare metal, do you?
>
> I tried to test on VMware Workstation 16 myself but VMware's nvme
> implementation is currently broken
> (https://github.com/vmware/open-vm-tools/issues/579).

I am testing with QEMU emulator version 6.2.0. I can also test with
bare metal.

Thanks,
Song

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs
  2022-08-22 22:59                         ` Song Liu
@ 2022-08-23  1:37                           ` Song Liu
  2022-08-23  3:15                             ` Thomas Deutschmann
  0 siblings, 1 reply; 22+ messages in thread
From: Song Liu @ 2022-08-23  1:37 UTC (permalink / raw)
  To: Thomas Deutschmann
  Cc: Vishal Verma, Thorsten Leemhuis, stable, regressions, Jens Axboe

On Mon, Aug 22, 2022 at 3:59 PM Song Liu <song@kernel.org> wrote:
>
> On Mon, Aug 22, 2022 at 3:44 PM Thomas Deutschmann <whissi@whissi.de> wrote:
> >
> > On 2022-08-22 23:52, Song Liu wrote:
> > > Hmm.. I still cannot repro the hang in my test. I have:
> > >
> > > [root@eth50-1 ~]# mount | grep mnt
> > > /dev/md0 on /root/mnt type ext4 (rw,relatime,stripe=384)
> > > [root@eth50-1 ~]# lsblk
> > > NAME    MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
> > > sr0      11:0    1 1024M  0 rom
> > > vda     253:0    0   32G  0 disk
> > > ├─vda1  253:1    0    2G  0 part  /boot
> > > └─vda2  253:2    0   30G  0 part  /
> > > nvme0n1 259:0    0    4G  0 disk
> > > └─md0     9:0    0   12G  0 raid5 /root/mnt
> > > nvme2n1 259:1    0    4G  0 disk
> > > └─md0     9:0    0   12G  0 raid5 /root/mnt
> > > nvme3n1 259:2    0    4G  0 disk
> > > └─md0     9:0    0   12G  0 raid5 /root/mnt
> > > nvme1n1 259:3    0    4G  0 disk
> > > └─md0     9:0    0   12G  0 raid5 /root/mnt
> > >
> > > [root@eth50-1 ~]# history
> > >    381  fio iou/repro.fio
> > >    382  fsfreeze --freeze /root/mnt
> > >    383  fsfreeze --unfreeze /root/mnt
> > >    384  fio iou/repro.fio
> > >    385  fsfreeze --freeze /root/mnt
> > >    386  fsfreeze --unfreeze /root/mnt
> > > ^^^^^^^^^^^^^^ all works fine.
> > >
> > > Did I miss something?
> >
> > No :(
> >
> > I am currently not testing against the mdraid but this shouldn't matter.
> >
> > However, it looks like you don't test on bare metal, do you?
> >
> > I tried to test on VMware Workstation 16 myself but VMware's nvme
> > implementation is currently broken
> > (https://github.com/vmware/open-vm-tools/issues/579).
>
> I am testing with QEMU emulator version 6.2.0. I can also test with
> bare metal.

OK, now I got a repro with bare metal: nvme+xfs.

This is a 5.19 based kernel, the stack is

[  867.091579] INFO: task fsfreeze:49972 blocked for more than 122 seconds.
[  867.104969]       Tainted: G S
5.19.0-0_fbk0_rc1_gc225658be66e #1
[  867.119750] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  867.135381] task:fsfreeze        state:D stack:    0 pid:49972
ppid: 22571 flags:0x00004000
[  867.135388] Call Trace:
[  867.135390]  <TASK>
[  867.135394]  __schedule+0x3d7/0x700
[  867.135404]  schedule+0x39/0x90
[  867.135409]  percpu_down_write+0x234/0x270
[  867.135414]  freeze_super+0x8a/0x160
[  867.135422]  do_vfs_ioctl+0x8b5/0x920
[  867.135430]  __x64_sys_ioctl+0x52/0xb0
[  867.135435]  do_syscall_64+0x3d/0x90
[  867.135441]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  867.135447] RIP: 0033:0x7f034f23fcdb
[  867.135453] RSP: 002b:00007ffe2bdfebf8 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  867.135457] RAX: ffffffffffffffda RBX: 0000000000000066 RCX: 00007f034f23fcdb
[  867.135460] RDX: 0000000000000000 RSI: 00000000c0045877 RDI: 0000000000000003
[  867.135463] RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000000
[  867.135466] R10: 0000000000001000 R11: 0000000000000246 R12: 00007ffe2bdff334
[  867.135469] R13: 00005650ff68dc40 R14: ffffffff00000000 R15: 00005650ff68c0f5
[  867.135474]  </TASK>

I am not very familiar with this code, so I will need more time to look into it.

Thomas, have you tried to bisect with the fio repro?

Thanks,
Song

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs
  2022-08-23  1:37                           ` Song Liu
@ 2022-08-23  3:15                             ` Thomas Deutschmann
  2022-08-23 17:13                               ` Song Liu
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Deutschmann @ 2022-08-23  3:15 UTC (permalink / raw)
  To: Song Liu, Christoph Hellwig
  Cc: Vishal Verma, Thorsten Leemhuis, stable, regressions, Jens Axboe

On 2022-08-23 03:37, Song Liu wrote:
> Thomas, have you tried to bisect with the fio repro?

Yes, just finished:

> d32d3d0b47f7e34560ae3c55ddfcf68694813501 is the first bad commit
> commit d32d3d0b47f7e34560ae3c55ddfcf68694813501
> Author: Christoph Hellwig
> Date:   Mon Jun 14 13:17:34 2021 +0200
> 
>     nvme-multipath: set QUEUE_FLAG_NOWAIT
> 
>     The nvme multipathing code just dispatches bios to one of the blk-mq
>     based paths and never blocks on its own, so set QUEUE_FLAG_NOWAIT
>     to support REQ_NOWAIT bios.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d32d3d0b47f7e34560ae3c55ddfcf68694813501 


So another NOWAIT issue -- similar to the bad commit which is causing 
the mdraid issue I already found 
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0f9650bd838efe5c52f7e5f40c3204ad59f1964d).

Reverting the commit, i.e. deleting

   blk_queue_flag_set(QUEUE_FLAG_NOWAIT, head->disk->queue);

fixes the problem for me. Well, sort of. Looks like this will disable 
io_uring. fio reproducer fails with

> $ fio reproducer.fio
> filename0: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=1
> fio-3.30
> Starting 1 thread
> fio: io_u error on file /srv/machines/fio/filename0.0.0: Operation not supported: write offset=12648448, buflen=4096
> fio: pid=1585, err=95/file:io_u.c:1846, func=io_u error, error=Operation not supported

My MariaDB reproducer also doesn't trigger the problem anymore, but 
probably for the same reason -- it cannot use io_uring anymore.


-- 
Regards,
Thomas


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs
  2022-08-23  3:15                             ` Thomas Deutschmann
@ 2022-08-23 17:13                               ` Song Liu
  2022-08-25 16:47                                 ` Song Liu
  0 siblings, 1 reply; 22+ messages in thread
From: Song Liu @ 2022-08-23 17:13 UTC (permalink / raw)
  To: Thomas Deutschmann
  Cc: Christoph Hellwig, Vishal Verma, Thorsten Leemhuis, stable,
	regressions, Jens Axboe

On Mon, Aug 22, 2022 at 8:15 PM Thomas Deutschmann <whissi@whissi.de> wrote:
>
> On 2022-08-23 03:37, Song Liu wrote:
> > Thomas, have you tried to bisect with the fio repro?
>
> Yes, just finished:
>
> > d32d3d0b47f7e34560ae3c55ddfcf68694813501 is the first bad commit
> > commit d32d3d0b47f7e34560ae3c55ddfcf68694813501
> > Author: Christoph Hellwig
> > Date:   Mon Jun 14 13:17:34 2021 +0200
> >
> >     nvme-multipath: set QUEUE_FLAG_NOWAIT
> >
> >     The nvme multipathing code just dispatches bios to one of the blk-mq
> >     based paths and never blocks on its own, so set QUEUE_FLAG_NOWAIT
> >     to support REQ_NOWAIT bios.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d32d3d0b47f7e34560ae3c55ddfcf68694813501
>
>
> So another NOWAIT issue -- similar to the bad commit which is causing
> the mdraid issue I already found
> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0f9650bd838efe5c52f7e5f40c3204ad59f1964d).
>
> Reverting the commit, i.e. deleting
>
>    blk_queue_flag_set(QUEUE_FLAG_NOWAIT, head->disk->queue);
>
> fixes the problem for me. Well, sort of. Looks like this will disable
> io_uring. fio reproducer fails with

My system doesn't have multipath enabled. I guess bisect will point to something
else here.

I am afraid we won't get more information from bisect.

Thanks,
Song

>
> > $ fio reproducer.fio
> > filename0: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=1
> > fio-3.30
> > Starting 1 thread
> > fio: io_u error on file /srv/machines/fio/filename0.0.0: Operation not supported: write offset=12648448, buflen=4096
> > fio: pid=1585, err=95/file:io_u.c:1846, func=io_u error, error=Operation not supported
>
> My MariaDB reproducer also doesn't trigger the problem anymore, but
> probably for the same reason -- it cannot use io_uring anymore.
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs
  2022-08-23 17:13                               ` Song Liu
@ 2022-08-25 16:47                                 ` Song Liu
  2022-08-25 19:12                                   ` Jens Axboe
  0 siblings, 1 reply; 22+ messages in thread
From: Song Liu @ 2022-08-25 16:47 UTC (permalink / raw)
  To: Thomas Deutschmann
  Cc: Christoph Hellwig, Vishal Verma, Thorsten Leemhuis, stable,
	regressions, Jens Axboe

On Tue, Aug 23, 2022 at 10:13 AM Song Liu <song@kernel.org> wrote:
>
> On Mon, Aug 22, 2022 at 8:15 PM Thomas Deutschmann <whissi@whissi.de> wrote:
> >
> > On 2022-08-23 03:37, Song Liu wrote:
> > > Thomas, have you tried to bisect with the fio repro?
> >
> > Yes, just finished:
> >
> > > d32d3d0b47f7e34560ae3c55ddfcf68694813501 is the first bad commit
> > > commit d32d3d0b47f7e34560ae3c55ddfcf68694813501
> > > Author: Christoph Hellwig
> > > Date:   Mon Jun 14 13:17:34 2021 +0200
> > >
> > >     nvme-multipath: set QUEUE_FLAG_NOWAIT
> > >
> > >     The nvme multipathing code just dispatches bios to one of the blk-mq
> > >     based paths and never blocks on its own, so set QUEUE_FLAG_NOWAIT
> > >     to support REQ_NOWAIT bios.
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d32d3d0b47f7e34560ae3c55ddfcf68694813501
> >
> >
> > So another NOWAIT issue -- similar to the bad commit which is causing
> > the mdraid issue I already found
> > (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0f9650bd838efe5c52f7e5f40c3204ad59f1964d).
> >
> > Reverting the commit, i.e. deleting
> >
> >    blk_queue_flag_set(QUEUE_FLAG_NOWAIT, head->disk->queue);
> >
> > fixes the problem for me. Well, sort of. Looks like this will disable
> > io_uring. fio reproducer fails with
>
> My system doesn't have multipath enabled. I guess bisect will point to something
> else here.
>
> I am afraid we won't get more information from bisect.

OK, I am able to pinpoint the issue, and Jens found the proper fix for
it (see below,
also available in [1]). It survived 100 runs of the repro fio job.

Thomas, please give it a try.

Thanks,
Song

diff --git c/fs/io_uring.c w/fs/io_uring.c
index 3f8a79a4affa..72a39f5ec5a5 100644
--- c/fs/io_uring.c
+++ w/fs/io_uring.c
@@ -4551,7 +4551,12 @@ static int io_write(struct io_kiocb *req,
unsigned int issue_flags)
 copy_iov:
                iov_iter_restore(&s->iter, &s->iter_state);
                ret = io_setup_async_rw(req, iovec, s, false);
-               return ret ?: -EAGAIN;
+               if (!ret) {
+                       if (kiocb->ki_flags & IOCB_WRITE)
+                               kiocb_end_write(req);
+                       return -EAGAIN;
+               }
+               return 0;
        }
 out_free:
        /* it's reportedly faster than delegating the null check to kfree() */

[1] https://lore.kernel.org/stable/a603cfc5-9ba5-20c3-3fec-2c4eec4350f7@kernel.dk/T/#u

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs
  2022-08-25 16:47                                 ` Song Liu
@ 2022-08-25 19:12                                   ` Jens Axboe
  2022-08-25 22:24                                     ` Song Liu
  0 siblings, 1 reply; 22+ messages in thread
From: Jens Axboe @ 2022-08-25 19:12 UTC (permalink / raw)
  To: Song Liu, Thomas Deutschmann
  Cc: Christoph Hellwig, Vishal Verma, Thorsten Leemhuis, stable, regressions

On 8/25/22 10:47 AM, Song Liu wrote:
> On Tue, Aug 23, 2022 at 10:13 AM Song Liu <song@kernel.org> wrote:
>>
>> On Mon, Aug 22, 2022 at 8:15 PM Thomas Deutschmann <whissi@whissi.de> wrote:
>>>
>>> On 2022-08-23 03:37, Song Liu wrote:
>>>> Thomas, have you tried to bisect with the fio repro?
>>>
>>> Yes, just finished:
>>>
>>>> d32d3d0b47f7e34560ae3c55ddfcf68694813501 is the first bad commit
>>>> commit d32d3d0b47f7e34560ae3c55ddfcf68694813501
>>>> Author: Christoph Hellwig
>>>> Date:   Mon Jun 14 13:17:34 2021 +0200
>>>>
>>>>     nvme-multipath: set QUEUE_FLAG_NOWAIT
>>>>
>>>>     The nvme multipathing code just dispatches bios to one of the blk-mq
>>>>     based paths and never blocks on its own, so set QUEUE_FLAG_NOWAIT
>>>>     to support REQ_NOWAIT bios.
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d32d3d0b47f7e34560ae3c55ddfcf68694813501
>>>
>>>
>>> So another NOWAIT issue -- similar to the bad commit which is causing
>>> the mdraid issue I already found
>>> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0f9650bd838efe5c52f7e5f40c3204ad59f1964d).
>>>
>>> Reverting the commit, i.e. deleting
>>>
>>>    blk_queue_flag_set(QUEUE_FLAG_NOWAIT, head->disk->queue);
>>>
>>> fixes the problem for me. Well, sort of. Looks like this will disable
>>> io_uring. fio reproducer fails with
>>
>> My system doesn't have multipath enabled. I guess bisect will point to something
>> else here.
>>
>> I am afraid we won't get more information from bisect.
> 
> OK, I am able to pinpoint the issue, and Jens found the proper fix for
> it (see below,
> also available in [1]). It survived 100 runs of the repro fio job.
> 
> Thomas, please give it a try.
> 
> Thanks,
> Song
> 
> diff --git c/fs/io_uring.c w/fs/io_uring.c
> index 3f8a79a4affa..72a39f5ec5a5 100644
> --- c/fs/io_uring.c
> +++ w/fs/io_uring.c
> @@ -4551,7 +4551,12 @@ static int io_write(struct io_kiocb *req,
> unsigned int issue_flags)
>  copy_iov:
>                 iov_iter_restore(&s->iter, &s->iter_state);
>                 ret = io_setup_async_rw(req, iovec, s, false);
> -               return ret ?: -EAGAIN;
> +               if (!ret) {
> +                       if (kiocb->ki_flags & IOCB_WRITE)
> +                               kiocb_end_write(req);
> +                       return -EAGAIN;
> +               }
> +               return 0;

This should be 'return ret;' for that last line. I had to double check
the ones I did, but they did get it right. But I did a double take when
I saw this one :-)

It'll work fine for testing as we won't hit errors here unless we run
out of memory, so...

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs
  2022-08-25 19:12                                   ` Jens Axboe
@ 2022-08-25 22:24                                     ` Song Liu
  2022-08-26 20:10                                       ` Thomas Deutschmann
  0 siblings, 1 reply; 22+ messages in thread
From: Song Liu @ 2022-08-25 22:24 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Thomas Deutschmann, Christoph Hellwig, Vishal Verma,
	Thorsten Leemhuis, stable, regressions

On Thu, Aug 25, 2022 at 12:12 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 8/25/22 10:47 AM, Song Liu wrote:
> > On Tue, Aug 23, 2022 at 10:13 AM Song Liu <song@kernel.org> wrote:
> >>
> >> On Mon, Aug 22, 2022 at 8:15 PM Thomas Deutschmann <whissi@whissi.de> wrote:
> >>>
> >>> On 2022-08-23 03:37, Song Liu wrote:
> >>>> Thomas, have you tried to bisect with the fio repro?
> >>>
> >>> Yes, just finished:
> >>>
> >>>> d32d3d0b47f7e34560ae3c55ddfcf68694813501 is the first bad commit
> >>>> commit d32d3d0b47f7e34560ae3c55ddfcf68694813501
> >>>> Author: Christoph Hellwig
> >>>> Date:   Mon Jun 14 13:17:34 2021 +0200
> >>>>
> >>>>     nvme-multipath: set QUEUE_FLAG_NOWAIT
> >>>>
> >>>>     The nvme multipathing code just dispatches bios to one of the blk-mq
> >>>>     based paths and never blocks on its own, so set QUEUE_FLAG_NOWAIT
> >>>>     to support REQ_NOWAIT bios.
> >>>
> >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d32d3d0b47f7e34560ae3c55ddfcf68694813501
> >>>
> >>>
> >>> So another NOWAIT issue -- similar to the bad commit which is causing
> >>> the mdraid issue I already found
> >>> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0f9650bd838efe5c52f7e5f40c3204ad59f1964d).
> >>>
> >>> Reverting the commit, i.e. deleting
> >>>
> >>>    blk_queue_flag_set(QUEUE_FLAG_NOWAIT, head->disk->queue);
> >>>
> >>> fixes the problem for me. Well, sort of. Looks like this will disable
> >>> io_uring. fio reproducer fails with
> >>
> >> My system doesn't have multipath enabled. I guess bisect will point to something
> >> else here.
> >>
> >> I am afraid we won't get more information from bisect.
> >
> > OK, I am able to pinpoint the issue, and Jens found the proper fix for
> > it (see below,
> > also available in [1]). It survived 100 runs of the repro fio job.
> >
> > Thomas, please give it a try.
> >
> > Thanks,
> > Song
> >
> > diff --git c/fs/io_uring.c w/fs/io_uring.c
> > index 3f8a79a4affa..72a39f5ec5a5 100644
> > --- c/fs/io_uring.c
> > +++ w/fs/io_uring.c
> > @@ -4551,7 +4551,12 @@ static int io_write(struct io_kiocb *req,
> > unsigned int issue_flags)
> >  copy_iov:
> >                 iov_iter_restore(&s->iter, &s->iter_state);
> >                 ret = io_setup_async_rw(req, iovec, s, false);
> > -               return ret ?: -EAGAIN;
> > +               if (!ret) {
> > +                       if (kiocb->ki_flags & IOCB_WRITE)
> > +                               kiocb_end_write(req);
> > +                       return -EAGAIN;
> > +               }
> > +               return 0;
>
> This should be 'return ret;' for that last line. I had to double check
> the ones I did, but they did get it right. But I did a double take when
> I saw this one :-)

Ah, right... "ret ?: -EAGAIN" is a lot of information..

Song

>
> It'll work fine for testing as we won't hit errors here unless we run
> out of memory, so...
>
> --
> Jens Axboe

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs
  2022-08-25 22:24                                     ` Song Liu
@ 2022-08-26 20:10                                       ` Thomas Deutschmann
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Deutschmann @ 2022-08-26 20:10 UTC (permalink / raw)
  To: Song Liu, Jens Axboe
  Cc: Christoph Hellwig, Vishal Verma, Thorsten Leemhuis, stable, regressions

Hello,

patch looks good to me -- cannot reproduce the problem anymore:

I tested 10 hours against single NVME drive and 10 hours against mdraid 
array so the patch addresses both problems.

Thank you very much!


-- 
Regards,
Thomas


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs #forregzbot
  2022-08-15 10:58   ` Thorsten Leemhuis
  2022-08-15 15:46     ` Vishal Verma
@ 2022-09-08 13:25     ` Thorsten Leemhuis
  1 sibling, 0 replies; 22+ messages in thread
From: Thorsten Leemhuis @ 2022-09-08 13:25 UTC (permalink / raw)
  To: regressions; +Cc: stable

TWIMC: this mail is primarily send for documentation purposes and for
regzbot, my Linux kernel regression tracking bot. These mails usually
contain '#forregzbot' in the subject, to make them easy to spot and filter.

#regzbot fixed-by: e053aaf4da56cbf0afb33a0fda4a62188e2c0637

On 15.08.22 12:58, Thorsten Leemhuis wrote:
> Hi, this is your Linux kernel regression tracker. Top-posting for once,
> to make this easily accessible to everyone.
> 
> [CCing Jens, as the top-level maintainer who in this case also reviewed
> the patch that causes this regression.]
> 
> Vishal, Song, what up here? Could you please look into this and at least
> comment on the issue, as it's a regression that was reported more than
> 10 days ago already. Ideally at this point it would be good if the
> regression was fixed already, as explained by "Prioritize work on fixing
> regressions" here:
> https://docs.kernel.org/process/handling-regressions.html#prioritize-work-on-fixing-regressions
> 
> Ciao, Thorsten
> 
> On 11.08.22 14:34, Thomas Deutschmann wrote:
> 
>>
>> Hi,
>>
>> any news on this? Is there anything else you need from me or I can help
>> with?
>>
>> Thanks.
>>
>>
>> -- Regards, Thomas -----Original Message----- From: Thomas Deutschmann
>> <whissi@whissi.de> Sent: Wednesday, August 3, 2022 4:35 PM To:
>> vverma@digitalocean.com; song@kernel.org Cc: stable@vger.kernel.org;
>> regressions@lists.linux.dev Subject: [REGRESSION] v5.17-rc1+: FIFREEZE
>> ioctl system call hangs Hi, while trying to backup a Dell R7525 system
>> running Debian bookworm/testing using LVM snapshots I noticed that the
>> system will 'freeze' sometimes (not all the times) when creating the
>> snapshot. First I thought this was related to LVM so I created
>> https://listman.redhat.com/archives/linux-lvm/2022-July/026228.html
>> (continued at
>> https://listman.redhat.com/archives/linux-lvm/2022-August/thread.html#26229) Long story short: I was even able to reproduce with fsfreeze, see last strace lines
>>> [...]
>>> 14471 1659449870.984635 openat(AT_FDCWD, "/var/lib/machines", O_RDONLY) =3
>>> 14471 1659449870.984658 newfstatat(3, "",
>> {st_mode=S_IFDIR|0700,st_size=4096, ...}, AT_EMPTY_PATH) = 0
>>> 14471 1659449870.984678 ioctl(3, FIFREEZE
>> so I started to bisect kernel and found the following bad commit:
>>
>>> md: add support for REQ_NOWAIT
>>>
>>> commit 021a24460dc2 ("block: add QUEUE_FLAG_NOWAIT") added support
>>> for checking whether a given bdev supports handling of REQ_NOWAIT or not.
>>> Since then commit 6abc49468eea ("dm: add support for REQ_NOWAIT and enable
>>> it for linear target") added support for REQ_NOWAIT for dm. This uses
>>> a similar approach to incorporate REQ_NOWAIT for md based bios.
>>>
>>> This patch was tested using t/io_uring tool within FIO. A nvme drive
>>> was partitioned into 2 partitions and a simple raid 0 configuration
>>> /dev/md0 was created.
>>>
>>> md0 : active raid0 nvme4n1p1[1] nvme4n1p2[0]
>>>       937423872 blocks super 1.2 512k chunks
>>>
>>> Before patch:
>>>
>>> $ ./t/io_uring /dev/md0 -p 0 -a 0 -d 1 -r 100
>>>
>>> Running top while the above runs:
>>>
>>> $ ps -eL | grep $(pidof io_uring)
>>>
>>>   38396   38396 pts/2    00:00:00 io_uring
>>>   38396   38397 pts/2    00:00:15 io_uring
>>>   38396   38398 pts/2    00:00:13 iou-wrk-38397
>>>
>>> We can see iou-wrk-38397 io worker thread created which gets created
>>> when io_uring sees that the underlying device (/dev/md0 in this case)
>>> doesn't support nowait.
>>>
>>> After patch:
>>>
>>> $ ./t/io_uring /dev/md0 -p 0 -a 0 -d 1 -r 100
>>>
>>> Running top while the above runs:
>>>
>>> $ ps -eL | grep $(pidof io_uring)
>>>
>>>   38341   38341 pts/2    00:10:22 io_uring
>>>   38341   38342 pts/2    00:10:37 io_uring
>>>
>>> After running this patch, we don't see any io worker thread
>>> being created which indicated that io_uring saw that the
>>> underlying device does support nowait. This is the exact behaviour
>>> noticed on a dm device which also supports nowait.
>>>
>>> For all the other raid personalities except raid0, we would need
>>> to train pieces which involves make_request fn in order for them
>>> to correctly handle REQ_NOWAIT.
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i
>> d=f51d46d0e7cb5b8494aa534d276a9d8915a2443d
>>
>> After reverting this commit (and follow up commit
>> 0f9650bd838efe5c52f7e5f40c3204ad59f1964d)
>> v5.18.15 and v5.19 worked for me again.
>>
>> At this point I still wonder why I experienced the same problem even after I
>> removed one nvme device from the mdraid array and tested it separately. So
>> maybe there is another nowait/REQ_NOWAIT problem somewhere. During bisect
>> I only tested against the mdraid array.
>>
>>
>> #regzbot introduced: f51d46d0e7cb5b8494aa534d276a9d8915a2443d
>> #regzbot link:
>> https://listman.redhat.com/archives/linux-lvm/2022-July/026228.html
>> #regzbot link:
>> https://listman.redhat.com/archives/linux-lvm/2022-August/thread.html#26229
>>
>>
>> -- Regards, Thomas
>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-09-08 13:25 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-03 14:35 [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs Thomas Deutschmann
2022-08-11 12:34 ` Thomas Deutschmann
2022-08-15 10:58   ` Thorsten Leemhuis
2022-08-15 15:46     ` Vishal Verma
2022-08-17  6:19       ` Song Liu
2022-08-17  6:53         ` Thomas Deutschmann
2022-08-17 18:29           ` Thomas Deutschmann
2022-08-19  2:46             ` Thomas Deutschmann
2022-08-20  1:04               ` Song Liu
2022-08-22 15:29                 ` Thomas Deutschmann
2022-08-22 16:30                   ` Thomas Deutschmann
2022-08-22 21:52                     ` Song Liu
2022-08-22 22:44                       ` Thomas Deutschmann
2022-08-22 22:59                         ` Song Liu
2022-08-23  1:37                           ` Song Liu
2022-08-23  3:15                             ` Thomas Deutschmann
2022-08-23 17:13                               ` Song Liu
2022-08-25 16:47                                 ` Song Liu
2022-08-25 19:12                                   ` Jens Axboe
2022-08-25 22:24                                     ` Song Liu
2022-08-26 20:10                                       ` Thomas Deutschmann
2022-09-08 13:25     ` [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs #forregzbot Thorsten Leemhuis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.