From: Thomas Deutschmann <whissi@whissi.de>
To: Song Liu <song@kernel.org>, Vishal Verma <vverma@digitalocean.com>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>,
stable@vger.kernel.org, regressions@lists.linux.dev,
Jens Axboe <axboe@kernel.dk>
Subject: Re: [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs
Date: Wed, 17 Aug 2022 08:53:46 +0200 [thread overview]
Message-ID: <43e678ca-3fc3-6c08-f035-2c31a34dd889@whissi.de> (raw)
In-Reply-To: <CAPhsuW5f9QD+gzJ9eBhn5irsHvrsvkWjSnA4MPaHsQjjLMypXg@mail.gmail.com>
Hi,
On 2022-08-17 08:19, Song Liu wrote:
> On Mon, Aug 15, 2022 at 8:46 AM Vishal Verma
> <vverma@digitalocean.com> wrote:
>>
>> Just saw this. I’m trying to understand whether this happens only
>> on md array or individual nvme drives (without any raid) too? The
>> commit you pointed added REQ_NOWAIT for md based arrays, but if it
>> is happening on individual nvme drives then that could point to
>> something with REQ_NOWAIT I think.
>
> Agreed with this analysis.
I bisected again, this time I tested against the single nvme device.
I did it 2 times, and always ended up with
> git bisect start
> # good: [8bb7eca972ad531c9b149c0a51ab43a417385813] Linux 5.15
> git bisect good 8bb7eca972ad531c9b149c0a51ab43a417385813
> # bad: [df0cc57e057f18e44dac8e6c18aba47ab53202f9] Linux 5.16
> git bisect bad df0cc57e057f18e44dac8e6c18aba47ab53202f9
> # good: [2219b0ceefe835b92a8a74a73fe964aa052742a2] Merge tag
'soc-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
> git bisect good 2219b0ceefe835b92a8a74a73fe964aa052742a2
> # good: [206825f50f908771934e1fba2bfc2e1f1138b36a] Merge tag
'mtd/for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
> git bisect good 206825f50f908771934e1fba2bfc2e1f1138b36a
> # bad: [4e1fddc98d2585ddd4792b5e44433dcee7ece001] tcp_cubic: fix
spurious Hystart ACK train detections for not-cwnd-limited flows
> git bisect bad 4e1fddc98d2585ddd4792b5e44433dcee7ece001
> # good: [dbf49896187fd58c577fa1574a338e4f3672b4b2] Merge branch
'akpm' (patches from Andrew)
> git bisect good dbf49896187fd58c577fa1574a338e4f3672b4b2
> # good: [0ecca62beb12eeb13965ed602905c8bf53ac93d0] Merge tag
'ceph-for-5.16-rc1' of git://github.com/ceph/ceph-client
> git bisect good 0ecca62beb12eeb13965ed602905c8bf53ac93d0
> # bad: [7d5775d49e4a488bc8a07e5abb2b71a4c28aadbb] Merge tag
'printk-for-5.16-fixup' of
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
> git bisect bad 7d5775d49e4a488bc8a07e5abb2b71a4c28aadbb
> # good: [35c8fad4a703fdfa009ed274f80bb64b49314cde] Merge tag
'perf-tools-for-v5.16-2021-11-13' of
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
> git bisect good 35c8fad4a703fdfa009ed274f80bb64b49314cde
> # good: [6ea45c57dc176dde529ab5d7c4b3f20e52a2bd82] Merge tag
'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
> git bisect good 6ea45c57dc176dde529ab5d7c4b3f20e52a2bd82
> # bad: [fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf] Linux 5.16-rc1
> git bisect bad fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf
> # good: [475c3f599582a34e189f047ed3fb7e90a295ea5b] sh: fix READ/WRITE
redefinition warnings
> git bisect good 475c3f599582a34e189f047ed3fb7e90a295ea5b
> # good: [c3b68c27f58a07130382f3fa6320c3652ad76f15] Merge tag
'for-5.16/parisc-3' of
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
> git bisect good c3b68c27f58a07130382f3fa6320c3652ad76f15
> # good: [4a6b35b3b3f28df81fea931dc77c4c229cbdb5b2] xfs: sync
xfs_btree_split macros with userspace libxfs
> git bisect good 4a6b35b3b3f28df81fea931dc77c4c229cbdb5b2
> # good: [dee2b702bcf067d7b6b62c18bdd060ff0810a800] kconfig: Add
support for -Wimplicit-fallthrough
> git bisect good dee2b702bcf067d7b6b62c18bdd060ff0810a800
> # first bad commit: [fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf] Linux
5.16-rc1
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf
...but this doesn't make any sense, right?
However, I cannot reproduce with the commit before, i.e. dee2b702bcf0
didn't freeze during my 10 test runs.
But with fa55b7dcdc (or any later commit), system will freeze on _every_
test run?!
I checked out 1bd297988b75 which never failed before, changed Makefile
to PATCHLEVEL=16 and EXTRAVERSION=-rc1 and guess what: It's now failing,
too.
So this sounds like some code changes behavior when KV is >=5.16-rc1. Is
that possible?
Anyway, I started to test v5.10 (with PATCHLEVEL=16 and
EXTRAVERSION=-rc1 set) which worked so I started another bisect session
where I named all KV to 5.16-rc1.
I'll post my finding when this session is completed.
> I am not able to reproduce this on 5.19+ kernel. I have:
>
> [root@eth50-1 ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> sr0 11:0 1 1024M 0 rom vda 253:0 0 32G 0 disk
> ├─vda1 253:1 0 2G 0 part /boot └─vda2 253:2 0 30G 0
> part / nvme0n1 259:0 0 4G 0 disk └─md0 9:0 0 12G 0
> raid5 /root/mnt nvme2n1 259:1 0 4G 0 disk └─md0 9:0 0
> 12G 0 raid5 /root/mnt nvme3n1 259:2 0 4G 0 disk └─md0 9:0
> 0 12G 0 raid5 /root/mnt nvme1n1 259:3 0 4G 0 disk └─md0
> 9:0 0 12G 0 raid5 /root/mnt [root@eth50-1 ~]# for x in {1..100}
> ; do fsfreeze --unfreeze /root/mnt ; fsfreeze --freeze /root/mnt ;
> done
>
> Did I miss something?
Well, your reproducer doesn't work. Like written in my initial mail,
executing `fsfreeze --freeze...` directly after boot doesn't even fail
for me. The device/array must have seen some I/O to trigger this.
To be more precise:
During my current bisect session (where I set KV to 5.16-rc1 for all
kernels), I noticed that my 'reproducer' failed:
To trigger the problem, it is not enough to create random I/O by copying
some files for example.
I am using mysqld (MariaDB 10.6.8) and restore ~20GB of SQL dumps --
somehow this is triggering the problem in a reliable way. The mysqld is
using O_DIRECT
(https://mariadb.com/kb/en/innodb-system-variables/#innodb_flush_method)
-- maybe Direct I/O is the trigger.
This process usually takes ~620s on my test system where I am
experiencing the problem. After import I called `fsfreeze --freeze ...`
against the mount point used by mysqld.
When this command did not return (=fsfreeze was hanging), I marked
revision as bad.
Since setting KV in all kernels to "5.16-rc1" I noticed that the import
process sometimes "freezed" -- mysqld was still running and responsive
(that's not the case when fsfreeze hangs for example) and `SHOW
PROCESSLIST` showed the running imports with still increasing time
counter. However, no data are read and written anymore. Although
fsfreeze command works when this happens. Anyway, I marked revisions
showing this behavior as bad, too.
I'll post my results when I finished this bisect session.
--
Regards,
Thomas
next prev parent reply other threads:[~2022-08-17 6:53 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-03 14:35 [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs Thomas Deutschmann
2022-08-11 12:34 ` Thomas Deutschmann
2022-08-15 10:58 ` Thorsten Leemhuis
2022-08-15 15:46 ` Vishal Verma
2022-08-17 6:19 ` Song Liu
2022-08-17 6:53 ` Thomas Deutschmann [this message]
2022-08-17 18:29 ` Thomas Deutschmann
2022-08-19 2:46 ` Thomas Deutschmann
2022-08-20 1:04 ` Song Liu
2022-08-22 15:29 ` Thomas Deutschmann
2022-08-22 16:30 ` Thomas Deutschmann
2022-08-22 21:52 ` Song Liu
2022-08-22 22:44 ` Thomas Deutschmann
2022-08-22 22:59 ` Song Liu
2022-08-23 1:37 ` Song Liu
2022-08-23 3:15 ` Thomas Deutschmann
2022-08-23 17:13 ` Song Liu
2022-08-25 16:47 ` Song Liu
2022-08-25 19:12 ` Jens Axboe
2022-08-25 22:24 ` Song Liu
2022-08-26 20:10 ` Thomas Deutschmann
2022-09-08 13:25 ` [REGRESSION] v5.17-rc1+: FIFREEZE ioctl system call hangs #forregzbot Thorsten Leemhuis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43e678ca-3fc3-6c08-f035-2c31a34dd889@whissi.de \
--to=whissi@whissi.de \
--cc=axboe@kernel.dk \
--cc=regressions@leemhuis.info \
--cc=regressions@lists.linux.dev \
--cc=song@kernel.org \
--cc=stable@vger.kernel.org \
--cc=vverma@digitalocean.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).