I/O blocked after booting

* I/O blocked after booting
@ 2024-03-21 13:13 Massimo B.
  2024-03-28  8:36 ` HAN Yuwei
  2024-03-28 10:10 ` Qu Wenruo
  0 siblings, 2 replies; 7+ messages in thread
From: Massimo B. @ 2024-03-21 13:13 UTC (permalink / raw)
  To: linux-btrfs

Hello everybody,

I have this issue since years on all my desktop machines (but with almost
identical distribution and configurations):

Sometimes when booting the system, it comes up until the window manager with
login screen appears. But no further login is possible. Trying to login via
virtual terminals, SSH or trying to reboot, it appears that all I/O to the btrfs
is blocked. Also waiting for ~20 minutes doesn't help the filesystem is
blocking.

I thought that might happen on unclean shutdowns or stuff. But it's not
reproducible and also clean shutdowns sometimes lead to the same issue.

First I thought it's some of the btrfsmaintenance jobs. So finally I disabled
all of them:

# grep PERIOD /etc/default/btrfsmaintenance
BTRFS_DEFRAG_PERIOD="none"
BTRFS_BALANCE_PERIOD="none"
BTRFS_SCRUB_PERIOD="monthly"
BTRFS_TRIM_PERIOD="none"

No success.

What I can confirm, after doing a forced reboot by holy SYSRQ series R,E,I,S,U,B
the next startup is always fine and gets a working btrfs.
Then the first line on the screen before doing the reboot are:

sysrq: Keyboard mode set to system default
sysrq: Terminate All Tasks
elogind-daemon[4481]: Received signal 15 [TERM]

BTRFS info (device dm-2): first mount of filesystem 1d677-.....
BTRFS info (device dm-2): using crc32c (crc32c-intel checksum algorithm
BTRFS info (device dm-2): force zstd compression, level 15
BTRFS info (device dm-2): using free space tree
BTRFS warmomg (device dm-0): failed to trim 30 block group(s), last error -512
BTRFS warmomg (device dm-0): failed to trim 1 device(s), last error -512

I guess this dm-0 is my main btrfs on PCIe NVMe.

When successfully mounted the mount looks like this:

/dev/mapper/luks-801... on / type btrfs (rw,noatime,nodiratime,compress-
force=zstd:3,ssd,discard=async,noacl,space_cache=v2,subvolid=524,subvol=/volumes
/root)

Current kernel is 6.6.13-gentoo, though I don't think that is important as I
have the issue for years with all previous kernels.
I'm not only using the self-configured kernel from gentoo-sources but also a
universal binary 6.6.16-gentoo-dist.

I thought, maybe my btrbk run by cron could be the culprit. Looking at the
syslogs, before the blocked I/O I see some very last lines in the log, where
btrbk was started. Right after that the next line is the next boot:

Mar 19 07:43:40 [chronyd] System clock wrong by -3.227396 seconds
Mar 19 07:43:40 [chronyd] System clock was stepped by -3.227396 seconds
Mar 19 07:44:00 [fcron] pam_unix(fcron:session): session opened for user clamav(uid=130) by (uid=0)
Mar 19 07:44:00 [fcron] Job 'fangfrisch -c /etc/fangfrisch.conf refresh' started for user clamav (pid 4977)
Mar 19 07:44:00 [fcron] Job 'ionice -c 3 schedtool -D -e btrbk -c /etc/btrbk/btrbk.conf run cron && /usr/local/bin/1update_btrbksnapshotlinks -c /etc/btrbk/btrbk.conf /mnt/archive/*/* / (truncated)
Mar 19 07:44:00 [fcron] Job 'run-parts /etc/cron.daily' started for user systab (pid 4984)
Mar 19 07:44:00 [fcron] Job 'run-parts /etc/cron.weekly' started for user systab (pid 4987)
Mar 19 07:47:32 [kernel] Linux version 6.6.13-gentoo (root@gentoo) (gcc (Gentoo 13.2.1_p20230826 p7) 13.2.1 20230826, GNU ld (Gentoo 2.40 p7) 2.40.0) #1 SMP PREEMPT_DYNAMIC Mon Jan 22 11:11:15 CET 2024

Actually btrbk works fine when the system is up and running, either started
manually or from the cron job. What could happen to block all btrfs IO? How can
I debug that?

Best regards,
Massimo

^ permalink raw reply	[flat|nested] 7+ messages in thread