* btrfs-transaction stalls
@ 2023-03-02 13:42 Christian Kujau
2023-04-17 21:30 ` Christian Kujau
0 siblings, 1 reply; 11+ messages in thread
From: Christian Kujau @ 2023-03-02 13:42 UTC (permalink / raw)
To: linux-btrfs
Hi,
this Fedora 37 workstation (always running the latest distribution kernel,
6.1.14-200 right now) has been installed 2 years ago but lately I noticed
some strange stalls when working: browser windows no longer react,
sometimes even the Gnome "oh, this window does not respond, do you want to
kill it?" popup comes up, and the system feels very unresponse. The system
load applet is still working and shows very high I/O wait times, and I
wait ~60 seconds and the system recovers and works again. And while all
this happened rarely in the past, I notice it more often lately.
Some specs: Thinkpad T470, with a single Crucial NVMe 1 TB disk installed.
I opted for full-disk-encryption and so the rootfs is (compressed) Btrfs
on-top a LUKS2 dm-crypt device. I'm not using snapshots.
I ran a full btrfs-balance some time ago, but I can't say it helped much.
If recommended, I can run another one of course.
Whenever these stalls happen, and I manage to fire up iotop-c, I can see
"btrfs-transaction" at the very top, utilizing 100% of the disk's IO, but
not generating _that_ much of I/O traffic (the encrypted disk can do ~250
MB/s read). With even more luck I manage to run "perf record" during these
stalls ("cd /tmp" before, so it can record to tmpfs, because of the
stall), and while I can see "btrfs-transaction" in there, I don't see it
in the top-10. Or maybe I'm to stupid to use perf report:
https://nerdbynature.de/bits/6.1.14-200/perf_1.data.xz
This all can even be reproduced somewhat reliably: I was writing above how
fast the disk is, and wanted to test _write_ speed as well:
$ pv < /dev/zero > /foo
[wait....1.7GB/s.., cool...ok, let's stop after 9GB or so...]
^C
$ rm -f /foo
And all that comes back instantly. But then running "/bin/sync" afterwards
took 8 minutes (!) until the command came back, and I can see the I/O wait
again in the system load applet in my window manager. This time I could
no longer start iotop-c, but "perf record" recorded something, if you want
to have a look:
https://nerdbynature.de/bits/6.1.14-200/perf_2.data.xz
Some Btrfs infos below. Does anybody have an idea what's going on? Or how
to debug this? Disable compression? Run another btrfs-balance? fsck?
Thank you,
Christian.
$ mount | grep btrfs
/dev/mapper/luks-root on / type btrfs (rw,relatime,seclabel,compress=zstd:3,ssd,discard,space_cache=v2,subvolid=5,subvol=/)
$ df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/luks-root 250G 162G 88G 66% /
$ btrfs filesystem df /
Data, single: total=164.00GiB, used=159.79GiB
System, single: total=32.00MiB, used=48.00KiB
Metadata, single: total=3.00GiB, used=1.73GiB
GlobalReserve, single: total=329.44MiB, used=0.00B
$ sudo btrfs filesystem usage -T /
Overall:
Device size: 249.98GiB
Device allocated: 167.03GiB
Device unallocated: 82.95GiB
Device missing: 0.00B
Device slack: 0.00B
Used: 161.53GiB
Free (estimated): 87.16GiB (min: 87.16GiB)
Free (statfs, df): 87.16GiB
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 329.44MiB (used: 0.00B)
Multiple profiles: no
Data Metadata System
Id Path single single single Unallocated Total Slack
-- --------------------- --------- -------- -------- ----------- --------- ----
1 /dev/mapper/luks-root 164.00GiB 3.00GiB 32.00MiB 82.95GiB 249.98GiB -
-- --------------------- --------- -------- -------- ----------- --------- ----
Total 164.00GiB 3.00GiB 32.00MiB 82.95GiB 249.98GiB 0.00B
Used 159.79GiB 1.73GiB 48.00KiB
$ sudo compsize -x /
Processed 1263684 files, 1173537 regular extents (1252947 refs), 612785 inline.
Type Perc Disk Usage Uncompressed Referenced
TOTAL 82% 160G 193G 199G
none 100% 144G 144G 145G
zlib 33% 14K 43K 43K
zstd 32% 15G 48G 53G
prealloc 100% 66M 66M 72M
--
BOFH excuse #227:
Fatal error right in front of screen
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs-transaction stalls
2023-03-02 13:42 btrfs-transaction stalls Christian Kujau
@ 2023-04-17 21:30 ` Christian Kujau
2023-04-17 22:56 ` Qu Wenruo
0 siblings, 1 reply; 11+ messages in thread
From: Christian Kujau @ 2023-04-17 21:30 UTC (permalink / raw)
To: linux-btrfs
On Thu, 2 Mar 2023, Christian Kujau wrote:
> this Fedora 37 workstation (always running the latest distribution kernel,
> 6.1.14-200 right now) has been installed 2 years ago but lately I noticed
> some strange stalls when working: browser windows no longer react,
> sometimes even the Gnome "oh, this window does not respond, do you want to
> kill it?" popup comes up, and the system feels very unresponsive.
No dice. Even after another full re-balance, the problem is back, and even
worse. Now with 6.2.10-200.fc37.x86_64:
$ sudo iotop-c -o
Total DISK READ: 0.00 B/s
Total DISK WRITE: 10.18 K/s
Current DISK READ: 0.00 B/s
Current DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO GRAPH[IO]▽ COMMAN
784 be/4 root 0.00 B/s 0.00 B/s 0.00% 96.05% btrfs-transaction
For some reason btrfs-transaction keeps the disk busy at almost 100%, with
no real disk I/O going on, system load goes up to ~140. A few (maybe 10
minutes later) the system recovers and sync(1) comes back too.
Any ideas on how to debug this?
Thanks,
Christian.
> The system load applet is still working and shows very high I/O wait
> times, and I wait ~60 seconds and the system recovers and works again.
> And while all this happened rarely in the past, I notice it more often
> lately.
>
> Some specs: Thinkpad T470, with a single Crucial NVMe 1 TB disk installed.
> I opted for full-disk-encryption and so the rootfs is (compressed) Btrfs
> on-top a LUKS2 dm-crypt device. I'm not using snapshots.
>
> I ran a full btrfs-balance some time ago, but I can't say it helped much.
> If recommended, I can run another one of course.
>
> Whenever these stalls happen, and I manage to fire up iotop-c, I can see
> "btrfs-transaction" at the very top, utilizing 100% of the disk's IO, but
> not generating _that_ much of I/O traffic (the encrypted disk can do ~250
> MB/s read). With even more luck I manage to run "perf record" during these
> stalls ("cd /tmp" before, so it can record to tmpfs, because of the
> stall), and while I can see "btrfs-transaction" in there, I don't see it
> in the top-10. Or maybe I'm to stupid to use perf report:
>
> https://nerdbynature.de/bits/6.1.14-200/perf_1.data.xz
>
> This all can even be reproduced somewhat reliably: I was writing above how
> fast the disk is, and wanted to test _write_ speed as well:
>
> $ pv < /dev/zero > /foo
> [wait....1.7GB/s.., cool...ok, let's stop after 9GB or so...]
> ^C
> $ rm -f /foo
>
> And all that comes back instantly. But then running "/bin/sync" afterwards
> took 8 minutes (!) until the command came back, and I can see the I/O wait
> again in the system load applet in my window manager. This time I could
> no longer start iotop-c, but "perf record" recorded something, if you want
> to have a look:
>
> https://nerdbynature.de/bits/6.1.14-200/perf_2.data.xz
>
> Some Btrfs infos below. Does anybody have an idea what's going on? Or how
> to debug this? Disable compression? Run another btrfs-balance? fsck?
>
> Thank you,
> Christian.
>
> $ mount | grep btrfs
> /dev/mapper/luks-root on / type btrfs (rw,relatime,seclabel,compress=zstd:3,ssd,discard,space_cache=v2,subvolid=5,subvol=/)
>
> $ df -h /
> Filesystem Size Used Avail Use% Mounted on
> /dev/mapper/luks-root 250G 162G 88G 66% /
>
> $ btrfs filesystem df /
> Data, single: total=164.00GiB, used=159.79GiB
> System, single: total=32.00MiB, used=48.00KiB
> Metadata, single: total=3.00GiB, used=1.73GiB
> GlobalReserve, single: total=329.44MiB, used=0.00B
>
> $ sudo btrfs filesystem usage -T /
> Overall:
> Device size: 249.98GiB
> Device allocated: 167.03GiB
> Device unallocated: 82.95GiB
> Device missing: 0.00B
> Device slack: 0.00B
> Used: 161.53GiB
> Free (estimated): 87.16GiB (min: 87.16GiB)
> Free (statfs, df): 87.16GiB
> Data ratio: 1.00
> Metadata ratio: 1.00
> Global reserve: 329.44MiB (used: 0.00B)
> Multiple profiles: no
>
> Data Metadata System
> Id Path single single single Unallocated Total Slack
> -- --------------------- --------- -------- -------- ----------- --------- ----
> 1 /dev/mapper/luks-root 164.00GiB 3.00GiB 32.00MiB 82.95GiB 249.98GiB -
> -- --------------------- --------- -------- -------- ----------- --------- ----
> Total 164.00GiB 3.00GiB 32.00MiB 82.95GiB 249.98GiB 0.00B
> Used 159.79GiB 1.73GiB 48.00KiB
>
>
> $ sudo compsize -x /
> Processed 1263684 files, 1173537 regular extents (1252947 refs), 612785 inline.
> Type Perc Disk Usage Uncompressed Referenced
> TOTAL 82% 160G 193G 199G
> none 100% 144G 144G 145G
> zlib 33% 14K 43K 43K
> zstd 32% 15G 48G 53G
> prealloc 100% 66M 66M 72M
>
> --
> BOFH excuse #227:
>
> Fatal error right in front of screen
>
--
BOFH excuse #279:
The static electricity routing is acting up...
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs-transaction stalls
2023-04-17 21:30 ` Christian Kujau
@ 2023-04-17 22:56 ` Qu Wenruo
2023-04-18 10:50 ` Christian Kujau
0 siblings, 1 reply; 11+ messages in thread
From: Qu Wenruo @ 2023-04-17 22:56 UTC (permalink / raw)
To: Christian Kujau, linux-btrfs
On 2023/4/18 05:30, Christian Kujau wrote:
> On Thu, 2 Mar 2023, Christian Kujau wrote:
>> this Fedora 37 workstation (always running the latest distribution kernel,
>> 6.1.14-200 right now) has been installed 2 years ago but lately I noticed
>> some strange stalls when working: browser windows no longer react,
>> sometimes even the Gnome "oh, this window does not respond, do you want to
>> kill it?" popup comes up, and the system feels very unresponsive.
>
> No dice. Even after another full re-balance, the problem is back, and even
> worse. Now with 6.2.10-200.fc37.x86_64:
>
> $ sudo iotop-c -o
> Total DISK READ: 0.00 B/s
> Total DISK WRITE: 10.18 K/s
> Current DISK READ: 0.00 B/s
> Current DISK WRITE: 0.00 B/s
> TID PRIO USER DISK READ DISK WRITE SWAPIN IO GRAPH[IO]▽ COMMAN
> 784 be/4 root 0.00 B/s 0.00 B/s 0.00% 96.05% btrfs-transaction
>
> For some reason btrfs-transaction keeps the disk busy at almost 100%, with
> no real disk I/O going on, system load goes up to ~140. A few (maybe 10
> minutes later) the system recovers and sync(1) comes back too.
>
> Any ideas on how to debug this?
Disable qgroup, or don't delete large shared subvolumes, or use the
latest /sys/fs/btrfs/<UUID>/qgroups/drop_subtree_threshold file to make
qgroup to automatically skip such large snapshot drop.
Thanks,
Qu
>
> Thanks,
> Christian.
>
>
>> The system load applet is still working and shows very high I/O wait
>> times, and I wait ~60 seconds and the system recovers and works again.
>> And while all this happened rarely in the past, I notice it more often
>> lately.
>>
>> Some specs: Thinkpad T470, with a single Crucial NVMe 1 TB disk installed.
>> I opted for full-disk-encryption and so the rootfs is (compressed) Btrfs
>> on-top a LUKS2 dm-crypt device. I'm not using snapshots.
>>
>> I ran a full btrfs-balance some time ago, but I can't say it helped much.
>> If recommended, I can run another one of course.
>>
>> Whenever these stalls happen, and I manage to fire up iotop-c, I can see
>> "btrfs-transaction" at the very top, utilizing 100% of the disk's IO, but
>> not generating _that_ much of I/O traffic (the encrypted disk can do ~250
>> MB/s read). With even more luck I manage to run "perf record" during these
>> stalls ("cd /tmp" before, so it can record to tmpfs, because of the
>> stall), and while I can see "btrfs-transaction" in there, I don't see it
>> in the top-10. Or maybe I'm to stupid to use perf report:
>>
>> https://nerdbynature.de/bits/6.1.14-200/perf_1.data.xz
>>
>> This all can even be reproduced somewhat reliably: I was writing above how
>> fast the disk is, and wanted to test _write_ speed as well:
>>
>> $ pv < /dev/zero > /foo
>> [wait....1.7GB/s.., cool...ok, let's stop after 9GB or so...]
>> ^C
>> $ rm -f /foo
>>
>> And all that comes back instantly. But then running "/bin/sync" afterwards
>> took 8 minutes (!) until the command came back, and I can see the I/O wait
>> again in the system load applet in my window manager. This time I could
>> no longer start iotop-c, but "perf record" recorded something, if you want
>> to have a look:
>>
>> https://nerdbynature.de/bits/6.1.14-200/perf_2.data.xz
>>
>> Some Btrfs infos below. Does anybody have an idea what's going on? Or how
>> to debug this? Disable compression? Run another btrfs-balance? fsck?
>>
>> Thank you,
>> Christian.
>>
>> $ mount | grep btrfs
>> /dev/mapper/luks-root on / type btrfs (rw,relatime,seclabel,compress=zstd:3,ssd,discard,space_cache=v2,subvolid=5,subvol=/)
>>
>> $ df -h /
>> Filesystem Size Used Avail Use% Mounted on
>> /dev/mapper/luks-root 250G 162G 88G 66% /
>>
>> $ btrfs filesystem df /
>> Data, single: total=164.00GiB, used=159.79GiB
>> System, single: total=32.00MiB, used=48.00KiB
>> Metadata, single: total=3.00GiB, used=1.73GiB
>> GlobalReserve, single: total=329.44MiB, used=0.00B
>>
>> $ sudo btrfs filesystem usage -T /
>> Overall:
>> Device size: 249.98GiB
>> Device allocated: 167.03GiB
>> Device unallocated: 82.95GiB
>> Device missing: 0.00B
>> Device slack: 0.00B
>> Used: 161.53GiB
>> Free (estimated): 87.16GiB (min: 87.16GiB)
>> Free (statfs, df): 87.16GiB
>> Data ratio: 1.00
>> Metadata ratio: 1.00
>> Global reserve: 329.44MiB (used: 0.00B)
>> Multiple profiles: no
>>
>> Data Metadata System
>> Id Path single single single Unallocated Total Slack
>> -- --------------------- --------- -------- -------- ----------- --------- ----
>> 1 /dev/mapper/luks-root 164.00GiB 3.00GiB 32.00MiB 82.95GiB 249.98GiB -
>> -- --------------------- --------- -------- -------- ----------- --------- ----
>> Total 164.00GiB 3.00GiB 32.00MiB 82.95GiB 249.98GiB 0.00B
>> Used 159.79GiB 1.73GiB 48.00KiB
>>
>>
>> $ sudo compsize -x /
>> Processed 1263684 files, 1173537 regular extents (1252947 refs), 612785 inline.
>> Type Perc Disk Usage Uncompressed Referenced
>> TOTAL 82% 160G 193G 199G
>> none 100% 144G 144G 145G
>> zlib 33% 14K 43K 43K
>> zstd 32% 15G 48G 53G
>> prealloc 100% 66M 66M 72M
>>
>> --
>> BOFH excuse #227:
>>
>> Fatal error right in front of screen
>>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs-transaction stalls
2023-04-17 22:56 ` Qu Wenruo
@ 2023-04-18 10:50 ` Christian Kujau
2023-05-09 12:30 ` Christian Kujau
0 siblings, 1 reply; 11+ messages in thread
From: Christian Kujau @ 2023-04-18 10:50 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Tue, 18 Apr 2023, Qu Wenruo wrote:
> Disable qgroup, or don't delete large shared subvolumes, or use the latest
> /sys/fs/btrfs/<UUID>/qgroups/drop_subtree_threshold file to make qgroup to
> automatically skip such large snapshot drop.
I don't have Quota Groups enabled anywhere on the system, and snapshots
are disabled too, see below. There's no qgroups subdirectory on this
machine, only:
$ grep . /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/*
grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/allocation: Is a directory
grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/bdi: Is a directory
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/bg_reclaim_threshold:75
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/checksum:crc32c (crc32c-intel)
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/clone_alignment:4096
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/commit_stats:commits 2106
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/commit_stats:last_commit_ms 2805
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/commit_stats:max_commit_ms 371767
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/commit_stats:total_commit_ms 6460389
grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/devices: Is a directory
grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/devinfo: Is a directory
grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/discard: Is a directory
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/exclusive_operation:none
grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/features: Is a directory
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/generation:636168
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/metadata_uuid:f1093dd0-ba7e-47c7-98de-e0ac85a34972
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/nodesize:16384
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/quota_override:0
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/read_policy:[pid]
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/sectorsize:4096
$ mount | grep btrfs
/dev/mapper/luks-root on / type btrfs (rw,relatime,seclabel,compress=zstd:3,ssd,discard,space_cache=v2,subvolid=5,subvol=/)
$ btrfs qgroup show /
ERROR: can't list qgroups: quotas not enabled
$ btrfs subvolume list /
ID 262 gen 636159 top level 5 path var/lib/machines
ID 263 gen 634062 top level 5 path srv
$ rpm -qf /var/lib/machines /srv
systemd-container-251.14-2.fc37.x86_64
filesystem-3.18-2.fc37.x86_64
The subvolumes must have been added by the distribution, but are not
actively used (both directories are empty).
Thanks,
Christian.
>
> Thanks,
> Qu
> >
> > Thanks,
> > Christian.
> >
> >
> > > The system load applet is still working and shows very high I/O wait
> > > times, and I wait ~60 seconds and the system recovers and works again.
> > > And while all this happened rarely in the past, I notice it more often
> > > lately.
> > >
> > > Some specs: Thinkpad T470, with a single Crucial NVMe 1 TB disk installed.
> > > I opted for full-disk-encryption and so the rootfs is (compressed) Btrfs
> > > on-top a LUKS2 dm-crypt device. I'm not using snapshots.
> > >
> > > I ran a full btrfs-balance some time ago, but I can't say it helped much.
> > > If recommended, I can run another one of course.
> > >
> > > Whenever these stalls happen, and I manage to fire up iotop-c, I can see
> > > "btrfs-transaction" at the very top, utilizing 100% of the disk's IO, but
> > > not generating _that_ much of I/O traffic (the encrypted disk can do ~250
> > > MB/s read). With even more luck I manage to run "perf record" during these
> > > stalls ("cd /tmp" before, so it can record to tmpfs, because of the
> > > stall), and while I can see "btrfs-transaction" in there, I don't see it
> > > in the top-10. Or maybe I'm to stupid to use perf report:
> > >
> > > https://nerdbynature.de/bits/6.1.14-200/perf_1.data.xz
> > >
> > > This all can even be reproduced somewhat reliably: I was writing above how
> > > fast the disk is, and wanted to test _write_ speed as well:
> > >
> > > $ pv < /dev/zero > /foo
> > > [wait....1.7GB/s.., cool...ok, let's stop after 9GB or so...]
> > > ^C
> > > $ rm -f /foo
> > >
> > > And all that comes back instantly. But then running "/bin/sync" afterwards
> > > took 8 minutes (!) until the command came back, and I can see the I/O wait
> > > again in the system load applet in my window manager. This time I could
> > > no longer start iotop-c, but "perf record" recorded something, if you want
> > > to have a look:
> > >
> > > https://nerdbynature.de/bits/6.1.14-200/perf_2.data.xz
> > >
> > > Some Btrfs infos below. Does anybody have an idea what's going on? Or how
> > > to debug this? Disable compression? Run another btrfs-balance? fsck?
> > >
> > > Thank you,
> > > Christian.
> > >
> > > $ mount | grep btrfs
> > > /dev/mapper/luks-root on / type btrfs
> > > (rw,relatime,seclabel,compress=zstd:3,ssd,discard,space_cache=v2,subvolid=5,subvol=/)
> > >
> > > $ df -h /
> > > Filesystem Size Used Avail Use% Mounted on
> > > /dev/mapper/luks-root 250G 162G 88G 66% /
> > >
> > > $ btrfs filesystem df /
> > > Data, single: total=164.00GiB, used=159.79GiB
> > > System, single: total=32.00MiB, used=48.00KiB
> > > Metadata, single: total=3.00GiB, used=1.73GiB
> > > GlobalReserve, single: total=329.44MiB, used=0.00B
> > >
> > > $ sudo btrfs filesystem usage -T /
> > > Overall:
> > > Device size: 249.98GiB
> > > Device allocated: 167.03GiB
> > > Device unallocated: 82.95GiB
> > > Device missing: 0.00B
> > > Device slack: 0.00B
> > > Used: 161.53GiB
> > > Free (estimated): 87.16GiB (min: 87.16GiB)
> > > Free (statfs, df): 87.16GiB
> > > Data ratio: 1.00
> > > Metadata ratio: 1.00
> > > Global reserve: 329.44MiB (used: 0.00B)
> > > Multiple profiles: no
> > >
> > > Data Metadata System
> > > Id Path single single single Unallocated Total
> > > Slack
> > > -- --------------------- --------- -------- -------- -----------
> > > --------- ----
> > > 1 /dev/mapper/luks-root 164.00GiB 3.00GiB 32.00MiB 82.95GiB
> > > 249.98GiB -
> > > -- --------------------- --------- -------- -------- -----------
> > > --------- ----
> > > Total 164.00GiB 3.00GiB 32.00MiB 82.95GiB
> > > 249.98GiB 0.00B
> > > Used 159.79GiB 1.73GiB 48.00KiB
> > >
> > >
> > > $ sudo compsize -x /
> > > Processed 1263684 files, 1173537 regular extents (1252947 refs), 612785
> > > inline.
> > > Type Perc Disk Usage Uncompressed Referenced
> > > TOTAL 82% 160G 193G 199G
> > > none 100% 144G 144G 145G
> > > zlib 33% 14K 43K 43K
> > > zstd 32% 15G 48G 53G
> > > prealloc 100% 66M 66M 72M
> > >
> > > --
> > > BOFH excuse #227:
> > >
> > > Fatal error right in front of screen
> > >
> >
>
--
BOFH excuse #99:
SIMM crosstalk.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs-transaction stalls
2023-04-18 10:50 ` Christian Kujau
@ 2023-05-09 12:30 ` Christian Kujau
2023-05-09 19:48 ` Chris Murphy
0 siblings, 1 reply; 11+ messages in thread
From: Christian Kujau @ 2023-05-09 12:30 UTC (permalink / raw)
To: linux-btrfs; +Cc: Qu Wenruo
After upgrading from Fedora 37 to Fedora 38 the problem did NOT magically
go away (heh!) and I've run another full balance (and a scrub too),
although this did not help much in the past.
I've since switched the mount option from "discard" to "discard=async" and
the problem has not recurred....yet :-)
Christian.
On Tue, 18 Apr 2023, Christian Kujau wrote:
> On Tue, 18 Apr 2023, Qu Wenruo wrote:
> > Disable qgroup, or don't delete large shared subvolumes, or use the latest
> > /sys/fs/btrfs/<UUID>/qgroups/drop_subtree_threshold file to make qgroup to
> > automatically skip such large snapshot drop.
>
> I don't have Quota Groups enabled anywhere on the system, and snapshots
> are disabled too, see below. There's no qgroups subdirectory on this
> machine, only:
>
> $ grep . /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/*
> grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/allocation: Is a directory
> grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/bdi: Is a directory
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/bg_reclaim_threshold:75
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/checksum:crc32c (crc32c-intel)
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/clone_alignment:4096
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/commit_stats:commits 2106
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/commit_stats:last_commit_ms 2805
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/commit_stats:max_commit_ms 371767
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/commit_stats:total_commit_ms 6460389
> grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/devices: Is a directory
> grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/devinfo: Is a directory
> grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/discard: Is a directory
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/exclusive_operation:none
> grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/features: Is a directory
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/generation:636168
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/metadata_uuid:f1093dd0-ba7e-47c7-98de-e0ac85a34972
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/nodesize:16384
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/quota_override:0
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/read_policy:[pid]
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/sectorsize:4096
>
>
> $ mount | grep btrfs
> /dev/mapper/luks-root on / type btrfs (rw,relatime,seclabel,compress=zstd:3,ssd,discard,space_cache=v2,subvolid=5,subvol=/)
>
> $ btrfs qgroup show /
> ERROR: can't list qgroups: quotas not enabled
>
> $ btrfs subvolume list /
> ID 262 gen 636159 top level 5 path var/lib/machines
> ID 263 gen 634062 top level 5 path srv
>
> $ rpm -qf /var/lib/machines /srv
> systemd-container-251.14-2.fc37.x86_64
> filesystem-3.18-2.fc37.x86_64
>
> The subvolumes must have been added by the distribution, but are not
> actively used (both directories are empty).
>
>
> Thanks,
> Christian.
>
> >
> > Thanks,
> > Qu
> > >
> > > Thanks,
> > > Christian.
> > >
> > >
> > > > The system load applet is still working and shows very high I/O wait
> > > > times, and I wait ~60 seconds and the system recovers and works again.
> > > > And while all this happened rarely in the past, I notice it more often
> > > > lately.
> > > >
> > > > Some specs: Thinkpad T470, with a single Crucial NVMe 1 TB disk installed.
> > > > I opted for full-disk-encryption and so the rootfs is (compressed) Btrfs
> > > > on-top a LUKS2 dm-crypt device. I'm not using snapshots.
> > > >
> > > > I ran a full btrfs-balance some time ago, but I can't say it helped much.
> > > > If recommended, I can run another one of course.
> > > >
> > > > Whenever these stalls happen, and I manage to fire up iotop-c, I can see
> > > > "btrfs-transaction" at the very top, utilizing 100% of the disk's IO, but
> > > > not generating _that_ much of I/O traffic (the encrypted disk can do ~250
> > > > MB/s read). With even more luck I manage to run "perf record" during these
> > > > stalls ("cd /tmp" before, so it can record to tmpfs, because of the
> > > > stall), and while I can see "btrfs-transaction" in there, I don't see it
> > > > in the top-10. Or maybe I'm to stupid to use perf report:
> > > >
> > > > https://nerdbynature.de/bits/6.1.14-200/perf_1.data.xz
> > > >
> > > > This all can even be reproduced somewhat reliably: I was writing above how
> > > > fast the disk is, and wanted to test _write_ speed as well:
> > > >
> > > > $ pv < /dev/zero > /foo
> > > > [wait....1.7GB/s.., cool...ok, let's stop after 9GB or so...]
> > > > ^C
> > > > $ rm -f /foo
> > > >
> > > > And all that comes back instantly. But then running "/bin/sync" afterwards
> > > > took 8 minutes (!) until the command came back, and I can see the I/O wait
> > > > again in the system load applet in my window manager. This time I could
> > > > no longer start iotop-c, but "perf record" recorded something, if you want
> > > > to have a look:
> > > >
> > > > https://nerdbynature.de/bits/6.1.14-200/perf_2.data.xz
> > > >
> > > > Some Btrfs infos below. Does anybody have an idea what's going on? Or how
> > > > to debug this? Disable compression? Run another btrfs-balance? fsck?
> > > >
> > > > Thank you,
> > > > Christian.
> > > >
> > > > $ mount | grep btrfs
> > > > /dev/mapper/luks-root on / type btrfs
> > > > (rw,relatime,seclabel,compress=zstd:3,ssd,discard,space_cache=v2,subvolid=5,subvol=/)
> > > >
> > > > $ df -h /
> > > > Filesystem Size Used Avail Use% Mounted on
> > > > /dev/mapper/luks-root 250G 162G 88G 66% /
> > > >
> > > > $ btrfs filesystem df /
> > > > Data, single: total=164.00GiB, used=159.79GiB
> > > > System, single: total=32.00MiB, used=48.00KiB
> > > > Metadata, single: total=3.00GiB, used=1.73GiB
> > > > GlobalReserve, single: total=329.44MiB, used=0.00B
> > > >
> > > > $ sudo btrfs filesystem usage -T /
> > > > Overall:
> > > > Device size: 249.98GiB
> > > > Device allocated: 167.03GiB
> > > > Device unallocated: 82.95GiB
> > > > Device missing: 0.00B
> > > > Device slack: 0.00B
> > > > Used: 161.53GiB
> > > > Free (estimated): 87.16GiB (min: 87.16GiB)
> > > > Free (statfs, df): 87.16GiB
> > > > Data ratio: 1.00
> > > > Metadata ratio: 1.00
> > > > Global reserve: 329.44MiB (used: 0.00B)
> > > > Multiple profiles: no
> > > >
> > > > Data Metadata System
> > > > Id Path single single single Unallocated Total
> > > > Slack
> > > > -- --------------------- --------- -------- -------- -----------
> > > > --------- ----
> > > > 1 /dev/mapper/luks-root 164.00GiB 3.00GiB 32.00MiB 82.95GiB
> > > > 249.98GiB -
> > > > -- --------------------- --------- -------- -------- -----------
> > > > --------- ----
> > > > Total 164.00GiB 3.00GiB 32.00MiB 82.95GiB
> > > > 249.98GiB 0.00B
> > > > Used 159.79GiB 1.73GiB 48.00KiB
> > > >
> > > >
> > > > $ sudo compsize -x /
> > > > Processed 1263684 files, 1173537 regular extents (1252947 refs), 612785
> > > > inline.
> > > > Type Perc Disk Usage Uncompressed Referenced
> > > > TOTAL 82% 160G 193G 199G
> > > > none 100% 144G 144G 145G
> > > > zlib 33% 14K 43K 43K
> > > > zstd 32% 15G 48G 53G
> > > > prealloc 100% 66M 66M 72M
> > > >
> > > > --
> > > > BOFH excuse #227:
> > > >
> > > > Fatal error right in front of screen
> > > >
> > >
> >
>
> --
> BOFH excuse #99:
>
> SIMM crosstalk.
>
--
BOFH excuse #391:
We already sent around a notice about that.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs-transaction stalls
2023-05-09 12:30 ` Christian Kujau
@ 2023-05-09 19:48 ` Chris Murphy
2023-05-09 21:48 ` Christian Kujau
0 siblings, 1 reply; 11+ messages in thread
From: Chris Murphy @ 2023-05-09 19:48 UTC (permalink / raw)
To: Christian Kujau, Btrfs BTRFS; +Cc: Qu Wenruo
On Tue, May 9, 2023, at 8:30 AM, Christian Kujau wrote:
> After upgrading from Fedora 37 to Fedora 38 the problem did NOT magically
> go away (heh!) and I've run another full balance (and a scrub too),
> although this did not help much in the past.
>
> I've since switched the mount option from "discard" to "discard=async" and
> the problem has not recurred....yet :-)
There are some edge case performance issues with async discards that should be fixed in 6.2.13 or newer. I suggest upgrading your kernel andremoving the discard inhibition so it is used (the default). And report back if it fixes the problem or not.
--
Chris Murphy
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs-transaction stalls
2023-05-09 19:48 ` Chris Murphy
@ 2023-05-09 21:48 ` Christian Kujau
2023-05-10 1:05 ` Chris Murphy
0 siblings, 1 reply; 11+ messages in thread
From: Christian Kujau @ 2023-05-09 21:48 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS, Qu Wenruo
On Tue, 9 May 2023, Chris Murphy wrote:
> On Tue, May 9, 2023, at 8:30 AM, Christian Kujau wrote:
> > After upgrading from Fedora 37 to Fedora 38 the problem did NOT magically
> > go away (heh!) and I've run another full balance (and a scrub too),
> > although this did not help much in the past.
> >
> > I've since switched the mount option from "discard" to "discard=async" and
> > the problem has not recurred....yet :-)
>
> There are some edge case performance issues with async discards that
> should be fixed in 6.2.13 or newer. I suggest upgrading your kernel
> andremoving the discard inhibition so it is used (the default). And
> report back if it fixes the problem or not.
Well, Fedora 38 has 6.2.14-300.fc38.x86_64, but I had these weird stalls
right after the upgrade to F38 again. But "discard=async" (instead of
plain "discard", which defaults to "discard=sync") appears to help. Knock
on wood.
Christian.
--
BOFH excuse #299:
The data on your hard drive is out of balance.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs-transaction stalls
2023-05-09 21:48 ` Christian Kujau
@ 2023-05-10 1:05 ` Chris Murphy
2023-05-10 6:17 ` Christian Kujau
0 siblings, 1 reply; 11+ messages in thread
From: Chris Murphy @ 2023-05-10 1:05 UTC (permalink / raw)
To: Christian Kujau; +Cc: Btrfs BTRFS, Qu Wenruo
On Tue, May 9, 2023, at 5:48 PM, Christian Kujau wrote:
> On Tue, 9 May 2023, Chris Murphy wrote:
>> On Tue, May 9, 2023, at 8:30 AM, Christian Kujau wrote:
>> > After upgrading from Fedora 37 to Fedora 38 the problem did NOT magically
>> > go away (heh!) and I've run another full balance (and a scrub too),
>> > although this did not help much in the past.
>> >
>> > I've since switched the mount option from "discard" to "discard=async" and
>> > the problem has not recurred....yet :-)
>>
>> There are some edge case performance issues with async discards that
>> should be fixed in 6.2.13 or newer. I suggest upgrading your kernel
>> andremoving the discard inhibition so it is used (the default). And
>> report back if it fixes the problem or not.
>
> Well, Fedora 38 has 6.2.14-300.fc38.x86_64, but I had these weird stalls
> right after the upgrade to F38 again. But "discard=async" (instead of
> plain "discard", which defaults to "discard=sync") appears to help. Knock
> on wood.
That is confusing. If you do not specify any discard mount option, 6.2 kernels default to discard=async.
If you are manually specifying only discard, you should get async discards, not sync discards. I think sync discards is a bug. Surely with 6.2 kernels and newer it should be async, but arguably it should be backported to all stable kernels still accepting changes. If not specified, async should be implied.
--
Chris Murphy
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs-transaction stalls
2023-05-10 1:05 ` Chris Murphy
@ 2023-05-10 6:17 ` Christian Kujau
2023-05-11 10:41 ` Neal Gompa
0 siblings, 1 reply; 11+ messages in thread
From: Christian Kujau @ 2023-05-10 6:17 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS, Qu Wenruo
On Tue, 9 May 2023, Chris Murphy wrote:
> That is confusing.
It is! :-)
> If you do not specify any discard mount option, 6.2 kernels default to
> discard=async.
So, ever since I installed this system (Lenovo T470, with an NVME disk)
the "discard" option was added (by me or the install routine) to the mount
options of the root disk. And btrfs(5) on this Fedoa 38 system (the man
page says "6.2.2" and "Mar 26, 2023") states:
discard, discard=sync, discard=async, nodiscard
(default: off, async support since: 5.6)
So, the I'd assume that no discard option means...no discard. But you are
saying that "discard=async" is now default, when the option is not
specified at all?
The man page further states:
In the synchronous mode (sync or without option value), lack of
asynchronous queued TRIM on the backing device TRIM can severely degrade
performance
...but I did not know that until a few days ago. So, I thought: "OK, I had
'discard' specified for ages here, and that defaults to 'discard=sync', so
better change that to 'discard=async' now." And ever since I did that
(i.e. since last week) these weird stalls are now gone.
> If you are manually specifying only discard, you should get async
> discards, not sync discards. I think sync discards is a bug. Surely with
> 6.2 kernels and newer it should be async, but arguably it should be
> backported to all stable kernels still accepting changes. If not
> specified, async should be implied.
So, I'm not using vanilla Linux, I'm using 6.2.14-300.fc38.x86_64 from the
Fedora distribution, but I doubt that they patched the man pages, as they
sure state something else here: "discard is off by default, and discard
without options defaults to sync". Or I'm misreading the man page, that's
entirely possible too :-)
Thanks,
Christian.
--
BOFH excuse #230:
Lusers learning curve appears to be fractal
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs-transaction stalls
2023-05-10 6:17 ` Christian Kujau
@ 2023-05-11 10:41 ` Neal Gompa
2023-05-11 12:55 ` Christian Kujau
0 siblings, 1 reply; 11+ messages in thread
From: Neal Gompa @ 2023-05-11 10:41 UTC (permalink / raw)
To: Christian Kujau; +Cc: Chris Murphy, Btrfs BTRFS, Qu Wenruo
On Wed, May 10, 2023 at 2:25 AM Christian Kujau <lists@nerdbynature.de> wrote:
>
> On Tue, 9 May 2023, Chris Murphy wrote:
> > That is confusing.
>
> It is! :-)
>
> > If you do not specify any discard mount option, 6.2 kernels default to
> > discard=async.
>
> So, ever since I installed this system (Lenovo T470, with an NVME disk)
> the "discard" option was added (by me or the install routine) to the mount
> options of the root disk. And btrfs(5) on this Fedoa 38 system (the man
> page says "6.2.2" and "Mar 26, 2023") states:
>
> discard, discard=sync, discard=async, nodiscard
> (default: off, async support since: 5.6)
>
> So, the I'd assume that no discard option means...no discard. But you are
> saying that "discard=async" is now default, when the option is not
> specified at all?
>
> The man page further states:
>
> In the synchronous mode (sync or without option value), lack of
> asynchronous queued TRIM on the backing device TRIM can severely degrade
> performance
>
> ...but I did not know that until a few days ago. So, I thought: "OK, I had
> 'discard' specified for ages here, and that defaults to 'discard=sync', so
> better change that to 'discard=async' now." And ever since I did that
> (i.e. since last week) these weird stalls are now gone.
>
> > If you are manually specifying only discard, you should get async
> > discards, not sync discards. I think sync discards is a bug. Surely with
> > 6.2 kernels and newer it should be async, but arguably it should be
> > backported to all stable kernels still accepting changes. If not
> > specified, async should be implied.
>
> So, I'm not using vanilla Linux, I'm using 6.2.14-300.fc38.x86_64 from the
> Fedora distribution, but I doubt that they patched the man pages, as they
> sure state something else here: "discard is off by default, and discard
> without options defaults to sync". Or I'm misreading the man page, that's
> entirely possible too :-)
>
So what's probably going on is that "discard" triggers sync discards,
whereas having nothing or specifying "discard=async" will do async
discards.
We're not patching this in Fedora, so this is certainly an upstream Linux bug.
--
真実はいつも一つ!/ Always, there's only one truth!
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs-transaction stalls
2023-05-11 10:41 ` Neal Gompa
@ 2023-05-11 12:55 ` Christian Kujau
0 siblings, 0 replies; 11+ messages in thread
From: Christian Kujau @ 2023-05-11 12:55 UTC (permalink / raw)
To: Neal Gompa; +Cc: Chris Murphy, Btrfs BTRFS, Qu Wenruo
On Thu, 11 May 2023, Neal Gompa wrote:
> So what's probably going on is that "discard" triggers sync discards,
> whereas having nothing or specifying "discard=async" will do async
> discards.
So, Chris wrote to me the following:
-------------------------------------------
It's fixed in dev and just hasn't found its way into Fedora yet.
commit 93d51b286c848bc21900c105ce9fae1286ff65b4
Author: David Sterba <dsterba@suse.com>
Date: Wed Apr 26 12:42:39 2023 +0200
btrfs-progs: docs: mention discard=async in mount options
Issue: #617
Signed-off-by: David Sterba <dsterba@suse.com>
https://github.com/kdave/btrfs-progs/issues/617
-------------------------------------------
And discard=async is the default since Linux 6.2, even when no "discard"
option is specified at all. Interestingly, the current version of that
file[0] still reads:
> In the synchronous mode (sync or without option value),
...which I find somewhat counter-intuitive:
* no option means "discard=async"
* discard means "discard=sync"
...I would assume that "discard" defaults to "discard=async" (since this
is the default with 6.2 kernels anyway) and only with "discard=sync" one
can force its synchronous mode. But either way, the man page has been
corrected now and my weird stalls appear to be gone now too, hooray!
Thanks,
Christian.
[0] https://github.com/kdave/btrfs-progs/blob/master/Documentation/ch-mount-options.rst
--
BOFH excuse #42:
spaghetti cable cause packet failure
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-05-11 12:55 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-02 13:42 btrfs-transaction stalls Christian Kujau
2023-04-17 21:30 ` Christian Kujau
2023-04-17 22:56 ` Qu Wenruo
2023-04-18 10:50 ` Christian Kujau
2023-05-09 12:30 ` Christian Kujau
2023-05-09 19:48 ` Chris Murphy
2023-05-09 21:48 ` Christian Kujau
2023-05-10 1:05 ` Chris Murphy
2023-05-10 6:17 ` Christian Kujau
2023-05-11 10:41 ` Neal Gompa
2023-05-11 12:55 ` Christian Kujau
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.