All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs-transaction stalls
@ 2023-03-02 13:42 Christian Kujau
  2023-04-17 21:30 ` Christian Kujau
  0 siblings, 1 reply; 11+ messages in thread
From: Christian Kujau @ 2023-03-02 13:42 UTC (permalink / raw)
  To: linux-btrfs

Hi,

this Fedora 37 workstation (always running the latest distribution kernel, 
6.1.14-200 right now) has been installed 2 years ago but lately I noticed 
some strange stalls when working: browser windows no longer react, 
sometimes even the Gnome "oh, this window does not respond, do you want to 
kill it?" popup comes up, and the system feels very unresponse. The system 
load applet is still working and shows very high I/O wait times, and I 
wait ~60 seconds and the system recovers and works again. And while all 
this happened rarely in the past, I notice it more often lately.

Some specs: Thinkpad T470, with a single Crucial NVMe 1 TB disk installed. 
I opted for full-disk-encryption and so the rootfs is (compressed) Btrfs 
on-top a LUKS2 dm-crypt device. I'm not using snapshots.

I ran a full btrfs-balance some time ago, but I can't say it helped much. 
If recommended, I can run another one of course. 

Whenever these stalls happen, and I manage to fire up iotop-c, I can see 
"btrfs-transaction" at the very top, utilizing 100% of the disk's IO, but 
not generating _that_ much of I/O traffic (the encrypted disk can do ~250 
MB/s read). With even more luck I manage to run "perf record" during these 
stalls ("cd /tmp" before, so it can record to tmpfs, because of the 
stall), and while I can see "btrfs-transaction" in there, I don't see it 
in the top-10. Or maybe I'm to stupid to use perf report:

  https://nerdbynature.de/bits/6.1.14-200/perf_1.data.xz

This all can even be reproduced somewhat reliably: I was writing above how 
fast the disk is, and wanted to test _write_ speed as well:

 $ pv < /dev/zero > /foo
 [wait....1.7GB/s.., cool...ok, let's stop after 9GB or so...]
 ^C
 $ rm -f /foo

And all that comes back instantly. But then running "/bin/sync" afterwards 
took 8 minutes (!) until the command came back, and I can see the I/O wait 
again in the system load applet in my window manager. This time I could 
no longer start iotop-c, but "perf record" recorded something, if you want 
to have a look:

 https://nerdbynature.de/bits/6.1.14-200/perf_2.data.xz

Some Btrfs infos below. Does anybody have an idea what's going on? Or how 
to debug this? Disable compression? Run another btrfs-balance? fsck?

Thank you,
Christian.

$ mount | grep btrfs
/dev/mapper/luks-root on / type btrfs (rw,relatime,seclabel,compress=zstd:3,ssd,discard,space_cache=v2,subvolid=5,subvol=/)

$ df -h /
Filesystem             Size  Used Avail Use% Mounted on
/dev/mapper/luks-root  250G  162G   88G  66% /

$ btrfs filesystem df /
Data, single: total=164.00GiB, used=159.79GiB
System, single: total=32.00MiB, used=48.00KiB
Metadata, single: total=3.00GiB, used=1.73GiB
GlobalReserve, single: total=329.44MiB, used=0.00B

$ sudo btrfs filesystem usage -T /
Overall:
    Device size:                 249.98GiB
    Device allocated:            167.03GiB
    Device unallocated:           82.95GiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                        161.53GiB
    Free (estimated):             87.16GiB      (min: 87.16GiB)
    Free (statfs, df):            87.16GiB
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:              329.44MiB      (used: 0.00B)
    Multiple profiles:                  no

                         Data      Metadata System
Id Path                  single    single   single   Unallocated Total     Slack
-- --------------------- --------- -------- -------- ----------- --------- ----
 1 /dev/mapper/luks-root 164.00GiB  3.00GiB 32.00MiB    82.95GiB 249.98GiB -
-- --------------------- --------- -------- -------- ----------- --------- ----
   Total                 164.00GiB  3.00GiB 32.00MiB    82.95GiB 249.98GiB 0.00B
   Used                  159.79GiB  1.73GiB 48.00KiB


$ sudo compsize -x /
Processed 1263684 files, 1173537 regular extents (1252947 refs), 612785 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       82%      160G         193G         199G       
none       100%      144G         144G         145G       
zlib        33%       14K          43K          43K       
zstd        32%       15G          48G          53G       
prealloc   100%       66M          66M          72M       

-- 
BOFH excuse #227:

Fatal error right in front of screen

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs-transaction stalls
  2023-03-02 13:42 btrfs-transaction stalls Christian Kujau
@ 2023-04-17 21:30 ` Christian Kujau
  2023-04-17 22:56   ` Qu Wenruo
  0 siblings, 1 reply; 11+ messages in thread
From: Christian Kujau @ 2023-04-17 21:30 UTC (permalink / raw)
  To: linux-btrfs

On Thu, 2 Mar 2023, Christian Kujau wrote:
> this Fedora 37 workstation (always running the latest distribution kernel, 
> 6.1.14-200 right now) has been installed 2 years ago but lately I noticed 
> some strange stalls when working: browser windows no longer react, 
> sometimes even the Gnome "oh, this window does not respond, do you want to 
> kill it?" popup comes up, and the system feels very unresponsive.

No dice. Even after another full re-balance, the problem is back, and even 
worse. Now with 6.2.10-200.fc37.x86_64:

$ sudo iotop-c -o
Total DISK READ:    0.00 B/s
Total DISK WRITE:  10.18 K/s
Current DISK READ:  0.00 B/s
Current DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO GRAPH[IO]▽ COMMAN
 784 be/4 root 0.00 B/s 0.00 B/s 0.00% 96.05% btrfs-transaction

For some reason btrfs-transaction keeps the disk busy at almost 100%, with 
no real disk I/O going on, system load goes up to ~140. A few (maybe 10 
minutes later) the system recovers and sync(1) comes back too.

Any ideas on how to debug this?

Thanks,
Christian.


> The system load applet is still working and shows very high I/O wait 
> times, and I wait ~60 seconds and the system recovers and works again. 
> And while all this happened rarely in the past, I notice it more often 
> lately.
> 
> Some specs: Thinkpad T470, with a single Crucial NVMe 1 TB disk installed. 
> I opted for full-disk-encryption and so the rootfs is (compressed) Btrfs 
> on-top a LUKS2 dm-crypt device. I'm not using snapshots.
> 
> I ran a full btrfs-balance some time ago, but I can't say it helped much. 
> If recommended, I can run another one of course. 
> 
> Whenever these stalls happen, and I manage to fire up iotop-c, I can see 
> "btrfs-transaction" at the very top, utilizing 100% of the disk's IO, but 
> not generating _that_ much of I/O traffic (the encrypted disk can do ~250 
> MB/s read). With even more luck I manage to run "perf record" during these 
> stalls ("cd /tmp" before, so it can record to tmpfs, because of the 
> stall), and while I can see "btrfs-transaction" in there, I don't see it 
> in the top-10. Or maybe I'm to stupid to use perf report:
> 
>   https://nerdbynature.de/bits/6.1.14-200/perf_1.data.xz
> 
> This all can even be reproduced somewhat reliably: I was writing above how 
> fast the disk is, and wanted to test _write_ speed as well:
> 
>  $ pv < /dev/zero > /foo
>  [wait....1.7GB/s.., cool...ok, let's stop after 9GB or so...]
>  ^C
>  $ rm -f /foo
> 
> And all that comes back instantly. But then running "/bin/sync" afterwards 
> took 8 minutes (!) until the command came back, and I can see the I/O wait 
> again in the system load applet in my window manager. This time I could 
> no longer start iotop-c, but "perf record" recorded something, if you want 
> to have a look:
> 
>  https://nerdbynature.de/bits/6.1.14-200/perf_2.data.xz
> 
> Some Btrfs infos below. Does anybody have an idea what's going on? Or how 
> to debug this? Disable compression? Run another btrfs-balance? fsck?
> 
> Thank you,
> Christian.
> 
> $ mount | grep btrfs
> /dev/mapper/luks-root on / type btrfs (rw,relatime,seclabel,compress=zstd:3,ssd,discard,space_cache=v2,subvolid=5,subvol=/)
> 
> $ df -h /
> Filesystem             Size  Used Avail Use% Mounted on
> /dev/mapper/luks-root  250G  162G   88G  66% /
> 
> $ btrfs filesystem df /
> Data, single: total=164.00GiB, used=159.79GiB
> System, single: total=32.00MiB, used=48.00KiB
> Metadata, single: total=3.00GiB, used=1.73GiB
> GlobalReserve, single: total=329.44MiB, used=0.00B
> 
> $ sudo btrfs filesystem usage -T /
> Overall:
>     Device size:                 249.98GiB
>     Device allocated:            167.03GiB
>     Device unallocated:           82.95GiB
>     Device missing:                  0.00B
>     Device slack:                    0.00B
>     Used:                        161.53GiB
>     Free (estimated):             87.16GiB      (min: 87.16GiB)
>     Free (statfs, df):            87.16GiB
>     Data ratio:                       1.00
>     Metadata ratio:                   1.00
>     Global reserve:              329.44MiB      (used: 0.00B)
>     Multiple profiles:                  no
> 
>                          Data      Metadata System
> Id Path                  single    single   single   Unallocated Total     Slack
> -- --------------------- --------- -------- -------- ----------- --------- ----
>  1 /dev/mapper/luks-root 164.00GiB  3.00GiB 32.00MiB    82.95GiB 249.98GiB -
> -- --------------------- --------- -------- -------- ----------- --------- ----
>    Total                 164.00GiB  3.00GiB 32.00MiB    82.95GiB 249.98GiB 0.00B
>    Used                  159.79GiB  1.73GiB 48.00KiB
> 
> 
> $ sudo compsize -x /
> Processed 1263684 files, 1173537 regular extents (1252947 refs), 612785 inline.
> Type       Perc     Disk Usage   Uncompressed Referenced  
> TOTAL       82%      160G         193G         199G       
> none       100%      144G         144G         145G       
> zlib        33%       14K          43K          43K       
> zstd        32%       15G          48G          53G       
> prealloc   100%       66M          66M          72M       
> 
> -- 
> BOFH excuse #227:
> 
> Fatal error right in front of screen
> 

-- 
BOFH excuse #279:

The static electricity routing is acting up...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs-transaction stalls
  2023-04-17 21:30 ` Christian Kujau
@ 2023-04-17 22:56   ` Qu Wenruo
  2023-04-18 10:50     ` Christian Kujau
  0 siblings, 1 reply; 11+ messages in thread
From: Qu Wenruo @ 2023-04-17 22:56 UTC (permalink / raw)
  To: Christian Kujau, linux-btrfs



On 2023/4/18 05:30, Christian Kujau wrote:
> On Thu, 2 Mar 2023, Christian Kujau wrote:
>> this Fedora 37 workstation (always running the latest distribution kernel,
>> 6.1.14-200 right now) has been installed 2 years ago but lately I noticed
>> some strange stalls when working: browser windows no longer react,
>> sometimes even the Gnome "oh, this window does not respond, do you want to
>> kill it?" popup comes up, and the system feels very unresponsive.
> 
> No dice. Even after another full re-balance, the problem is back, and even
> worse. Now with 6.2.10-200.fc37.x86_64:
> 
> $ sudo iotop-c -o
> Total DISK READ:    0.00 B/s
> Total DISK WRITE:  10.18 K/s
> Current DISK READ:  0.00 B/s
> Current DISK WRITE: 0.00 B/s
> TID PRIO USER DISK READ DISK WRITE SWAPIN IO GRAPH[IO]▽ COMMAN
>   784 be/4 root 0.00 B/s 0.00 B/s 0.00% 96.05% btrfs-transaction
> 
> For some reason btrfs-transaction keeps the disk busy at almost 100%, with
> no real disk I/O going on, system load goes up to ~140. A few (maybe 10
> minutes later) the system recovers and sync(1) comes back too.
> 
> Any ideas on how to debug this?

Disable qgroup, or don't delete large shared subvolumes, or use the 
latest /sys/fs/btrfs/<UUID>/qgroups/drop_subtree_threshold file to make 
qgroup to automatically skip such large snapshot drop.

Thanks,
Qu
> 
> Thanks,
> Christian.
> 
> 
>> The system load applet is still working and shows very high I/O wait
>> times, and I wait ~60 seconds and the system recovers and works again.
>> And while all this happened rarely in the past, I notice it more often
>> lately.
>>
>> Some specs: Thinkpad T470, with a single Crucial NVMe 1 TB disk installed.
>> I opted for full-disk-encryption and so the rootfs is (compressed) Btrfs
>> on-top a LUKS2 dm-crypt device. I'm not using snapshots.
>>
>> I ran a full btrfs-balance some time ago, but I can't say it helped much.
>> If recommended, I can run another one of course.
>>
>> Whenever these stalls happen, and I manage to fire up iotop-c, I can see
>> "btrfs-transaction" at the very top, utilizing 100% of the disk's IO, but
>> not generating _that_ much of I/O traffic (the encrypted disk can do ~250
>> MB/s read). With even more luck I manage to run "perf record" during these
>> stalls ("cd /tmp" before, so it can record to tmpfs, because of the
>> stall), and while I can see "btrfs-transaction" in there, I don't see it
>> in the top-10. Or maybe I'm to stupid to use perf report:
>>
>>    https://nerdbynature.de/bits/6.1.14-200/perf_1.data.xz
>>
>> This all can even be reproduced somewhat reliably: I was writing above how
>> fast the disk is, and wanted to test _write_ speed as well:
>>
>>   $ pv < /dev/zero > /foo
>>   [wait....1.7GB/s.., cool...ok, let's stop after 9GB or so...]
>>   ^C
>>   $ rm -f /foo
>>
>> And all that comes back instantly. But then running "/bin/sync" afterwards
>> took 8 minutes (!) until the command came back, and I can see the I/O wait
>> again in the system load applet in my window manager. This time I could
>> no longer start iotop-c, but "perf record" recorded something, if you want
>> to have a look:
>>
>>   https://nerdbynature.de/bits/6.1.14-200/perf_2.data.xz
>>
>> Some Btrfs infos below. Does anybody have an idea what's going on? Or how
>> to debug this? Disable compression? Run another btrfs-balance? fsck?
>>
>> Thank you,
>> Christian.
>>
>> $ mount | grep btrfs
>> /dev/mapper/luks-root on / type btrfs (rw,relatime,seclabel,compress=zstd:3,ssd,discard,space_cache=v2,subvolid=5,subvol=/)
>>
>> $ df -h /
>> Filesystem             Size  Used Avail Use% Mounted on
>> /dev/mapper/luks-root  250G  162G   88G  66% /
>>
>> $ btrfs filesystem df /
>> Data, single: total=164.00GiB, used=159.79GiB
>> System, single: total=32.00MiB, used=48.00KiB
>> Metadata, single: total=3.00GiB, used=1.73GiB
>> GlobalReserve, single: total=329.44MiB, used=0.00B
>>
>> $ sudo btrfs filesystem usage -T /
>> Overall:
>>      Device size:                 249.98GiB
>>      Device allocated:            167.03GiB
>>      Device unallocated:           82.95GiB
>>      Device missing:                  0.00B
>>      Device slack:                    0.00B
>>      Used:                        161.53GiB
>>      Free (estimated):             87.16GiB      (min: 87.16GiB)
>>      Free (statfs, df):            87.16GiB
>>      Data ratio:                       1.00
>>      Metadata ratio:                   1.00
>>      Global reserve:              329.44MiB      (used: 0.00B)
>>      Multiple profiles:                  no
>>
>>                           Data      Metadata System
>> Id Path                  single    single   single   Unallocated Total     Slack
>> -- --------------------- --------- -------- -------- ----------- --------- ----
>>   1 /dev/mapper/luks-root 164.00GiB  3.00GiB 32.00MiB    82.95GiB 249.98GiB -
>> -- --------------------- --------- -------- -------- ----------- --------- ----
>>     Total                 164.00GiB  3.00GiB 32.00MiB    82.95GiB 249.98GiB 0.00B
>>     Used                  159.79GiB  1.73GiB 48.00KiB
>>
>>
>> $ sudo compsize -x /
>> Processed 1263684 files, 1173537 regular extents (1252947 refs), 612785 inline.
>> Type       Perc     Disk Usage   Uncompressed Referenced
>> TOTAL       82%      160G         193G         199G
>> none       100%      144G         144G         145G
>> zlib        33%       14K          43K          43K
>> zstd        32%       15G          48G          53G
>> prealloc   100%       66M          66M          72M
>>
>> -- 
>> BOFH excuse #227:
>>
>> Fatal error right in front of screen
>>
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs-transaction stalls
  2023-04-17 22:56   ` Qu Wenruo
@ 2023-04-18 10:50     ` Christian Kujau
  2023-05-09 12:30       ` Christian Kujau
  0 siblings, 1 reply; 11+ messages in thread
From: Christian Kujau @ 2023-04-18 10:50 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, 18 Apr 2023, Qu Wenruo wrote:
> Disable qgroup, or don't delete large shared subvolumes, or use the latest
> /sys/fs/btrfs/<UUID>/qgroups/drop_subtree_threshold file to make qgroup to
> automatically skip such large snapshot drop.

I don't have Quota Groups enabled anywhere on the system, and snapshots 
are disabled too, see below. There's no qgroups subdirectory on this 
machine, only:

$ grep . /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/*
grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/allocation: Is a directory
grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/bdi: Is a directory
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/bg_reclaim_threshold:75
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/checksum:crc32c (crc32c-intel)
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/clone_alignment:4096
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/commit_stats:commits 2106
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/commit_stats:last_commit_ms 2805
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/commit_stats:max_commit_ms 371767
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/commit_stats:total_commit_ms 6460389
grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/devices: Is a directory
grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/devinfo: Is a directory
grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/discard: Is a directory
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/exclusive_operation:none
grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/features: Is a directory
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/generation:636168
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/metadata_uuid:f1093dd0-ba7e-47c7-98de-e0ac85a34972
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/nodesize:16384
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/quota_override:0
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/read_policy:[pid]
/sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/sectorsize:4096


$ mount | grep btrfs
/dev/mapper/luks-root on / type btrfs (rw,relatime,seclabel,compress=zstd:3,ssd,discard,space_cache=v2,subvolid=5,subvol=/)

$ btrfs qgroup show /
ERROR: can't list qgroups: quotas not enabled

$ btrfs subvolume list /
ID 262 gen 636159 top level 5 path var/lib/machines
ID 263 gen 634062 top level 5 path srv

$ rpm -qf /var/lib/machines /srv
systemd-container-251.14-2.fc37.x86_64
filesystem-3.18-2.fc37.x86_64

The subvolumes must have been added by the distribution, but are not 
actively used (both directories are empty).


Thanks,
Christian.

> 
> Thanks,
> Qu
> > 
> > Thanks,
> > Christian.
> > 
> > 
> > > The system load applet is still working and shows very high I/O wait
> > > times, and I wait ~60 seconds and the system recovers and works again.
> > > And while all this happened rarely in the past, I notice it more often
> > > lately.
> > > 
> > > Some specs: Thinkpad T470, with a single Crucial NVMe 1 TB disk installed.
> > > I opted for full-disk-encryption and so the rootfs is (compressed) Btrfs
> > > on-top a LUKS2 dm-crypt device. I'm not using snapshots.
> > > 
> > > I ran a full btrfs-balance some time ago, but I can't say it helped much.
> > > If recommended, I can run another one of course.
> > > 
> > > Whenever these stalls happen, and I manage to fire up iotop-c, I can see
> > > "btrfs-transaction" at the very top, utilizing 100% of the disk's IO, but
> > > not generating _that_ much of I/O traffic (the encrypted disk can do ~250
> > > MB/s read). With even more luck I manage to run "perf record" during these
> > > stalls ("cd /tmp" before, so it can record to tmpfs, because of the
> > > stall), and while I can see "btrfs-transaction" in there, I don't see it
> > > in the top-10. Or maybe I'm to stupid to use perf report:
> > > 
> > >    https://nerdbynature.de/bits/6.1.14-200/perf_1.data.xz
> > > 
> > > This all can even be reproduced somewhat reliably: I was writing above how
> > > fast the disk is, and wanted to test _write_ speed as well:
> > > 
> > >   $ pv < /dev/zero > /foo
> > >   [wait....1.7GB/s.., cool...ok, let's stop after 9GB or so...]
> > >   ^C
> > >   $ rm -f /foo
> > > 
> > > And all that comes back instantly. But then running "/bin/sync" afterwards
> > > took 8 minutes (!) until the command came back, and I can see the I/O wait
> > > again in the system load applet in my window manager. This time I could
> > > no longer start iotop-c, but "perf record" recorded something, if you want
> > > to have a look:
> > > 
> > >   https://nerdbynature.de/bits/6.1.14-200/perf_2.data.xz
> > > 
> > > Some Btrfs infos below. Does anybody have an idea what's going on? Or how
> > > to debug this? Disable compression? Run another btrfs-balance? fsck?
> > > 
> > > Thank you,
> > > Christian.
> > > 
> > > $ mount | grep btrfs
> > > /dev/mapper/luks-root on / type btrfs
> > > (rw,relatime,seclabel,compress=zstd:3,ssd,discard,space_cache=v2,subvolid=5,subvol=/)
> > > 
> > > $ df -h /
> > > Filesystem             Size  Used Avail Use% Mounted on
> > > /dev/mapper/luks-root  250G  162G   88G  66% /
> > > 
> > > $ btrfs filesystem df /
> > > Data, single: total=164.00GiB, used=159.79GiB
> > > System, single: total=32.00MiB, used=48.00KiB
> > > Metadata, single: total=3.00GiB, used=1.73GiB
> > > GlobalReserve, single: total=329.44MiB, used=0.00B
> > > 
> > > $ sudo btrfs filesystem usage -T /
> > > Overall:
> > >      Device size:                 249.98GiB
> > >      Device allocated:            167.03GiB
> > >      Device unallocated:           82.95GiB
> > >      Device missing:                  0.00B
> > >      Device slack:                    0.00B
> > >      Used:                        161.53GiB
> > >      Free (estimated):             87.16GiB      (min: 87.16GiB)
> > >      Free (statfs, df):            87.16GiB
> > >      Data ratio:                       1.00
> > >      Metadata ratio:                   1.00
> > >      Global reserve:              329.44MiB      (used: 0.00B)
> > >      Multiple profiles:                  no
> > > 
> > >                           Data      Metadata System
> > > Id Path                  single    single   single   Unallocated Total
> > > Slack
> > > -- --------------------- --------- -------- -------- -----------
> > > --------- ----
> > >   1 /dev/mapper/luks-root 164.00GiB  3.00GiB 32.00MiB    82.95GiB
> > > 249.98GiB -
> > > -- --------------------- --------- -------- -------- -----------
> > > --------- ----
> > >     Total                 164.00GiB  3.00GiB 32.00MiB    82.95GiB
> > > 249.98GiB 0.00B
> > >     Used                  159.79GiB  1.73GiB 48.00KiB
> > > 
> > > 
> > > $ sudo compsize -x /
> > > Processed 1263684 files, 1173537 regular extents (1252947 refs), 612785
> > > inline.
> > > Type       Perc     Disk Usage   Uncompressed Referenced
> > > TOTAL       82%      160G         193G         199G
> > > none       100%      144G         144G         145G
> > > zlib        33%       14K          43K          43K
> > > zstd        32%       15G          48G          53G
> > > prealloc   100%       66M          66M          72M
> > > 
> > > -- 
> > > BOFH excuse #227:
> > > 
> > > Fatal error right in front of screen
> > > 
> > 
> 

-- 
BOFH excuse #99:

SIMM crosstalk.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs-transaction stalls
  2023-04-18 10:50     ` Christian Kujau
@ 2023-05-09 12:30       ` Christian Kujau
  2023-05-09 19:48         ` Chris Murphy
  0 siblings, 1 reply; 11+ messages in thread
From: Christian Kujau @ 2023-05-09 12:30 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo

After upgrading from Fedora 37 to Fedora 38 the problem did NOT magically 
go away (heh!) and I've run another full balance (and a scrub too), 
although this did not help much in the past.

I've since switched the mount option from "discard" to "discard=async" and 
the problem has not recurred....yet :-)

Christian.

On Tue, 18 Apr 2023, Christian Kujau wrote:
> On Tue, 18 Apr 2023, Qu Wenruo wrote:
> > Disable qgroup, or don't delete large shared subvolumes, or use the latest
> > /sys/fs/btrfs/<UUID>/qgroups/drop_subtree_threshold file to make qgroup to
> > automatically skip such large snapshot drop.
> 
> I don't have Quota Groups enabled anywhere on the system, and snapshots 
> are disabled too, see below. There's no qgroups subdirectory on this 
> machine, only:
> 
> $ grep . /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/*
> grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/allocation: Is a directory
> grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/bdi: Is a directory
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/bg_reclaim_threshold:75
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/checksum:crc32c (crc32c-intel)
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/clone_alignment:4096
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/commit_stats:commits 2106
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/commit_stats:last_commit_ms 2805
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/commit_stats:max_commit_ms 371767
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/commit_stats:total_commit_ms 6460389
> grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/devices: Is a directory
> grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/devinfo: Is a directory
> grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/discard: Is a directory
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/exclusive_operation:none
> grep: /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/features: Is a directory
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/generation:636168
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/metadata_uuid:f1093dd0-ba7e-47c7-98de-e0ac85a34972
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/nodesize:16384
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/quota_override:0
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/read_policy:[pid]
> /sys/fs/btrfs/f1093dd0-ba7e-47c7-98de-e0ac85a34972/sectorsize:4096
> 
> 
> $ mount | grep btrfs
> /dev/mapper/luks-root on / type btrfs (rw,relatime,seclabel,compress=zstd:3,ssd,discard,space_cache=v2,subvolid=5,subvol=/)
> 
> $ btrfs qgroup show /
> ERROR: can't list qgroups: quotas not enabled
> 
> $ btrfs subvolume list /
> ID 262 gen 636159 top level 5 path var/lib/machines
> ID 263 gen 634062 top level 5 path srv
> 
> $ rpm -qf /var/lib/machines /srv
> systemd-container-251.14-2.fc37.x86_64
> filesystem-3.18-2.fc37.x86_64
> 
> The subvolumes must have been added by the distribution, but are not 
> actively used (both directories are empty).
> 
> 
> Thanks,
> Christian.
> 
> > 
> > Thanks,
> > Qu
> > > 
> > > Thanks,
> > > Christian.
> > > 
> > > 
> > > > The system load applet is still working and shows very high I/O wait
> > > > times, and I wait ~60 seconds and the system recovers and works again.
> > > > And while all this happened rarely in the past, I notice it more often
> > > > lately.
> > > > 
> > > > Some specs: Thinkpad T470, with a single Crucial NVMe 1 TB disk installed.
> > > > I opted for full-disk-encryption and so the rootfs is (compressed) Btrfs
> > > > on-top a LUKS2 dm-crypt device. I'm not using snapshots.
> > > > 
> > > > I ran a full btrfs-balance some time ago, but I can't say it helped much.
> > > > If recommended, I can run another one of course.
> > > > 
> > > > Whenever these stalls happen, and I manage to fire up iotop-c, I can see
> > > > "btrfs-transaction" at the very top, utilizing 100% of the disk's IO, but
> > > > not generating _that_ much of I/O traffic (the encrypted disk can do ~250
> > > > MB/s read). With even more luck I manage to run "perf record" during these
> > > > stalls ("cd /tmp" before, so it can record to tmpfs, because of the
> > > > stall), and while I can see "btrfs-transaction" in there, I don't see it
> > > > in the top-10. Or maybe I'm to stupid to use perf report:
> > > > 
> > > >    https://nerdbynature.de/bits/6.1.14-200/perf_1.data.xz
> > > > 
> > > > This all can even be reproduced somewhat reliably: I was writing above how
> > > > fast the disk is, and wanted to test _write_ speed as well:
> > > > 
> > > >   $ pv < /dev/zero > /foo
> > > >   [wait....1.7GB/s.., cool...ok, let's stop after 9GB or so...]
> > > >   ^C
> > > >   $ rm -f /foo
> > > > 
> > > > And all that comes back instantly. But then running "/bin/sync" afterwards
> > > > took 8 minutes (!) until the command came back, and I can see the I/O wait
> > > > again in the system load applet in my window manager. This time I could
> > > > no longer start iotop-c, but "perf record" recorded something, if you want
> > > > to have a look:
> > > > 
> > > >   https://nerdbynature.de/bits/6.1.14-200/perf_2.data.xz
> > > > 
> > > > Some Btrfs infos below. Does anybody have an idea what's going on? Or how
> > > > to debug this? Disable compression? Run another btrfs-balance? fsck?
> > > > 
> > > > Thank you,
> > > > Christian.
> > > > 
> > > > $ mount | grep btrfs
> > > > /dev/mapper/luks-root on / type btrfs
> > > > (rw,relatime,seclabel,compress=zstd:3,ssd,discard,space_cache=v2,subvolid=5,subvol=/)
> > > > 
> > > > $ df -h /
> > > > Filesystem             Size  Used Avail Use% Mounted on
> > > > /dev/mapper/luks-root  250G  162G   88G  66% /
> > > > 
> > > > $ btrfs filesystem df /
> > > > Data, single: total=164.00GiB, used=159.79GiB
> > > > System, single: total=32.00MiB, used=48.00KiB
> > > > Metadata, single: total=3.00GiB, used=1.73GiB
> > > > GlobalReserve, single: total=329.44MiB, used=0.00B
> > > > 
> > > > $ sudo btrfs filesystem usage -T /
> > > > Overall:
> > > >      Device size:                 249.98GiB
> > > >      Device allocated:            167.03GiB
> > > >      Device unallocated:           82.95GiB
> > > >      Device missing:                  0.00B
> > > >      Device slack:                    0.00B
> > > >      Used:                        161.53GiB
> > > >      Free (estimated):             87.16GiB      (min: 87.16GiB)
> > > >      Free (statfs, df):            87.16GiB
> > > >      Data ratio:                       1.00
> > > >      Metadata ratio:                   1.00
> > > >      Global reserve:              329.44MiB      (used: 0.00B)
> > > >      Multiple profiles:                  no
> > > > 
> > > >                           Data      Metadata System
> > > > Id Path                  single    single   single   Unallocated Total
> > > > Slack
> > > > -- --------------------- --------- -------- -------- -----------
> > > > --------- ----
> > > >   1 /dev/mapper/luks-root 164.00GiB  3.00GiB 32.00MiB    82.95GiB
> > > > 249.98GiB -
> > > > -- --------------------- --------- -------- -------- -----------
> > > > --------- ----
> > > >     Total                 164.00GiB  3.00GiB 32.00MiB    82.95GiB
> > > > 249.98GiB 0.00B
> > > >     Used                  159.79GiB  1.73GiB 48.00KiB
> > > > 
> > > > 
> > > > $ sudo compsize -x /
> > > > Processed 1263684 files, 1173537 regular extents (1252947 refs), 612785
> > > > inline.
> > > > Type       Perc     Disk Usage   Uncompressed Referenced
> > > > TOTAL       82%      160G         193G         199G
> > > > none       100%      144G         144G         145G
> > > > zlib        33%       14K          43K          43K
> > > > zstd        32%       15G          48G          53G
> > > > prealloc   100%       66M          66M          72M
> > > > 
> > > > -- 
> > > > BOFH excuse #227:
> > > > 
> > > > Fatal error right in front of screen
> > > > 
> > > 
> > 
> 
> -- 
> BOFH excuse #99:
> 
> SIMM crosstalk.
> 

-- 
BOFH excuse #391:

We already sent around a notice about that.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs-transaction stalls
  2023-05-09 12:30       ` Christian Kujau
@ 2023-05-09 19:48         ` Chris Murphy
  2023-05-09 21:48           ` Christian Kujau
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Murphy @ 2023-05-09 19:48 UTC (permalink / raw)
  To: Christian Kujau, Btrfs BTRFS; +Cc: Qu Wenruo



On Tue, May 9, 2023, at 8:30 AM, Christian Kujau wrote:
> After upgrading from Fedora 37 to Fedora 38 the problem did NOT magically 
> go away (heh!) and I've run another full balance (and a scrub too), 
> although this did not help much in the past.
>
> I've since switched the mount option from "discard" to "discard=async" and 
> the problem has not recurred....yet :-)


There are some edge case performance issues with async discards that should be fixed in 6.2.13 or newer. I suggest upgrading your kernel andremoving the discard inhibition so it is used (the default). And report back if it fixes the problem or not.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs-transaction stalls
  2023-05-09 19:48         ` Chris Murphy
@ 2023-05-09 21:48           ` Christian Kujau
  2023-05-10  1:05             ` Chris Murphy
  0 siblings, 1 reply; 11+ messages in thread
From: Christian Kujau @ 2023-05-09 21:48 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS, Qu Wenruo

On Tue, 9 May 2023, Chris Murphy wrote:
> On Tue, May 9, 2023, at 8:30 AM, Christian Kujau wrote:
> > After upgrading from Fedora 37 to Fedora 38 the problem did NOT magically 
> > go away (heh!) and I've run another full balance (and a scrub too), 
> > although this did not help much in the past.
> >
> > I've since switched the mount option from "discard" to "discard=async" and 
> > the problem has not recurred....yet :-)
> 
> There are some edge case performance issues with async discards that 
> should be fixed in 6.2.13 or newer. I suggest upgrading your kernel 
> andremoving the discard inhibition so it is used (the default). And 
> report back if it fixes the problem or not.

Well, Fedora 38 has 6.2.14-300.fc38.x86_64, but I had these weird stalls 
right after the upgrade to F38 again. But "discard=async" (instead of 
plain "discard", which defaults to "discard=sync") appears to help. Knock 
on wood.

Christian.
-- 
BOFH excuse #299:

The data on your hard drive is out of balance.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs-transaction stalls
  2023-05-09 21:48           ` Christian Kujau
@ 2023-05-10  1:05             ` Chris Murphy
  2023-05-10  6:17               ` Christian Kujau
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Murphy @ 2023-05-10  1:05 UTC (permalink / raw)
  To: Christian Kujau; +Cc: Btrfs BTRFS, Qu Wenruo



On Tue, May 9, 2023, at 5:48 PM, Christian Kujau wrote:
> On Tue, 9 May 2023, Chris Murphy wrote:
>> On Tue, May 9, 2023, at 8:30 AM, Christian Kujau wrote:
>> > After upgrading from Fedora 37 to Fedora 38 the problem did NOT magically 
>> > go away (heh!) and I've run another full balance (and a scrub too), 
>> > although this did not help much in the past.
>> >
>> > I've since switched the mount option from "discard" to "discard=async" and 
>> > the problem has not recurred....yet :-)
>> 
>> There are some edge case performance issues with async discards that 
>> should be fixed in 6.2.13 or newer. I suggest upgrading your kernel 
>> andremoving the discard inhibition so it is used (the default). And 
>> report back if it fixes the problem or not.
>
> Well, Fedora 38 has 6.2.14-300.fc38.x86_64, but I had these weird stalls 
> right after the upgrade to F38 again. But "discard=async" (instead of 
> plain "discard", which defaults to "discard=sync") appears to help. Knock 
> on wood.

That is confusing. If you do not specify any discard mount option, 6.2 kernels default to discard=async.

If you are manually specifying only discard, you should get async discards, not sync discards. I think sync discards is a bug. Surely with 6.2 kernels and newer it should be async, but arguably it should be backported to all stable kernels still accepting changes. If not specified, async should be implied.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs-transaction stalls
  2023-05-10  1:05             ` Chris Murphy
@ 2023-05-10  6:17               ` Christian Kujau
  2023-05-11 10:41                 ` Neal Gompa
  0 siblings, 1 reply; 11+ messages in thread
From: Christian Kujau @ 2023-05-10  6:17 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS, Qu Wenruo

On Tue, 9 May 2023, Chris Murphy wrote:
> That is confusing.

It is! :-)

> If you do not specify any discard mount option, 6.2 kernels default to 
> discard=async.

So, ever since I installed this system (Lenovo T470, with an NVME disk) 
the "discard" option was added (by me or the install routine) to the mount 
options of the root disk. And btrfs(5) on this Fedoa 38 system (the man 
page says "6.2.2" and "Mar 26, 2023") states:

   discard, discard=sync, discard=async, nodiscard
      (default: off, async support since: 5.6)

So, the I'd assume that no discard option means...no discard. But you are 
saying that "discard=async" is now default, when the option is not 
specified at all?

The man page further states:

  In the synchronous mode (sync or without option value), lack of 
  asynchronous queued TRIM on the backing device TRIM can severely degrade 
  performance

...but I did not know that until a few days ago. So, I thought: "OK, I had 
'discard' specified for ages here, and that defaults to 'discard=sync', so 
better change that to 'discard=async' now." And ever since I did that 
(i.e. since last week) these weird stalls are now gone.

> If you are manually specifying only discard, you should get async 
> discards, not sync discards. I think sync discards is a bug. Surely with 
> 6.2 kernels and newer it should be async, but arguably it should be 
> backported to all stable kernels still accepting changes. If not 
> specified, async should be implied.

So, I'm not using vanilla Linux, I'm using 6.2.14-300.fc38.x86_64 from the 
Fedora distribution, but I doubt that they patched the man pages, as they 
sure state something else here: "discard is off by default, and discard 
without options defaults to sync". Or I'm misreading the man page, that's 
entirely possible too :-)

Thanks,
Christian.
-- 
BOFH excuse #230:

Lusers learning curve appears to be fractal

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs-transaction stalls
  2023-05-10  6:17               ` Christian Kujau
@ 2023-05-11 10:41                 ` Neal Gompa
  2023-05-11 12:55                   ` Christian Kujau
  0 siblings, 1 reply; 11+ messages in thread
From: Neal Gompa @ 2023-05-11 10:41 UTC (permalink / raw)
  To: Christian Kujau; +Cc: Chris Murphy, Btrfs BTRFS, Qu Wenruo

On Wed, May 10, 2023 at 2:25 AM Christian Kujau <lists@nerdbynature.de> wrote:
>
> On Tue, 9 May 2023, Chris Murphy wrote:
> > That is confusing.
>
> It is! :-)
>
> > If you do not specify any discard mount option, 6.2 kernels default to
> > discard=async.
>
> So, ever since I installed this system (Lenovo T470, with an NVME disk)
> the "discard" option was added (by me or the install routine) to the mount
> options of the root disk. And btrfs(5) on this Fedoa 38 system (the man
> page says "6.2.2" and "Mar 26, 2023") states:
>
>    discard, discard=sync, discard=async, nodiscard
>       (default: off, async support since: 5.6)
>
> So, the I'd assume that no discard option means...no discard. But you are
> saying that "discard=async" is now default, when the option is not
> specified at all?
>
> The man page further states:
>
>   In the synchronous mode (sync or without option value), lack of
>   asynchronous queued TRIM on the backing device TRIM can severely degrade
>   performance
>
> ...but I did not know that until a few days ago. So, I thought: "OK, I had
> 'discard' specified for ages here, and that defaults to 'discard=sync', so
> better change that to 'discard=async' now." And ever since I did that
> (i.e. since last week) these weird stalls are now gone.
>
> > If you are manually specifying only discard, you should get async
> > discards, not sync discards. I think sync discards is a bug. Surely with
> > 6.2 kernels and newer it should be async, but arguably it should be
> > backported to all stable kernels still accepting changes. If not
> > specified, async should be implied.
>
> So, I'm not using vanilla Linux, I'm using 6.2.14-300.fc38.x86_64 from the
> Fedora distribution, but I doubt that they patched the man pages, as they
> sure state something else here: "discard is off by default, and discard
> without options defaults to sync". Or I'm misreading the man page, that's
> entirely possible too :-)
>

So what's probably going on is that "discard" triggers sync discards,
whereas having nothing or specifying "discard=async" will do async
discards.

We're not patching this in Fedora, so this is certainly an upstream Linux bug.




--
真実はいつも一つ!/ Always, there's only one truth!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs-transaction stalls
  2023-05-11 10:41                 ` Neal Gompa
@ 2023-05-11 12:55                   ` Christian Kujau
  0 siblings, 0 replies; 11+ messages in thread
From: Christian Kujau @ 2023-05-11 12:55 UTC (permalink / raw)
  To: Neal Gompa; +Cc: Chris Murphy, Btrfs BTRFS, Qu Wenruo

On Thu, 11 May 2023, Neal Gompa wrote:
> So what's probably going on is that "discard" triggers sync discards,
> whereas having nothing or specifying "discard=async" will do async
> discards.

So, Chris wrote to me the following:

-------------------------------------------
It's fixed in dev and just hasn't found its way into Fedora yet.

commit 93d51b286c848bc21900c105ce9fae1286ff65b4
Author: David Sterba <dsterba@suse.com>
Date:  Wed Apr 26 12:42:39 2023 +0200

    btrfs-progs: docs: mention discard=async in mount options

    Issue: #617
    Signed-off-by: David Sterba <dsterba@suse.com>

https://github.com/kdave/btrfs-progs/issues/617
-------------------------------------------

And discard=async is the default since Linux 6.2, even when no "discard" 
option is specified at all. Interestingly, the current version of that 
file[0] still reads:

 > In the synchronous mode (sync or without option value),

...which I find somewhat counter-intuitive:

* no option means "discard=async"
*   discard means "discard=sync"

...I would assume that "discard" defaults to "discard=async" (since this 
is the default with 6.2 kernels anyway) and only with "discard=sync" one 
can force its synchronous mode. But either way, the man page has been 
corrected now and my weird stalls appear to be gone now too, hooray!

Thanks,
Christian.

[0] https://github.com/kdave/btrfs-progs/blob/master/Documentation/ch-mount-options.rst
-- 
BOFH excuse #42:

spaghetti cable cause packet failure

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-05-11 12:55 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-02 13:42 btrfs-transaction stalls Christian Kujau
2023-04-17 21:30 ` Christian Kujau
2023-04-17 22:56   ` Qu Wenruo
2023-04-18 10:50     ` Christian Kujau
2023-05-09 12:30       ` Christian Kujau
2023-05-09 19:48         ` Chris Murphy
2023-05-09 21:48           ` Christian Kujau
2023-05-10  1:05             ` Chris Murphy
2023-05-10  6:17               ` Christian Kujau
2023-05-11 10:41                 ` Neal Gompa
2023-05-11 12:55                   ` Christian Kujau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.