* btrfs-transacti hangs system for several seconds every few minutes @ 2020-03-28 18:26 Brad Templeton 2020-03-28 21:20 ` Zygo Blaxell 2020-03-29 0:58 ` Qu Wenruo 0 siblings, 2 replies; 15+ messages in thread From: Brad Templeton @ 2020-03-28 18:26 UTC (permalink / raw) To: Btrfs BTRFS I have a decent sized 3 disk Raid 1 that I have had on btrfs for many years. Over time, a serious problem has emerged, in that from time to time all I/O will pause, freezing any programs attempting to use the btrfs filesystem. Performance has degraded over the years as well, so that just browsing around in directories with 300 or so files often takes many seconds just to autocomplete a filename or do an ls. But the big problem is that during periods of active but not heavy use, every few minutes the i/o system will hang for periods of 1 to 10 seconds. During these hangs, btrfs-transacti is doing very heavy I/O. Programs waiting on I/O block -- the most frustrating is typing in vi and having the echo stop. It's getting close to unusable and may be time to leave btrfs after many years for a different FS. During these incidents iotop will look like this: Total DISK READ : 499.57 K/s | Total DISK WRITE : 1639.00 K/s Actual DISK READ: 492.73 K/s | Actual DISK WRITE: 0.00 B/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND 882 be/4 root 499.57 K/s 1604.78 K/s 0.00 % 98.60 % [btrfs-transacti] 21829 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.23 % [kworker/u32:1-btrfs-endio-meta] 14662 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.17 % [kworker/u32:0-btrfs-endio-meta] 22184 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.11 % [kworker/u32:3-events_freezable_power_] 13063 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.06 % [kworker/u32:6-events_freezable_power_] 486 be/3 root 0.00 B/s 6.84 K/s 0.00 % 0.00 % systemd-journald 22213 be/4 brad 0.00 B/s 6.84 K/s 0.00 % 0.00 % chrome --no-startup-window [ThreadPoolForeg] A way to reliably generate it, I have found, is to quickly skim through my large video collection (looking for videos) I would be hitting "next" every second or so -- lots of read, but very little write. After doing about 40 seconds of this, it is sure to hang. I am running kernel 5.3.0 on Ubuntu 18.04.4, but have seen this problem gong back into much older kernels. My array looks like this: /dev/sda, ID: 2 Device size: 3.64TiB Device slack: 0.00B Data,RAID1: 1.79TiB Metadata,RAID1: 8.00GiB Unallocated: 1.84TiB /dev/sdg, ID: 1 Device size: 9.10TiB Device slack: 0.00B Data,RAID1: 7.21TiB Metadata,RAID1: 14.00GiB System,RAID1: 32.00MiB Unallocated: 1.87TiB /dev/sdh, ID: 3 Device size: 7.28TiB Device slack: 344.00KiB Data,RAID1: 5.43TiB Metadata,RAID1: 8.00GiB System,RAID1: 32.00MiB Unallocated: 1.84TiB /dev/sdg on /home type btrfs (rw,relatime,space_cache,subvolid=256,subvol=/home) I have 16gb of ram with 16gb of swap on a flash drive, the swap is in use KiB Mem : 16393944 total, 398800 free, 13538088 used, 2457056 buff/cache KiB Swap: 16777212 total, 6804352 free, 9972860 used. 2045812 avail Mem What other information would be useful in attempting to diagnose or fix this? I like a number of things about BTFS. One of them that I don't want to give up is the ability to do RAID with different sized disks, which seems like the only way it should work. Switching to ZFS or mdadm again would involve disk upgrades and a very large amount of time copying this much data, but I'll have to do it if I can't diagnose this. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: btrfs-transacti hangs system for several seconds every few minutes 2020-03-28 18:26 btrfs-transacti hangs system for several seconds every few minutes Brad Templeton @ 2020-03-28 21:20 ` Zygo Blaxell [not found] ` <7778ece0-67d4-8d1c-b773-35f07d81dcbe@templetons.com> 2020-03-29 0:58 ` Qu Wenruo 1 sibling, 1 reply; 15+ messages in thread From: Zygo Blaxell @ 2020-03-28 21:20 UTC (permalink / raw) To: Brad Templeton; +Cc: Btrfs BTRFS [-- Attachment #1: Type: text/plain, Size: 7526 bytes --] On Sat, Mar 28, 2020 at 11:26:56AM -0700, Brad Templeton wrote: > I have a decent sized 3 disk Raid 1 that I have had on btrfs for many > years. Over time, a serious problem has emerged, in that from time to > time all I/O will pause, freezing any programs attempting to use the > btrfs filesystem. Performance has degraded over the years as well, so > that just browsing around in directories with 300 or so files often > takes many seconds just to autocomplete a filename or do an ls. > > But the big problem is that during periods of active but not heavy use, > every few minutes the i/o system will hang for periods of 1 to 10 > seconds. During these hangs, btrfs-transacti is doing very heavy I/O. > Programs waiting on I/O block -- the most frustrating is typing in vi > and having the echo stop. It's getting close to unusable and may be > time to leave btrfs after many years for a different FS. > > During these incidents iotop will look like this: > > Total DISK READ : 499.57 K/s | Total DISK WRITE : 1639.00 K/s > Actual DISK READ: 492.73 K/s | Actual DISK WRITE: 0.00 B/s > TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND > 882 be/4 root 499.57 K/s 1604.78 K/s 0.00 % 98.60 % > [btrfs-transacti] > 21829 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.23 % > [kworker/u32:1-btrfs-endio-meta] > 14662 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.17 % > [kworker/u32:0-btrfs-endio-meta] > 22184 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.11 % > [kworker/u32:3-events_freezable_power_] > 13063 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.06 % > [kworker/u32:6-events_freezable_power_] > 486 be/3 root 0.00 B/s 6.84 K/s 0.00 % 0.00 % systemd-journald > 22213 be/4 brad 0.00 B/s 6.84 K/s 0.00 % 0.00 % chrome > --no-startup-window [ThreadPoolForeg] > > A way to reliably generate it, I have found, is to quickly skim through > my large video collection (looking for videos) I would be hitting > "next" every second or so -- lots of read, but very little write. > After doing about 40 seconds of this, it is sure to hang. > > I am running kernel 5.3.0 on Ubuntu 18.04.4, but have seen this problem > gong back into much older kernels. PSA: Get off 5.3.0. There is a serious bug in kernels 5.1 to 5.4.13 that can lead to metadata corruption resulting in loss of the filesystem. Go to 5.4.14 or later, or back to 4.19.y for y > 100 or so. This advice applies to all btrfs users, it's not related to latency. In this case 4.19 might be a better choice than later kernels for latency. 5.0 had some latency-related regressions, and fixes for those are still in development. > My array looks like this: > > /dev/sda, ID: 2 > Device size: 3.64TiB > Device slack: 0.00B > Data,RAID1: 1.79TiB > Metadata,RAID1: 8.00GiB > Unallocated: 1.84TiB > > /dev/sdg, ID: 1 > Device size: 9.10TiB > Device slack: 0.00B > Data,RAID1: 7.21TiB > Metadata,RAID1: 14.00GiB > System,RAID1: 32.00MiB > Unallocated: 1.87TiB > > /dev/sdh, ID: 3 > Device size: 7.28TiB > Device slack: 344.00KiB > Data,RAID1: 5.43TiB > Metadata,RAID1: 8.00GiB > System,RAID1: 32.00MiB > Unallocated: 1.84TiB > > /dev/sdg on /home type btrfs > (rw,relatime,space_cache,subvolid=256,subvol=/home) Two things in the mount options: 1. PSA: Upgrade to space_cache=v2. Unmount the filesystem, then mount it with '-o clear_cache,space_cache=v2' (remount is not sufficient, you have to completely umount). This will take some minutes, but it only has to be done once. Transactions will be quite slow on a filesystem with ~10000 block groups with space_cache=v1. Afterwards, use btrfs ins dump-tree -t 10 /dev/vgwaya/root | grep 'owner FREE_SPACE_TREE' | wc -l to verify the space_cache=v2 conversion was done (it should give a non-zero number). Although directly relevant to this case, this advice is a PSA because it also applies to all btrfs users. 2. Use noatime instead of relatime. In the mount man page for 'relatime': since Linux 2.6.30, the file's last access time is always updated if it is more than 1 day old If you get this high-latency behavior about once a day, but it's fine at other times, then this is the likely cause. Some users need atime updates, and they're usually OK on small SSD filesystems; however, this filesystem is neither small nor SSD, and most users don't need atime. You didn't mention snapshots. If you don't have snapshots then disregard the rest of this paragraph. If you do have snapshots, then each time you modify a snapshotted subvol (either origin or snapshot, doesn't matter, what matters is that the metadata is shared), btrfs will be doing extra writes to unshare shared pages and update reference counts. Immediately after the snapshot is created, the write multiplication factor is about 300. The factor drops rapidly to 1.0, but it can take a few minutes to get through the first 10000 page updates after a snapshot, and you can easily get that many by touching 500 files. Note that the snapshot could have been made in the past, its existence will still affect the write performance of the filesystem in the present. All of the above effects combine: 5.0 and later do not attempt to manage latency, atime updates throw a lot of writes into the queue at once, space_cache=v1 makes every write slower to exit the queue, and fresh snapshots multiply everything else by an order of magnitude. With all of those at once, I'm surprised it's as fast as you reported. Starting with kernel 5.0 it's not hard to make a btrfs commit take 10 hours. > I have 16gb of ram with 16gb of swap on a flash drive, the swap is in use > > KiB Mem : 16393944 total, 398800 free, 13538088 used, 2457056 buff/cache > KiB Swap: 16777212 total, 6804352 free, 9972860 used. 2045812 avail Mem Check slabtop: # slabtop | grep btrfs_delayed_ref_head 105072 105072 100% 0.33K 8756 12 35024K btrfs_delayed_ref_head Divide the second number (count of btrfs_delayed_ref_head slabs in use) by about 1000 (depends on how fast your disks are, range is about 500 to 10000 for consumer hardware) and the result is roughly the commit latency in seconds. It's not the only time spent in a commit, but btrfs spends orders of magnitude more time on delayed refs than on anything else. On kernels before 5.0 btrfs kept the delayed ref head count below 10000, but after 5.0 it is allowed to grow until memory is exhausted. The latency fixes currently in development put the latency caps from 4.19 back in, and also add new ones, e.g. snapshot delete could create unlimited latency in btrfs since the beginning. 5.7 or 5.8 should be better at latency than 4.19. > What other information would be useful in attempting to diagnose or fix > this? I like a number of things about BTFS. One of them that I don't > want to give up is the ability to do RAID with different sized disks, > which seems like the only way it should work. Switching to ZFS or mdadm > again would involve disk upgrades and a very large amount of time > copying this much data, but I'll have to do it if I can't diagnose this. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <7778ece0-67d4-8d1c-b773-35f07d81dcbe@templetons.com>]
* Re: btrfs-transacti hangs system for several seconds every few minutes [not found] ` <7778ece0-67d4-8d1c-b773-35f07d81dcbe@templetons.com> @ 2020-03-29 6:42 ` Zygo Blaxell 2020-03-30 22:14 ` Chris Murphy 0 siblings, 1 reply; 15+ messages in thread From: Zygo Blaxell @ 2020-03-29 6:42 UTC (permalink / raw) To: Brad Templeton; +Cc: Btrfs BTRFS On Sat, Mar 28, 2020 at 09:02:38PM -0700, Brad Templeton wrote: > Not using qgroups. Not doing snapshots. Did a reboot with the options > to upgrade to v2 -- it failed, in that the disk check took more than 6 > minutes, but it worked, and the second time I was able to boot, and -- > knock on wood -- so far it has not hung. > > I wonder why they put 5.3.0 as the standard advanced Kernel in Ubuntu LTS > if it has a data corruption bug. Ubuntu, like most Linux distros, chooses the kernel version to ship based on release date. QA/defect data does not usually affect that decision. Starting with 5.2 there are internal btrfs mitigations for metadata corruption. 5.3, 5.4, and 5.5 contain improved mitigations which are able to detect more corruption cases introduced by regressions in 5.1 (and also external causes like RAM failure), and block the corrupted data before it reaches the disk. This blocking is done by forcing the filesystem readonly, so even if you don't get on-disk corruption, you're still going to need to at least unmount the filesystem and possibly reboot if you hit the bug. The risk is that the tree checker can't detect every possible corruption, and some things might slip through undetected. The probability of this failure is normally small (about the same failure rate as an average hard drive), but the risk increases significantly while metadata-intensive operations like device delete or shrinking resize are running. On a 10TB filesystem it is unlikely you would be able to complete a device delete operation without some kind of error, and then you're rolling dice to see if the filesystem survives intact. Ubuntu could backport the fix from 5.4.14 (the patch applies to 5.1 and later). I don't know if Ubuntu has done this. When in doubt, assume the fix is not present. > I don't know if I've seen any release > of 5.4.14 in a PPA yet -- manual kernel install is such a pain the few > times I have done it. I could revert, but the reason I switched to 5.3, > not long ago, was another problem with sound drivers. Yeah, non-overlapping bug lifetimes suck. > BTW, even though it now works, it still takes 90 seconds every boot doing > a disk check, even after what I think is a clean shutdown. I presume > that is not normal, any clues on what may cause that? It shouldn't be doing a disk check at all, even on an unclean shutdown. btrfs writes data in order so no checking is required unless your disks break (and with raid1 you need two broken disks at the same time). fsck.btrfs is a no-op stub. 90 seconds sounds about right for the block group scan when mounting on a 10TB filesystem. There's a feature called block group tree in kernel 5.5 that helps with that: it lays out block group items on disk closer together so they can be read in milliseconds. This is an on-disk format change, so once you enable that feature, you wouldn't be able to mount the filesystem on an older kernel. This can be a problem if your sound drivers have regressions. You might want to wait a few kernel releases to be sure you don't need to downgrade. > On 3/28/20 2:20 PM, Zygo Blaxell wrote: > > On Sat, Mar 28, 2020 at 11:26:56AM -0700, Brad Templeton wrote: > > I have a decent sized 3 disk Raid 1 that I have had on btrfs for many > years. Over time, a serious problem has emerged, in that from time to > time all I/O will pause, freezing any programs attempting to use the > btrfs filesystem. Performance has degraded over the years as well, so > that just browsing around in directories with 300 or so files often > takes many seconds just to autocomplete a filename or do an ls. > > But the big problem is that during periods of active but not heavy use, > every few minutes the i/o system will hang for periods of 1 to 10 > seconds. During these hangs, btrfs-transacti is doing very heavy I/O. > Programs waiting on I/O block -- the most frustrating is typing in vi > and having the echo stop. It's getting close to unusable and may be > time to leave btrfs after many years for a different FS. > > During these incidents iotop will look like this: > > Total DISK READ : 499.57 K/s | Total DISK WRITE : 1639.00 K/s > Actual DISK READ: 492.73 K/s | Actual DISK WRITE: 0.00 B/s > TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND > 882 be/4 root 499.57 K/s 1604.78 K/s 0.00 % 98.60 % > [btrfs-transacti] > 21829 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.23 % > [kworker/u32:1-btrfs-endio-meta] > 14662 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.17 % > [kworker/u32:0-btrfs-endio-meta] > 22184 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.11 % > [kworker/u32:3-events_freezable_power_] > 13063 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.06 % > [kworker/u32:6-events_freezable_power_] > 486 be/3 root 0.00 B/s 6.84 K/s 0.00 % 0.00 % systemd-journald > 22213 be/4 brad 0.00 B/s 6.84 K/s 0.00 % 0.00 % chrome > --no-startup-window [ThreadPoolForeg] > > A way to reliably generate it, I have found, is to quickly skim through > my large video collection (looking for videos) I would be hitting > "next" every second or so -- lots of read, but very little write. > After doing about 40 seconds of this, it is sure to hang. > > I am running kernel 5.3.0 on Ubuntu 18.04.4, but have seen this problem > gong back into much older kernels. > > PSA: Get off 5.3.0. There is a serious bug in kernels 5.1 to 5.4.13 that > can lead to metadata corruption resulting in loss of the filesystem. > Go to 5.4.14 or later, or back to 4.19.y for y > 100 or so. This advice > applies to all btrfs users, it's not related to latency. > > In this case 4.19 might be a better choice than later kernels for latency. > 5.0 had some latency-related regressions, and fixes for those are still > in development. > > > My array looks like this: > > /dev/sda, ID: 2 > Device size: 3.64TiB > Device slack: 0.00B > Data,RAID1: 1.79TiB > Metadata,RAID1: 8.00GiB > Unallocated: 1.84TiB > > /dev/sdg, ID: 1 > Device size: 9.10TiB > Device slack: 0.00B > Data,RAID1: 7.21TiB > Metadata,RAID1: 14.00GiB > System,RAID1: 32.00MiB > Unallocated: 1.87TiB > > /dev/sdh, ID: 3 > Device size: 7.28TiB > Device slack: 344.00KiB > Data,RAID1: 5.43TiB > Metadata,RAID1: 8.00GiB > System,RAID1: 32.00MiB > Unallocated: 1.84TiB > > /dev/sdg on /home type btrfs > (rw,relatime,space_cache,subvolid=256,subvol=/home) > > Two things in the mount options: > > 1. PSA: Upgrade to space_cache=v2. Unmount the filesystem, then mount > it with '-o clear_cache,space_cache=v2' (remount is not sufficient, you > have to completely umount). This will take some minutes, but it only > has to be done once. Transactions will be quite slow on a filesystem > with ~10000 block groups with space_cache=v1. Afterwards, use > > btrfs ins dump-tree -t 10 /dev/vgwaya/root | > grep 'owner FREE_SPACE_TREE' | wc -l > > to verify the space_cache=v2 conversion was done (it should give a > non-zero number). Although directly relevant to this case, this advice > is a PSA because it also applies to all btrfs users. > > 2. Use noatime instead of relatime. > > In the mount man page for 'relatime': > > since Linux 2.6.30, the file's last access time is always updated > if it is more than 1 day old > > If you get this high-latency behavior about once a day, but it's fine > at other times, then this is the likely cause. Some users need atime > updates, and they're usually OK on small SSD filesystems; however, this > filesystem is neither small nor SSD, and most users don't need atime. > > You didn't mention snapshots. If you don't have snapshots then disregard > the rest of this paragraph. If you do have snapshots, then each time > you modify a snapshotted subvol (either origin or snapshot, doesn't > matter, what matters is that the metadata is shared), btrfs will be > doing extra writes to unshare shared pages and update reference counts. > Immediately after the snapshot is created, the write multiplication factor > is about 300. The factor drops rapidly to 1.0, but it can take a few > minutes to get through the first 10000 page updates after a snapshot, > and you can easily get that many by touching 500 files. Note that the > snapshot could have been made in the past, its existence will still > affect the write performance of the filesystem in the present. > > All of the above effects combine: 5.0 and later do not attempt to manage > latency, atime updates throw a lot of writes into the queue at once, > space_cache=v1 makes every write slower to exit the queue, and fresh > snapshots multiply everything else by an order of magnitude. With all of > those at once, I'm surprised it's as fast as you reported. Starting with > kernel 5.0 it's not hard to make a btrfs commit take 10 hours. > > > I have 16gb of ram with 16gb of swap on a flash drive, the swap is in use > > KiB Mem : 16393944 total, 398800 free, 13538088 used, 2457056 buff/cache > KiB Swap: 16777212 total, 6804352 free, 9972860 used. 2045812 avail Mem > > Check slabtop: > > # slabtop | grep btrfs_delayed_ref_head > 105072 105072 100% 0.33K 8756 12 35024K btrfs_delayed_ref_head > > Divide the second number (count of btrfs_delayed_ref_head slabs in use) > by about 1000 (depends on how fast your disks are, range is about 500 to > 10000 for consumer hardware) and the result is roughly the commit latency > in seconds. It's not the only time spent in a commit, but btrfs spends > orders of magnitude more time on delayed refs than on anything else. > On kernels before 5.0 btrfs kept the delayed ref head count below 10000, > but after 5.0 it is allowed to grow until memory is exhausted. > > The latency fixes currently in development put the latency caps from > 4.19 back in, and also add new ones, e.g. snapshot delete could create > unlimited latency in btrfs since the beginning. 5.7 or 5.8 should be > better at latency than 4.19. > > > What other information would be useful in attempting to diagnose or fix > this? I like a number of things about BTFS. One of them that I don't > want to give up is the ability to do RAID with different sized disks, > which seems like the only way it should work. Switching to ZFS or mdadm > again would involve disk upgrades and a very large amount of time > copying this much data, but I'll have to do it if I can't diagnose this. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: btrfs-transacti hangs system for several seconds every few minutes 2020-03-29 6:42 ` Zygo Blaxell @ 2020-03-30 22:14 ` Chris Murphy 2020-03-31 4:04 ` Zygo Blaxell 0 siblings, 1 reply; 15+ messages in thread From: Chris Murphy @ 2020-03-30 22:14 UTC (permalink / raw) To: Zygo Blaxell; +Cc: Brad Templeton, Btrfs BTRFS On Sun, Mar 29, 2020 at 12:42 AM Zygo Blaxell <ce3g8jdj@umail.furryterror.org> wrote: > > 90 seconds sounds about right for the block group scan when mounting on > a 10TB filesystem. There's a feature called block group tree in kernel > 5.5 that helps with that: it lays out block group items on disk closer > together so they can be read in milliseconds. This is an on-disk format > change, so once you enable that feature, you wouldn't be able to mount > the filesystem on an older kernel. This can be a problem if your > sound drivers have regressions. You might want to wait a few kernel > releases to be sure you don't need to downgrade. I'm not seeing anything about block group tree in btrfs/super.c. There is block_group_cache_tree but I'm not seeing anything about it in 'man 5 btrfs' using btrfs-progs 5.4, or in the devel branch. So I'm not sure what mount option or btrfstune option this would be, seems to be automatic? https://github.com/kdave/btrfs-progs/commit/2eaf862f46b3ccb6b7248a0417ebf7096bc93b80 -- Chris Murphy ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: btrfs-transacti hangs system for several seconds every few minutes 2020-03-30 22:14 ` Chris Murphy @ 2020-03-31 4:04 ` Zygo Blaxell 0 siblings, 0 replies; 15+ messages in thread From: Zygo Blaxell @ 2020-03-31 4:04 UTC (permalink / raw) To: Chris Murphy; +Cc: Brad Templeton, Btrfs BTRFS On Mon, Mar 30, 2020 at 04:14:46PM -0600, Chris Murphy wrote: > On Sun, Mar 29, 2020 at 12:42 AM Zygo Blaxell > <ce3g8jdj@umail.furryterror.org> wrote: > > > > 90 seconds sounds about right for the block group scan when mounting on > > a 10TB filesystem. There's a feature called block group tree in kernel > > 5.5 that helps with that: it lays out block group items on disk closer > > together so they can be read in milliseconds. This is an on-disk format > > change, so once you enable that feature, you wouldn't be able to mount > > the filesystem on an older kernel. This can be a problem if your > > sound drivers have regressions. You might want to wait a few kernel > > releases to be sure you don't need to downgrade. > > I'm not seeing anything about block group tree in btrfs/super.c. > > There is block_group_cache_tree but I'm not seeing anything about it > in 'man 5 btrfs' using btrfs-progs 5.4, or in the devel branch. > > So I'm not sure what mount option or btrfstune option this would be, > seems to be automatic? > https://github.com/kdave/btrfs-progs/commit/2eaf862f46b3ccb6b7248a0417ebf7096bc93b80 Sorry, my mistake...it was in one of the misc-next branches, but seems to have been dropped. Maybe not finished yet? > > -- > Chris Murphy > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: btrfs-transacti hangs system for several seconds every few minutes 2020-03-28 18:26 btrfs-transacti hangs system for several seconds every few minutes Brad Templeton 2020-03-28 21:20 ` Zygo Blaxell @ 2020-03-29 0:58 ` Qu Wenruo 1 sibling, 0 replies; 15+ messages in thread From: Qu Wenruo @ 2020-03-29 0:58 UTC (permalink / raw) To: Brad Templeton, Btrfs BTRFS [-- Attachment #1.1: Type: text/plain, Size: 4298 bytes --] On 2020/3/29 上午2:26, Brad Templeton wrote: > I have a decent sized 3 disk Raid 1 that I have had on btrfs for many > years. Over time, a serious problem has emerged, in that from time to > time all I/O will pause, freezing any programs attempting to use the > btrfs filesystem. Performance has degraded over the years as well, so > that just browsing around in directories with 300 or so files often > takes many seconds just to autocomplete a filename or do an ls. > > But the big problem is that during periods of active but not heavy use, > every few minutes the i/o system will hang for periods of 1 to 10 > seconds. During these hangs, btrfs-transacti is doing very heavy I/O. > Programs waiting on I/O block -- the most frustrating is typing in vi > and having the echo stop. It's getting close to unusable and may be > time to leave btrfs after many years for a different FS. Are you using qgroups and doing routinely balance or snapshot drop? Qgroup is known for causing a lot of performance impact, especially for snapshot drop and balance. For the balance part, it get improved in recent releases, but v5.3 it shouldn't cause too much overhead unless it's doing a lot of background IO during balance. Anyway, if you're using qgroup and it's not critical to your use case, disabling qgroup would help a lot. Thanks, Qu > > During these incidents iotop will look like this: > > Total DISK READ : 499.57 K/s | Total DISK WRITE : 1639.00 K/s > Actual DISK READ: 492.73 K/s | Actual DISK WRITE: 0.00 B/s > TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND > 882 be/4 root 499.57 K/s 1604.78 K/s 0.00 % 98.60 % > [btrfs-transacti] > 21829 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.23 % > [kworker/u32:1-btrfs-endio-meta] > 14662 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.17 % > [kworker/u32:0-btrfs-endio-meta] > 22184 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.11 % > [kworker/u32:3-events_freezable_power_] > 13063 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.06 % > [kworker/u32:6-events_freezable_power_] > 486 be/3 root 0.00 B/s 6.84 K/s 0.00 % 0.00 % systemd-journald > 22213 be/4 brad 0.00 B/s 6.84 K/s 0.00 % 0.00 % chrome > --no-startup-window [ThreadPoolForeg] > > A way to reliably generate it, I have found, is to quickly skim through > my large video collection (looking for videos) I would be hitting > "next" every second or so -- lots of read, but very little write. > After doing about 40 seconds of this, it is sure to hang. > > I am running kernel 5.3.0 on Ubuntu 18.04.4, but have seen this problem > gong back into much older kernels. > > My array looks like this: > > /dev/sda, ID: 2 > Device size: 3.64TiB > Device slack: 0.00B > Data,RAID1: 1.79TiB > Metadata,RAID1: 8.00GiB > Unallocated: 1.84TiB > > /dev/sdg, ID: 1 > Device size: 9.10TiB > Device slack: 0.00B > Data,RAID1: 7.21TiB > Metadata,RAID1: 14.00GiB > System,RAID1: 32.00MiB > Unallocated: 1.87TiB > > /dev/sdh, ID: 3 > Device size: 7.28TiB > Device slack: 344.00KiB > Data,RAID1: 5.43TiB > Metadata,RAID1: 8.00GiB > System,RAID1: 32.00MiB > Unallocated: 1.84TiB > > /dev/sdg on /home type btrfs > (rw,relatime,space_cache,subvolid=256,subvol=/home) > > I have 16gb of ram with 16gb of swap on a flash drive, the swap is in use > > KiB Mem : 16393944 total, 398800 free, 13538088 used, 2457056 buff/cache > KiB Swap: 16777212 total, 6804352 free, 9972860 used. 2045812 avail Mem > > > What other information would be useful in attempting to diagnose or fix > this? I like a number of things about BTFS. One of them that I don't > want to give up is the ability to do RAID with different sized disks, > which seems like the only way it should work. Switching to ZFS or mdadm > again would involve disk upgrades and a very large amount of time > copying this much data, but I'll have to do it if I can't diagnose this. > > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: btrfs-transacti hangs system for several seconds every few minutes @ 2020-03-29 4:03 Brad Templeton 2020-03-29 13:14 ` Qu Wenruo 0 siblings, 1 reply; 15+ messages in thread From: Brad Templeton @ 2020-03-29 4:03 UTC (permalink / raw) To: Btrfs BTRFS Not using qgroups. Not doing snapshots. Did a reboot with the options to upgrade to v2 -- it failed, in that the disk check took more than 6 minutes, but it worked, and the second time I was able to boot, and -- knock on wood -- so far it has not hung. I wonder why they put 5.3.0 as the standard advanced Kernel in Ubuntu LTS if it has a data corruption bug. I don't know if I've seen any release of 5.4.14 in a PPA yet -- manual kernel install is such a pain the few times I have done it. I could revert, but the reason I switched to 5.3, not long ago, was another problem with sound drivers. BTW, even though it now works, it still takes 90 seconds every boot doing a disk check, even after what I think is a clean shutdown. I presume that is not normal, any clues on what may cause that? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: btrfs-transacti hangs system for several seconds every few minutes 2020-03-29 4:03 Brad Templeton @ 2020-03-29 13:14 ` Qu Wenruo 2020-03-29 17:58 ` Brad Templeton 0 siblings, 1 reply; 15+ messages in thread From: Qu Wenruo @ 2020-03-29 13:14 UTC (permalink / raw) To: Brad Templeton, Btrfs BTRFS [-- Attachment #1.1: Type: text/plain, Size: 2086 bytes --] On 2020/3/29 下午12:03, Brad Templeton wrote: > Not using qgroups. Not doing snapshots. Did a reboot with the > options to upgrade to v2 -- it failed, What did you mean about "it failed" It failed to mount or something else showed up? If failed to mount, would you like to shared the dmesg of that mount failure? > in that the disk check took more > than 6 minutes, Please be aware that, btrfs check, unlike e2fsck, will always check all metadata of the fs, no matter if the fs is clean unmounted or not. In fact, btrfs unlike other journal based fs, has no clear way to determine if an fs is unmounted cleanly or not. (Log tree is one method, but not a reliable one). 6 min looks completely valid to me. > but it worked, and the second time I was able to boot, > and -- knock on wood -- so far it has not hung. If you hit the hang, you could try to use 'perf' command to try to probe the runtime of btrfs_commit_transaction() and its major components. It would be super helpful if we could determine which is the major cause. > > I wonder why they put 5.3.0 as the standard advanced Kernel in Ubuntu > LTS if it has a data corruption bug. I don't know if I've seen any > release of 5.4.14 in a PPA yet -- manual kernel install is such a pain > the few times I have done it. I could revert, but the reason I switched > to 5.3, not long ago, was another problem with sound drivers. > > BTW, even though it now works, it still takes 90 seconds every boot > doing a disk check, even after what I think is a clean shutdown. I > presume that is not normal, any clues on what may cause that? > Another thing I found is, in your initial report, your swap is heavily used. I guess it may be related to the memory pressure, where every metadata write needs to do a lot of metadata read before it can do anything. If that's the case, it would be good to keep an eye on the memory pressure to make sure fs can still have enough metadata cache without triggering too much IO in its critical section. Thanks, Qu [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: btrfs-transacti hangs system for several seconds every few minutes 2020-03-29 13:14 ` Qu Wenruo @ 2020-03-29 17:58 ` Brad Templeton 2020-03-29 18:09 ` Zygo Blaxell 0 siblings, 1 reply; 15+ messages in thread From: Brad Templeton @ 2020-03-29 17:58 UTC (permalink / raw) To: Qu Wenruo, Btrfs BTRFS It started doing what it claims is a "disk checK", where it says it is doing 4 tasks and gives it 90 seconds to do it. (It does this every time I boot.) However, this time it timed out, and went to 3 minutes, then 4 minutes 30 seconds and then the boot aborted on the timeout. However, the filesystems mounted. I was unable to do the test " btrfs ins dump-tree -t 10 /dev/vgwaya/root | grep 'owner FREE_SPACE_TREE' | wc -l" Because I have no device named anything like vgwaya. My guess was this was the cache clearance taking a very long time, and I should not have done it at reboot? My perhaps wrong instinct was that this would be cleanest but perhaps not. Normally it takes exactly 90 seconds every time. It is annoying. Most systems have gotten their boot times down very low these days. While I often go many months without booting, whenever I am trying to debug things I end up rebooting frequently and with everything but the RAID on flash (as I expect most people do these days) every other aspect of the boot is quick, except the btrfs filesystem mount. It reminds me of the days when you had to do fsck too much. Yes, I need more ram. Tools have bloated so much that 16gb is no longer nearly enough for an ordinary desktop with lots of web browser tabs. Sadly, to get more ram I will need to get a new motherboard/cpu/etc.) which is frankly a lot of work. The v2 cache has done something. In the past, there were not only the hangs, but just a lot more activity on the disk (you can hear it.) The computer is a lot quieter now. Rebooting, as we know, can fix many things, and right now I'm not needing any swap. As I use the system more memory sucks will arrive and I'll start using the swap and I will track it. On 3/29/20 6:14 AM, Qu Wenruo wrote: > > > On 2020/3/29 下午12:03, Brad Templeton wrote: >> Not using qgroups. Not doing snapshots. Did a reboot with the >> options to upgrade to v2 -- it failed, > > What did you mean about "it failed" > > It failed to mount or something else showed up? > > If failed to mount, would you like to shared the dmesg of that mount > failure? > >> in that the disk check took more >> than 6 minutes, > > Please be aware that, btrfs check, unlike e2fsck, will always check all > metadata of the fs, no matter if the fs is clean unmounted or not. > > In fact, btrfs unlike other journal based fs, has no clear way to > determine if an fs is unmounted cleanly or not. > (Log tree is one method, but not a reliable one). > > 6 min looks completely valid to me. > >> but it worked, and the second time I was able to boot, >> and -- knock on wood -- so far it has not hung. > > If you hit the hang, you could try to use 'perf' command to try to probe > the runtime of btrfs_commit_transaction() and its major components. > > It would be super helpful if we could determine which is the major cause. > >> >> I wonder why they put 5.3.0 as the standard advanced Kernel in Ubuntu >> LTS if it has a data corruption bug. I don't know if I've seen any >> release of 5.4.14 in a PPA yet -- manual kernel install is such a pain >> the few times I have done it. I could revert, but the reason I switched >> to 5.3, not long ago, was another problem with sound drivers. >> >> BTW, even though it now works, it still takes 90 seconds every boot >> doing a disk check, even after what I think is a clean shutdown. I >> presume that is not normal, any clues on what may cause that? >> > Another thing I found is, in your initial report, your swap is heavily used. > > I guess it may be related to the memory pressure, where every metadata > write needs to do a lot of metadata read before it can do anything. > > If that's the case, it would be good to keep an eye on the memory > pressure to make sure fs can still have enough metadata cache without > triggering too much IO in its critical section. > > Thanks, > Qu > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: btrfs-transacti hangs system for several seconds every few minutes 2020-03-29 17:58 ` Brad Templeton @ 2020-03-29 18:09 ` Zygo Blaxell 0 siblings, 0 replies; 15+ messages in thread From: Zygo Blaxell @ 2020-03-29 18:09 UTC (permalink / raw) To: Brad Templeton; +Cc: Qu Wenruo, Btrfs BTRFS On Sun, Mar 29, 2020 at 10:58:10AM -0700, Brad Templeton wrote: > It started doing what it claims is a "disk checK", where it says it is > doing 4 tasks and gives it 90 seconds to do it. (It does this every > time I boot.) However, this time it timed out, and went to 3 minutes, > then 4 minutes 30 seconds and then the boot aborted on the timeout. > > However, the filesystems mounted. I was unable to do the test " btrfs > ins dump-tree -t 10 /dev/vgwaya/root | > grep 'owner FREE_SPACE_TREE' | wc -l" > Because I have no device named anything like vgwaya. Sorry, that should be /dev/sda. Forgot to edit the device name! > My guess was this was the cache clearance taking a very long time, and I > should not have done it at reboot? My perhaps wrong instinct was that > this would be cleanest but perhaps not. > > Normally it takes exactly 90 seconds every time. It is annoying. Most > systems have gotten their boot times down very low these days. While I > often go many months without booting, whenever I am trying to debug > things I end up rebooting frequently and with everything but the RAID on > flash (as I expect most people do these days) every other aspect of the > boot is quick, except the btrfs filesystem mount. It reminds me of the > days when you had to do fsck too much. > > Yes, I need more ram. Tools have bloated so much that 16gb is no > longer nearly enough for an ordinary desktop with lots of web browser > tabs. Sadly, to get more ram I will need to get a new > motherboard/cpu/etc.) which is frankly a lot of work. > > The v2 cache has done something. In the past, there were not only the > hangs, but just a lot more activity on the disk (you can hear it.) The > computer is a lot quieter now. Rebooting, as we know, can fix many > things, and right now I'm not needing any swap. As I use the system more > memory sucks will arrive and I'll start using the swap and I will track it. > > On 3/29/20 6:14 AM, Qu Wenruo wrote: > > > > > > On 2020/3/29 下午12:03, Brad Templeton wrote: > >> Not using qgroups. Not doing snapshots. Did a reboot with the > >> options to upgrade to v2 -- it failed, > > > > What did you mean about "it failed" > > > > It failed to mount or something else showed up? > > > > If failed to mount, would you like to shared the dmesg of that mount > > failure? > > > >> in that the disk check took more > >> than 6 minutes, > > > > Please be aware that, btrfs check, unlike e2fsck, will always check all > > metadata of the fs, no matter if the fs is clean unmounted or not. > > > > In fact, btrfs unlike other journal based fs, has no clear way to > > determine if an fs is unmounted cleanly or not. > > (Log tree is one method, but not a reliable one). > > > > 6 min looks completely valid to me. > > > >> but it worked, and the second time I was able to boot, > >> and -- knock on wood -- so far it has not hung. > > > > If you hit the hang, you could try to use 'perf' command to try to probe > > the runtime of btrfs_commit_transaction() and its major components. > > > > It would be super helpful if we could determine which is the major cause. > > > > >> > >> I wonder why they put 5.3.0 as the standard advanced Kernel in Ubuntu > >> LTS if it has a data corruption bug. I don't know if I've seen any > >> release of 5.4.14 in a PPA yet -- manual kernel install is such a pain > >> the few times I have done it. I could revert, but the reason I switched > >> to 5.3, not long ago, was another problem with sound drivers. > >> > >> BTW, even though it now works, it still takes 90 seconds every boot > >> doing a disk check, even after what I think is a clean shutdown. I > >> presume that is not normal, any clues on what may cause that? > >> > > Another thing I found is, in your initial report, your swap is heavily used. > > > > I guess it may be related to the memory pressure, where every metadata > > write needs to do a lot of metadata read before it can do anything. > > > > If that's the case, it would be good to keep an eye on the memory > > pressure to make sure fs can still have enough metadata cache without > > triggering too much IO in its critical section. > > > > Thanks, > > Qu > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: btrfs-transacti hangs system for several seconds every few minutes
@ 2020-03-30 2:29 Tomasz Chmielewski
2020-03-30 5:56 ` Andrei Borzenkov
0 siblings, 1 reply; 15+ messages in thread
From: Tomasz Chmielewski @ 2020-03-30 2:29 UTC (permalink / raw)
To: Btrfs BTRFS; +Cc: 4brad
> I wonder why they put 5.3.0 as the standard advanced Kernel in Ubuntu
> LTS if it has a data corruption bug. I don't know if I've seen any
> release of 5.4.14 in a PPA yet -- manual kernel install is such a pain
> the few times I have done it.
You have all kernels compiled as packages here (for Ubuntu):
https://kernel.ubuntu.com/~kernel-ppa/mainline/
So just download two deb packages, dpkg -i, and done.
btrfs can be still not quite as stable as one would wish, but the
following work well for me on quite many servers:
- use a recent kernel - late 5.5.x, now perhaps 5.6 - will typically
work better for btrfs than a default distribution kernel
- use "noatime" mount option
- use "space_cache=v2" mount option
- absolutely do not use qgroups (make sure this command returns an error
saying that quotas are not enabled): btrfs qgroup show /mount/point
- if using RAID-5, make sure to use RAID-1 for metadata (and raid1c3
metadata for RAID-6 data)
- if you use any software automation, make sure that it doesn't
accidentally re-enable quotas (in btrfs, there is no mount flag for
quotas, unlike in other filesystems, so it's not intuitive to say if the
quotas are enabled or not)
Tomasz Chmielewski
https://lxadm.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: btrfs-transacti hangs system for several seconds every few minutes 2020-03-30 2:29 Tomasz Chmielewski @ 2020-03-30 5:56 ` Andrei Borzenkov 2020-03-30 8:11 ` Brad Templeton 0 siblings, 1 reply; 15+ messages in thread From: Andrei Borzenkov @ 2020-03-30 5:56 UTC (permalink / raw) To: Tomasz Chmielewski, Btrfs BTRFS; +Cc: 4brad 30.03.2020 05:29, Tomasz Chmielewski пишет: >> I wonder why they put 5.3.0 as the standard advanced Kernel in Ubuntu >> LTS if it has a data corruption bug. I don't know if I've seen any >> release of 5.4.14 in a PPA yet -- manual kernel install is such a pain >> the few times I have done it. > > You have all kernels compiled as packages here (for Ubuntu): > > https://kernel.ubuntu.com/~kernel-ppa/mainline/ > > So just download two deb packages, dpkg -i, and done. > Beware that it is not exactly the same as distribution kernel (both in terms of included patches and enabled configuration options). Also matching linux-tools is not provided which means perf, cpupower, turbostat and some other tools stop working. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: btrfs-transacti hangs system for several seconds every few minutes 2020-03-30 5:56 ` Andrei Borzenkov @ 2020-03-30 8:11 ` Brad Templeton 2020-03-30 8:35 ` Tomasz Chmielewski 2020-03-31 4:20 ` Zygo Blaxell 0 siblings, 2 replies; 15+ messages in thread From: Brad Templeton @ 2020-03-30 8:11 UTC (permalink / raw) To: Andrei Borzenkov, Tomasz Chmielewski, Btrfs BTRFS Also, isn't it 4 debs -- image, modules, headers and architecture independent headers? Still, I am surprised that the ubuntu team, with a data corruption issue, would not make a priority to install a fixed kernel or at least backport btrfs modules into the current kernel. On 3/29/20 10:56 PM, Andrei Borzenkov wrote: > 30.03.2020 05:29, Tomasz Chmielewski пишет: >>> I wonder why they put 5.3.0 as the standard advanced Kernel in Ubuntu >>> LTS if it has a data corruption bug. I don't know if I've seen any >>> release of 5.4.14 in a PPA yet -- manual kernel install is such a pain >>> the few times I have done it. >> >> You have all kernels compiled as packages here (for Ubuntu): >> >> https://kernel.ubuntu.com/~kernel-ppa/mainline/ >> >> So just download two deb packages, dpkg -i, and done. >> > > Beware that it is not exactly the same as distribution kernel (both in > terms of included patches and enabled configuration options). Also > matching linux-tools is not provided which means perf, cpupower, > turbostat and some other tools stop working. > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: btrfs-transacti hangs system for several seconds every few minutes 2020-03-30 8:11 ` Brad Templeton @ 2020-03-30 8:35 ` Tomasz Chmielewski 2020-03-31 4:20 ` Zygo Blaxell 1 sibling, 0 replies; 15+ messages in thread From: Tomasz Chmielewski @ 2020-03-30 8:35 UTC (permalink / raw) To: Brad Templeton; +Cc: Andrei Borzenkov, Btrfs BTRFS On 2020-03-30 17:11, Brad Templeton wrote: > Also, isn't it 4 debs -- image, modules, headers and architecture > independent headers? You don't have to install header debs (unless compiling the modules yourself etc.). Tomasz Chmielewski https://lxadm.com ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: btrfs-transacti hangs system for several seconds every few minutes 2020-03-30 8:11 ` Brad Templeton 2020-03-30 8:35 ` Tomasz Chmielewski @ 2020-03-31 4:20 ` Zygo Blaxell 1 sibling, 0 replies; 15+ messages in thread From: Zygo Blaxell @ 2020-03-31 4:20 UTC (permalink / raw) To: Brad Templeton; +Cc: Andrei Borzenkov, Tomasz Chmielewski, Btrfs BTRFS On Mon, Mar 30, 2020 at 01:11:41AM -0700, Brad Templeton wrote: > Also, isn't it 4 debs -- image, modules, headers and architecture > independent headers? > > Still, I am surprised that the ubuntu team, with a data corruption > issue, would not make a priority to install a fixed kernel or at least > backport btrfs modules into the current kernel. It probably hasn't been reported to them. Distro vendors are kind of on their own for long-term kernel bug fixes, especially for non-LTS kernels like 5.3. kernel.org support ended last year, just before the bug was identified. > On 3/29/20 10:56 PM, Andrei Borzenkov wrote: > > 30.03.2020 05:29, Tomasz Chmielewski пишет: > >>> I wonder why they put 5.3.0 as the standard advanced Kernel in Ubuntu > >>> LTS if it has a data corruption bug. I don't know if I've seen any > >>> release of 5.4.14 in a PPA yet -- manual kernel install is such a pain > >>> the few times I have done it. > >> > >> You have all kernels compiled as packages here (for Ubuntu): > >> > >> https://kernel.ubuntu.com/~kernel-ppa/mainline/ > >> > >> So just download two deb packages, dpkg -i, and done. > >> > > > > Beware that it is not exactly the same as distribution kernel (both in > > terms of included patches and enabled configuration options). Also > > matching linux-tools is not provided which means perf, cpupower, > > turbostat and some other tools stop working. > > ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2020-03-31 4:20 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-03-28 18:26 btrfs-transacti hangs system for several seconds every few minutes Brad Templeton 2020-03-28 21:20 ` Zygo Blaxell [not found] ` <7778ece0-67d4-8d1c-b773-35f07d81dcbe@templetons.com> 2020-03-29 6:42 ` Zygo Blaxell 2020-03-30 22:14 ` Chris Murphy 2020-03-31 4:04 ` Zygo Blaxell 2020-03-29 0:58 ` Qu Wenruo 2020-03-29 4:03 Brad Templeton 2020-03-29 13:14 ` Qu Wenruo 2020-03-29 17:58 ` Brad Templeton 2020-03-29 18:09 ` Zygo Blaxell 2020-03-30 2:29 Tomasz Chmielewski 2020-03-30 5:56 ` Andrei Borzenkov 2020-03-30 8:11 ` Brad Templeton 2020-03-30 8:35 ` Tomasz Chmielewski 2020-03-31 4:20 ` Zygo Blaxell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).