Hello, I think I've encountered a deadlock between btrfs-transacti and postgres process(es). This is system information (btrfs fi usage obtained after poweroff and boot): # cat /etc/redhat-release CentOS Linux release 7.6.1810 (Core) # uname -a Linux prod-dbsnap-01 5.2.1-1.el7.elrepo.x86_64 #1 SMP Sun Jul 14 08:15:04 EDT 2019 x86_64 x86_64 x86_64 GNU/Linux # btrfs --version btrfs-progs v5.2 # btrfs filesystem usage /data/pg_data Overall:     Device size:                   2.00TiB     Device allocated:            345.03GiB     Device unallocated:            1.66TiB     Device missing:                  0.00B     Used:                        338.07GiB     Free (estimated):              1.67TiB      (min: 854.27GiB)     Data ratio:                       1.00     Metadata ratio:                   2.00     Global reserve:              512.00MiB      (used: 0.00B) Data,RAID0: Size:332.00GiB, Used:329.22GiB    /dev/sdb       83.00GiB    /dev/sdc       83.00GiB    /dev/sdd       83.00GiB    /dev/sde       83.00GiB Metadata,RAID10: Size:6.50GiB, Used:4.42GiB    /dev/sdb        3.25GiB    /dev/sdc        3.25GiB    /dev/sdd        3.25GiB    /dev/sde        3.25GiB System,RAID10: Size:16.00MiB, Used:48.00KiB    /dev/sdb        8.00MiB    /dev/sdc        8.00MiB    /dev/sdd        8.00MiB    /dev/sde        8.00MiB Unallocated:    /dev/sdb      425.74GiB    /dev/sdc      425.74GiB    /dev/sdd      425.74GiB    /dev/sde      425.74GiB There were three btrfs subvolumes and on each one there was a Postgres database slave doing recovery (single threaded writing). But there was a lot of writing. And prior to starting Postgres slaves I was restoring base backup from the backup server, which was being done by a number of parallel rsync processes (12 at most, I think). The file system is mounted with: # grep btrfs /proc/mounts /dev/sdd /data/pg_data btrfs rw,noatime,compress-force=zstd:3,space_cache,subvolid=5,subvol=/ 0 0 After several hours I got this in /var/log/messages: # grep 'INFO: task .*blocked for more than' messages Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task btrfs-transacti:1361 blocked for more than 122 seconds. Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:62682 blocked for more than 122 seconds. Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:80145 blocked for more than 122 seconds. Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:87299 blocked for more than 122 seconds. Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:93108 blocked for more than 122 seconds. Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task btrfs-transacti:1361 blocked for more than 245 seconds. Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:62682 blocked for more than 245 seconds. Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:80145 blocked for more than 245 seconds. Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:87299 blocked for more than 245 seconds. Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:93108 blocked for more than 245 seconds. Full log is in the attachment. It seems to me that there is a deadlock between btrfs-transacti and any process which is trying to write something (not sure about reading). While this was going on the cpu usage (according to top) was non-existant. There are 4 cpus (it's a virtual machine in VmWare) and 3 were 100% idle. The 4th was 100% in waiting. (I didn't find out which process was on that cpu, unfortunately.) I powered off the machine (yesterday), booted this morning and things are working without errors. I stopped postgres clusters, though. I have a few questions: 1. After something like this happens and the machine is rebooted is there a procedure which would lower the probablity of encountering the deadlock again (maybe btrfs scrub or btrfs defragment or something like that)? This happened after a heavy write activity, so maybe fragmentation had something to do with it. 2. Should I run btrfsck (or something else) to verify on-disk integrity after a problem like this? Or it's just an in-memory problem, so I can assume that nothing bad happened to the data on disks. 3. I'd like to write a watchdog program to catch deadlocks and reboot (probably power-cycle) the VM, but I'm not sure what would the appropriate check be. Does it have to write something to the disk or reading would be sufficient? And how to bypass the OS buffer cache (fsync() or O_DIRECT should do it for writing, but I'm not sure about reading)? What would the appropriate timeout be? (If the operation doesn't complete in xx seconds a reboot should be triggered, but I don't know how many seconds to wait when there's a heavy load and things are just slow, but there's no deadlock.) Should I put watchdog process in real-time class (or however the equivalent is called on Linux)? Since this is a mainline kernel, I'm not sure if I could assume that real-time support won't have bugs of its own. And last, but not least, is there additional data that could help with debugging issues like this? (If possible, something that could be programmed into watchdog service.)