* very slow "btrfs dev delete" 3x6Tb, 7Tb of data @ 2019-12-25 22:35 Leszek Dubiel 2019-12-26 5:08 ` Qu Wenruo 2019-12-26 22:15 ` Chris Murphy 0 siblings, 2 replies; 32+ messages in thread From: Leszek Dubiel @ 2019-12-25 22:35 UTC (permalink / raw) To: linux-btrfs Hello! I have a server: 3 disks, 6TB each, total 17TB space, occupied with data 6TB. One of the disks got smart errors: 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 16 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 2 And didn't pass tests: Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 3575 - # 2 Short offline Completed without error 00% 3574 - # 3 Extended offline Completed: read failure 90% 3574 - # 4 Extended offline Completed: read failure 90% 3560 - # 5 Extended offline Completed: read failure 50% 3559 - I decided to remove that drive from BTRFS system: btrfs dev delete /dev/sda2 / At the beginning iostat showed that system is reading from /dev/sda 100Mb/sec, then writing to /dev/sdb + /dev/sdc 100Mb/sek. It looked correct, and during a few hours it moved aobut 1.5 TB of data out of the failing disk. But then it stuck... Iostat no longer showed reading or writing, "btrfs dev usage" didn't change. Couldn't stop "btrfs dev delete" command -- nor Ctrl+C nor "kill -9". I have rebooted system, started "btrfs dev delete" again. But this is extremely slow. Disks can read/write 100-150Mb/sek. But "process btrfs dev delete" moved 53Gb during 6 hours, that is only about 2Mb/sek. Why this is so slow? Below are some logs. Some other people also have had this problem... What is my problem? I am testing btrfs for many years, I have had some issues few years ago, recently I considered btrfs to be stable... but I cannot put btrfs into production when I have such problems when disk fails... to move 3Tb with 2mb/sec speed it would take many, many days. Server does almost nothing... it just deletes one drive. Cpu is not intensive, iostat looks lazy... ################################################# root@wawel:~# uname -a Linux wawel 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64 GNU/Linux root@wawel:~# btrfs --version btrfs-progs v4.20.1 root@wawel:~# dpkg -l | egrep btrfs ii btrfs-progs 4.20.1-2 amd64 Checksumming Copy on Write Filesystem utilities ################################################# root@wawel:~# btrfs sub list / | wc -l 21 ################################################# root@wawel:~# btrfs dev usage / /dev/sda2, ID: 1 Device size: 5.45TiB Device slack: 5.45TiB Data,single: 1.08TiB Metadata,RAID1: 11.00GiB System,RAID1: 32.00MiB Unallocated: -1.09TiB /dev/sdb2, ID: 2 Device size: 5.45TiB Device slack: 0.00B Data,single: 3.22TiB Metadata,RAID1: 87.00GiB Unallocated: 2.15TiB /dev/sdc2, ID: 3 Device size: 5.45TiB Device slack: 0.00B Data,single: 3.21TiB Metadata,RAID1: 96.00GiB System,RAID1: 32.00MiB Unallocated: 2.15TiB ################################################# root@wawel:~# iostat -m -d 30 Linux 4.19.0-6-amd64 (wawel) 25.12.2019 _x86_64_ (4 CPU) Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn sda 29,52 2,77 0,90 158467 51287 sdb 532,93 0,77 16,80 44046 962107 sdc 532,03 0,59 16,79 34017 961406 Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn sda 0,70 0,00 0,01 0 0 sdb 624,20 0,05 14,17 1 424 sdc 628,47 0,06 14,79 1 443 Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn sda 17,90 0,23 0,59 7 17 sdb 760,63 0,15 15,39 4 461 sdc 761,13 0,09 16,75 2 502 Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn sda 65,80 1,00 0,26 30 7 sdb 1085,93 0,18 26,14 5 784 sdc 1081,03 0,09 27,03 2 810 Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn sda 101,53 1,57 0,10 47 3 sdb 609,63 0,34 19,29 10 578 sdc 612,67 0,33 19,73 9 591 Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn sda 114,30 1,76 0,28 52 8 sdb 355,90 0,39 19,60 11 587 sdc 368,50 0,35 20,17 10 605 Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn sda 63,70 0,55 0,54 16 16 sdb 438,40 0,13 7,83 3 234 sdc 463,97 0,09 8,93 2 267 root@wawel:~# iostat -m -d 30 Linux 4.19.0-6-amd64 (wawel) 25.12.2019 _x86_64_ (4 CPU) Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn sda 30,18 2,51 1,01 226710 91462 sdb 531,30 0,69 15,36 62022 1384919 sdc 535,25 0,59 15,53 53092 1400122 Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn sda 2,80 0,01 0,49 0 14 sdb 154,77 0,53 3,05 15 91 sdc 161,40 0,64 3,02 19 90 Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn sda 124,87 0,01 3,02 0 90 sdb 799,33 0,13 19,40 3 581 sdc 930,63 0,23 22,21 6 666 Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn sda 5,97 0,00 0,69 0 20 sdb 730,10 0,05 14,62 1 438 sdc 739,23 0,13 14,86 3 445 Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn sda 6,53 0,01 1,39 0 41 sdb 800,63 0,13 15,63 4 468 sdc 801,83 0,02 16,12 0 483 Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn sda 4,83 0,00 1,13 0 33 sdb 756,30 0,14 15,08 4 452 sdc 759,03 0,15 15,33 4 459 ################################################# root@wawel:~# btrfs dev usage -b / | sed -r "s/^/$(date +'%F %T')\t/" >> btrfsdevusage root@wawel:~# btrfs dev usage -b / | sed -r "s/^/$(date +'%F %T')\t/" >> btrfsdevusage root@wawel:~# btrfs dev usage -b / | sed -r "s/^/$(date +'%F %T')\t/" >> btrfsdevusage root@wawel:~# btrfs dev usage -b / | sed -r "s/^/$(date +'%F %T')\t/" >> btrfsdevusage root@wawel:~# cat btrfsdevusage | egrep sda -A5 | egrep single 2019-12-25 17:03:04 Data,single: 1188632199168 2019-12-25 18:31:27 Data,single: 1184337231872 2019-12-25 19:00:56 Data,single: 1177894780928 2019-12-25 20:25:46 Data,single: 1158567428096 2019-12-25 21:06:00 Data,single: 1147830009856 2019-12-25 23:01:34 Data,single: 1132797624320 2019-12-25 23:16:27 Data,single: 1130650140672 root@wawel:~# echo "speed is $(( ( 1188632199168 - 1130650140672 ) / 1024 / 1024 / ((23 - 17) * 3600) )) MB/sec" speed is 2 MB/sec ################################################# root@wawel:~# iotop -d12 Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s Current DISK READ: 0.00 B/s | Current DISK WRITE: 183.48 M/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 8138 be/4 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % btrfs dev delete /dev/sda2 / 1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init 2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd] 3 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_gp] 4 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_par_gp] 6 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/0:0H-kblockd] 8 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [mm_percpu_wq] Total DISK READ: 11.23 M/s | Total DISK WRITE: 28.77 M/s Current DISK READ: 11.23 M/s | Current DISK WRITE: 56.51 M/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 10349 be/4 root 29.26 K/s 168.89 K/s 0.00 % 75.29 % [kworker/u8:3-btrfs-endio-meta] 8138 be/4 root 11.03 M/s 23.61 M/s 0.00 % 74.65 % btrfs dev delete /dev/sda2 / 10351 be/4 root 45.21 K/s 1365.76 K/s 0.00 % 11.50 % [kworker/u8:8-btrfs-endio-write] 10350 be/4 root 63.83 K/s 1542.63 K/s 0.00 % 10.22 % [kworker/u8:7-btrfs-freespace-write] 7879 be/4 root 42.56 K/s 1186.23 K/s 0.00 % 7.07 % [kworker/u8:1-btrfs-freespace-write] 23032 be/4 root 25.27 K/s 972.12 K/s 0.00 % 5.04 % [kworker/u8:12-btrfs-extent-refs] 8035 be/4 root 0.00 B/s 25.27 K/s 0.00 % 0.00 % [kworker/u8:4-btrfs-endio-meta] 8036 be/4 root 0.00 B/s 11.97 K/s 0.00 % 0.00 % [kworker/u8:5-btrfs-extent-refs] 31253 be/4 root 0.00 B/s 5.32 K/s 0.00 % 0.00 % [kworker/u8:9-btrfs-extent-refs] Total DISK READ: 674.25 K/s | Total DISK WRITE: 13.57 M/s Current DISK READ: 674.25 K/s | Current DISK WRITE: 39.97 M/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 8138 be/4 root 147.62 K/s 7.72 M/s 0.00 % 79.89 % btrfs dev delete /dev/sda2 / 10274 be/4 root 484.08 K/s 4.91 M/s 0.00 % 56.93 % [kworker/u8:2+btrfs-extent-refs] 10351 be/4 root 27.93 K/s 492.06 K/s 0.00 % 3.03 % [kworker/u8:8-btrfs-endio-meta] 10350 be/4 root 3.99 K/s 115.70 K/s 0.00 % 0.59 % [kworker/u8:7-btrfs-endio-write] 8037 be/4 root 5.32 K/s 79.79 K/s 0.00 % 0.40 % [kworker/u8:6-btrfs-endio-write] 23032 be/4 root 3.99 K/s 79.79 K/s 0.00 % 0.20 % [kworker/u8:12-btrfs-endio-write] 7879 be/4 root 1361.80 B/s 105.06 K/s 0.00 % 0.10 % [kworker/u8:1-btrfs-endio-write] 538 be/4 root 0.00 B/s 1021.35 B/s 0.00 % 0.00 % rsyslogd -n -iNONE [rs:main Q:Reg] 10349 be/4 root 0.00 B/s 86.44 K/s 0.00 % 0.00 % [kworker/u8:3-btrfs-endio-meta] 1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init ################################################# root@wawel:~# top -d12 top - 23:23:01 up 1 day, 56 min, 1 user, load average: 3,05, 3,28, 3,47 Tasks: 137 total, 1 running, 136 sleeping, 0 stopped, 0 zombie %Cpu(s): 0,0 us, 4,5 sy, 0,0 ni, 92,1 id, 3,3 wa, 0,0 hi, 0,1 si, 0,0 st MiB Mem : 7841,1 total, 797,0 free, 259,7 used, 6784,4 buff/cache MiB Swap: 8053,0 total, 7962,2 free, 90,8 used. 7265,2 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8138 root 20 0 8540 104 104 D 11,6 0,0 23:25.88 btrfs 23032 root 20 0 0 0 0 I 4,8 0,0 0:51.20 kworker/u8:12-btrfs-extent-refs 10350 root 20 0 0 0 0 I 0,9 0,0 0:01.79 kworker/u8:7-btrfs-endio-meta 10349 root 20 0 0 0 0 I 0,8 0,0 0:02.12 kworker/u8:3-btrfs-freespace-write 242 root 0 -20 0 0 0 I 0,2 0,0 0:59.66 kworker/2:1H-kblockd ################################################# root@wawel:~# tail /var/log/kern.log -n50 Dec 25 21:50:35 wawel kernel: [84258.634360] BTRFS info (device sda2): found 1752 extents Dec 25 21:50:41 wawel kernel: [84264.771079] BTRFS info (device sda2): relocating block group 3434964058112 flags data Dec 25 21:51:09 wawel kernel: [84291.864553] BTRFS info (device sda2): found 584 extents Dec 25 21:55:51 wawel kernel: [84574.383303] BTRFS info (device sda2): found 584 extents Dec 25 21:55:56 wawel kernel: [84579.359349] BTRFS info (device sda2): relocating block group 3431742832640 flags data Dec 25 21:56:17 wawel kernel: [84600.668237] BTRFS info (device sda2): found 635 extents Dec 25 22:00:36 wawel kernel: [84859.766366] BTRFS info (device sda2): found 635 extents Dec 25 22:00:47 wawel kernel: [84870.367399] BTRFS info (device sda2): relocating block group 3428521607168 flags data Dec 25 22:01:14 wawel kernel: [84897.313282] BTRFS info (device sda2): found 1170 extents Dec 25 22:05:38 wawel kernel: [85161.264397] BTRFS info (device sda2): found 1170 extents Dec 25 22:05:46 wawel kernel: [85169.216425] BTRFS info (device sda2): relocating block group 3425300381696 flags data Dec 25 22:06:20 wawel kernel: [85202.963222] BTRFS info (device sda2): found 1336 extents Dec 25 22:13:00 wawel kernel: [85603.648327] BTRFS info (device sda2): found 1336 extents Dec 25 22:13:02 wawel kernel: [85604.878785] BTRFS info (device sda2): relocating block group 3422079156224 flags data Dec 25 22:14:57 wawel kernel: [85720.394945] BTRFS info (device sda2): found 1771 extents Dec 25 22:23:03 wawel kernel: [86206.582245] BTRFS info (device sda2): found 1771 extents Dec 25 22:23:04 wawel kernel: [86207.671814] BTRFS info (device sda2): relocating block group 3418857930752 flags data Dec 25 22:23:19 wawel kernel: [86222.138761] BTRFS info (device sda2): found 199 extents Dec 25 22:26:26 wawel kernel: [86408.991487] BTRFS info (device sda2): found 199 extents Dec 25 22:26:29 wawel kernel: [86412.181922] BTRFS info (device sda2): relocating block group 3415636705280 flags data Dec 25 22:26:48 wawel kernel: [86431.726268] BTRFS info (device sda2): found 50 extents Dec 25 22:27:32 wawel kernel: [86475.800606] BTRFS info (device sda2): found 50 extents Dec 25 22:27:39 wawel kernel: [86482.222298] BTRFS info (device sda2): found 20 extents Dec 25 22:27:44 wawel kernel: [86487.168278] BTRFS info (device sda2): relocating block group 3412415479808 flags data Dec 25 22:28:27 wawel kernel: [86530.525657] BTRFS info (device sda2): found 6748 extents Dec 25 22:34:07 wawel kernel: [86870.618140] BTRFS info (device sda2): found 6748 extents Dec 25 22:34:35 wawel kernel: [86898.498949] BTRFS info (device sda2): relocating block group 3409194254336 flags data Dec 25 22:36:06 wawel kernel: [86989.589309] BTRFS info (device sda2): found 1175 extents Dec 25 22:40:12 wawel kernel: [87234.881523] perf: interrupt took too long (3943 > 3930), lowering kernel.perf_event_max_sample_rate to 50500 Dec 25 22:44:44 wawel kernel: [87507.870507] BTRFS info (device sda2): found 1175 extents Dec 25 22:44:52 wawel kernel: [87515.563744] BTRFS info (device sda2): relocating block group 3405973028864 flags data Dec 25 22:45:14 wawel kernel: [87537.128352] BTRFS info (device sda2): found 372 extents Dec 25 22:46:11 wawel kernel: [87594.569709] BTRFS info (device sda2): found 372 extents Dec 25 22:46:17 wawel kernel: [87600.772681] BTRFS info (device sda2): relocating block group 3402751803392 flags data Dec 25 22:47:27 wawel kernel: [87670.663043] BTRFS info (device sda2): found 3483 extents Dec 25 23:04:57 wawel kernel: [88720.068885] BTRFS info (device sda2): found 3479 extents Dec 25 23:05:03 wawel kernel: [88726.864461] BTRFS info (device sda2): relocating block group 3399530577920 flags data Dec 25 23:06:36 wawel kernel: [88819.464525] BTRFS info (device sda2): found 1871 extents Dec 25 23:16:11 wawel kernel: [89393.997406] BTRFS info (device sda2): found 1868 extents Dec 25 23:16:13 wawel kernel: [89396.825305] BTRFS info (device sda2): relocating block group 3396309352448 flags data Dec 25 23:16:41 wawel kernel: [89424.310916] BTRFS info (device sda2): found 1946 extents Dec 25 23:22:12 wawel kernel: [89755.068457] BTRFS info (device sda2): found 1946 extents Dec 25 23:22:15 wawel kernel: [89758.601122] BTRFS info (device sda2): relocating block group 3393088126976 flags data Dec 25 23:22:35 wawel kernel: [89778.172631] BTRFS info (device sda2): found 647 extents Dec 25 23:23:06 wawel kernel: [89809.153016] BTRFS info (device sda2): found 647 extents Dec 25 23:23:07 wawel kernel: [89810.017006] BTRFS info (device sda2): relocating block group 3389866901504 flags data Dec 25 23:23:24 wawel kernel: [89827.351203] BTRFS info (device sda2): found 745 extents Dec 25 23:24:38 wawel kernel: [89901.176493] BTRFS info (device sda2): found 745 extents Dec 25 23:24:51 wawel kernel: [89914.840850] BTRFS info (device sda2): relocating block group 3386645676032 flags data Dec 25 23:25:40 wawel kernel: [89963.641062] BTRFS info (device sda2): found 3114 extents ################################################ More on this topic: https://strugglers.net/~andy/blog/2018/06/15/another-disappointing-btrfs-experience/ https://www.spinics.net/lists/linux-btrfs/msg50771.html https://bugzilla.redhat.com/show_bug.cgi?id=1294531 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2019-12-25 22:35 very slow "btrfs dev delete" 3x6Tb, 7Tb of data Leszek Dubiel @ 2019-12-26 5:08 ` Qu Wenruo 2019-12-26 13:17 ` Leszek Dubiel 2019-12-26 22:15 ` Chris Murphy 1 sibling, 1 reply; 32+ messages in thread From: Qu Wenruo @ 2019-12-26 5:08 UTC (permalink / raw) To: Leszek Dubiel, linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 21626 bytes --] On 2019/12/26 上午6:35, Leszek Dubiel wrote: > > Hello! > > I have a server: 3 disks, 6TB each, total 17TB space, occupied with data > 6TB. > > > One of the disks got smart errors: > > 197 Current_Pending_Sector 0x0022 100 100 000 Old_age > Always - 16 > 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age > Offline - 2 > > And didn't pass tests: > > Num Test_Description Status Remaining > LifeTime(hours) LBA_of_first_error > # 1 Extended offline Completed: read failure 90% > 3575 - > # 2 Short offline Completed without error 00% > 3574 - > # 3 Extended offline Completed: read failure 90% > 3574 - > # 4 Extended offline Completed: read failure 90% > 3560 - > # 5 Extended offline Completed: read failure 50% > 3559 - > > I decided to remove that drive from BTRFS system: > > btrfs dev delete /dev/sda2 / > > > At the beginning iostat showed that system is reading from /dev/sda > 100Mb/sec, then writing to /dev/sdb + /dev/sdc 100Mb/sek. It looked > correct, and during a few hours it moved aobut 1.5 TB of data out of the > failing disk. But then it stuck... Iostat no longer showed reading or > writing, "btrfs dev usage" didn't change. Couldn't stop "btrfs dev > delete" command -- nor Ctrl+C nor "kill -9". I have rebooted system, > started "btrfs dev delete" again. > > But this is extremely slow. Disks can read/write 100-150Mb/sek. But > "process btrfs dev delete" moved 53Gb during 6 hours, that is only about > 2Mb/sek. > > Why this is so slow? Below are some logs. Some other people also have > had this problem... Are you using qgroups? You can verify it by "btrfs qgroup show -pcre <mount>" to verify. Qgroup can hugely impact performance for balance, if using kernel older than v5.2. You can either disable qgroup, losing the ability to trace how many bytes are exclusively used by each subvolume, or upgrade to v5.2 kernel to solve it. Thanks, Qu > > What is my problem? I am testing btrfs for many years, I have had some > issues few years ago, recently I considered btrfs to be stable... but I > cannot put btrfs into production when I have such problems when disk > fails... to move 3Tb with 2mb/sec speed it would take many, many days. > Server does almost nothing... it just deletes one drive. Cpu is not > intensive, iostat looks lazy... > > > > > ################################################# > > root@wawel:~# uname -a > Linux wawel 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) > x86_64 GNU/Linux > > root@wawel:~# btrfs --version > btrfs-progs v4.20.1 > > root@wawel:~# dpkg -l | egrep btrfs > ii btrfs-progs 4.20.1-2 amd64 Checksumming > Copy on Write Filesystem utilities > > ################################################# > > root@wawel:~# btrfs sub list / | wc -l > 21 > > ################################################# > > root@wawel:~# btrfs dev usage / > /dev/sda2, ID: 1 > Device size: 5.45TiB > Device slack: 5.45TiB > Data,single: 1.08TiB > Metadata,RAID1: 11.00GiB > System,RAID1: 32.00MiB > Unallocated: -1.09TiB > > /dev/sdb2, ID: 2 > Device size: 5.45TiB > Device slack: 0.00B > Data,single: 3.22TiB > Metadata,RAID1: 87.00GiB > Unallocated: 2.15TiB > > /dev/sdc2, ID: 3 > Device size: 5.45TiB > Device slack: 0.00B > Data,single: 3.21TiB > Metadata,RAID1: 96.00GiB > System,RAID1: 32.00MiB > Unallocated: 2.15TiB > > ################################################# > > root@wawel:~# iostat -m -d 30 > Linux 4.19.0-6-amd64 (wawel) 25.12.2019 _x86_64_ (4 CPU) > > Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn > sda 29,52 2,77 0,90 158467 51287 > sdb 532,93 0,77 16,80 44046 962107 > sdc 532,03 0,59 16,79 34017 961406 > > Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn > sda 0,70 0,00 0,01 0 0 > sdb 624,20 0,05 14,17 1 424 > sdc 628,47 0,06 14,79 1 443 > > Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn > sda 17,90 0,23 0,59 7 17 > sdb 760,63 0,15 15,39 4 461 > sdc 761,13 0,09 16,75 2 502 > > Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn > sda 65,80 1,00 0,26 30 7 > sdb 1085,93 0,18 26,14 5 784 > sdc 1081,03 0,09 27,03 2 810 > > Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn > sda 101,53 1,57 0,10 47 3 > sdb 609,63 0,34 19,29 10 578 > sdc 612,67 0,33 19,73 9 591 > > Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn > sda 114,30 1,76 0,28 52 8 > sdb 355,90 0,39 19,60 11 587 > sdc 368,50 0,35 20,17 10 605 > > Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn > sda 63,70 0,55 0,54 16 16 > sdb 438,40 0,13 7,83 3 234 > sdc 463,97 0,09 8,93 2 267 > > root@wawel:~# iostat -m -d 30 > Linux 4.19.0-6-amd64 (wawel) 25.12.2019 _x86_64_ (4 CPU) > > Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn > sda 30,18 2,51 1,01 226710 91462 > sdb 531,30 0,69 15,36 62022 1384919 > sdc 535,25 0,59 15,53 53092 1400122 > > Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn > sda 2,80 0,01 0,49 0 14 > sdb 154,77 0,53 3,05 15 91 > sdc 161,40 0,64 3,02 19 90 > > Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn > sda 124,87 0,01 3,02 0 90 > sdb 799,33 0,13 19,40 3 581 > sdc 930,63 0,23 22,21 6 666 > > Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn > sda 5,97 0,00 0,69 0 20 > sdb 730,10 0,05 14,62 1 438 > sdc 739,23 0,13 14,86 3 445 > > Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn > sda 6,53 0,01 1,39 0 41 > sdb 800,63 0,13 15,63 4 468 > sdc 801,83 0,02 16,12 0 483 > > Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn > sda 4,83 0,00 1,13 0 33 > sdb 756,30 0,14 15,08 4 452 > sdc 759,03 0,15 15,33 4 459 > > > > ################################################# > > root@wawel:~# btrfs dev usage -b / | sed -r "s/^/$(date +'%F %T')\t/" >> > btrfsdevusage > root@wawel:~# btrfs dev usage -b / | sed -r "s/^/$(date +'%F %T')\t/" >> > btrfsdevusage > root@wawel:~# btrfs dev usage -b / | sed -r "s/^/$(date +'%F %T')\t/" >> > btrfsdevusage > root@wawel:~# btrfs dev usage -b / | sed -r "s/^/$(date +'%F %T')\t/" >> > btrfsdevusage > > root@wawel:~# cat btrfsdevusage | egrep sda -A5 | egrep single > 2019-12-25 17:03:04 Data,single: 1188632199168 > 2019-12-25 18:31:27 Data,single: 1184337231872 > 2019-12-25 19:00:56 Data,single: 1177894780928 > 2019-12-25 20:25:46 Data,single: 1158567428096 > 2019-12-25 21:06:00 Data,single: 1147830009856 > 2019-12-25 23:01:34 Data,single: 1132797624320 > 2019-12-25 23:16:27 Data,single: 1130650140672 > > root@wawel:~# echo "speed is $(( ( 1188632199168 - 1130650140672 ) / > 1024 / 1024 / ((23 - 17) * 3600) )) MB/sec" > speed is 2 MB/sec > > ################################################# > > root@wawel:~# iotop -d12 > > Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s > Current DISK READ: 0.00 B/s | Current DISK WRITE: 183.48 M/s > TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND > 8138 be/4 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % btrfs dev > delete /dev/sda2 / > 1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init > 2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd] > 3 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_gp] > 4 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_par_gp] > 6 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % > [kworker/0:0H-kblockd] > 8 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [mm_percpu_wq] > > Total DISK READ: 11.23 M/s | Total DISK WRITE: 28.77 M/s > Current DISK READ: 11.23 M/s | Current DISK WRITE: 56.51 M/s > TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND > 10349 be/4 root 29.26 K/s 168.89 K/s 0.00 % 75.29 % > [kworker/u8:3-btrfs-endio-meta] > 8138 be/4 root 11.03 M/s 23.61 M/s 0.00 % 74.65 % btrfs dev > delete /dev/sda2 / > 10351 be/4 root 45.21 K/s 1365.76 K/s 0.00 % 11.50 % > [kworker/u8:8-btrfs-endio-write] > 10350 be/4 root 63.83 K/s 1542.63 K/s 0.00 % 10.22 % > [kworker/u8:7-btrfs-freespace-write] > 7879 be/4 root 42.56 K/s 1186.23 K/s 0.00 % 7.07 % > [kworker/u8:1-btrfs-freespace-write] > 23032 be/4 root 25.27 K/s 972.12 K/s 0.00 % 5.04 % > [kworker/u8:12-btrfs-extent-refs] > 8035 be/4 root 0.00 B/s 25.27 K/s 0.00 % 0.00 % > [kworker/u8:4-btrfs-endio-meta] > 8036 be/4 root 0.00 B/s 11.97 K/s 0.00 % 0.00 % > [kworker/u8:5-btrfs-extent-refs] > 31253 be/4 root 0.00 B/s 5.32 K/s 0.00 % 0.00 % > [kworker/u8:9-btrfs-extent-refs] > > Total DISK READ: 674.25 K/s | Total DISK WRITE: 13.57 M/s > Current DISK READ: 674.25 K/s | Current DISK WRITE: 39.97 M/s > TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND > 8138 be/4 root 147.62 K/s 7.72 M/s 0.00 % 79.89 % btrfs dev > delete /dev/sda2 / > 10274 be/4 root 484.08 K/s 4.91 M/s 0.00 % 56.93 % > [kworker/u8:2+btrfs-extent-refs] > 10351 be/4 root 27.93 K/s 492.06 K/s 0.00 % 3.03 % > [kworker/u8:8-btrfs-endio-meta] > 10350 be/4 root 3.99 K/s 115.70 K/s 0.00 % 0.59 % > [kworker/u8:7-btrfs-endio-write] > 8037 be/4 root 5.32 K/s 79.79 K/s 0.00 % 0.40 % > [kworker/u8:6-btrfs-endio-write] > 23032 be/4 root 3.99 K/s 79.79 K/s 0.00 % 0.20 % > [kworker/u8:12-btrfs-endio-write] > 7879 be/4 root 1361.80 B/s 105.06 K/s 0.00 % 0.10 % > [kworker/u8:1-btrfs-endio-write] > 538 be/4 root 0.00 B/s 1021.35 B/s 0.00 % 0.00 % rsyslogd -n > -iNONE [rs:main Q:Reg] > 10349 be/4 root 0.00 B/s 86.44 K/s 0.00 % 0.00 % > [kworker/u8:3-btrfs-endio-meta] > 1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init > > > > ################################################# > > root@wawel:~# top -d12 > top - 23:23:01 up 1 day, 56 min, 1 user, load average: 3,05, 3,28, 3,47 > Tasks: 137 total, 1 running, 136 sleeping, 0 stopped, 0 zombie > %Cpu(s): 0,0 us, 4,5 sy, 0,0 ni, 92,1 id, 3,3 wa, 0,0 hi, 0,1 si, > 0,0 st > MiB Mem : 7841,1 total, 797,0 free, 259,7 used, 6784,4 buff/cache > MiB Swap: 8053,0 total, 7962,2 free, 90,8 used. 7265,2 avail Mem > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 8138 root 20 0 8540 104 104 D 11,6 0,0 23:25.88 btrfs > 23032 root 20 0 0 0 0 I 4,8 0,0 0:51.20 > kworker/u8:12-btrfs-extent-refs > 10350 root 20 0 0 0 0 I 0,9 0,0 0:01.79 > kworker/u8:7-btrfs-endio-meta > 10349 root 20 0 0 0 0 I 0,8 0,0 0:02.12 > kworker/u8:3-btrfs-freespace-write > 242 root 0 -20 0 0 0 I 0,2 0,0 0:59.66 > kworker/2:1H-kblockd > > > ################################################# > > root@wawel:~# tail /var/log/kern.log -n50 > Dec 25 21:50:35 wawel kernel: [84258.634360] BTRFS info (device sda2): > found 1752 extents > Dec 25 21:50:41 wawel kernel: [84264.771079] BTRFS info (device sda2): > relocating block group 3434964058112 flags data > Dec 25 21:51:09 wawel kernel: [84291.864553] BTRFS info (device sda2): > found 584 extents > Dec 25 21:55:51 wawel kernel: [84574.383303] BTRFS info (device sda2): > found 584 extents > Dec 25 21:55:56 wawel kernel: [84579.359349] BTRFS info (device sda2): > relocating block group 3431742832640 flags data > Dec 25 21:56:17 wawel kernel: [84600.668237] BTRFS info (device sda2): > found 635 extents > Dec 25 22:00:36 wawel kernel: [84859.766366] BTRFS info (device sda2): > found 635 extents > Dec 25 22:00:47 wawel kernel: [84870.367399] BTRFS info (device sda2): > relocating block group 3428521607168 flags data > Dec 25 22:01:14 wawel kernel: [84897.313282] BTRFS info (device sda2): > found 1170 extents > Dec 25 22:05:38 wawel kernel: [85161.264397] BTRFS info (device sda2): > found 1170 extents > Dec 25 22:05:46 wawel kernel: [85169.216425] BTRFS info (device sda2): > relocating block group 3425300381696 flags data > Dec 25 22:06:20 wawel kernel: [85202.963222] BTRFS info (device sda2): > found 1336 extents > Dec 25 22:13:00 wawel kernel: [85603.648327] BTRFS info (device sda2): > found 1336 extents > Dec 25 22:13:02 wawel kernel: [85604.878785] BTRFS info (device sda2): > relocating block group 3422079156224 flags data > Dec 25 22:14:57 wawel kernel: [85720.394945] BTRFS info (device sda2): > found 1771 extents > Dec 25 22:23:03 wawel kernel: [86206.582245] BTRFS info (device sda2): > found 1771 extents > Dec 25 22:23:04 wawel kernel: [86207.671814] BTRFS info (device sda2): > relocating block group 3418857930752 flags data > Dec 25 22:23:19 wawel kernel: [86222.138761] BTRFS info (device sda2): > found 199 extents > Dec 25 22:26:26 wawel kernel: [86408.991487] BTRFS info (device sda2): > found 199 extents > Dec 25 22:26:29 wawel kernel: [86412.181922] BTRFS info (device sda2): > relocating block group 3415636705280 flags data > Dec 25 22:26:48 wawel kernel: [86431.726268] BTRFS info (device sda2): > found 50 extents > Dec 25 22:27:32 wawel kernel: [86475.800606] BTRFS info (device sda2): > found 50 extents > Dec 25 22:27:39 wawel kernel: [86482.222298] BTRFS info (device sda2): > found 20 extents > Dec 25 22:27:44 wawel kernel: [86487.168278] BTRFS info (device sda2): > relocating block group 3412415479808 flags data > Dec 25 22:28:27 wawel kernel: [86530.525657] BTRFS info (device sda2): > found 6748 extents > Dec 25 22:34:07 wawel kernel: [86870.618140] BTRFS info (device sda2): > found 6748 extents > Dec 25 22:34:35 wawel kernel: [86898.498949] BTRFS info (device sda2): > relocating block group 3409194254336 flags data > Dec 25 22:36:06 wawel kernel: [86989.589309] BTRFS info (device sda2): > found 1175 extents > Dec 25 22:40:12 wawel kernel: [87234.881523] perf: interrupt took too > long (3943 > 3930), lowering kernel.perf_event_max_sample_rate to 50500 > Dec 25 22:44:44 wawel kernel: [87507.870507] BTRFS info (device sda2): > found 1175 extents > Dec 25 22:44:52 wawel kernel: [87515.563744] BTRFS info (device sda2): > relocating block group 3405973028864 flags data > Dec 25 22:45:14 wawel kernel: [87537.128352] BTRFS info (device sda2): > found 372 extents > Dec 25 22:46:11 wawel kernel: [87594.569709] BTRFS info (device sda2): > found 372 extents > Dec 25 22:46:17 wawel kernel: [87600.772681] BTRFS info (device sda2): > relocating block group 3402751803392 flags data > Dec 25 22:47:27 wawel kernel: [87670.663043] BTRFS info (device sda2): > found 3483 extents > Dec 25 23:04:57 wawel kernel: [88720.068885] BTRFS info (device sda2): > found 3479 extents > Dec 25 23:05:03 wawel kernel: [88726.864461] BTRFS info (device sda2): > relocating block group 3399530577920 flags data > Dec 25 23:06:36 wawel kernel: [88819.464525] BTRFS info (device sda2): > found 1871 extents > Dec 25 23:16:11 wawel kernel: [89393.997406] BTRFS info (device sda2): > found 1868 extents > Dec 25 23:16:13 wawel kernel: [89396.825305] BTRFS info (device sda2): > relocating block group 3396309352448 flags data > Dec 25 23:16:41 wawel kernel: [89424.310916] BTRFS info (device sda2): > found 1946 extents > Dec 25 23:22:12 wawel kernel: [89755.068457] BTRFS info (device sda2): > found 1946 extents > Dec 25 23:22:15 wawel kernel: [89758.601122] BTRFS info (device sda2): > relocating block group 3393088126976 flags data > Dec 25 23:22:35 wawel kernel: [89778.172631] BTRFS info (device sda2): > found 647 extents > Dec 25 23:23:06 wawel kernel: [89809.153016] BTRFS info (device sda2): > found 647 extents > Dec 25 23:23:07 wawel kernel: [89810.017006] BTRFS info (device sda2): > relocating block group 3389866901504 flags data > Dec 25 23:23:24 wawel kernel: [89827.351203] BTRFS info (device sda2): > found 745 extents > Dec 25 23:24:38 wawel kernel: [89901.176493] BTRFS info (device sda2): > found 745 extents > Dec 25 23:24:51 wawel kernel: [89914.840850] BTRFS info (device sda2): > relocating block group 3386645676032 flags data > Dec 25 23:25:40 wawel kernel: [89963.641062] BTRFS info (device sda2): > found 3114 extents > > > > > ################################################ > > More on this topic: > > https://strugglers.net/~andy/blog/2018/06/15/another-disappointing-btrfs-experience/ > > > https://www.spinics.net/lists/linux-btrfs/msg50771.html > > https://bugzilla.redhat.com/show_bug.cgi?id=1294531 > > > > > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 520 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2019-12-26 5:08 ` Qu Wenruo @ 2019-12-26 13:17 ` Leszek Dubiel 2019-12-26 13:44 ` Remi Gauvin 0 siblings, 1 reply; 32+ messages in thread From: Leszek Dubiel @ 2019-12-26 13:17 UTC (permalink / raw) To: linux-btrfs; +Cc: Qu Wenruo > Are you using qgroups? > > You can verify it by "btrfs qgroup show -pcre <mount>" to verify. > Qgroup can hugely impact performance for balance, if using kernel older > than v5.2. > > You can either disable qgroup, losing the ability to trace how many > bytes are exclusively used by each subvolume, or upgrade to v5.2 kernel > to solve it. > Not using qgroups. root@wawel:~# btrfs qgroup show -pcre / ERROR: can't list qgroups: quotas not enabled Current data transfer speed is 2Mb/sek. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2019-12-26 13:17 ` Leszek Dubiel @ 2019-12-26 13:44 ` Remi Gauvin 2019-12-26 14:05 ` Leszek Dubiel 0 siblings, 1 reply; 32+ messages in thread From: Remi Gauvin @ 2019-12-26 13:44 UTC (permalink / raw) To: Leszek Dubiel, linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 579 bytes --] On 2019-12-26 8:17 a.m., Leszek Dubiel wrote: > Not using qgroups. > > root@wawel:~# btrfs qgroup show -pcre / > ERROR: can't list qgroups: quotas not enabled > > > Current data transfer speed is 2Mb/sek. > > Did you check dmesg to see if you're getting a whole pile of I/O errors on the failing disk? It might be necessary to remove it (physically) and finish replacing it with a degraded array. (I would not jump to trying that, however, as there are additional risks if your metadata is Raid6 as well, and you have any write hole corruption.) [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2019-12-26 13:44 ` Remi Gauvin @ 2019-12-26 14:05 ` Leszek Dubiel 2019-12-26 14:21 ` Remi Gauvin 2019-12-26 22:40 ` Chris Murphy 0 siblings, 2 replies; 32+ messages in thread From: Leszek Dubiel @ 2019-12-26 14:05 UTC (permalink / raw) To: Remi Gauvin, linux-btrfs > Did you check dmesg to see if you're getting a whole pile of I/O errors > on the failing disk? It might be necessary to remove it (physically) > and finish replacing it with a degraded array. (I would not jump to > trying that, however, as there are additional risks if your metadata is > Raid6 as well, and you have any write hole corruption.) I didn't use Raid6, but Raid0. Yes I have chcecked dmesg in first place (see original mail). Btrfs no error logged. root@wawel:~# btrfs dev stat / [/dev/sda2].write_io_errs 0 [/dev/sda2].read_io_errs 0 [/dev/sda2].flush_io_errs 0 [/dev/sda2].corruption_errs 0 [/dev/sda2].generation_errs 0 [/dev/sdb2].write_io_errs 0 [/dev/sdb2].read_io_errs 0 [/dev/sdb2].flush_io_errs 0 [/dev/sdb2].corruption_errs 0 [/dev/sdb2].generation_errs 0 [/dev/sdc2].write_io_errs 0 [/dev/sdc2].read_io_errs 0 [/dev/sdc2].flush_io_errs 0 [/dev/sdc2].corruption_errs 0 [/dev/sdc2].generation_errs 0 But currently it looks like this: Dec 22 17:44:37 wawel kernel: [ 179.075180] BTRFS warning (device sda2): block group 5414977536000 has wrong amount of free space Dec 22 17:44:37 wawel kernel: [ 179.075185] BTRFS warning (device sda2): failed to load free space cache for block group 5414977536000, rebuilding it now Dec 22 18:59:17 wawel kernel: [ 4659.089590] BTRFS info (device sda2): relocating block group 8562114822144 flags metadata|raid1 Dec 22 19:01:44 wawel kernel: [ 4805.180779] BTRFS info (device sda2): relocating block group 8562114822144 flags metadata|raid1 Dec 22 19:02:14 wawel kernel: [ 4835.167239] BTRFS info (device sda2): found 12211 extents Dec 22 19:02:14 wawel kernel: [ 4835.641938] BTRFS info (device sda2): relocating block group 8555672371200 flags data Dec 22 19:02:24 wawel kernel: [ 4845.378907] BTRFS info (device sda2): found 10 extents Dec 22 19:02:26 wawel kernel: [ 4847.909681] BTRFS info (device sda2): found 10 extents Dec 22 19:02:27 wawel kernel: [ 4848.442186] BTRFS info (device sda2): relocating block group 8552451145728 flags data Dec 22 19:02:27 wawel kernel: [ 4848.639837] BTRFS warning (device sda2): block group 6255717384192 has wrong amount of free space Dec 22 19:02:27 wawel kernel: [ 4848.639841] BTRFS warning (device sda2): failed to load free space cache for block group 6255717384192, rebuilding it now Dec 22 19:02:37 wawel kernel: [ 4858.473525] BTRFS info (device sda2): found 8 extents Dec 22 19:03:03 wawel kernel: [ 4884.425697] BTRFS info (device sda2): found 8 extents Dec 22 19:03:16 wawel kernel: [ 4898.034082] BTRFS info (device sda2): relocating block group 8549229920256 flags data Dec 22 19:03:40 wawel kernel: [ 4921.855827] BTRFS info (device sda2): found 12 extents Dec 22 19:03:49 wawel kernel: [ 4931.023551] BTRFS info (device sda2): found 12 extents Dec 22 19:03:50 wawel kernel: [ 4931.603340] BTRFS info (device sda2): relocating block group 8546008694784 flags data Dec 22 19:04:05 wawel kernel: [ 4946.520618] BTRFS info (device sda2): found 8 extents Dec 22 19:04:14 wawel kernel: [ 4955.158390] BTRFS info (device sda2): found 8 extents Dec 22 19:04:15 wawel kernel: [ 4956.492239] BTRFS info (device sda2): found 8 extents Dec 22 19:04:16 wawel kernel: [ 4957.213730] BTRFS info (device sda2): relocating block group 8542787469312 flags data Dec 22 19:04:32 wawel kernel: [ 4973.410098] BTRFS info (device sda2): found 12 extents Dec 22 19:04:35 wawel kernel: [ 4976.628967] BTRFS info (device sda2): found 12 extents Dec 22 19:04:35 wawel kernel: [ 4977.062360] BTRFS info (device sda2): relocating block group 8539566243840 flags data Dec 22 19:04:46 wawel kernel: [ 4987.604348] BTRFS info (device sda2): found 8 extents Dec 22 19:04:49 wawel kernel: [ 4991.071047] BTRFS info (device sda2): found 8 extents Dec 22 19:04:50 wawel kernel: [ 4991.867354] BTRFS info (device sda2): relocating block group 8536345018368 flags data Dec 22 19:05:02 wawel kernel: [ 5003.539073] BTRFS info (device sda2): found 10 extents Dec 22 19:05:06 wawel kernel: [ 5007.309308] BTRFS info (device sda2): found 10 extents Dec 22 19:05:06 wawel kernel: [ 5007.938822] BTRFS info (device sda2): relocating block group 8533123792896 flags data Dec 22 19:05:19 wawel kernel: [ 5020.576346] BTRFS info (device sda2): found 11 extents Dec 22 19:05:32 wawel kernel: [ 5033.787681] BTRFS info (device sda2): found 11 extents Dec 22 19:05:45 wawel kernel: [ 5046.894372] BTRFS info (device sda2): relocating block group 8529902567424 flags data Dec 22 19:06:08 wawel kernel: [ 5069.249286] BTRFS info (device sda2): found 8 extents Dec 22 19:06:18 wawel kernel: [ 5079.974289] BTRFS info (device sda2): found 8 extents Dec 22 19:06:19 wawel kernel: [ 5080.609198] BTRFS info (device sda2): relocating block group 8526681341952 flags data Dec 22 19:06:30 wawel kernel: [ 5091.408195] BTRFS info (device sda2): found 8 extents Dec 22 19:06:34 wawel kernel: [ 5095.494893] BTRFS info (device sda2): found 8 extents Dec 22 19:06:34 wawel kernel: [ 5095.974846] BTRFS info (device sda2): relocating block group 8523460116480 flags data Dec 22 19:06:48 wawel kernel: [ 5109.948486] BTRFS info (device sda2): found 8 extents Dec 22 19:06:50 wawel kernel: [ 5111.872561] BTRFS info (device sda2): found 8 extents Dec 22 19:06:51 wawel kernel: [ 5112.262733] BTRFS info (device sda2): relocating block group 8520238891008 flags data Dec 22 19:07:01 wawel kernel: [ 5122.295835] BTRFS info (device sda2): found 11 extents Dec 22 19:07:03 wawel kernel: [ 5124.152445] BTRFS info (device sda2): found 11 extents Dec 22 19:07:03 wawel kernel: [ 5124.603091] BTRFS info (device sda2): relocating block group 8517017665536 flags data Dec 22 19:07:14 wawel kernel: [ 5135.149621] BTRFS info (device sda2): found 8 extents Dec 22 19:07:15 wawel kernel: [ 5136.612473] BTRFS info (device sda2): found 8 extents Dec 22 19:07:15 wawel kernel: [ 5137.042461] BTRFS info (device sda2): relocating block group 8513796440064 flags data Dec 22 19:07:26 wawel kernel: [ 5147.805243] BTRFS info (device sda2): found 11 extents Dec 22 19:07:30 wawel kernel: [ 5151.553394] BTRFS info (device sda2): found 11 extents Dec 22 19:07:30 wawel kernel: [ 5152.056795] BTRFS info (device sda2): relocating block group 8510575214592 flags data Dec 22 19:07:41 wawel kernel: [ 5162.747546] BTRFS info (device sda2): found 8 extents Dec 22 19:07:44 wawel kernel: [ 5165.848208] BTRFS info (device sda2): found 8 extents Dec 22 19:07:45 wawel kernel: [ 5166.412946] BTRFS info (device sda2): relocating block group 8507353989120 flags data Dec 22 19:07:55 wawel kernel: [ 5176.370942] BTRFS info (device sda2): found 12 extents Dec 22 19:07:56 wawel kernel: [ 5177.729094] BTRFS info (device sda2): found 12 extents Dec 22 19:07:57 wawel kernel: [ 5178.312743] BTRFS info (device sda2): relocating block group 8504132763648 flags data Dec 22 19:08:03 wawel kernel: [ 5184.392945] BTRFS info (device sda2): found 4 extents Dec 22 19:08:04 wawel kernel: [ 5185.610488] BTRFS info (device sda2): found 4 extents Dec 22 19:08:04 wawel kernel: [ 5186.076337] BTRFS info (device sda2): relocating block group 8500911538176 flags data Dec 22 19:08:15 wawel kernel: [ 5196.915114] BTRFS info (device sda2): found 14 extents Dec 22 19:08:38 wawel kernel: [ 5219.384163] BTRFS info (device sda2): found 14 extents Dec 22 19:08:53 wawel kernel: [ 5234.746250] BTRFS info (device sda2): relocating block group 8497690312704 flags data Dec 22 19:09:10 wawel kernel: [ 5252.065078] BTRFS info (device sda2): found 8 extents Dec 22 19:09:20 wawel kernel: [ 5261.174092] BTRFS info (device sda2): found 8 extents Dec 22 19:09:20 wawel kernel: [ 5262.064000] BTRFS info (device sda2): relocating block group 8494469087232 flags data Dec 22 19:09:31 wawel kernel: [ 5272.940366] BTRFS info (device sda2): found 9 extents Dec 22 19:09:35 wawel kernel: [ 5276.735998] BTRFS info (device sda2): found 9 extents Dec 22 19:09:36 wawel kernel: [ 5277.343659] BTRFS info (device sda2): relocating block group 8491247861760 flags data Dec 22 19:09:47 wawel kernel: [ 5288.587703] BTRFS info (device sda2): found 8 extents Dec 22 19:09:51 wawel kernel: [ 5292.187071] BTRFS info (device sda2): found 8 extents Dec 22 19:09:51 wawel kernel: [ 5292.617360] BTRFS info (device sda2): relocating block group 8489100378112 flags data Dec 22 19:09:55 wawel kernel: [ 5296.165794] BTRFS info (device sda2): found 2 extents Dec 22 19:09:57 wawel kernel: [ 5298.717969] BTRFS info (device sda2): found 2 extents Dec 22 19:09:58 wawel kernel: [ 5300.067976] BTRFS info (device sda2): relocating block group 8485879152640 flags data Dec 22 19:10:03 wawel kernel: [ 5304.928496] BTRFS info (device sda2): found 3 extents Dec 22 19:20:11 wawel kernel: [ 5912.116874] ata1.00: exception Emask 0x0 SAct 0x1f80 SErr 0x0 action 0x0 Dec 22 19:20:11 wawel kernel: [ 5912.116878] ata1.00: irq_stat 0x40000008 Dec 22 19:20:11 wawel kernel: [ 5912.116880] ata1.00: failed command: READ FPDMA QUEUED Dec 22 19:20:11 wawel kernel: [ 5912.116882] ata1.00: cmd 60/00:38:00:00:98/0a:00:45:01:00/40 tag 7 ncq dma 1310720 in Dec 22 19:20:11 wawel kernel: [ 5912.116882] res 43/40:18:e8:05:98/00:04:45:01:00/40 Emask 0x409 (media error) <F> Dec 22 19:20:11 wawel kernel: [ 5912.116885] ata1.00: status: { DRDY SENSE ERR } Dec 22 19:20:11 wawel kernel: [ 5912.116886] ata1.00: error: { UNC } Dec 22 19:20:11 wawel kernel: [ 5912.153695] ata1.00: configured for UDMA/133 Dec 22 19:20:11 wawel kernel: [ 5912.153707] sd 0:0:0:0: [sda] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Dec 22 19:20:11 wawel kernel: [ 5912.153709] sd 0:0:0:0: [sda] tag#7 Sense Key : Medium Error [current] Dec 22 19:20:11 wawel kernel: [ 5912.153710] sd 0:0:0:0: [sda] tag#7 Add. Sense: Unrecovered read error - auto reallocate failed Dec 22 19:20:11 wawel kernel: [ 5912.153711] sd 0:0:0:0: [sda] tag#7 CDB: Read(16) 88 00 00 00 00 01 45 98 00 00 00 00 0a 00 00 00 Dec 22 19:20:11 wawel kernel: [ 5912.153712] print_req_error: I/O error, dev sda, sector 5462556672 Dec 22 19:20:11 wawel kernel: [ 5912.153724] ata1: EH complete Dec 22 19:21:28 wawel kernel: [ 5989.527853] BTRFS info (device sda2): found 8 extents Dec 22 19:21:29 wawel kernel: [ 5990.861317] BTRFS info (device sda2): found 8 extents Dec 22 19:21:30 wawel kernel: [ 5991.492017] BTRFS info (device sda2): relocating block group 8392463613952 flags data Dec 22 19:21:40 wawel kernel: [ 6001.782573] BTRFS info (device sda2): found 10 extents Dec 22 19:22:01 wawel kernel: [ 6022.162622] BTRFS info (device sda2): found 10 extents Dec 22 19:22:17 wawel kernel: [ 6039.036941] BTRFS info (device sda2): relocating block group 8389242388480 flags data Dec 22 19:22:33 wawel kernel: [ 6054.947905] BTRFS info (device sda2): found 9 extents Dec 22 19:22:40 wawel kernel: [ 6061.630503] BTRFS info (device sda2): found 9 extents Dec 22 19:22:41 wawel kernel: [ 6062.758008] BTRFS info (device sda2): relocating block group 8386021163008 flags data Dec 22 19:22:53 wawel kernel: [ 6074.648119] BTRFS info (device sda2): found 8 extents Dec 22 19:22:55 wawel kernel: [ 6076.264156] BTRFS info (device sda2): found 8 extents Dec 22 23:31:24 wawel kernel: [20985.918518] BTRFS info (device sda2): found 17 extents Dec 22 23:31:25 wawel kernel: [20986.705143] BTRFS info (device sda2): relocating block group 7150144323584 flags data Dec 22 23:31:36 wawel kernel: [20997.997520] BTRFS info (device sda2): found 11 extents Dec 22 23:31:56 wawel kernel: [21017.914604] BTRFS info (device sda2): found 11 extents Dec 22 23:32:05 wawel kernel: [21027.095442] BTRFS info (device sda2): relocating block group 7146923098112 flags data Dec 22 23:32:42 wawel kernel: [21064.078043] BTRFS info (device sda2): found 11 extents Dec 22 23:32:52 wawel kernel: [21073.534844] BTRFS info (device sda2): found 11 extents Dec 22 23:32:53 wawel kernel: [21074.527201] BTRFS info (device sda2): relocating block group 7141554388992 flags data Dec 22 23:33:05 wawel kernel: [21086.947393] BTRFS info (device sda2): found 14 extents Dec 22 23:33:14 wawel kernel: [21095.872547] BTRFS info (device sda2): found 14 extents Dec 22 23:33:16 wawel kernel: [21097.161827] BTRFS info (device sda2): relocating block group 7138333163520 flags data Dec 22 23:33:27 wawel kernel: [21108.868346] BTRFS info (device sda2): found 14 extents Dec 22 23:33:35 wawel kernel: [21116.664273] BTRFS info (device sda2): found 14 extents Dec 22 23:33:36 wawel kernel: [21117.233358] BTRFS info (device sda2): relocating block group 8559967338496 flags metadata|raid1 Dec 22 23:47:19 wawel kernel: [21940.204817] BTRFS info (device sda2): found 59958 extents Dec 22 23:47:19 wawel kernel: [21940.935869] BTRFS info (device sda2): relocating block group 8558893596672 flags metadata|raid1 Dec 22 23:57:18 wawel kernel: [22539.150585] BTRFS info (device sda2): found 63167 extents Dec 22 23:57:18 wawel kernel: [22540.041179] BTRFS info (device sda2): relocating block group 7128669487104 flags data Dec 22 23:57:31 wawel kernel: [22552.964707] BTRFS info (device sda2): found 13 extents Dec 22 23:57:49 wawel kernel: [22570.932560] BTRFS info (device sda2): found 13 extents Dec 22 23:57:50 wawel kernel: [22571.760150] BTRFS info (device sda2): relocating block group 7125448261632 flags data Dec 22 23:58:02 wawel kernel: [22583.650959] BTRFS info (device sda2): found 11 extents Dec 22 23:58:10 wawel kernel: [22592.036311] BTRFS info (device sda2): found 11 extents Dec 22 23:58:11 wawel kernel: [22592.520003] BTRFS info (device sda2): relocating block group 7122227036160 flags data Dec 22 23:58:22 wawel kernel: [22603.590833] BTRFS info (device sda2): found 16 extents Dec 22 23:58:35 wawel kernel: [22616.179240] BTRFS info (device sda2): found 16 extents Dec 22 23:58:35 wawel kernel: [22616.810120] BTRFS info (device sda2): relocating block group 7119005810688 flags data Dec 22 23:58:54 wawel kernel: [22635.938434] BTRFS info (device sda2): found 17 extents Dec 22 23:59:30 wawel kernel: [22671.668073] BTRFS info (device sda2): found 17 extents Dec 22 23:59:33 wawel kernel: [22675.053253] BTRFS info (device sda2): found 6 extents Dec 22 23:59:34 wawel kernel: [22675.806876] BTRFS info (device sda2): relocating block group 7115784585216 flags data Dec 22 23:59:47 wawel kernel: [22689.015207] BTRFS info (device sda2): found 15 extents Dec 23 00:00:04 wawel kernel: [22705.897111] BTRFS info (device sda2): found 15 extents Dec 23 00:00:11 wawel kernel: [22712.320353] BTRFS info (device sda2): relocating block group 7112563359744 flags data Dec 23 00:01:19 wawel kernel: [22780.810851] BTRFS info (device sda2): found 25 extents Dec 23 00:02:23 wawel kernel: [22844.836838] BTRFS info (device sda2): found 25 extents Dec 23 00:02:24 wawel kernel: [22845.996013] BTRFS info (device sda2): relocating block group 7109342134272 flags data Dec 23 00:02:36 wawel kernel: [22857.961127] BTRFS info (device sda2): found 14 extents Dec 23 00:02:43 wawel kernel: [22864.547374] BTRFS info (device sda2): found 14 extents Dec 23 00:02:44 wawel kernel: [22865.258107] BTRFS info (device sda2): relocating block group 7106120908800 flags data Dec 23 00:02:55 wawel kernel: [22876.860762] BTRFS info (device sda2): found 14 extents Dec 23 00:03:02 wawel kernel: [22883.804570] BTRFS info (device sda2): found 14 extents Dec 23 00:03:03 wawel kernel: [22884.363060] BTRFS info (device sda2): relocating block group 7102899683328 flags data Dec 23 00:03:14 wawel kernel: [22895.383964] BTRFS info (device sda2): found 8 extents Dec 23 00:03:38 wawel kernel: [22919.988908] BTRFS info (device sda2): found 8 extents Dec 23 00:03:53 wawel kernel: [22935.003150] BTRFS info (device sda2): relocating block group 7081424846848 flags data Dec 23 00:04:37 wawel kernel: [22978.751114] BTRFS info (device sda2): found 9 extents Dec 23 00:04:56 wawel kernel: [22997.228499] BTRFS info (device sda2): found 9 extents Dec 23 00:04:57 wawel kernel: [22998.219232] BTRFS info (device sda2): relocating block group 7078203621376 flags data Dec 23 00:08:20 wawel kernel: [23201.188424] INFO: task btrfs:2546 blocked for more than 120 seconds. Dec 23 00:08:20 wawel kernel: [23201.188437] Not tainted 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2 Dec 23 00:08:20 wawel kernel: [23201.188442] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 23 00:08:20 wawel kernel: [23201.188449] btrfs D 0 2546 2539 0x00000000 Dec 23 00:08:20 wawel kernel: [23201.188454] Call Trace: Dec 23 00:08:20 wawel kernel: [23201.188468] ? __schedule+0x2a2/0x870 Dec 23 00:08:20 wawel kernel: [23201.188476] schedule+0x28/0x80 Dec 23 00:08:20 wawel kernel: [23201.188480] schedule_preempt_disabled+0xa/0x10 Dec 23 00:08:20 wawel kernel: [23201.188486] __mutex_lock.isra.8+0x2b5/0x4a0 Dec 23 00:08:20 wawel kernel: [23201.188550] btrfs_relocate_block_group.cold.48+0x48/0x1ae [btrfs] Dec 23 00:08:20 wawel kernel: [23201.188609] btrfs_relocate_chunk+0x31/0xa0 [btrfs] Dec 23 00:08:20 wawel kernel: [23201.188666] btrfs_shrink_device+0x1e1/0x550 [btrfs] Dec 23 00:08:20 wawel kernel: [23201.188722] ? btrfs_find_device_missing_or_by_path+0x110/0x130 [btrfs] Dec 23 00:08:20 wawel kernel: [23201.188775] btrfs_rm_device+0x165/0x590 [btrfs] Dec 23 00:08:20 wawel kernel: [23201.188782] ? __kmalloc_track_caller+0x16c/0x210 Dec 23 00:08:20 wawel kernel: [23201.188788] ? __check_object_size+0x15d/0x189 Dec 23 00:08:20 wawel kernel: [23201.188841] btrfs_ioctl+0x2a8e/0x2d90 [btrfs] Dec 23 00:08:20 wawel kernel: [23201.188848] ? __check_object_size+0x15d/0x189 Dec 23 00:08:20 wawel kernel: [23201.188855] ? _copy_to_user+0x26/0x30 Dec 23 00:08:20 wawel kernel: [23201.188861] ? cp_new_stat+0x150/0x180 Dec 23 00:08:20 wawel kernel: [23201.188868] ? do_vfs_ioctl+0xa4/0x630 Dec 23 00:08:20 wawel kernel: [23201.188873] do_vfs_ioctl+0xa4/0x630 Dec 23 00:08:20 wawel kernel: [23201.188879] ? __do_sys_newstat+0x48/0x70 Dec 23 00:08:20 wawel kernel: [23201.188885] ksys_ioctl+0x60/0x90 Dec 23 00:08:20 wawel kernel: [23201.188891] __x64_sys_ioctl+0x16/0x20 Dec 23 00:08:20 wawel kernel: [23201.188898] do_syscall_64+0x53/0x110 Dec 23 00:08:20 wawel kernel: [23201.188905] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Dec 23 00:08:20 wawel kernel: [23201.188910] RIP: 0033:0x7f3d307fd427 Dec 23 00:08:20 wawel kernel: [23201.188919] Code: Bad RIP value. Dec 23 00:08:20 wawel kernel: [23201.188922] RSP: 002b:00007ffc69d36b48 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Dec 23 00:08:20 wawel kernel: [23201.188927] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3d307fd427 Dec 23 00:08:20 wawel kernel: [23201.188929] RDX: 00007ffc69d37b68 RSI: 000000005000943a RDI: 0000000000000003 Dec 23 00:08:20 wawel kernel: [23201.188931] RBP: 0000000000000001 R08: 0000000000000008 R09: 0000000000000078 Dec 23 00:08:20 wawel kernel: [23201.188933] R10: fffffffffffffa4a R11: 0000000000000206 R12: 00007ffc69d38ce8 Dec 23 00:08:20 wawel kernel: [23201.188935] R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000003 Dec 23 00:10:20 wawel kernel: [23322.019744] INFO: task btrfs:2546 blocked for more than 120 seconds. Dec 23 00:10:20 wawel kernel: [23322.019757] Not tainted 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2 Dec 23 00:10:20 wawel kernel: [23322.019762] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 23 00:10:20 wawel kernel: [23322.019768] btrfs D 0 2546 2539 0x00000000 Dec 23 00:10:20 wawel kernel: [23322.019774] Call Trace: Dec 23 00:10:20 wawel kernel: [23322.019788] ? __schedule+0x2a2/0x870 Dec 23 00:10:20 wawel kernel: [23322.019795] schedule+0x28/0x80 Dec 23 00:10:20 wawel kernel: [23322.019800] schedule_preempt_disabled+0xa/0x10 Dec 23 00:10:20 wawel kernel: [23322.019805] __mutex_lock.isra.8+0x2b5/0x4a0 Dec 23 00:10:20 wawel kernel: [23322.019864] btrfs_relocate_block_group.cold.48+0x48/0x1ae [btrfs] Dec 23 00:10:20 wawel kernel: [23322.019923] btrfs_relocate_chunk+0x31/0xa0 [btrfs] Dec 23 00:10:20 wawel kernel: [23322.019979] btrfs_shrink_device+0x1e1/0x550 [btrfs] Dec 23 00:10:20 wawel kernel: [23322.020035] ? btrfs_find_device_missing_or_by_path+0x110/0x130 [btrfs] Dec 23 00:10:20 wawel kernel: [23322.020088] btrfs_rm_device+0x165/0x590 [btrfs] Dec 23 00:10:20 wawel kernel: [23322.020095] ? __kmalloc_track_caller+0x16c/0x210 Dec 23 00:10:20 wawel kernel: [23322.020101] ? __check_object_size+0x15d/0x189 Dec 23 00:10:20 wawel kernel: [23322.020152] btrfs_ioctl+0x2a8e/0x2d90 [btrfs] Dec 23 00:10:20 wawel kernel: [23322.020159] ? __check_object_size+0x15d/0x189 Dec 23 00:10:20 wawel kernel: [23322.020166] ? _copy_to_user+0x26/0x30 Dec 23 00:10:20 wawel kernel: [23322.020171] ? cp_new_stat+0x150/0x180 Dec 23 00:10:20 wawel kernel: [23322.020179] ? do_vfs_ioctl+0xa4/0x630 Dec 23 00:10:20 wawel kernel: [23322.020183] do_vfs_ioctl+0xa4/0x630 Dec 23 00:10:20 wawel kernel: [23322.020189] ? __do_sys_newstat+0x48/0x70 Dec 23 00:10:20 wawel kernel: [23322.020195] ksys_ioctl+0x60/0x90 Dec 23 00:10:20 wawel kernel: [23322.020201] __x64_sys_ioctl+0x16/0x20 Dec 23 00:10:20 wawel kernel: [23322.020207] do_syscall_64+0x53/0x110 Dec 23 00:10:20 wawel kernel: [23322.020214] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Dec 23 00:10:20 wawel kernel: [23322.020219] RIP: 0033:0x7f3d307fd427 Dec 23 00:10:20 wawel kernel: [23322.020228] Code: Bad RIP value. Dec 23 00:10:20 wawel kernel: [23322.020231] RSP: 002b:00007ffc69d36b48 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Dec 23 00:10:20 wawel kernel: [23322.020235] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3d307fd427 Dec 23 00:10:20 wawel kernel: [23322.020238] RDX: 00007ffc69d37b68 RSI: 000000005000943a RDI: 0000000000000003 Dec 23 00:10:20 wawel kernel: [23322.020240] RBP: 0000000000000001 R08: 0000000000000008 R09: 0000000000000078 Dec 23 00:10:20 wawel kernel: [23322.020242] R10: fffffffffffffa4a R11: 0000000000000206 R12: 00007ffc69d38ce8 Dec 23 00:10:20 wawel kernel: [23322.020244] R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000003 Dec 23 00:12:21 wawel kernel: [23442.851059] INFO: task btrfs:2546 blocked for more than 120 seconds. Dec 23 00:12:21 wawel kernel: [23442.851076] Not tainted 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2 Dec 23 00:12:21 wawel kernel: [23442.851086] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 23 00:12:21 wawel kernel: [23442.851097] btrfs D 0 2546 2539 0x00000000 Dec 23 00:12:21 wawel kernel: [23442.851105] Call Trace: Dec 23 00:12:21 wawel kernel: [23442.851122] ? __schedule+0x2a2/0x870 Dec 23 00:12:21 wawel kernel: [23442.851131] schedule+0x28/0x80 Dec 23 00:12:21 wawel kernel: [23442.851138] schedule_preempt_disabled+0xa/0x10 Dec 23 00:12:21 wawel kernel: [23442.851145] __mutex_lock.isra.8+0x2b5/0x4a0 Dec 23 00:12:21 wawel kernel: [23442.851228] btrfs_relocate_block_group.cold.48+0x48/0x1ae [btrfs] Dec 23 00:12:21 wawel kernel: [23442.851310] btrfs_relocate_chunk+0x31/0xa0 [btrfs] Dec 23 00:12:21 wawel kernel: [23442.851391] btrfs_shrink_device+0x1e1/0x550 [btrfs] Dec 23 00:12:21 wawel kernel: [23442.851469] ? btrfs_find_device_missing_or_by_path+0x110/0x130 [btrfs] Dec 23 00:12:21 wawel kernel: [23442.851544] btrfs_rm_device+0x165/0x590 [btrfs] Dec 23 00:12:21 wawel kernel: [23442.851554] ? __kmalloc_track_caller+0x16c/0x210 Dec 23 00:12:21 wawel kernel: [23442.851562] ? __check_object_size+0x15d/0x189 Dec 23 00:12:21 wawel kernel: [23442.851637] btrfs_ioctl+0x2a8e/0x2d90 [btrfs] Dec 23 00:12:21 wawel kernel: [23442.851646] ? __check_object_size+0x15d/0x189 Dec 23 00:12:21 wawel kernel: [23442.851655] ? _copy_to_user+0x26/0x30 Dec 23 00:12:21 wawel kernel: [23442.851662] ? cp_new_stat+0x150/0x180 Dec 23 00:12:21 wawel kernel: [23442.851672] ? do_vfs_ioctl+0xa4/0x630 Dec 23 00:12:21 wawel kernel: [23442.851678] do_vfs_ioctl+0xa4/0x630 Dec 23 00:12:21 wawel kernel: [23442.851685] ? __do_sys_newstat+0x48/0x70 Dec 23 00:12:21 wawel kernel: [23442.851694] ksys_ioctl+0x60/0x90 Dec 23 00:12:21 wawel kernel: [23442.851702] __x64_sys_ioctl+0x16/0x20 Dec 23 00:12:21 wawel kernel: [23442.851710] do_syscall_64+0x53/0x110 Dec 23 00:12:21 wawel kernel: [23442.851719] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Dec 23 00:12:21 wawel kernel: [23442.851725] RIP: 0033:0x7f3d307fd427 Dec 23 00:12:21 wawel kernel: [23442.851736] Code: Bad RIP value. Dec 23 00:12:21 wawel kernel: [23442.851739] RSP: 002b:00007ffc69d36b48 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Dec 23 00:12:21 wawel kernel: [23442.851746] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3d307fd427 Dec 23 00:12:21 wawel kernel: [23442.851749] RDX: 00007ffc69d37b68 RSI: 000000005000943a RDI: 0000000000000003 Dec 23 00:12:21 wawel kernel: [23442.851752] RBP: 0000000000000001 R08: 0000000000000008 R09: 0000000000000078 Dec 23 00:12:21 wawel kernel: [23442.851756] R10: fffffffffffffa4a R11: 0000000000000206 R12: 00007ffc69d38ce8 Dec 23 00:12:21 wawel kernel: [23442.851759] R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000003 Dec 23 00:14:22 wawel kernel: [23563.682406] INFO: task btrfs:2546 blocked for more than 120 seconds. Dec 23 00:14:22 wawel kernel: [23563.682424] Not tainted 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2 Dec 23 00:14:22 wawel kernel: [23563.682434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 23 00:14:22 wawel kernel: [23563.682446] btrfs D 0 2546 2539 0x00000000 Dec 23 00:14:22 wawel kernel: [23563.682453] Call Trace: Dec 23 00:14:22 wawel kernel: [23563.682470] ? __schedule+0x2a2/0x870 Dec 23 00:14:22 wawel kernel: [23563.682480] schedule+0x28/0x80 Dec 23 00:14:22 wawel kernel: [23563.682486] schedule_preempt_disabled+0xa/0x10 Dec 23 00:14:22 wawel kernel: [23563.682493] __mutex_lock.isra.8+0x2b5/0x4a0 Dec 23 00:14:22 wawel kernel: [23563.682577] btrfs_relocate_block_group.cold.48+0x48/0x1ae [btrfs] Dec 23 00:14:22 wawel kernel: [23563.682658] btrfs_relocate_chunk+0x31/0xa0 [btrfs] Dec 23 00:14:22 wawel kernel: [23563.682737] btrfs_shrink_device+0x1e1/0x550 [btrfs] Dec 23 00:14:22 wawel kernel: [23563.682817] ? btrfs_find_device_missing_or_by_path+0x110/0x130 [btrfs] Dec 23 00:14:22 wawel kernel: [23563.682892] btrfs_rm_device+0x165/0x590 [btrfs] Dec 23 00:14:22 wawel kernel: [23563.682902] ? __kmalloc_track_caller+0x16c/0x210 Dec 23 00:14:22 wawel kernel: [23563.682910] ? __check_object_size+0x15d/0x189 Dec 23 00:14:22 wawel kernel: [23563.682984] btrfs_ioctl+0x2a8e/0x2d90 [btrfs] Dec 23 00:14:22 wawel kernel: [23563.682993] ? __check_object_size+0x15d/0x189 Dec 23 00:14:22 wawel kernel: [23563.683002] ? _copy_to_user+0x26/0x30 Dec 23 00:14:22 wawel kernel: [23563.683009] ? cp_new_stat+0x150/0x180 Dec 23 00:14:22 wawel kernel: [23563.683018] ? do_vfs_ioctl+0xa4/0x630 Dec 23 00:14:22 wawel kernel: [23563.683024] do_vfs_ioctl+0xa4/0x630 Dec 23 00:14:22 wawel kernel: [23563.683032] ? __do_sys_newstat+0x48/0x70 Dec 23 00:14:22 wawel kernel: [23563.683040] ksys_ioctl+0x60/0x90 Dec 23 00:14:22 wawel kernel: [23563.683048] __x64_sys_ioctl+0x16/0x20 Dec 23 00:14:22 wawel kernel: [23563.683056] do_syscall_64+0x53/0x110 Dec 23 00:14:22 wawel kernel: [23563.683065] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Dec 23 00:14:22 wawel kernel: [23563.683071] RIP: 0033:0x7f3d307fd427 Dec 23 00:14:22 wawel kernel: [23563.683081] Code: Bad RIP value. Dec 23 00:14:22 wawel kernel: [23563.683085] RSP: 002b:00007ffc69d36b48 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Dec 23 00:14:22 wawel kernel: [23563.683092] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3d307fd427 Dec 23 00:14:22 wawel kernel: [23563.683096] RDX: 00007ffc69d37b68 RSI: 000000005000943a RDI: 0000000000000003 Dec 23 00:14:22 wawel kernel: [23563.683099] RBP: 0000000000000001 R08: 0000000000000008 R09: 0000000000000078 Dec 23 00:14:22 wawel kernel: [23563.683102] R10: fffffffffffffa4a R11: 0000000000000206 R12: 00007ffc69d38ce8 Dec 23 00:14:22 wawel kernel: [23563.683104] R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000003 Dec 23 00:14:55 wawel kernel: [23596.566285] BTRFS info (device sda2): found 16 extents Dec 23 00:15:21 wawel kernel: [23622.371794] BTRFS info (device sda2): found 16 extents Dec 23 00:15:21 wawel kernel: [23622.805017] BTRFS info (device sda2): relocating block group 7073908654080 flags data Dec 23 00:15:32 wawel kernel: [23633.535788] BTRFS info (device sda2): found 18 extents Dec 23 00:15:56 wawel kernel: [23657.552946] BTRFS info (device sda2): found 18 extents Dec 23 00:15:57 wawel kernel: [23658.309332] BTRFS info (device sda2): relocating block group 7070687428608 flags data Dec 23 00:16:11 wawel kernel: [23672.185586] BTRFS info (device sda2): found 16 extents Dec 23 00:16:18 wawel kernel: [23679.466177] BTRFS info (device sda2): found 16 extents Dec 23 00:16:18 wawel kernel: [23679.780450] BTRFS info (device sda2): relocating block group 7067466203136 flags data Dec 23 00:16:29 wawel kernel: [23691.026282] BTRFS info (device sda2): found 8 extents Dec 23 00:16:32 wawel kernel: [23693.545940] BTRFS info (device sda2): found 8 extents Dec 23 00:16:32 wawel kernel: [23693.977786] BTRFS info (device sda2): relocating block group 7064244977664 flags data Dec 23 00:16:43 wawel kernel: [23704.182394] BTRFS info (device sda2): found 13 extents Dec 23 00:16:50 wawel kernel: [23711.975492] BTRFS info (device sda2): found 13 extents Dec 23 00:16:51 wawel kernel: [23712.408224] BTRFS info (device sda2): relocating block group 7061023752192 flags data Dec 23 00:17:01 wawel kernel: [23722.754585] BTRFS info (device sda2): found 11 extents Dec 23 00:17:09 wawel kernel: [23730.262877] BTRFS info (device sda2): found 11 extents Dec 23 00:17:11 wawel kernel: [23732.302305] BTRFS info (device sda2): found 11 extents Dec 23 00:17:11 wawel kernel: [23732.722001] BTRFS info (device sda2): relocating block group 7057802526720 flags data Dec 23 00:17:26 wawel kernel: [23747.679932] BTRFS info (device sda2): found 8 extents Dec 23 00:17:30 wawel kernel: [23751.451465] BTRFS info (device sda2): found 8 extents Dec 23 00:17:30 wawel kernel: [23751.876404] BTRFS info (device sda2): relocating block group 7054581301248 flags data Dec 23 00:17:41 wawel kernel: [23762.977551] BTRFS info (device sda2): found 8 extents Dec 23 00:17:42 wawel kernel: [23763.883422] BTRFS info (device sda2): found 8 extents Dec 23 00:17:43 wawel kernel: [23764.275202] BTRFS info (device sda2): relocating block group 7051360075776 flags data Dec 23 00:17:52 wawel kernel: [23773.479034] BTRFS info (device sda2): found 8 extents Dec 23 00:17:53 wawel kernel: [23774.431145] BTRFS info (device sda2): found 8 extents Dec 23 00:17:53 wawel kernel: [23774.905979] BTRFS info (device sda2): relocating block group 7048138850304 flags data Dec 23 00:18:21 wawel kernel: [23802.511276] BTRFS info (device sda2): found 18 extents Dec 23 00:18:47 wawel kernel: [23828.664441] BTRFS info (device sda2): found 18 extents Dec 23 00:18:51 wawel kernel: [23832.165407] BTRFS info (device sda2): found 9 extents Dec 23 00:18:51 wawel kernel: [23832.596833] BTRFS info (device sda2): relocating block group 7044917624832 flags data Dec 23 00:19:07 wawel kernel: [23848.329302] BTRFS info (device sda2): found 9 extents Dec 23 00:19:18 wawel kernel: [23859.557383] BTRFS info (device sda2): found 9 extents Dec 23 00:19:18 wawel kernel: [23859.916559] BTRFS info (device sda2): relocating block group 7041696399360 flags data Dec 23 00:19:31 wawel kernel: [23872.555941] BTRFS info (device sda2): found 18 extents Dec 23 00:19:46 wawel kernel: [23888.035472] BTRFS info (device sda2): found 18 extents Dec 23 00:19:47 wawel kernel: [23888.620939] BTRFS info (device sda2): relocating block group 7038475173888 flags data Dec 23 00:19:59 wawel kernel: [23900.878661] BTRFS info (device sda2): found 9 extents Dec 23 00:20:14 wawel kernel: [23915.924062] BTRFS info (device sda2): found 9 extents Dec 23 00:20:15 wawel kernel: [23916.373951] BTRFS info (device sda2): relocating block group 7035253948416 flags data Dec 23 00:20:29 wawel kernel: [23930.526982] BTRFS info (device sda2): found 13 extents Dec 23 00:20:40 wawel kernel: [23941.166196] BTRFS info (device sda2): found 13 extents Dec 23 00:20:40 wawel kernel: [23941.603215] BTRFS info (device sda2): relocating block group 7032032722944 flags data Dec 23 00:20:52 wawel kernel: [23953.361622] BTRFS info (device sda2): found 8 extents Dec 23 00:20:54 wawel kernel: [23955.734989] BTRFS info (device sda2): found 8 extents Dec 23 00:20:55 wawel kernel: [23956.150691] BTRFS info (device sda2): relocating block group 7028811497472 flags data Dec 23 00:21:07 wawel kernel: [23968.438551] BTRFS info (device sda2): found 11 extents Dec 23 00:21:11 wawel kernel: [23972.732940] BTRFS info (device sda2): found 11 extents Dec 23 00:21:12 wawel kernel: [23973.122960] BTRFS info (device sda2): relocating block group 7025590272000 flags data Dec 23 00:21:25 wawel kernel: [23986.098275] BTRFS info (device sda2): found 10 extents Dec 23 00:21:43 wawel kernel: [24004.479645] BTRFS info (device sda2): found 10 extents Dec 23 00:21:58 wawel kernel: [24019.409741] BTRFS info (device sda2): relocating block group 7022369046528 flags data Dec 23 00:22:16 wawel kernel: [24037.905347] BTRFS info (device sda2): found 10 extents Dec 23 00:22:36 wawel kernel: [24057.193719] BTRFS info (device sda2): found 10 extents Dec 23 00:22:42 wawel kernel: [24063.236104] BTRFS info (device sda2): relocating block group 7019147821056 flags data Dec 23 00:22:57 wawel kernel: [24078.235035] BTRFS info (device sda2): found 10 extents Dec 23 00:23:17 wawel kernel: [24098.884138] BTRFS info (device sda2): found 10 extents Dec 23 00:23:19 wawel kernel: [24100.237497] BTRFS info (device sda2): relocating block group 7015926595584 flags data Dec 23 00:23:32 wawel kernel: [24113.252248] BTRFS info (device sda2): found 11 extents Dec 26 09:06:30 wawel kernel: [124813.206299] BTRFS info (device sda2): found 538 extents Dec 26 09:06:41 wawel kernel: [124824.121993] BTRFS info (device sda2): relocating block group 2862659665920 flags data Dec 26 09:07:01 wawel kernel: [124844.420159] BTRFS info (device sda2): found 76 extents Dec 26 09:07:26 wawel kernel: [124869.796349] BTRFS info (device sda2): found 76 extents Dec 26 09:07:33 wawel kernel: [124876.379783] BTRFS info (device sda2): relocating block group 2859438440448 flags data Dec 26 09:07:55 wawel kernel: [124898.518810] BTRFS info (device sda2): found 314 extents Dec 26 09:08:20 wawel kernel: [124922.923430] BTRFS info (device sda2): found 314 extents Dec 26 09:08:24 wawel kernel: [124927.088396] BTRFS info (device sda2): relocating block group 2856217214976 flags data Dec 26 09:09:01 wawel kernel: [124964.879333] BTRFS info (device sda2): found 3257 extents Dec 26 09:10:07 wawel kernel: [125030.251298] BTRFS info (device sda2): found 3257 extents Dec 26 09:10:27 wawel kernel: [125050.474861] BTRFS info (device sda2): relocating block group 2852995989504 flags data Dec 26 09:11:17 wawel kernel: [125100.646368] BTRFS info (device sda2): found 720 extents Dec 26 09:11:57 wawel kernel: [125140.038078] BTRFS info (device sda2): found 720 extents Dec 26 09:12:00 wawel kernel: [125143.489371] BTRFS info (device sda2): relocating block group 2849774764032 flags data Dec 26 09:12:30 wawel kernel: [125173.882124] BTRFS info (device sda2): found 781 extents Dec 26 09:13:13 wawel kernel: [125216.215971] BTRFS info (device sda2): found 781 extents Dec 26 09:13:30 wawel kernel: [125233.369639] BTRFS info (device sda2): relocating block group 2846553538560 flags data Dec 26 09:14:10 wawel kernel: [125273.480275] BTRFS info (device sda2): found 863 extents Dec 26 09:14:40 wawel kernel: [125303.044820] BTRFS info (device sda2): found 863 extents Dec 26 09:14:50 wawel kernel: [125313.825051] BTRFS info (device sda2): relocating block group 2843332313088 flags data Dec 26 09:15:11 wawel kernel: [125334.898553] BTRFS info (device sda2): found 605 extents Dec 26 09:15:34 wawel kernel: [125357.032946] BTRFS info (device sda2): found 605 extents Dec 26 09:15:39 wawel kernel: [125362.305375] BTRFS info (device sda2): relocating block group 2840111087616 flags data Dec 26 09:15:57 wawel kernel: [125380.738443] BTRFS info (device sda2): found 822 extents Dec 26 09:16:27 wawel kernel: [125410.554652] BTRFS info (device sda2): found 822 extents Dec 26 09:16:33 wawel kernel: [125416.795941] BTRFS info (device sda2): found 1 extents Dec 26 09:16:40 wawel kernel: [125423.287556] BTRFS info (device sda2): found 1 extents Dec 26 09:16:41 wawel kernel: [125424.662419] BTRFS info (device sda2): relocating block group 2836889862144 flags data Dec 26 09:17:58 wawel kernel: [125500.923465] BTRFS info (device sda2): found 6105 extents Dec 26 09:20:20 wawel kernel: [125643.445337] BTRFS info (device sda2): found 6105 extents Dec 26 09:20:37 wawel kernel: [125660.315318] BTRFS info (device sda2): found 9 extents Dec 26 09:21:19 wawel kernel: [125702.836577] BTRFS info (device sda2): found 9 extents Dec 26 09:21:36 wawel kernel: [125719.739584] BTRFS info (device sda2): relocating block group 2833668636672 flags data Dec 26 09:22:15 wawel kernel: [125758.510136] BTRFS info (device sda2): found 1732 extents Dec 26 09:22:39 wawel kernel: [125782.335053] BTRFS info (device sda2): found 1732 extents Dec 26 09:22:43 wawel kernel: [125786.733606] BTRFS info (device sda2): relocating block group 2830447411200 flags data Dec 26 09:23:14 wawel kernel: [125817.174210] BTRFS info (device sda2): found 1540 extents Dec 26 09:23:32 wawel kernel: [125835.049452] BTRFS info (device sda2): found 1540 extents Dec 26 09:23:37 wawel kernel: [125840.764399] BTRFS info (device sda2): relocating block group 2827226185728 flags data Dec 26 09:23:59 wawel kernel: [125862.499798] BTRFS info (device sda2): found 1730 extents Dec 26 09:24:20 wawel kernel: [125883.528931] BTRFS info (device sda2): found 1730 extents Dec 26 09:24:25 wawel kernel: [125888.791587] BTRFS info (device sda2): relocating block group 2824004960256 flags data Dec 26 09:25:00 wawel kernel: [125923.748361] BTRFS info (device sda2): found 1453 extents Dec 26 09:25:30 wawel kernel: [125953.500151] BTRFS info (device sda2): found 1453 extents Dec 26 09:25:47 wawel kernel: [125970.178853] BTRFS info (device sda2): relocating block group 2820783734784 flags data Dec 26 09:26:19 wawel kernel: [126001.953383] BTRFS info (device sda2): found 1321 extents Dec 26 09:26:47 wawel kernel: [126030.506902] BTRFS info (device sda2): found 1321 extents Dec 26 09:26:59 wawel kernel: [126042.292928] BTRFS info (device sda2): relocating block group 2817562509312 flags data Dec 26 09:27:26 wawel kernel: [126069.399351] BTRFS info (device sda2): found 1321 extents Dec 26 09:28:54 wawel kernel: [126157.696317] BTRFS info (device sda2): found 1321 extents Dec 26 09:29:05 wawel kernel: [126167.934754] BTRFS info (device sda2): relocating block group 2814341283840 flags data Dec 26 09:29:29 wawel kernel: [126192.741151] BTRFS info (device sda2): found 1308 extents Dec 26 09:29:48 wawel kernel: [126210.929302] BTRFS info (device sda2): found 1308 extents Dec 26 09:29:55 wawel kernel: [126218.316856] BTRFS info (device sda2): relocating block group 2811120058368 flags data Dec 26 09:30:24 wawel kernel: [126247.373223] BTRFS info (device sda2): found 1302 extents Dec 26 09:30:50 wawel kernel: [126273.241357] BTRFS info (device sda2): found 1302 extents Dec 26 09:31:06 wawel kernel: [126289.751136] BTRFS info (device sda2): found 1301 extents Dec 26 09:31:11 wawel kernel: [126293.918325] BTRFS info (device sda2): relocating block group 2807898832896 flags data Dec 26 09:31:29 wawel kernel: [126312.570431] BTRFS info (device sda2): found 1239 extents Dec 26 09:31:57 wawel kernel: [126340.525339] BTRFS info (device sda2): found 1239 extents Dec 26 09:32:07 wawel kernel: [126350.580726] BTRFS info (device sda2): relocating block group 2804677607424 flags data Dec 26 09:32:38 wawel kernel: [126381.535312] BTRFS info (device sda2): found 1318 extents Dec 26 09:33:04 wawel kernel: [126407.888226] BTRFS info (device sda2): found 1318 extents Dec 26 09:33:17 wawel kernel: [126420.834830] BTRFS info (device sda2): relocating block group 2801456381952 flags data Dec 26 09:34:57 wawel kernel: [126520.878652] BTRFS info (device sda2): found 1314 extents Dec 26 09:35:30 wawel kernel: [126553.629757] BTRFS info (device sda2): found 1314 extents Dec 26 09:35:40 wawel kernel: [126563.198254] BTRFS info (device sda2): relocating block group 2798235156480 flags data Dec 26 09:36:05 wawel kernel: [126588.438641] BTRFS info (device sda2): found 1259 extents Dec 26 09:36:23 wawel kernel: [126605.928090] BTRFS info (device sda2): found 1259 extents Dec 26 09:36:31 wawel kernel: [126614.349082] BTRFS info (device sda2): relocating block group 2795013931008 flags data Dec 26 09:36:55 wawel kernel: [126638.306903] BTRFS info (device sda2): found 1254 extents Dec 26 09:37:13 wawel kernel: [126656.883642] BTRFS info (device sda2): found 1254 extents Dec 26 09:37:18 wawel kernel: [126661.319326] BTRFS info (device sda2): relocating block group 2791792705536 flags data Dec 26 09:37:39 wawel kernel: [126682.613613] BTRFS info (device sda2): found 1034 extents Dec 26 09:37:53 wawel kernel: [126695.934991] BTRFS info (device sda2): found 1034 extents Dec 26 09:37:58 wawel kernel: [126701.052766] BTRFS info (device sda2): relocating block group 2788571480064 flags data Dec 26 09:38:23 wawel kernel: [126726.414875] BTRFS info (device sda2): found 633 extents Dec 26 09:38:54 wawel kernel: [126757.375947] BTRFS info (device sda2): found 633 extents Dec 26 09:39:05 wawel kernel: [126768.679478] BTRFS info (device sda2): relocating block group 2785350254592 flags data Dec 26 09:39:36 wawel kernel: [126799.425144] BTRFS info (device sda2): found 693 extents Dec 26 09:40:06 wawel kernel: [126829.604815] BTRFS info (device sda2): found 693 extents Dec 26 09:40:25 wawel kernel: [126848.363498] BTRFS info (device sda2): relocating block group 2782129029120 flags data Dec 26 09:41:05 wawel kernel: [126888.549973] BTRFS info (device sda2): found 514 extents Dec 26 09:41:36 wawel kernel: [126919.490217] BTRFS info (device sda2): found 514 extents Dec 26 09:42:03 wawel kernel: [126946.123272] BTRFS info (device sda2): relocating block group 2778907803648 flags data Dec 26 09:42:36 wawel kernel: [126979.670180] BTRFS info (device sda2): found 519 extents Dec 26 09:42:56 wawel kernel: [126999.459002] BTRFS info (device sda2): found 519 extents Dec 26 09:43:05 wawel kernel: [127008.768350] BTRFS info (device sda2): relocating block group 2775686578176 flags data Dec 26 09:43:28 wawel kernel: [127031.510719] BTRFS info (device sda2): found 653 extents Dec 26 09:43:44 wawel kernel: [127047.653185] BTRFS info (device sda2): found 653 extents Dec 26 09:43:51 wawel kernel: [127054.633127] BTRFS info (device sda2): relocating block group 2772465352704 flags data Dec 26 09:44:16 wawel kernel: [127079.730172] BTRFS info (device sda2): found 609 extents Dec 26 09:44:51 wawel kernel: [127114.373682] BTRFS info (device sda2): found 609 extents Dec 26 09:45:09 wawel kernel: [127132.516306] BTRFS info (device sda2): relocating block group 2770317869056 flags data Dec 26 09:46:06 wawel kernel: [127189.531463] BTRFS info (device sda2): found 2162 extents Dec 26 09:47:13 wawel kernel: [127256.870794] BTRFS info (device sda2): found 2162 extents Dec 26 09:47:23 wawel kernel: [127266.335336] BTRFS info (device sda2): relocating block group 2768170385408 flags metadata|raid1 Dec 26 10:02:40 wawel kernel: [128183.204056] BTRFS info (device sda2): found 64344 extents Dec 26 10:02:51 wawel kernel: [128194.752371] BTRFS info (device sda2): relocating block group 2766022901760 flags data Dec 26 10:05:51 wawel kernel: [128374.372685] BTRFS info (device sda2): found 14657 extents Dec 26 10:11:59 wawel kernel: [128742.106841] BTRFS info (device sda2): found 14657 extents Dec 26 10:12:04 wawel kernel: [128747.164928] BTRFS info (device sda2): relocating block group 2762801676288 flags data Dec 26 10:12:46 wawel kernel: [128789.843093] BTRFS info (device sda2): found 687 extents Dec 26 10:13:37 wawel kernel: [128840.608651] BTRFS info (device sda2): found 687 extents Dec 26 10:13:57 wawel kernel: [128860.515578] BTRFS info (device sda2): found 4 extents Dec 26 10:14:04 wawel kernel: [128867.761205] BTRFS info (device sda2): relocating block group 2759580450816 flags data Dec 26 10:14:48 wawel kernel: [128911.046501] BTRFS info (device sda2): found 2078 extents Dec 26 10:23:54 wawel kernel: [129457.365421] BTRFS info (device sda2): found 2078 extents Dec 26 10:24:17 wawel kernel: [129480.141652] BTRFS info (device sda2): relocating block group 2756359225344 flags data Dec 26 10:25:06 wawel kernel: [129529.130468] BTRFS info (device sda2): found 1948 extents Dec 26 10:37:54 wawel kernel: [130297.130706] BTRFS info (device sda2): found 1948 extents Dec 26 10:38:07 wawel kernel: [130310.753097] BTRFS info (device sda2): relocating block group 2753137999872 flags data Dec 26 10:39:52 wawel kernel: [130415.674175] BTRFS info (device sda2): found 658 extents Dec 26 10:43:24 wawel kernel: [130627.255410] BTRFS info (device sda2): found 658 extents Dec 26 10:43:42 wawel kernel: [130645.434399] BTRFS info (device sda2): found 42 extents Dec 26 10:43:53 wawel kernel: [130656.815802] BTRFS info (device sda2): found 42 extents Dec 26 10:44:05 wawel kernel: [130668.284526] BTRFS info (device sda2): relocating block group 2749916774400 flags data Dec 26 10:45:37 wawel kernel: [130760.519557] BTRFS info (device sda2): found 7795 extents Dec 26 11:24:02 wawel kernel: [133065.590359] BTRFS info (device sda2): found 7795 extents Dec 26 11:24:49 wawel kernel: [133112.551574] BTRFS info (device sda2): relocating block group 2746695548928 flags data Dec 26 11:25:21 wawel kernel: [133144.523206] BTRFS info (device sda2): found 67 extents Dec 26 11:26:23 wawel kernel: [133206.718874] BTRFS info (device sda2): found 67 extents Dec 26 11:26:34 wawel kernel: [133217.364169] BTRFS info (device sda2): relocating block group 2743474323456 flags data Dec 26 11:27:09 wawel kernel: [133252.030493] BTRFS info (device sda2): found 154 extents Dec 26 11:28:37 wawel kernel: [133340.658370] BTRFS info (device sda2): found 154 extents Dec 26 11:28:48 wawel kernel: [133350.928232] BTRFS info (device sda2): relocating block group 2740253097984 flags data Dec 26 11:29:31 wawel kernel: [133394.349923] BTRFS info (device sda2): found 3377 extents Dec 26 11:50:07 wawel kernel: [134630.385264] BTRFS info (device sda2): found 3377 extents Dec 26 11:50:40 wawel kernel: [134663.264144] BTRFS info (device sda2): relocating block group 2737031872512 flags data Dec 26 11:51:31 wawel kernel: [134714.239962] BTRFS info (device sda2): found 244 extents Dec 26 11:54:06 wawel kernel: [134869.320437] BTRFS info (device sda2): found 244 extents Dec 26 11:54:28 wawel kernel: [134891.019946] BTRFS info (device sda2): relocating block group 2733810647040 flags data Dec 26 11:54:57 wawel kernel: [134920.370189] BTRFS info (device sda2): found 560 extents Dec 26 11:57:30 wawel kernel: [135073.351351] BTRFS info (device sda2): found 560 extents Dec 26 11:57:53 wawel kernel: [135096.801125] BTRFS info (device sda2): relocating block group 2730589421568 flags data Dec 26 11:58:26 wawel kernel: [135129.186591] BTRFS info (device sda2): found 243 extents Dec 26 12:00:28 wawel kernel: [135251.901624] BTRFS info (device sda2): found 243 extents Dec 26 12:01:18 wawel kernel: [135301.341113] BTRFS info (device sda2): relocating block group 2727368196096 flags data Dec 26 12:02:58 wawel kernel: [135401.616231] BTRFS info (device sda2): found 7966 extents Dec 26 12:35:08 wawel kernel: [137331.224676] BTRFS info (device sda2): found 7966 extents Dec 26 12:35:25 wawel kernel: [137348.767051] BTRFS info (device sda2): relocating block group 2724146970624 flags data Dec 26 12:36:18 wawel kernel: [137401.253589] BTRFS info (device sda2): found 6267 extents Dec 26 13:00:55 wawel kernel: [138878.357733] BTRFS info (device sda2): found 6267 extents Dec 26 13:01:11 wawel kernel: [138893.979362] BTRFS info (device sda2): relocating block group 2720925745152 flags data Dec 26 13:02:56 wawel kernel: [138999.875074] BTRFS info (device sda2): found 9154 extents Dec 26 13:29:38 wawel kernel: [140601.538362] BTRFS info (device sda2): found 9154 extents Dec 26 13:30:08 wawel kernel: [140631.396198] BTRFS info (device sda2): relocating block group 2717704519680 flags data Dec 26 13:31:04 wawel kernel: [140687.543046] BTRFS info (device sda2): found 7404 extents Dec 26 13:54:06 wawel kernel: [142069.542451] BTRFS info (device sda2): found 7404 extents Dec 26 13:54:36 wawel kernel: [142099.763738] BTRFS info (device sda2): relocating block group 2714483294208 flags data Dec 26 13:56:01 wawel kernel: [142184.270785] BTRFS info (device sda2): found 5820 extents Dec 26 14:23:00 wawel kernel: [143803.389716] BTRFS info (device sda2): found 5820 extents Dec 26 14:23:05 wawel kernel: [143808.503120] BTRFS info (device sda2): relocating block group 2711262068736 flags data Dec 26 14:25:19 wawel kernel: [143942.663324] BTRFS info (device sda2): found 8754 extents ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2019-12-26 14:05 ` Leszek Dubiel @ 2019-12-26 14:21 ` Remi Gauvin 2019-12-26 15:42 ` Leszek Dubiel 2019-12-26 22:40 ` Chris Murphy 1 sibling, 1 reply; 32+ messages in thread From: Remi Gauvin @ 2019-12-26 14:21 UTC (permalink / raw) To: Leszek Dubiel, linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 2086 bytes --] On 2019-12-26 9:05 a.m., Leszek Dubiel wrote: > > I didn't use Raid6, but Raid0. > > > Yes I have chcecked dmesg in first place (see original mail). > > Btrfs no error logged. > I'm sorry, that was reading comprehension failure on my part. But regardless, your dmesg clearly shows drive read failures, (to be expected with a drive clearly in a failing state.) Recovery of Raid0 with a faulty drive is a job for restore from backup, (or attempt to scrape what you can with array mounted ro).. That's what Raid0 *means*. Dec 22 19:20:11 wawel kernel: [ 5912.116874] ata1.00: exception Emask 0x0 SAct 0x1f80 SErr 0x0 action 0x0 Dec 22 19:20:11 wawel kernel: [ 5912.116878] ata1.00: irq_stat 0x40000008 Dec 22 19:20:11 wawel kernel: [ 5912.116880] ata1.00: failed command: READ FPDMA QUEUED Dec 22 19:20:11 wawel kernel: [ 5912.116882] ata1.00: cmd 60/00:38:00:00:98/0a:00:45:01:00/40 tag 7 ncq dma 1310720 in Dec 22 19:20:11 wawel kernel: [ 5912.116882] res 43/40:18:e8:05:98/00:04:45:01:00/40 Emask 0x409 (media error) <F> Dec 22 19:20:11 wawel kernel: [ 5912.116885] ata1.00: status: { DRDY SENSE ERR } Dec 22 19:20:11 wawel kernel: [ 5912.116886] ata1.00: error: { UNC } Dec 22 19:20:11 wawel kernel: [ 5912.153695] ata1.00: configured for UDMA/133 Dec 22 19:20:11 wawel kernel: [ 5912.153707] sd 0:0:0:0: [sda] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Dec 22 19:20:11 wawel kernel: [ 5912.153709] sd 0:0:0:0: [sda] tag#7 Sense Key : Medium Error [current] Dec 22 19:20:11 wawel kernel: [ 5912.153710] sd 0:0:0:0: [sda] tag#7 Add. Sense: Unrecovered read error - auto reallocate failed Dec 22 19:20:11 wawel kernel: [ 5912.153711] sd 0:0:0:0: [sda] tag#7 CDB: Read(16) 88 00 00 00 00 01 45 98 00 00 00 00 0a 00 00 00 Dec 22 19:20:11 wawel kernel: [ 5912.153712] print_req_error: I/O error, dev sda, sector 5462556672 Dec 22 19:20:11 wawel kernel: [ 5912.153724] ata1: EH complete I'm assuming that /dev/sda is the the drive you already identified as failing,, if not, you're totally hosed. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2019-12-26 14:21 ` Remi Gauvin @ 2019-12-26 15:42 ` Leszek Dubiel 0 siblings, 0 replies; 32+ messages in thread From: Leszek Dubiel @ 2019-12-26 15:42 UTC (permalink / raw) To: Remi Gauvin, linux-btrfs > Recovery of Raid0 with a faulty drive is a job for restore from backup, > (or attempt to scrape what you can with array mounted ro).. That's what > Raid0 *means*. > I'm assuming that /dev/sda is the the drive you already identified as > failing,, if not, you're totally hosed. Yes -- /dev/sda is getting errors, it's failing. I didin't pass smart self-tests. Thank you for inspection of my logs. I will try to pe patient until btrfs removes /dev/sda2 from array. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2019-12-26 14:05 ` Leszek Dubiel 2019-12-26 14:21 ` Remi Gauvin @ 2019-12-26 22:40 ` Chris Murphy 2019-12-26 22:58 ` Leszek Dubiel 1 sibling, 1 reply; 32+ messages in thread From: Chris Murphy @ 2019-12-26 22:40 UTC (permalink / raw) To: Leszek Dubiel; +Cc: Remi Gauvin, Btrfs BTRFS On Thu, Dec 26, 2019 at 7:05 AM Leszek Dubiel <leszek@dubiel.pl> wrote: > > Dec 22 19:20:11 wawel kernel: [ 5912.116874] ata1.00: exception Emask > 0x0 SAct 0x1f80 SErr 0x0 action 0x0 > Dec 22 19:20:11 wawel kernel: [ 5912.116878] ata1.00: irq_stat 0x40000008 > Dec 22 19:20:11 wawel kernel: [ 5912.116880] ata1.00: failed command: > READ FPDMA QUEUED > Dec 22 19:20:11 wawel kernel: [ 5912.116882] ata1.00: cmd > 60/00:38:00:00:98/0a:00:45:01:00/40 tag 7 ncq dma 1310720 in > Dec 22 19:20:11 wawel kernel: [ 5912.116882] res > 43/40:18:e8:05:98/00:04:45:01:00/40 Emask 0x409 (media error) <F> > Dec 22 19:20:11 wawel kernel: [ 5912.116885] ata1.00: status: { DRDY > SENSE ERR } > Dec 22 19:20:11 wawel kernel: [ 5912.116886] ata1.00: error: { UNC } > Dec 22 19:20:11 wawel kernel: [ 5912.153695] ata1.00: configured for > UDMA/133 > Dec 22 19:20:11 wawel kernel: [ 5912.153707] sd 0:0:0:0: [sda] tag#7 > FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE > Dec 22 19:20:11 wawel kernel: [ 5912.153709] sd 0:0:0:0: [sda] tag#7 > Sense Key : Medium Error [current] > Dec 22 19:20:11 wawel kernel: [ 5912.153710] sd 0:0:0:0: [sda] tag#7 > Add. Sense: Unrecovered read error - auto reallocate failed > Dec 22 19:20:11 wawel kernel: [ 5912.153711] sd 0:0:0:0: [sda] tag#7 > CDB: Read(16) 88 00 00 00 00 01 45 98 00 00 00 00 0a 00 00 00 > Dec 22 19:20:11 wawel kernel: [ 5912.153712] print_req_error: I/O error, > dev sda, sector 5462556672 > Dec 22 19:20:11 wawel kernel: [ 5912.153724] ata1: EH complete > Dec 22 19:21:28 wawel kernel: [ 5989.527853] BTRFS info (device sda2): > found 8 extents Weird. This is not expected. I see a discrete read error with LBA reported for the device, and yet Btfs shows no attempt to correct it (using raid1 metadata) nor does it report the path to file that's affected by this lost sector. I'm expecting to see one of those two outcomes, given the profiles being used. > > Dec 23 00:08:20 wawel kernel: [23201.188424] INFO: task btrfs:2546 > blocked for more than 120 seconds. Multiples of these, but no coinciding read errors or SATA link resets. This suggests bad sectors in deep recover. And that would explain why the copies are so slow. It's not a Btrfs problem per se. It's that you've decided to have only one copy of data, self healing of data isn't possible. The file system itself is fine, but slow because the affected drive is slow to recover these bad sectors. Again, dropping the SCT ERC for the drives would result in a faster error recovery when encountering bad sectors. It also increases the chance of data loss (not metadata loss since it's raid1). -- Chris Murphy ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2019-12-26 22:40 ` Chris Murphy @ 2019-12-26 22:58 ` Leszek Dubiel 2019-12-28 17:04 ` Leszek Dubiel 0 siblings, 1 reply; 32+ messages in thread From: Leszek Dubiel @ 2019-12-26 22:58 UTC (permalink / raw) To: Chris Murphy; +Cc: Remi Gauvin, Btrfs BTRFS > Weird. This is not expected. I see a discrete read error with LBA > reported for the device, and yet Btfs shows no attempt to correct it > (using raid1 metadata) nor does it report the path to file that's > affected by this lost sector. I'm expecting to see one of those two > outcomes, given the profiles being used. Yes. This is strange that I see no errors from btrfs. Maybe that ata erorrs were during smart self tests of /dev/sda... I put some logs below. > Dec 23 00:08:20 wawel kernel: [23201.188424] INFO: task btrfs:2546 > blocked for more than 120 seconds. > > Multiples of these, but no coinciding read errors or SATA link resets. > This suggests bad sectors in deep recover. And that would explain why > the copies are so slow. It's not a Btrfs problem per se. It's that > you've decided to have only one copy of data, self healing of data > isn't possible. The file system itself is fine, but slow because the > affected drive is slow to recover these bad sectors. > > Again, dropping the SCT ERC for the drives would result in a faster > error recovery when encountering bad sectors. It also increases the > chance of data loss (not metadata loss since it's raid1) Deep recovery theory could be correct, because two days ago there were 16 "Current_Pending_Sector" and 2 "uncorrectable sectors" and smart self tests failed. Today there are zoro pending sectors and self tests passed okey. root@wawel:~# smartctl -a /dev/sda smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-6-amd64] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: WDC WD6003FZBX-00K5WB0 Serial Number: V8GJ99HR LU WWN Device Id: 5 000cca 098c768df Firmware Version: 01.01A01 User Capacity: 6 001 175 126 016 bytes [6,00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Form Factor: 3.5 inches Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4 SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Thu Dec 26 23:56:39 2019 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 87) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 688) minutes. SCT capabilities: (0x0035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0 2 Throughput_Performance 0x0004 130 130 054 Old_age Offline - 100 3 Spin_Up_Time 0x0007 100 100 024 Pre-fail Always - 0 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 7 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000a 100 100 067 Old_age Always - 0 8 Seek_Time_Performance 0x0004 128 128 020 Old_age Offline - 18 9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 3676 10 Spin_Retry_Count 0x0012 100 100 060 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 7 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 154 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 154 194 Temperature_Celsius 0x0002 183 183 000 Old_age Always - 30 (Min/Max 21/38) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 2 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 1 CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 1 occurred at disk power-on lifetime: 3576 hours (149 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 43 00 00 00 00 00 Error: UNC at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 38 00 00 98 40 08 1d+04:19:05.582 READ FPDMA QUEUED 60 00 40 00 0a 98 40 08 1d+04:19:03.035 READ FPDMA QUEUED 60 30 30 d0 ff 97 40 08 1d+04:19:03.035 READ FPDMA QUEUED 60 00 28 d0 f5 97 40 08 1d+04:19:03.035 READ FPDMA QUEUED 60 d0 20 00 f0 97 40 08 1d+04:19:03.035 READ FPDMA QUEUED SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 3675 - # 2 Extended offline Completed without error 00% 3668 - # 3 Short offline Completed without error 00% 3651 - # 4 Extended offline Completed without error 00% 3644 - # 5 Short offline Completed without error 00% 3627 - # 6 Extended offline Completed without error 00% 3622 - # 7 Extended offline Completed without error 00% 3604 - # 8 Short offline Completed without error 00% 3579 - # 9 Extended offline Completed: read failure 90% 3575 - #10 Short offline Completed without error 00% 3574 - #11 Extended offline Completed: read failure 90% 3574 - #12 Extended offline Completed: read failure 90% 3560 - #13 Extended offline Completed: read failure 50% 3559 - 4 of 4 failed self-tests are outdated by newer successful extended offline self-test # 2 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2019-12-26 22:58 ` Leszek Dubiel @ 2019-12-28 17:04 ` Leszek Dubiel 2019-12-28 20:23 ` Zygo Blaxell 0 siblings, 1 reply; 32+ messages in thread From: Leszek Dubiel @ 2019-12-28 17:04 UTC (permalink / raw) To: Btrfs BTRFS; +Cc: Chris Murphy, Remi Gauvin PROBLEM SOLVED --- btrfs was busy cleaing after snaphot deletion few days ago, so it dodn't have time to "dev delete", that's why it was slow ======================= I restarted server, so job "btrfs delete" was not existent any more. But disks were still working (!!!). I wondered why? What is BTRFS doing all the time? I realized that afew days before starting "btrfs dev delete" I have removed many snapshots -- there were about 400 snapshots and I left 20 only. I did that because I have read that many snapshot could slowdown btrfs operations severely. I made an experiment on another testing serwer: 1. started command "watch -n1 'btrfs fi df /" 2. started "iostat -x -m" Disks were not working at all. 3. Then I removed many shapshots on that testing server and I was watching: Data, single: total=6.56TiB, used=5.13TiB System, RAID1: total=32.00MiB, used=992.00KiB Metadata, RAID1: total=92.00GiB, used=70.56GiB GlobalReserve, single: total=512.00MiB, used=1.39MiB Disks started to work hard. So btrfs was probably cleaining after snapshot deletion. At the beginning in "Metadata" line there was "used=70.00GiB". Metadata, RAID1: total=92.00GiB, used=70.00GiB It was changing all the time... getting lower and lower. During that process in line GlobalReserve, single: total=512.00MiB, used=1.39MiB "used=" was going up until it reached about 100MiB, then it was flushed to zero, and started again to fill, flush, fill, flush... some loop/buffer/cache (?). It convinced me, that after snapshot deletion btrfs is really working hard on cleanup. After some time "Metadata...used=" stopped changing, disks stopped working, server got lazy again. I got back to main server. Started to watch "Metadata...used=". It was going down and down... I waited. When "Metadata...used=" stopped changing, then "iostat -m" stopped showing any disk activity. I started "btrfs dev delete" again and now speed is 50Mb/sek. Hurrray! Problem solved. Sorry for not beeing clever enough to connect all this facts at the beginning. Anyway -- maybe in the future someone has the same problem, then btrfs experts could ask him if he let btrfs do some other hard work in the same time (eg cleaning up after massive snapsot deletion). Maybe it would be usful to have a tool to ask btrfs "what are you doing? are you busy?". It would respond "I am cleaing up after snapshot deletion... I am balancing... I am scrubbing... I am resizing... I am deleting ...". ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2019-12-28 17:04 ` Leszek Dubiel @ 2019-12-28 20:23 ` Zygo Blaxell 2020-01-02 18:37 ` Leszek Dubiel 0 siblings, 1 reply; 32+ messages in thread From: Zygo Blaxell @ 2019-12-28 20:23 UTC (permalink / raw) To: Leszek Dubiel; +Cc: Btrfs BTRFS, Chris Murphy, Remi Gauvin [-- Attachment #1: Type: text/plain, Size: 3448 bytes --] On Sat, Dec 28, 2019 at 06:04:21PM +0100, Leszek Dubiel wrote: > > PROBLEM SOLVED --- btrfs was busy cleaing after snaphot deletion few days > ago, so it dodn't have time to "dev delete", that's why it was slow That checks out. Snapshot delete and remove-device/resize/balance are not able to run at the same time. There is a mutex, so one or the other will run while the other is blocked. > ======================= > > > I restarted server, so job "btrfs delete" was not existent any more. > But disks were still working (!!!). I wondered why? What is BTRFS doing all > the time? > > I realized that afew days before starting "btrfs dev delete" I have removed > many snapshots -- there were about 400 snapshots and I left 20 only. I did > that because I have read that many snapshot could slowdown btrfs operations > severely. > > > > I made an experiment on another testing serwer: > > 1. started command "watch -n1 'btrfs fi df /" > 2. started "iostat -x -m" > > Disks were not working at all. > > > 3. Then I removed many shapshots on that testing server > > and I was watching: > > Data, single: total=6.56TiB, used=5.13TiB > System, RAID1: total=32.00MiB, used=992.00KiB > Metadata, RAID1: total=92.00GiB, used=70.56GiB > GlobalReserve, single: total=512.00MiB, used=1.39MiB > > Disks started to work hard. So btrfs was probably cleaining after snapshot > deletion. > > At the beginning in "Metadata" line there was "used=70.00GiB". > > Metadata, RAID1: total=92.00GiB, used=70.00GiB > > It was changing all the time... getting lower and lower. During that process > in line > > GlobalReserve, single: total=512.00MiB, used=1.39MiB > > "used=" was going up until it reached about 100MiB, then it was flushed to > zero, and started again to fill, flush, fill, flush... some > loop/buffer/cache (?). > It convinced me, that after snapshot deletion btrfs is really working hard > on cleanup. > After some time "Metadata...used=" stopped changing, disks stopped working, > server got lazy again. > > > > I got back to main server. Started to watch "Metadata...used=". It was going > down and down... > I waited. When "Metadata...used=" stopped changing, then "iostat -m" stopped > showing any disk activity. > > I started "btrfs dev delete" again and now speed is 50Mb/sek. Hurrray! > Problem solved. > > > Sorry for not beeing clever enough to connect all this facts at the > beginning. > Anyway -- maybe in the future someone has the same problem, then btrfs > experts could ask him if he let btrfs do some other hard work in the same > time (eg cleaning up after massive snapsot deletion). > > Maybe it would be usful to have a tool to ask btrfs "what are you doing? are > you busy?". > It would respond "I am cleaing up after snapshot deletion... I am > balancing... I am scrubbing... I am resizing... I am deleting ...". Usually 'top' or 'iotop' suffices for that. btrfs-cleaner = deleting snapshots, other activities will be tied to their respective userspace agents. The balance/delete/resize/drop-snapshot mutex is the only special case that I know of where one btrfs maintenance thread waits for another. It might be handy to give users a clue on snapshot delete, like add "use btrfs sub list -d to monitor deletion progress, or btrfs sub sync to wait for deletion to finish". > > > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2019-12-28 20:23 ` Zygo Blaxell @ 2020-01-02 18:37 ` Leszek Dubiel 2020-01-02 21:57 ` Chris Murphy 0 siblings, 1 reply; 32+ messages in thread From: Leszek Dubiel @ 2020-01-02 18:37 UTC (permalink / raw) To: Btrfs BTRFS > It might be handy to give users a clue on snapshot delete, like add > "use btrfs sub list -d to monitor deletion progress, or btrfs sub sync > to wait for deletion to finish". After having cleaned old shapshots, and after "dev delete" has completed I have added new fresh empty disk btrfs dev add /dev/sda3 / and started to balance: btrfs balance start -dconvert=raid1 -mconvert=raid1 / It was slow (3-5 MB/sec), so canceled balance. Iostat showed no activity. Started balance again: btrfs balance start -dconvert=raid1,soft -mconvert=raid1,soft / it is slow again. Server is not working hard, disks are healthy this time. I will patiently wait until it completes, but I wanted to inform about user experience. I have no clue why it is not going on full speed of disks. root@wawel:~# iostat -x 60 -m Linux 4.19.0-6-amd64 (wawel) 02.01.2020 _x86_64_ (8 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0,25 0,00 0,79 6,30 0,00 92,66 Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util sda 4,39 142,05 0,90 3,78 3,64 12,68 45,32 8,20 9,81 5,48 0,78 209,68 27,26 0,54 7,95 sdb 4,66 155,25 0,97 4,03 4,52 13,11 49,27 7,78 9,25 4,68 0,73 213,20 26,59 0,49 7,89 sdc 6,35 246,61 0,38 6,94 4,35 25,11 40,67 9,24 27,09 48,00 11,92 61,02 28,82 2,65 67,02 avg-cpu: %user %nice %system %iowait %steal %idle 0,09 0,00 1,68 11,70 0,00 86,53 Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util sda 1,27 205,48 0,02 7,44 0,00 2,47 0,00 1,19 18,25 1,67 0,37 16,00 37,07 0,37 7,55 sdb 1,20 223,10 0,02 7,33 0,00 5,32 0,00 2,33 16,26 1,68 0,38 16,00 33,64 0,34 7,72 sdc 19,50 428,38 0,30 14,76 0,00 7,78 0,00 1,78 59,97 6,80 4,07 16,00 35,28 2,05 91,85 avg-cpu: %user %nice %system %iowait %steal %idle 0,09 0,00 0,63 8,79 0,00 90,49 Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util sda 0,75 208,78 0,01 3,74 0,00 4,47 0,00 2,09 11,96 3,78 0,75 16,00 18,35 0,23 4,75 sdb 0,45 109,97 0,01 1,92 0,00 2,77 0,00 2,45 21,22 3,44 0,36 16,00 17,92 0,27 2,94 sdc 9,22 318,28 0,14 5,66 0,00 7,22 0,00 2,22 13,79 58,62 18,71 16,00 18,20 2,92 95,57 avg-cpu: %user %nice %system %iowait %steal %idle 0,09 0,00 0,68 7,07 0,00 92,16 Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util sda 0,00 305,72 0,00 5,69 0,00 9,73 0,00 3,09 0,00 7,20 2,12 0,00 19,06 0,47 14,23 sdb 0,00 189,15 0,00 3,39 0,00 5,90 0,00 3,02 0,00 7,59 1,39 0,00 18,37 0,52 9,84 sdc 0,00 288,97 0,00 5,44 0,00 8,63 0,00 2,90 0,00 62,44 17,91 0,00 19,29 2,89 83,38 avg-cpu: %user %nice %system %iowait %steal %idle 0,12 0,00 0,34 6,73 0,00 92,81 Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util sda 0,00 189,57 0,00 3,53 0,00 5,65 0,00 2,89 0,00 8,12 1,48 0,00 19,09 0,56 10,66 sdb 0,00 284,67 0,00 5,03 0,00 9,10 0,00 3,10 0,00 6,01 1,62 0,00 18,11 0,38 10,83 sdc 0,00 308,95 0,00 5,65 0,00 9,30 0,00 2,92 0,00 60,87 18,73 0,00 18,74 2,86 88,24 avg-cpu: %user %nice %system %iowait %steal %idle 0,09 0,00 1,99 6,62 0,00 91,30 Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util sda 0,80 294,67 0,01 18,03 0,00 2,43 0,00 0,82 22,50 3,45 1,00 16,00 62,66 0,32 9,51 sdb 28,28 224,48 16,84 4,72 0,33 13,97 1,16 5,86 6,80 3,88 1,00 609,62 21,54 0,46 11,61 sdc 8,60 518,88 0,14 22,76 0,08 16,30 0,96 3,05 10,12 27,26 14,01 16,19 44,91 1,42 74,85 avg-cpu: %user %nice %system %iowait %steal %idle 0,22 0,00 1,91 9,46 0,00 88,42 Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util sda 1,88 130,13 0,03 2,70 0,00 9,73 0,00 6,96 15,23 0,58 0,10 16,00 21,23 0,45 5,98 sdb 1,45 147,78 0,02 3,18 0,00 12,62 0,00 7,87 11,53 0,66 0,11 16,00 22,02 0,39 5,84 sdc 25,08 275,75 0,39 5,88 0,00 24,55 0,00 8,18 47,78 6,50 2,97 16,00 21,83 3,18 95,79 avg-cpu: %user %nice %system %iowait %steal %idle 0,14 0,00 0,73 9,28 0,00 89,85 Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util sda 2,38 186,52 0,20 3,33 1,33 4,20 35,87 2,20 9,88 3,48 0,63 86,04 18,27 0,30 5,60 sdb 1,00 108,35 0,02 1,99 0,00 2,73 0,00 2,46 8,92 3,78 0,41 18,00 18,85 0,27 2,98 sdc 13,47 294,68 0,21 5,32 0,00 6,92 0,00 2,29 13,33 61,56 18,28 16,00 18,48 3,13 96,45 top: top - 19:29:52 up 2:00, 2 users, load average: 1,21, 1,53, 1,92 Tasks: 165 total, 1 running, 164 sleeping, 0 stopped, 0 zombie %Cpu(s): 0,0 us, 0,0 sy, 0,0 ni, 91,9 id, 8,1 wa, 0,0 hi, 0,0 si, 0,0 st MiB Mem : 32130,4 total, 15827,0 free, 409,4 used, 15893,9 buff/cache MiB Swap: 0,0 total, 0,0 free, 0,0 used. 31244,2 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 289 root 0 -20 0 0 0 I 6,7 0,0 0:00.73 kworker/4:1H-kblockd 1 root 20 0 22036 10140 7872 S 0,0 0,0 0:11.52 systemd 2 root 20 0 0 0 0 S 0,0 0,0 0:00.01 kthreadd 3 root 0 -20 0 0 0 I 0,0 0,0 0:00.00 rcu_gp 4 root 0 -20 0 0 0 I 0,0 0,0 0:00.00 rcu_par_gp 6 root 0 -20 0 0 0 I 0,0 0,0 0:00.00 kworker/0:0H-kblockd 8 root 0 -20 0 0 0 I 0,0 0,0 0:00.00 mm_percpu_wq 9 root 20 0 0 0 0 S 0,0 0,0 0:00.04 ksoftirqd/0 10 root 20 0 0 0 0 I 0,0 0,0 0:02.61 rcu_sched 11 root 20 0 0 0 0 I 0,0 0,0 0:00.00 rcu_bh 12 root rt 0 0 0 0 S 0,0 0,0 0:00.02 migration/0 14 root 20 0 0 0 0 S 0,0 0,0 0:00.00 cpuhp/0 15 root 20 0 0 0 0 S 0,0 0,0 0:00.00 cpuhp/1 16 root rt 0 0 0 0 S 0,0 0,0 0:00.02 migration/1 17 root 20 0 0 0 0 S 0,0 0,0 0:00.07 ksoftirqd/1 19 root 0 -20 0 0 0 I 0,0 0,0 0:00.00 kworker/1:0H-kblockd iotop: Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s Current DISK READ: 0.00 B/s | Current DISK WRITE: 0.00 B/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init 2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd] 3 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_gp] 4 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_par_gp] 6 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/0:0H-kblockd] 8 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [mm_percpu_wq] 9 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/0] 10 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_sched] 11 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_bh] 12 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/0] 14 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [cpuhp/0] 15 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [cpuhp/1] 16 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/1] 17 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/1] 19 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/1:0H-kblockd] 20 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [cpuhp/2] 21 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/2] 22 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/2] 24 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/2:0H-kblockd] 25 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [cpuhp/3] 26 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/3] 27 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/3] 29 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/3:0H-kblockd] 30 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [cpuhp/4] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2020-01-02 18:37 ` Leszek Dubiel @ 2020-01-02 21:57 ` Chris Murphy 2020-01-02 22:39 ` Leszek Dubiel 2020-01-06 11:14 ` Leszek Dubiel 0 siblings, 2 replies; 32+ messages in thread From: Chris Murphy @ 2020-01-02 21:57 UTC (permalink / raw) To: Leszek Dubiel; +Cc: Btrfs BTRFS, Zygo Blaxell On Thu, Jan 2, 2020 at 11:47 AM Leszek Dubiel <leszek@dubiel.pl> wrote: > > > > It might be handy to give users a clue on snapshot delete, like add > > "use btrfs sub list -d to monitor deletion progress, or btrfs sub sync > > to wait for deletion to finish". > > After having cleaned old shapshots, and after "dev delete" has > completed I have added new fresh empty disk > > > btrfs dev add /dev/sda3 / > > > and started to balance: > > > btrfs balance start -dconvert=raid1 -mconvert=raid1 / > > > It was slow (3-5 MB/sec), so canceled balance. I'm not sure why it's slow and I don't think it should be this slow, but I would say that in retrospect it would have been better to NOT delete the device with a few bad sectors, and instead use `btrfs replace` to do a 1:1 replacement of that particular drive. When you delete a device, it has to rewrite everything onto the two remaining device. And now that you added another device, it has to rewrite everything onto three devices. Two full balances. Whereas 'device replace' is optimized to copy block groups as-is from source to destination drives. Only on a read-error will Btrfs use mirror copies from other devices. > Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s > %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util > sda 4,39 142,05 0,90 3,78 3,64 12,68 > 45,32 8,20 9,81 5,48 0,78 209,68 27,26 0,54 7,95 > sdb 4,66 155,25 0,97 4,03 4,52 13,11 > 49,27 7,78 9,25 4,68 0,73 213,20 26,59 0,49 7,89 > sdc 6,35 246,61 0,38 6,94 4,35 25,11 > 40,67 9,24 27,09 48,00 11,92 61,02 28,82 2,65 67,02 Almost no reads, all writes, but slow. And rather high write request per second, almost double for sdc. And sdc is near it's max utilization so it might be ear to its iops limit? ~210 rareq-sz = 210KiB is the average size of the read request for sda and sdb Default mkfs and default mount options? Or other and if so what other? Many small files on this file system? Or possibly large files with a lot of fragmentation? > Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s > %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util > sda 2,38 186,52 0,20 3,33 1,33 4,20 35,87 > 2,20 9,88 3,48 0,63 86,04 18,27 0,30 5,60 > sdb 1,00 108,35 0,02 1,99 0,00 2,73 0,00 > 2,46 8,92 3,78 0,41 18,00 18,85 0,27 2,98 > sdc 13,47 294,68 0,21 5,32 0,00 6,92 0,00 > 2,29 13,33 61,56 18,28 16,00 18,48 3,13 96,45 And again, sdc is at max utilization, with ~300 write requests per second which is at the high end for a fast drive for IOPS, if I'm not mistaken. That's a lot of writes per second. The average write request size is 18KiB. So what's going on with the workload? Is this only a balance operation or are there concurrent writes happening from some process? > iotop: > > Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s > Current DISK READ: 0.00 B/s | Current DISK WRITE: 0.00 B/s > TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND > 1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init > 2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd] > 3 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_gp] > 4 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_par_gp] > 6 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % > [kworker/0:0H-kblockd] > 8 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [mm_percpu_wq] > 9 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/0] > 10 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_sched] > 11 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_bh] > 12 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/0] > 14 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [cpuhp/0] > 15 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [cpuhp/1] > 16 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/1] > 17 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/1] > 19 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % > [kworker/1:0H-kblockd] > 20 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [cpuhp/2] > 21 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/2] > 22 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/2] > 24 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % > [kworker/2:0H-kblockd] > 25 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [cpuhp/3] > 26 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/3] > 27 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/3] > 29 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % > [kworker/3:0H-kblockd] > 30 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [cpuhp/4] iotop -d 5 -o might be more revealing; all zeros doesn't really make sense. I see balance and scrub reported in iotop. -- Chris Murphy ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2020-01-02 21:57 ` Chris Murphy @ 2020-01-02 22:39 ` Leszek Dubiel 2020-01-02 23:22 ` Chris Murphy 2020-01-06 11:14 ` Leszek Dubiel 1 sibling, 1 reply; 32+ messages in thread From: Leszek Dubiel @ 2020-01-02 22:39 UTC (permalink / raw) To: Chris Murphy; +Cc: Btrfs BTRFS, Zygo Blaxell > I'm not sure why it's slow and I don't think it should be this slow, > but I would say that in retrospect it would have been better to NOT > delete the device with a few bad sectors, and instead use `btrfs > replace` to do a 1:1 replacement of that particular drive. Thank you. I will remember that. > When you delete a device, it has to rewrite everything onto the two > remaining device. And now that you added another device, it has to > rewrite everything onto three devices. Two full balances. Whereas > 'device replace' is optimized to copy block groups as-is from source > to destination drives. Only on a read-error will Btrfs use mirror > copies from other devices. Thank you. > Almost no reads, all writes, but slow. And rather high write request > per second, almost double for sdc. And sdc is near it's max > utilization so it might be ear to its iops limit? > > ~210 rareq-sz = 210KiB is the average size of the read request for sda and sdb > > Default mkfs and default mount options? Or other and if so what other? > > Many small files on this file system? Or possibly large files with a > lot of fragmentation? Default mkfs and default mount options. This system could have a few million (!) of small files. On reiserfs it takes about 40 minutes, to do "find /". Rsync runs for 6 hours to backup data. >> Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s >> %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util >> sda 2,38 186,52 0,20 3,33 1,33 4,20 35,87 >> 2,20 9,88 3,48 0,63 86,04 18,27 0,30 5,60 >> sdb 1,00 108,35 0,02 1,99 0,00 2,73 0,00 >> 2,46 8,92 3,78 0,41 18,00 18,85 0,27 2,98 >> sdc 13,47 294,68 0,21 5,32 0,00 6,92 0,00 >> 2,29 13,33 61,56 18,28 16,00 18,48 3,13 96,45 > > And again, sdc is at max utilization, with ~300 write requests per > second which is at the high end for a fast drive for IOPS, if I'm not > mistaken. That's a lot of writes per second. The average write request > size is 18KiB. > > So what's going on with the workload? Is this only a balance operation > or are there concurrent writes happening from some process? root@wawel:~# btrfs bala stat / Balance on '/' is running 62 out of about 1231 chunks balanced (4224 considered), 95% left root@wawel:~# btrfs scrub stat / scrub status for 44803366-3981-4ebb-853b-6c991380c8a6 scrub started at Sat Dec 28 19:32:06 2019 and was aborted after 03:07:57 total bytes scrubbed: 3.33TiB with 0 errors root@wawel:~# btrfs sub list -d / root@wawel:~# btrfs sub list / | wc -l 27 > iotop -d 5 -o might be more revealing; all zeros doesn't really make > sense. I see balance and scrub reported in iotop. # iotop -d5 Total DISK READ: 0.00 B/s | Total DISK WRITE: 32.98 M/s Current DISK READ: 0.00 B/s | Current DISK WRITE: 26.19 M/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 4596 be/4 root 0.00 B/s 31.78 M/s 0.00 % 86.16 % btrfs balance start -dconvert raid1 soft -mconvert raid1 soft / 30893 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.16 % [kworker/0:2-events] 29683 be/4 root 0.00 B/s 659.23 K/s 0.00 % 0.00 % [kworker/u16:4-btrfs-extent-refs] 29684 be/4 root 0.00 B/s 560.50 K/s 0.00 % 0.00 % [kworker/u16:8-btrfs-extent-refs] 30115 be/4 root 0.00 B/s 6.37 K/s 0.00 % 0.00 % [kworker/u16:2-btrfs-extent-refs] 1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init 2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd] 3 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_gp] 4 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_par_gp] 6 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/0:0H-kblockd] # iotop -d30 Total DISK READ: 34.12 M/s | Total DISK WRITE: 40.36 M/s Current DISK READ: 34.12 M/s | Current DISK WRITE: 79.22 M/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 4596 be/4 root 34.12 M/s 37.79 M/s 0.00 % 91.77 % btrfs balance start -dconvert raid1 soft -mconvert raid1 soft / 29687 be/4 root 0.00 B/s 465.87 K/s 0.00 % 0.32 % [kworker/u16:10-edac-poller] 30893 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.13 % [kworker/0:2-events_freezable_power_] 30115 be/4 root 0.00 B/s 1373.65 K/s 0.00 % 0.08 % [kworker/u16:2-btrfs-endio-write] 26871 be/4 root 0.00 B/s 260.89 K/s 0.00 % 0.03 % [kworker/u16:7-btrfs-extent-refs] 23756 be/4 root 0.00 B/s 80.40 K/s 0.00 % 0.00 % [kworker/u16:1-btrfs-extent-refs] 29681 be/4 root 0.00 B/s 169.84 K/s 0.00 % 0.00 % [kworker/u16:0-btrfs-extent-refs] 29683 be/4 root 0.00 B/s 280.59 K/s 0.00 % 0.00 % [kworker/u16:4-btrfs-endio-write] 1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init 2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd] 3 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_gp] PS. I have set Thunderbird not to wrap long lines. Hope my emails are more readable now. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2020-01-02 22:39 ` Leszek Dubiel @ 2020-01-02 23:22 ` Chris Murphy 2020-01-03 9:08 ` Leszek Dubiel ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Chris Murphy @ 2020-01-02 23:22 UTC (permalink / raw) To: Leszek Dubiel; +Cc: Chris Murphy, Btrfs BTRFS, Zygo Blaxell On Thu, Jan 2, 2020 at 3:39 PM Leszek Dubiel <leszek@dubiel.pl> wrote: > > Almost no reads, all writes, but slow. And rather high write request > > per second, almost double for sdc. And sdc is near it's max > > utilization so it might be ear to its iops limit? > > > > ~210 rareq-sz = 210KiB is the average size of the read request for > sda and sdb > > > > Default mkfs and default mount options? Or other and if so what other? > > > > Many small files on this file system? Or possibly large files with a > > lot of fragmentation? > > Default mkfs and default mount options. > > This system could have a few million (!) of small files. > On reiserfs it takes about 40 minutes, to do "find /". > Rsync runs for 6 hours to backup data. There is a mount option: max_inline=<bytes> which the man page says (default: min(2048, page size) ) I've never used it, so in theory the max_inline byte size is 2KiB. However, I have seen substantially larger inline extents than 2KiB when using a nodesize larger than 16KiB at mkfs time. I've wondered whether it makes any difference for the "many small files" case to do more aggressive inlining of extents. I've seen with 16KiB leaf size, often small files that could be inlined, are instead put into a data block group, taking up a minimum 4KiB block size (on x64_64 anyway). I'm not sure why, but I suspect there just isn't enough room in that leaf to always use inline extents, and yet there is enough room to just reference it as a data block group extent. When using a larger node size, a larger percentage of small files ended up using inline extents. I'd expect this to be quite a bit more efficient, because it eliminates a time expensive (on HDD anyway) seek. Another optimization, using compress=zstd:1, which is the lowest compression setting. That'll increase the chance a file can use inline extents, in particular with a larger nodesize. And still another optimization, at the expense of much more complexity, is LVM cache with an SSD. You'd have to pick a suitable policy for the workload, but I expect that if the iostat utilizations you see of often near max utilization in normal operation, you'll see improved performance. SSD's can handle way higher iops than HDD. But a lot of this optimization stuff is use case specific. I'm not even sure what your mean small file size is. > # iotop -d30 > > Total DISK READ: 34.12 M/s | Total DISK WRITE: 40.36 M/s > Current DISK READ: 34.12 M/s | Current DISK WRITE: 79.22 M/s > TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND > 4596 be/4 root 34.12 M/s 37.79 M/s 0.00 % 91.77 % btrfs Not so bad for many small file reads and writes with HDD. I've see this myself with single spindle when doing small file reads and writes. -- Chris Murphy ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2020-01-02 23:22 ` Chris Murphy @ 2020-01-03 9:08 ` Leszek Dubiel 2020-01-03 19:15 ` Chris Murphy 2020-01-03 14:39 ` Leszek Dubiel 2020-01-04 5:38 ` Zygo Blaxell 2 siblings, 1 reply; 32+ messages in thread From: Leszek Dubiel @ 2020-01-03 9:08 UTC (permalink / raw) To: Chris Murphy; +Cc: Btrfs BTRFS, Zygo Blaxell W dniu 03.01.2020 o 00:22, Chris Murphy pisze: > On Thu, Jan 2, 2020 at 3:39 PM Leszek Dubiel <leszek@dubiel.pl> wrote: > > > This system could have a few million (!) of small files. > > On reiserfs it takes about 40 minutes, to do "find /". > > Rsync runs for 6 hours to backup data. > > > There is a mount option: max_inline=<bytes> which the man page says > (default: min(2048, page size) ) > > I've never used it, so in theory the max_inline byte size is 2KiB. > However, I have seen substantially larger inline extents than 2KiB > when using a nodesize larger than 16KiB at mkfs time. > > I've wondered whether it makes any difference for the "many small > files" case to do more aggressive inlining of extents. > > I've seen with 16KiB leaf size, often small files that could be > inlined, are instead put into a data block group, taking up a minimum > 4KiB block size (on x64_64 anyway). I'm not sure why, but I suspect > there just isn't enough room in that leaf to always use inline > extents, and yet there is enough room to just reference it as a data > block group extent. When using a larger node size, a larger percentage > of small files ended up using inline extents. I'd expect this to be > quite a bit more efficient, because it eliminates a time expensive (on > HDD anyway) seek. I will try that option when making new disks with BTRFS. Then I'll report about efficiency. > Another optimization, using compress=zstd:1, which is the lowest > compression setting. That'll increase the chance a file can use inline > extents, in particular with a larger nodesize. > > And still another optimization, at the expense of much more > complexity, is LVM cache with an SSD. You'd have to pick a suitable > policy for the workload, but I expect that if the iostat utilizations > you see of often near max utilization in normal operation, you'll see > improved performance. SSD's can handle way higher iops than HDD. But a > lot of this optimization stuff is use case specific. I'm not even sure > what your mean small file size is. There is 11 million files: root@gamma:/mnt/sdb1# find orion2 > listor2 root@gamma:/mnt/sdb1# ls -lt listor2 -rw-r--r-- 1 root root 988973729 sty 3 03:09 listor2 root@gamma:/mnt/sdb1# wc -l listor2 11329331 listor2 And df on reiserfs shows: root@orion:~# df -h -BM System plików 1M-bl used avail %uż. zamont. na /dev/md0 71522M 10353M 61169M 15% / /dev/md1 905967M 731199M 174768M 81% /root 10353 + 731199 = 741552 M, that is average file size is 741552 * 1000000 / 11000000 = 67413 bytes per file. This estimation is not good, because df counts in blocks... I will count more precisely with df --apparent-size. >> # iotop -d30 >> >> Total DISK READ: 34.12 M/s | Total DISK WRITE: 40.36 M/s >> Current DISK READ: 34.12 M/s | Current DISK WRITE: 79.22 M/s >> TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND >> 4596 be/4 root 34.12 M/s 37.79 M/s 0.00 % 91.77 % btrfs > > Not so bad for many small file reads and writes with HDD. I've see > this myself with single spindle when doing small file reads and > writes. So small files slow down in my case. Ok! Thank you for the expertise. PS. This morning: root@wawel:~# btrfs bala stat / Balance on '/' is running 1227 out of about 1231 chunks balanced (5390 considered), 0% left So during the night it balanced 600Gb + 600Gb = 1.2Tb of data in single profile to raid1 in about 12 hours. That is: (600 + 600) * 1000 Mb/Gb / (12 hours * 3600 sec/hour) = (600 + 600) * 1000 / (12 × 3600) = 27 Mb/sec root@wawel:~# btrfs dev usag / /dev/sda2, ID: 2 Device size: 5.45TiB Device slack: 0.00B Data,RAID1: 2.62TiB Metadata,RAID1: 22.00GiB Unallocated: 2.81TiB /dev/sdb2, ID: 3 Device size: 5.45TiB Device slack: 0.00B Data,RAID1: 2.62TiB Metadata,RAID1: 21.00GiB System,RAID1: 32.00MiB Unallocated: 2.81TiB /dev/sdc3, ID: 4 Device size: 10.90TiB Device slack: 3.50KiB Data,RAID1: 5.24TiB Metadata,RAID1: 33.00GiB System,RAID1: 32.00MiB Unallocated: 5.62TiB root@wawel:~# iostat 10 -x Linux 4.19.0-6-amd64 (wawel) 03.01.2020 _x86_64_ (8 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0,00 0,00 0,00 0,00 0,00 100,00 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util sda 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 sdc 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 avg-cpu: %user %nice %system %iowait %steal %idle 0,04 0,00 0,08 0,00 0,00 99,89 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util sda 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 sdc 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2020-01-03 9:08 ` Leszek Dubiel @ 2020-01-03 19:15 ` Chris Murphy 0 siblings, 0 replies; 32+ messages in thread From: Chris Murphy @ 2020-01-03 19:15 UTC (permalink / raw) To: Leszek Dubiel; +Cc: Chris Murphy, Btrfs BTRFS, Zygo Blaxell On Fri, Jan 3, 2020 at 2:08 AM Leszek Dubiel <leszek@dubiel.pl> wrote: > > >> # iotop -d30 > >> > >> Total DISK READ: 34.12 M/s | Total DISK WRITE: 40.36 M/s > >> Current DISK READ: 34.12 M/s | Current DISK WRITE: 79.22 M/s > >> TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND > >> 4596 be/4 root 34.12 M/s 37.79 M/s 0.00 % 91.77 % btrfs > > > > Not so bad for many small file reads and writes with HDD. I've see > > this myself with single spindle when doing small file reads and > > writes. It's not small files directly. It's the number of write requests per second, resulting in high latency seeks. And the reason for the seeking needs a second opinion, to be certain it's related to small files. I'm not really sure why there are hundreds of write requests per second. Seems to me with thousands of small files, Btrfs can aggregate them into a single sequential write (mostly sequential anyway) and do the same for metadata writes; yes there is some back and forth seeking since metadata and data block groups are in different physical locations. But hundreds of times per second? Hmmm. I'm suspicious why. It must be trying to read and write hundreds of small files *in different locations* causing the seeks, and the ensuing latency. The typical work around for this these days is add more disks or add SSD. If you add a fourth disk, you reduce your one bottle neck: > root@wawel:~# btrfs dev usag / > /dev/sda2, ID: 2 > Device size: 5.45TiB > Device slack: 0.00B > Data,RAID1: 2.62TiB > Metadata,RAID1: 22.00GiB > Unallocated: 2.81TiB > > /dev/sdb2, ID: 3 > Device size: 5.45TiB > Device slack: 0.00B > Data,RAID1: 2.62TiB > Metadata,RAID1: 21.00GiB > System,RAID1: 32.00MiB > Unallocated: 2.81TiB > > /dev/sdc3, ID: 4 > Device size: 10.90TiB > Device slack: 3.50KiB > Data,RAID1: 5.24TiB > Metadata,RAID1: 33.00GiB > System,RAID1: 32.00MiB > Unallocated: 5.62TiB OK this is important. Two equal size drives, and the third is much larger. This means writes are going to be IO bound to that single large device because it's always going to be written to. The reads get spread out somewhat. Again, maybe the every day workload is the one to focus on because it's not such a big deal for a device replace to take overnight. Even though it would be good for everyone's use case if it turns out there's some optimization possible to avoid hundreds of write requests per second, just because of small files. -- Chris Murphy ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2020-01-02 23:22 ` Chris Murphy 2020-01-03 9:08 ` Leszek Dubiel @ 2020-01-03 14:39 ` Leszek Dubiel 2020-01-03 19:02 ` Chris Murphy 2020-01-04 5:38 ` Zygo Blaxell 2 siblings, 1 reply; 32+ messages in thread From: Leszek Dubiel @ 2020-01-03 14:39 UTC (permalink / raw) To: Btrfs BTRFS I have made some measurements in case someone has the same profile on his server. ============= === BTRFS === This is the same server "wawel" that was discussed above on slow "btrfs delete" and "btrfs balance". I must admit that BTRFS is much faster these days than ReiserFs. See ReiserFs below. ** 51 minutes to list 11 million files: root@wawel:/mnt/root/orion# time find > list_of_files real 51m14,611s user 0m30,487s sys 2m23,520s root@wawel:/mnt/root/orion# wc -l list_of_files 11439162 list_of_files ** 86 min to check size of files root@wawel:/mnt/root/orion# time du -abx > disk_usage real 86m42,328s user 0m47,225s sys 7m13,320s root@wawel:/mnt/root/orion# wc -l disk_usage 11438206 disk_usage ** 45 minutes to read half million files, 50GB of data root@wawel:/mnt/root/orion# egrep 'root/Omega/Domains/Zakupy' disk_usage | wc -l 476499 root@wawel:/mnt/root/orion# egrep 'root/Omega/Domains/Zakupy$' disk_usage 49154713130 ./root/Omega/Domains/Zakupy root@wawel:/mnt/root/orion# time find root/Omega/Domains/Zakupy/ -type f -print0 | xargs -0r wc > word_count real 45m11,398s user 32m22,395s sys 0m22,657s ** 21 min to copy 50Gb root@wawel:/mnt/root/orion# time cp -a root/Omega/Domains/Zakupy/ tempcopy real 21m36,030s user 0m9,553s sys 2m47,038s ** remove is very fast root@wawel:/mnt/root/orion# time rm -r tempcopy real 0m39,952s user 0m0,359s sys 0m9,985s ** number of files by given size root@wawel:/mnt/root/orion# cat disk_usage | perl -MData::Dumper -e '$Data::Dumper::Sortkeys = 1; while (<>) { chomp; my ($byt, $nam) = split /\t/, $_, -1; if (index("$las/", $nam) == 0) { $dir++; } else { $filtot++; for $p (1 .. 99) { if ($byt < 10 ** $p) { $fil{"num of files size <10^$p"}++; last; } } }; $las = $nam; }; print "\ndirectories: $dir\ntotal num of files: $filtot\n", "\nnumber of files grouped by size: \n", Dumper(\%fil) ' directories: 1314246 total num of files: 10123960 number of files grouped by size: $VAR1 = { 'num of files size <10^1' => 3325886, 'num of files size <10^2' => 3709276, 'num of files size <10^3' => 789852, 'num of files size <10^4' => 1085927, 'num of files size <10^5' => 650571, 'num of files size <10^6' => 438717, 'num of files size <10^7' => 116757, 'num of files size <10^8' => 6638, 'num of files size <10^9' => 323 'num of files size <10^10' => 13, }; ** disk speed root@wawel:/mnt/root/orion# hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 680 MB in 3.01 seconds = 226.14 MB/sec ================ === REISERFS === ReiserFs is much slower: root@gamma:/mnt/sdb1/orion2# time find > list_of_files real 111m49,529s user 0m28,665s sys 1m3,576s root@gamma:/mnt/sdb1/orion2# wc -l list_of_files 11329332 list_of_files root@gamma:/mnt/sdb1/orion2# time du -abx > disk_usage real 198m43,813s user 0m53,501s sys 3m38,447s root@gamma:/mnt/sdb1/orion2# hdparm -t /dev/sdb /dev/sdb: Timing buffered disk reads: 534 MB in 3.00 seconds = 177.77 MB/sec ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2020-01-03 14:39 ` Leszek Dubiel @ 2020-01-03 19:02 ` Chris Murphy 2020-01-03 20:59 ` Leszek Dubiel 0 siblings, 1 reply; 32+ messages in thread From: Chris Murphy @ 2020-01-03 19:02 UTC (permalink / raw) To: Leszek Dubiel; +Cc: Btrfs BTRFS On Fri, Jan 3, 2020 at 7:39 AM Leszek Dubiel <leszek@dubiel.pl> wrote: > > ** number of files by given size > > root@wawel:/mnt/root/orion# cat disk_usage | perl -MData::Dumper -e > '$Data::Dumper::Sortkeys = 1; while (<>) { chomp; my ($byt, $nam) = > split /\t/, $_, -1; if (index("$las/", $nam) == 0) { $dir++; } else { > $filtot++; for $p (1 .. 99) { if ($byt < 10 ** $p) { $fil{"num of files > size <10^$p"}++; last; } } }; $las = $nam; }; print "\ndirectories: > $dir\ntotal num of files: $filtot\n", "\nnumber of files grouped by > size: \n", Dumper(\%fil) ' > > directories: 1314246 > total num of files: 10123960 > > number of files grouped by size: > $VAR1 = { > 'num of files size <10^1' => 3325886, > 'num of files size <10^2' => 3709276, > 'num of files size <10^3' => 789852, > 'num of files size <10^4' => 1085927, > 'num of files size <10^5' => 650571, > 'num of files size <10^6' => 438717, > 'num of files size <10^7' => 116757, > 'num of files size <10^8' => 6638, > 'num of files size <10^9' => 323 > 'num of files size <10^10' => 13, Is that really ~7.8 million files at or less than 1KiB?? (totalling the first three) Compression may not do much with such small files, and also I'm not sure which algorithm would do the best job. They all probably want a lot more than 1KiB to become efficient. But nodesize 64KiB might be a big deal...worth testing. -- Chris Murphy ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2020-01-03 19:02 ` Chris Murphy @ 2020-01-03 20:59 ` Leszek Dubiel 0 siblings, 0 replies; 32+ messages in thread From: Leszek Dubiel @ 2020-01-03 20:59 UTC (permalink / raw) To: Chris Murphy; +Cc: Btrfs BTRFS W dniu 03.01.2020 o 20:02, Chris Murphy pisze: > On Fri, Jan 3, 2020 at 7:39 AM Leszek Dubiel <leszek@dubiel.pl> wrote: >> >> ** number of files by given size >> >> root@wawel:/mnt/root/orion# cat disk_usage | perl -MData::Dumper -e >> '$Data::Dumper::Sortkeys = 1; while (<>) { chomp; my ($byt, $nam) = >> split /\t/, $_, -1; if (index("$las/", $nam) == 0) { $dir++; } else { >> $filtot++; for $p (1 .. 99) { if ($byt < 10 ** $p) { $fil{"num of files >> size <10^$p"}++; last; } } }; $las = $nam; }; print "\ndirectories: >> $dir\ntotal num of files: $filtot\n", "\nnumber of files grouped by >> size: \n", Dumper(\%fil) ' >> >> directories: 1314246 >> total num of files: 10123960 >> >> number of files grouped by size: >> $VAR1 = { >> 'num of files size <10^1' => 3325886, >> 'num of files size <10^2' => 3709276, >> 'num of files size <10^3' => 789852, >> 'num of files size <10^4' => 1085927, >> 'num of files size <10^5' => 650571, >> 'num of files size <10^6' => 438717, >> 'num of files size <10^7' => 116757, >> 'num of files size <10^8' => 6638, >> 'num of files size <10^9' => 323 >> 'num of files size <10^10' => 13, > > Is that really ~7.8 million files at or less than 1KiB?? (totalling > the first three) Yes. This is Workflow Management system in my company (bathroom mirorr manufacturer). System was made in 2004. Back then ReisierFs was great because it had "tail packing" and put small pieces of data together with metadata, so disks could read many pieces of data during one read request. Other systems were not any close to ReiserFs when it came to speed with lots of small files. That's why I'm testing BTFS for a few years now. > Compression may not do much with such small files, and also I'm not > sure which algorithm would do the best job. They all probably want a > lot more than 1KiB to become efficient. > > But nodesize 64KiB might be a big deal...worth testing. Yes -- I will have to read about nodesize. Thank you for hint. Current data: root@wawel:~# btrfs inspect dump-super /dev/sdb2 superblock: bytenr=65536, device=/dev/sdb2 --------------------------------------------------------- csum_type 0 (crc32c) csum_size 4 csum 0x0bd7280d [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid 44803366-3981-4ebb-853b-6c991380c8a6 metadata_uuid 44803366-3981-4ebb-853b-6c991380c8a6 label generation 553943 root 17155128295424 sys_array_size 129 chunk_root_generation 553648 root_level 1 chunk_root 10136287444992 chunk_root_level 1 log_root 0 log_root_transid 0 log_root_level 0 total_bytes 23967879376896 bytes_used 5844982415360 sectorsize 4096 nodesize 16384 ---------------<<< leafsize (deprecated) 16384 stripesize 4096 root_dir 6 num_devices 3 compat_flags 0x0 compat_ro_flags 0x0 incompat_flags 0x163 ( MIXED_BACKREF | DEFAULT_SUBVOL | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA ) cache_generation 553943 uuid_tree_generation 594 dev_item.uuid 5f74e436-f8f9-43ba-95fc-44cdb2bc1838 dev_item.fsid 44803366-3981-4ebb-853b-6c991380c8a6 [match] dev_item.type 0 dev_item.total_bytes 5992192409600 dev_item.bytes_used 2946381119488 dev_item.io_align 4096 dev_item.io_width 4096 dev_item.sector_size 4096 dev_item.devid 3 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2020-01-02 23:22 ` Chris Murphy 2020-01-03 9:08 ` Leszek Dubiel 2020-01-03 14:39 ` Leszek Dubiel @ 2020-01-04 5:38 ` Zygo Blaxell 2020-01-07 18:44 ` write amplification, was: " Chris Murphy 2 siblings, 1 reply; 32+ messages in thread From: Zygo Blaxell @ 2020-01-04 5:38 UTC (permalink / raw) To: Chris Murphy; +Cc: Leszek Dubiel, Btrfs BTRFS [-- Attachment #1: Type: text/plain, Size: 4637 bytes --] On Thu, Jan 02, 2020 at 04:22:37PM -0700, Chris Murphy wrote: > On Thu, Jan 2, 2020 at 3:39 PM Leszek Dubiel <leszek@dubiel.pl> wrote: > > > > Almost no reads, all writes, but slow. And rather high write request > > > per second, almost double for sdc. And sdc is near it's max > > > utilization so it might be ear to its iops limit? > > > > > > ~210 rareq-sz = 210KiB is the average size of the read request for > > sda and sdb > > > > > > Default mkfs and default mount options? Or other and if so what other? > > > > > > Many small files on this file system? Or possibly large files with a > > > lot of fragmentation? > > > > Default mkfs and default mount options. > > > > This system could have a few million (!) of small files. > > On reiserfs it takes about 40 minutes, to do "find /". > > Rsync runs for 6 hours to backup data. > > There is a mount option: max_inline=<bytes> which the man page says > (default: min(2048, page size) ) It's half the page size per a commit from some years ago. For compressed size, it's the compressed data size (i.e. you can have a 4095-byte inline file with max_inline=2048 due to the compression). > I've never used it, so in theory the max_inline byte size is 2KiB. > However, I have seen substantially larger inline extents than 2KiB > when using a nodesize larger than 16KiB at mkfs time. > > I've wondered whether it makes any difference for the "many small > files" case to do more aggressive inlining of extents. > > I've seen with 16KiB leaf size, often small files that could be > inlined, are instead put into a data block group, taking up a minimum > 4KiB block size (on x64_64 anyway). I'm not sure why, but I suspect > there just isn't enough room in that leaf to always use inline > extents, and yet there is enough room to just reference it as a data > block group extent. When using a larger node size, a larger percentage > of small files ended up using inline extents. I'd expect this to be > quite a bit more efficient, because it eliminates a time expensive (on > HDD anyway) seek. Putting a lot of inline file data into metadata pages makes them less dense, which is either good or bad depending on which bottleneck you're currently hitting. If you have snapshots there is an up-to-300x metadata write amplification penalty to update extent item references every time a shared metadata page is unshared. Inline extents reduce the write amplification. On the other hand, if you are doing a lot of 'find'-style tree sweeps, then inline extents will reduce their efficiency because more pages will have to be read to scan the same number of dirents and inodes. For workloads that reiserfs was good at, there's no reliable rule of thumb to guess which is better--you have to try both, and measure results. > Another optimization, using compress=zstd:1, which is the lowest > compression setting. That'll increase the chance a file can use inline > extents, in particular with a larger nodesize. > > And still another optimization, at the expense of much more > complexity, is LVM cache with an SSD. You'd have to pick a suitable > policy for the workload, but I expect that if the iostat utilizations > you see of often near max utilization in normal operation, you'll see > improved performance. SSD's can handle way higher iops than HDD. But a > lot of this optimization stuff is use case specific. I'm not even sure > what your mean small file size is. I've found an interesting result in cache configuration testing: btrfs's writes with datacow seem to be very well optimized, to the point that adding a writeback SSD cache between btrfs and a HDD makes btrfs commits significantly slower. A writeback cache adds latency to the write path without removing many seeks--btrfs already does writes in big contiguous bursts--so the extra latency makes the writeback cache slow compared to writethrough. A writethrough SSD cache helps with reads (which are very seeky and benefit a lot from caching) without adding latency to writes, and btrfs reads a _lot_ during commits. > > # iotop -d30 > > > > Total DISK READ: 34.12 M/s | Total DISK WRITE: 40.36 M/s > > Current DISK READ: 34.12 M/s | Current DISK WRITE: 79.22 M/s > > TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND > > 4596 be/4 root 34.12 M/s 37.79 M/s 0.00 % 91.77 % btrfs > > Not so bad for many small file reads and writes with HDD. I've see > this myself with single spindle when doing small file reads and > writes. > > > -- > Chris Murphy [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* write amplification, was: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2020-01-04 5:38 ` Zygo Blaxell @ 2020-01-07 18:44 ` Chris Murphy 2020-01-07 19:26 ` Holger Hoffstätte 2020-01-07 23:32 ` Zygo Blaxell 0 siblings, 2 replies; 32+ messages in thread From: Chris Murphy @ 2020-01-07 18:44 UTC (permalink / raw) To: Zygo Blaxell; +Cc: Btrfs BTRFS On Fri, Jan 3, 2020 at 10:38 PM Zygo Blaxell <ce3g8jdj@umail.furryterror.org> wrote: > > On Thu, Jan 02, 2020 at 04:22:37PM -0700, Chris Murphy wrote: > > I've seen with 16KiB leaf size, often small files that could be > > inlined, are instead put into a data block group, taking up a minimum > > 4KiB block size (on x64_64 anyway). I'm not sure why, but I suspect > > there just isn't enough room in that leaf to always use inline > > extents, and yet there is enough room to just reference it as a data > > block group extent. When using a larger node size, a larger percentage > > of small files ended up using inline extents. I'd expect this to be > > quite a bit more efficient, because it eliminates a time expensive (on > > HDD anyway) seek. > > Putting a lot of inline file data into metadata pages makes them less > dense, which is either good or bad depending on which bottleneck you're > currently hitting. If you have snapshots there is an up-to-300x metadata > write amplification penalty to update extent item references every time > a shared metadata page is unshared. Inline extents reduce the write > amplification. On the other hand, if you are doing a lot of 'find'-style > tree sweeps, then inline extents will reduce their efficiency because more > pages will have to be read to scan the same number of dirents and inodes. Egads! Soo... total tangent. I'll change the subject. I have had multiple flash drive failures while using Btrfs: all Samsung, several SD Cards, and so far two USB sticks. They all fail in the essentially the same way, the media itself becomes read only. USB: writes succeed but they do not persist. Write data to the media, and there is no error. Read that same sector back, old data is there. SD Card: writes fail with a call trace and diagnostic info unique to the sd card kernel code, and everything just goes belly up. This happens inside of 6 months of rather casual use as rootfs. And BTW Samsung always replaces the media under warranty without complaint. It's not a scientific sample. Could be the host device, which is the same in each case. Could be a bug in the firmware. I have nothing to go on really. But I wonder if this is due to write amplification that's just not anticipated by the manufacturers? Is there any way to test for this or estimate the amount of amplification? This class of media doesn't report LBA's written, so I'm at quite a lack of useful information to know what the cause is. The relevance here though is, I really like the idea of Btrfs used as a rootfs for things like IoT because of compression, ostensibly there are ssd optimizations, and always on checksumming to catch what often can be questionable media: like USB sticks, SD Cards, eMMC, etc. But not if the write amplication has a good chance of killing people's hardware (I have no proof of this but now I wonder, as I read your email). I'm aware of write amplification, I just didn't realize it could be this massive. It's is 300x just by having snapshots at all? Or does it get worse with each additional snapshot? And is it multiplicative or exponentially worse? In the most prolific snapshotting case, I had two subvolumes, each with 20 snapshots (at most). I used default ssd mount option for the sdcards, most recently ssd_spread with the usb sticks. And now nossd with the most recent USB stick I just started to use. -- Chris Murphy ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: write amplification, was: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2020-01-07 18:44 ` write amplification, was: " Chris Murphy @ 2020-01-07 19:26 ` Holger Hoffstätte 2020-01-07 23:32 ` Zygo Blaxell 1 sibling, 0 replies; 32+ messages in thread From: Holger Hoffstätte @ 2020-01-07 19:26 UTC (permalink / raw) To: Chris Murphy, Zygo Blaxell, Btrfs BTRFS On 1/7/20 7:44 PM, Chris Murphy wrote: > Egads! Soo... total tangent. I'll change the subject. Time for science! "Analyzing IO Amplification in Linux File Systems" https://arxiv.org/abs/1707.08514 Ironically Btrfs does better in a related paper: "Filesystem Aging: It’s more Usage than Fullness" https://www.usenix.org/conference/hotstorage19/presentation/conway enjoy, Holger ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: write amplification, was: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2020-01-07 18:44 ` write amplification, was: " Chris Murphy 2020-01-07 19:26 ` Holger Hoffstätte @ 2020-01-07 23:32 ` Zygo Blaxell 2020-01-07 23:53 ` Chris Murphy 1 sibling, 1 reply; 32+ messages in thread From: Zygo Blaxell @ 2020-01-07 23:32 UTC (permalink / raw) To: Chris Murphy; +Cc: Btrfs BTRFS [-- Attachment #1: Type: text/plain, Size: 8957 bytes --] On Tue, Jan 07, 2020 at 11:44:43AM -0700, Chris Murphy wrote: > On Fri, Jan 3, 2020 at 10:38 PM Zygo Blaxell > <ce3g8jdj@umail.furryterror.org> wrote: > > > > On Thu, Jan 02, 2020 at 04:22:37PM -0700, Chris Murphy wrote: > > > > I've seen with 16KiB leaf size, often small files that could be > > > inlined, are instead put into a data block group, taking up a minimum > > > 4KiB block size (on x64_64 anyway). I'm not sure why, but I suspect > > > there just isn't enough room in that leaf to always use inline > > > extents, and yet there is enough room to just reference it as a data > > > block group extent. When using a larger node size, a larger percentage > > > of small files ended up using inline extents. I'd expect this to be > > > quite a bit more efficient, because it eliminates a time expensive (on > > > HDD anyway) seek. > > > > Putting a lot of inline file data into metadata pages makes them less > > dense, which is either good or bad depending on which bottleneck you're > > currently hitting. If you have snapshots there is an up-to-300x metadata > > write amplification penalty to update extent item references every time > > a shared metadata page is unshared. Inline extents reduce the write > > amplification. On the other hand, if you are doing a lot of 'find'-style > > tree sweeps, then inline extents will reduce their efficiency because more > > pages will have to be read to scan the same number of dirents and inodes. > > Egads! Soo... total tangent. I'll change the subject. > > I have had multiple flash drive failures while using Btrfs: all > Samsung, several SD Cards, and so far two USB sticks. They all fail in > the essentially the same way, the media itself becomes read only. USB: > writes succeed but they do not persist. Write data to the media, and > there is no error. Read that same sector back, old data is there. SD > Card: writes fail with a call trace and diagnostic info unique to the > sd card kernel code, and everything just goes belly up. This happens > inside of 6 months of rather casual use as rootfs. And BTW Samsung > always replaces the media under warranty without complaint. > > It's not a scientific sample. Could be the host device, which is the > same in each case. Could be a bug in the firmware. I have nothing to > go on really. It seems to be normal behavior for USB sticks and SD cards. I've also had USB sticks degrade (bit errors) simply from sitting unused on a shelf for six months. Some low-end SATA SSDs (like $20/TB drives from Amazon) are giant USB sticks with a SATA interface, and will fail the same way. SD card vendors are starting to notice, and there are now SD card options with higher endurance ratings. Still "putting this card in a dashcam voids the warranty" in most cases. ext2 and msdos both make USB sticks last longer, but they have obvious other problems. From my fleet of raspberry pis, I find that SD cards last longer on btrfs than ext4 with comparable write loads, but they are still both lead very short lives, and the biggest life expectancy improvement (up to a couple of years) comes from eliminating local writes entirely. > But I wonder if this is due to write amplification that's just not > anticipated by the manufacturers? Is there any way to test for this or > estimate the amount of amplification? This class of media doesn't > report LBA's written, so I'm at quite a lack of useful information to > know what the cause is. The relevance here though is, I really like > the idea of Btrfs used as a rootfs for things like IoT because of > compression, ostensibly there are ssd optimizations, and always on > checksumming to catch what often can be questionable media: like USB > sticks, SD Cards, eMMC, etc. But not if the write amplication has a > good chance of killing people's hardware (I have no proof of this but > now I wonder, as I read your email). > > I'm aware of write amplification, I just didn't realize it could be > this massive. It's is 300x just by having snapshots at all? Or does it > get worse with each additional snapshot? And is it multiplicative or > exponentially worse? A 16K subvol metadata page can hold ~300 extent references. Each extent reference is bidirectional--there is a reference from the subvol metadata page to the extent data item, and a backref from the extent data item to the reference. If a new reference is created via clone or dedupe or partially overwriting an extent in the middle, then the extent item's reference count is increased, and new backrefs are added to the extent item's page. When a snapshot is created, all the metadata pages except the root become shared. Each referenced extent data item is not changed at this time, as there is only one metadata page containing references to each extent data item. The metadata page carrying the extent reference item has multiple owners which are ancestor nodes in all of the snapshot subvol trees. The backref walking code starts from an extent data item, and follows references back to subvol metadata pages. If the subvol metadata pages are also shared, then the walking code follows those back to the subvol roots. The true reference count for an extent is a combination of the direct references (subvol leaf page to extent data) and indirect references (subvol root or node page to subvol leaf page). When a snapshot metadata page is modified, a new metadata page is created with mostly the same content, give or take the items that are added or modified. This inserts ~300 new extent data backref items into the extent tree because they are now owned by both the old and new versions of the metadata page. It is as if the files located in the subvol metadata page were silently reflinked from the old to new subvol, but only in the specific areas listed on the single modified metadata page. In the worst case, all ~300 extent data items are stored on separate extent tree pages (i.e. you have really bad fragmentation on a big filesystem, and all of the extents in a file are in different places on disk). In this case, to modify a single page of shared subvol metadata, we must also update up to 300 extent tree pages. This is where the 300x write multiply comes from. It's not really the worst case--each of those 300 page updates has their own multiplication (e.g. the extent tree page may be overfilled and split, doubling one write). If you end up freeing pages all over the filesystem, there will be free space cache/tree modifications as well. In the best case, the metadata page isn't shared (i.e. it was CoWed or it's a brand new metadata page). In this case there's no need to update backref pointers or reference counts for unmodified items, as they will be deleted during the same transaction that creates a copy of them. Real cases fall between 1x and 300x. The first time you modify a metadata page in a snapshotted subvol, you must also update ~300 extent data item backrefs (unless the subvol is so small it contains fewer than 300 items). This repeats for every shared page as it is unshared. Over time, shared pages are replaced with unshared pages, and performance within the subvol returns to normal levels. If a new snapshot is created for the subvol, we start over at 300x item updates since every metadata page in the subvol is now shared again. The write multiplication contribution from snapshots quickly drops to 1x, but it's worst after a new snapshot is created (for either the old or new subvol). On a big filesystem it can take minutes to create a new directory, rename, delete, or hardlink files for the first time after a snapshot is created, as even the most trivial metadata update becomes an update of tens of thousands of randomly scattered extent tree pages. Snapshots and 'cp -ax --reflink=always a b' are comparably expensive if you modify more than 0.3% of the metadata and the modifications are equally distributed across the subvol. In the end, you'll have completely unshared metadata and touched every page of the extent tree either way, but the timing will be different (e.g. mkdirs and renames will be blamed for the low performance, when the root cause is really the snapshot for backups that happened 12 hours earlier). > In the most prolific snapshotting case, I had two subvolumes, each > with 20 snapshots (at most). I used default ssd mount option for the > sdcards, most recently ssd_spread with the usb sticks. And now nossd > with the most recent USB stick I just started to use. The number of snapshots doesn't really matter: you get the up-to-300x write multiple from writing to a subvol that has shared metadata pages, which happens when you have just one snapshot. It doesn't matter if you have 1 snapshot or 10000 (at least, not for _this_ reason). > -- > Chris Murphy > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: write amplification, was: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2020-01-07 23:32 ` Zygo Blaxell @ 2020-01-07 23:53 ` Chris Murphy 2020-01-08 1:41 ` Zygo Blaxell 0 siblings, 1 reply; 32+ messages in thread From: Chris Murphy @ 2020-01-07 23:53 UTC (permalink / raw) To: Zygo Blaxell; +Cc: Chris Murphy, Btrfs BTRFS On Tue, Jan 7, 2020 at 4:32 PM Zygo Blaxell <ce3g8jdj@umail.furryterror.org> wrote: > > It seems to be normal behavior for USB sticks and SD cards. I've also > had USB sticks degrade (bit errors) simply from sitting unused on a shelf > for six months. Some low-end SATA SSDs (like $20/TB drives from Amazon) > are giant USB sticks with a SATA interface, and will fail the same way. > > SD card vendors are starting to notice, and there are now SD card options > with higher endurance ratings. Still "putting this card in a dashcam > voids the warranty" in most cases. > > ext2 and msdos both make USB sticks last longer, but they have obvious > other problems. From my fleet of raspberry pis, I find that SD cards > last longer on btrfs than ext4 with comparable write loads, but they > are still both lead very short lives, and the biggest life expectancy > improvement (up to a couple of years) comes from eliminating local > writes entirely. It's long been an accepted fact in professional photography circles that CF/SD card corruption is due to crappy firmware in cameras. Ergo, format (FAT32/exFAT) only with that camera's firmware, don't delete files using the camera's interface, suck off all the files (back them up too) then format the card in that particular camera, etc. I've wondered for a while now, if all of that was flash manufacturer buck passing. > > > In the most prolific snapshotting case, I had two subvolumes, each > > with 20 snapshots (at most). I used default ssd mount option for the > > sdcards, most recently ssd_spread with the usb sticks. And now nossd > > with the most recent USB stick I just started to use. > > The number of snapshots doesn't really matter: you get the up-to-300x > write multiple from writing to a subvol that has shared metadata pages, > which happens when you have just one snapshot. It doesn't matter if > you have 1 snapshot or 10000 (at least, not for _this_ reason). Gotcha. So I wonder if the cheap consumer USB/SD card use case, Raspberry Pi and such, rootfs or general purpose use, should use a 4KiB node instead of default 16KiB? I read on HN someone using much much more expensive industrial SD cards and this problem has entirely vanished for them (Pi use case). I've looked around and they are a lot smaller and more expensive but for a Pi rootfs it's pretty sane really, USD$56 for a 4G card that won't die every 6 months? They are slower too. *shrug* -- Chris Murphy ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: write amplification, was: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2020-01-07 23:53 ` Chris Murphy @ 2020-01-08 1:41 ` Zygo Blaxell 2020-01-08 2:54 ` Chris Murphy 0 siblings, 1 reply; 32+ messages in thread From: Zygo Blaxell @ 2020-01-08 1:41 UTC (permalink / raw) To: Chris Murphy; +Cc: Btrfs BTRFS [-- Attachment #1: Type: text/plain, Size: 4281 bytes --] On Tue, Jan 07, 2020 at 04:53:13PM -0700, Chris Murphy wrote: > On Tue, Jan 7, 2020 at 4:32 PM Zygo Blaxell > <ce3g8jdj@umail.furryterror.org> wrote: > > > > It seems to be normal behavior for USB sticks and SD cards. I've also > > had USB sticks degrade (bit errors) simply from sitting unused on a shelf > > for six months. Some low-end SATA SSDs (like $20/TB drives from Amazon) > > are giant USB sticks with a SATA interface, and will fail the same way. > > > > SD card vendors are starting to notice, and there are now SD card options > > with higher endurance ratings. Still "putting this card in a dashcam > > voids the warranty" in most cases. > > > > ext2 and msdos both make USB sticks last longer, but they have obvious > > other problems. From my fleet of raspberry pis, I find that SD cards > > last longer on btrfs than ext4 with comparable write loads, but they > > are still both lead very short lives, and the biggest life expectancy > > improvement (up to a couple of years) comes from eliminating local > > writes entirely. > > It's long been an accepted fact in professional photography circles > that CF/SD card corruption is due to crappy firmware in cameras. Ergo, > format (FAT32/exFAT) only with that camera's firmware, don't delete > files using the camera's interface, suck off all the files (back them > up too) then format the card in that particular camera, etc. Well, those are all good ideas. One should never assume firmware is bug-free or that any individual storage device is durable or interoperable. I have encountered embedded firmware developers who write their own VFAT implementation from scratch "because [they] don't want to debug someone else's", as if exhaustive bug compatibility was somehow not inherent to correct operation of a VFAT filesystem. > I've wondered for a while now, if all of that was flash manufacturer > buck passing. Consumer SD cards have been ruthlessly optimized for decades, mostly for cost. It will take a while for the consumer-facing part of the industry to dig itself out of that hole. In the meantime, we can expect silent data corruption, bad wear leveling, and power failure corruption to be part of the default feature set. > > > In the most prolific snapshotting case, I had two subvolumes, each > > > with 20 snapshots (at most). I used default ssd mount option for the > > > sdcards, most recently ssd_spread with the usb sticks. And now nossd > > > with the most recent USB stick I just started to use. > > > > The number of snapshots doesn't really matter: you get the up-to-300x > > write multiple from writing to a subvol that has shared metadata pages, > > which happens when you have just one snapshot. It doesn't matter if > > you have 1 snapshot or 10000 (at least, not for _this_ reason). > > Gotcha. So I wonder if the cheap consumer USB/SD card use case, > Raspberry Pi and such, rootfs or general purpose use, should use a > 4KiB node instead of default 16KiB? I read on HN someone using much > much more expensive industrial SD cards and this problem has entirely > vanished for them (Pi use case). I've looked around and they are a lot > smaller and more expensive but for a Pi rootfs it's pretty sane > really, USD$56 for a 4G card that won't die every 6 months? They are > slower too. *shrug* I run btrfs with dup data and dup metadata on the Pis, with minimized write workloads (i.e. I turn off all the local log files, sending the data to other machines or putting it on tmpfs with periodic uploads, and I use compression to reduce write volume). I don't use snapshots on these devices--they do get backups, but they are fine with plain rsync, given the minimized write workloads. I haven't tried changing filesystem parameters like node size. Dup data doesn't help the SD card last any longer, but it does keep the device operating long enough to build and deploy a new SD card. Samsung is making SD cards with 10-year warranties and a 300 TBW rating (equivalent, it is specified in units of "hours of HD video"). They are USD$25 for 128GB, 100MB/s read 90MB/s write. No idea what they're like in the field, I start test deployments next week... > > -- > Chris Murphy > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: write amplification, was: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2020-01-08 1:41 ` Zygo Blaxell @ 2020-01-08 2:54 ` Chris Murphy 0 siblings, 0 replies; 32+ messages in thread From: Chris Murphy @ 2020-01-08 2:54 UTC (permalink / raw) To: Zygo Blaxell; +Cc: Chris Murphy, Btrfs BTRFS On Tue, Jan 7, 2020 at 6:41 PM Zygo Blaxell <ce3g8jdj@umail.furryterror.org> wrote: > > Consumer SD cards have been ruthlessly optimized for decades, mostly > for cost. It will take a while for the consumer-facing part of the > industry to dig itself out of that hole. In the meantime, we can expect > silent data corruption, bad wear leveling, and power failure corruption > to be part of the default feature set. I cackled out loud at this. > I run btrfs with dup data and dup metadata on the Pis, with minimized > write workloads (i.e. I turn off all the local log files, sending the > data to other machines or putting it on tmpfs with periodic uploads, > and I use compression to reduce write volume). I don't use snapshots on > these devices--they do get backups, but they are fine with plain rsync, > given the minimized write workloads. I haven't tried changing filesystem > parameters like node size. Dup data doesn't help the SD card last any > longer, but it does keep the device operating long enough to build and > deploy a new SD card. I use zstd:1, space_cache v2, and dsingle mdup. Curiously I've not seen any read errors on these. They just stop writing. I can mount and read from them fine, just writes file (silently on the USB sticks, kinda hilarious: yep, yep, i'm writing, no problem, yep, oh you want back what you just wrote, yeah no you get yesterday's data). > Samsung is making SD cards with 10-year warranties and a 300 TBW rating > (equivalent, it is specified in units of "hours of HD video"). They are > USD$25 for 128GB, 100MB/s read 90MB/s write. No idea what they're like > in the field, I start test deployments next week... Yeah the Samsung SD cards I have also are 10 year, but I have no idea what the TBW is and they don't report writes. EVO Plus, U3, is supposed to do 30M/s writes but no matter the file system I get 20MB/s out of it. I get a statistically significant extra ~350K/s with Btrfs. Reads are around 80-90M/s. Of course effective reads/writes is higher with compression. USB sticks I think are a few years warranty. -- Chris Murphy ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2020-01-02 21:57 ` Chris Murphy 2020-01-02 22:39 ` Leszek Dubiel @ 2020-01-06 11:14 ` Leszek Dubiel 2020-01-07 0:21 ` Chris Murphy 1 sibling, 1 reply; 32+ messages in thread From: Leszek Dubiel @ 2020-01-06 11:14 UTC (permalink / raw) To: Btrfs BTRFS; +Cc: Chris Murphy, Zygo Blaxell W dniu 02.01.2020 o 22:57, Chris Murphy pisze: > but I would say that in retrospect it would have been better to NOT > delete the device with a few bad sectors, and instead use `btrfs > replace` to do a 1:1 replacement of that particular drive. Tested "replace" on ahother server: root@zefir:~# btrfs replace start /dev/sde1 /dev/sdb3 / and speed was quite normal: 1.49 TiB * ( 1024 * 1024 MiB/TiB ) / ( 4.5 hours * 3600 sec/hour ) = 1.49 * ( 1024 * 1024 ) / ( 4.5 * 3600 ) = 96.44 MiB / sec Questions: 1. it is a little bit confusing that kerner reports sdc1 and sde1 on the same line: "BTRFS warning (device sdc1): i/o error ... on dev /dev/sde1".... 2. can I reset counters of errors for /dev/sda1? there were errors due to malfunctioning SATA cable, cable was replaced and counters don't grow any more; btrfs withstood that failure with no data loss ======================================= root@zefir:~# tail /var/log/kern.log Jan 6 00:19:37 zefir kernel: [33959.394053] BTRFS info (device sdc1): dev_replace from /dev/sde1 (devid 4) to /dev/sdb3 started Jan 6 00:50:32 zefir kernel: [35815.080061] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 1.126 msecs Jan 6 04:48:25 zefir kernel: [50087.254634] BTRFS info (device sdc1): dev_replace from /dev/sde1 (devid 4) to /dev/sdb3 finished root@zefir:~# btrfs dev usage / /dev/sda1, ID: 5 Device size: 1.80TiB Device slack: 0.00B Data,RAID1: 1.51TiB Metadata,RAID1: 20.00GiB System,RAID1: 32.00MiB Unallocated: 279.97GiB /dev/sdb3, ID: 4 Device size: 1.80TiB Device slack: 1.77GiB <<<<------------ Data,RAID1: 1.49TiB Metadata,RAID1: 42.00GiB Unallocated: 280.00GiB /dev/sdc1, ID: 1 Device size: 1.80TiB Device slack: 0.00B Data,RAID1: 1.50TiB Metadata,RAID1: 30.00GiB Unallocated: 278.00GiB /dev/sdd1, ID: 3 Device size: 1.80TiB Device slack: 0.00B Data,RAID1: 1.50TiB Metadata,RAID1: 36.00GiB Unallocated: 277.00GiB /dev/sdf1, ID: 2 Device size: 1.80TiB Device slack: 0.00B Data,RAID1: 1.49TiB Metadata,RAID1: 40.00GiB System,RAID1: 32.00MiB Unallocated: 279.97GiB # reduce slack root@zefir:~# btrfs fi resize 4:max / Resize '/' of '4:max' root@zefir:~# btrfs dev usage / ... /dev/sdb3, ID: 4 Device size: 1.80TiB Device slack: 3.50KiB <<<<<<<<<<<<<<<<<<<< Data,RAID1: 1.49TiB Metadata,RAID1: 42.00GiB Unallocated: 281.77GiB ... root@zefir:~# btrfs dev stat / [/dev/sdc1].write_io_errs 0 [/dev/sdc1].read_io_errs 0 [/dev/sdc1].flush_io_errs 0 [/dev/sdc1].corruption_errs 0 [/dev/sdc1].generation_errs 0 [/dev/sdf1].write_io_errs 0 [/dev/sdf1].read_io_errs 0 [/dev/sdf1].flush_io_errs 0 [/dev/sdf1].corruption_errs 0 [/dev/sdf1].generation_errs 0 [/dev/sdd1].write_io_errs 0 [/dev/sdd1].read_io_errs 0 [/dev/sdd1].flush_io_errs 0 [/dev/sdd1].corruption_errs 0 [/dev/sdd1].generation_errs 0 [/dev/sdb3].write_io_errs 0 [/dev/sdb3].read_io_errs 0 [/dev/sdb3].flush_io_errs 0 [/dev/sdb3].corruption_errs 0 [/dev/sdb3].generation_errs 0 [/dev/sda1].write_io_errs 10418 [/dev/sda1].read_io_errs 227 [/dev/sda1].flush_io_errs 117 [/dev/sda1].corruption_errs 77 [/dev/sda1].generation_errs 47 # erorr count doesn't change any more: root@zefir:/mnt/root/zefir_snaps# egrep . */root/btrfs_dev_stat | egrep write_io_errs | egrep sda1 | egrep ' 23:' 2019-12-27 23:50:01 z harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418 2019-12-28 23:50:01 z harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418 2019-12-29 23:50:01 z harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418 2019-12-30 23:50:02 z harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418 2019-12-31 23:50:01 z harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418 2020-01-01 23:50:01 z harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418 2020-01-02 23:50:03 z harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418 2020-01-03 23:50:01 z harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418 2020-01-04 23:50:01 z harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418 2020-01-05 23:00:01 z harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418 2020-01-05 23:10:03 z harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418 2020-01-05 23:20:01 z harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418 2020-01-05 23:30:01 z harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418 2020-01-05 23:40:01 z harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418 2020-01-05 23:50:01 z harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418 ######### kernel log root@zefir:~# cat /var/log/kern.log | egrep BTRFS Jan 5 13:52:09 zefir kernel: [1291932.446093] BTRFS warning (device sdc1): i/o error at logical 11658111352832 on dev /dev/sde1, physical 867246145536: metadata leaf (level 0) in tree 9109477097472 Jan 5 13:52:09 zefir kernel: [1291932.446095] BTRFS warning (device sdc1): i/o error at logical 11658111352832 on dev /dev/sde1, physical 867246145536: metadata leaf (level 0) in tree 9109477097472 Jan 5 13:52:09 zefir kernel: [1291932.446098] BTRFS error (device sdc1): bdev /dev/sde1 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0 Jan 5 13:52:09 zefir kernel: [1291932.689845] BTRFS error (device sdc1): fixed up error at logical 11658111352832 on dev /dev/sde1 Jan 5 14:54:30 zefir kernel: [ 4.654734] BTRFS: device fsid dd38db97-fa4a-479b-bc1b-973f01a61a8a devid 1 transid 257 /dev/sdd3 Jan 5 14:54:30 zefir kernel: [ 4.654855] BTRFS: device fsid 0a75ba3d-0540-4370-bb04-22470e855caa devid 1 transid 241 /dev/sdf3 Jan 5 14:54:30 zefir kernel: [ 4.655047] BTRFS: device fsid e2652ff5-6d5e-45de-912c-a662f92c10f0 devid 1 transid 280 /dev/sde3 Jan 5 14:54:30 zefir kernel: [ 4.655200] BTRFS: device fsid 8e960630-2b51-452b-b026-abc59dcf5f35 devid 1 transid 61 /dev/sda3 Jan 5 14:54:30 zefir kernel: [ 4.661180] BTRFS: device fsid c100cf56-f210-42d7-a953-bbe1974a28f0 devid 1 transid 292 /dev/sdc3 Jan 5 14:54:30 zefir kernel: [ 4.661331] BTRFS: device fsid 9d6e641c-ec71-411e-9312-f1f67a70913f devid 3 transid 2500779 /dev/sdd1 Jan 5 14:54:30 zefir kernel: [ 4.661450] BTRFS: device fsid 9d6e641c-ec71-411e-9312-f1f67a70913f devid 2 transid 2500779 /dev/sdf1 Jan 5 14:54:30 zefir kernel: [ 4.663399] BTRFS: device fsid 9d6e641c-ec71-411e-9312-f1f67a70913f devid 4 transid 2500779 /dev/sde1 Jan 5 14:54:30 zefir kernel: [ 4.663547] BTRFS: device fsid 9d6e641c-ec71-411e-9312-f1f67a70913f devid 5 transid 2500779 /dev/sda1 Jan 5 14:54:30 zefir kernel: [ 4.663779] BTRFS: device fsid 9d6e641c-ec71-411e-9312-f1f67a70913f devid 1 transid 2500779 /dev/sdc1 Jan 5 14:54:30 zefir kernel: [ 4.695640] BTRFS info (device sdc1): disk space caching is enabled Jan 5 14:54:30 zefir kernel: [ 4.695642] BTRFS info (device sdc1): has skinny extents Jan 5 14:54:30 zefir kernel: [ 5.363215] BTRFS info (device sdc1): bdev /dev/sde1 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0 Jan 5 14:54:30 zefir kernel: [ 5.363219] BTRFS info (device sdc1): bdev /dev/sda1 errs: wr 10418, rd 227, flush 117, corrupt 77, gen 47 Jan 5 14:54:30 zefir kernel: [ 37.646828] BTRFS info (device sdc1): disk space caching is enabled Jan 5 19:55:25 zefir kernel: [18107.366949] BTRFS warning (device sdc1): i/o error at logical 8728914731008 on dev /dev/sde1, physical 1395531751424, root 119218, inode 63539433, offset 138190848, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:25 zefir kernel: [18107.366952] BTRFS warning (device sdc1): i/o error at logical 8728912101376 on dev /dev/sde1, physical 1395529121792, root 119218, inode 63539433, offset 135561216, length 4096, links 1 (path: var/mail/bem1) ------------------------------------------------ ------ 8< cut many similar erorrs ------------------ ------------------------------------------------ Jan 5 19:55:25 zefir kernel: [18107.368411] BTRFS warning (device sdc1): i/o error at logical 8728912101376 on dev /dev/sde1, physical 1395529121792, root 119122, inode 63539433, offset 135561216, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:25 zefir kernel: [18107.368413] BTRFS warning (device sdc1): i/o error at logical 8728914731008 on dev /dev/sde1, physical 1395531751424, root 119122, inode 63539433, offset 138190848, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:25 zefir kernel: [18107.368441] BTRFS warning (device sdc1): i/o error at logical 8728914731008 on dev /dev/sde1, physical 1395531751424, root 119120, inode 63539433, offset 138190848, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:25 zefir kernel: [18107.368443] BTRFS warning (device sdc1): i/o error at logical 8728912101376 on dev /dev/sde1, physical 1395529121792, root 119120, inode 63539433, offset 135561216, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:25 zefir kernel: [18107.446304] BTRFS error (device sdc1): bdev /dev/sde1 errs: wr 0, rd 3, flush 0, corrupt 0, gen 0 Jan 5 19:55:25 zefir kernel: [18107.446306] BTRFS error (device sdc1): bdev /dev/sde1 errs: wr 0, rd 4, flush 0, corrupt 0, gen 0 Jan 5 19:55:25 zefir kernel: [18107.446450] BTRFS error (device sdc1): bdev /dev/sde1 errs: wr 0, rd 5, flush 0, corrupt 0, gen 0 Jan 5 19:55:25 zefir kernel: [18107.473619] BTRFS error (device sdc1): fixed up error at logical 8728912101376 on dev /dev/sde1 Jan 5 19:55:29 zefir kernel: [18111.775171] BTRFS error (device sdc1): fixed up error at logical 8728914731008 on dev /dev/sde1 Jan 5 19:55:29 zefir kernel: [18111.775336] BTRFS error (device sdc1): fixed up error at logical 8728913416192 on dev /dev/sde1 Jan 5 19:55:29 zefir kernel: [18111.778932] BTRFS error (device sdc1): bdev /dev/sde1 errs: wr 0, rd 6, flush 0, corrupt 0, gen 0 Jan 5 19:55:29 zefir kernel: [18111.779367] BTRFS error (device sdc1): fixed up error at logical 8728912105472 on dev /dev/sde1 Jan 5 19:55:34 zefir kernel: [18116.660449] BTRFS warning (device sdc1): i/o error at logical 8728914735104 on dev /dev/sde1, physical 1395531755520, root 119218, inode 63539433, offset 138194944, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:34 zefir kernel: [18116.660476] BTRFS warning (device sdc1): i/o error at logical 8728914735104 on dev /dev/sde1, physical 1395531755520, root 119216, inode 63539433, offset 138194944, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:34 zefir kernel: [18116.660502] BTRFS warning (device sdc1): i/o error at logical 8728914735104 on dev /dev/sde1, physical 1395531755520, root 119214, inode 63539433, offset 138194944, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:34 zefir kernel: [18116.660528] BTRFS warning (device sdc1): i/o error at logical 8728914735104 on dev /dev/sde1, physical 1395531755520, root 119212, inode 63539433, offset 138194944, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:34 zefir kernel: [18116.660554] BTRFS warning (device sdc1): i/o error at logical 8728914735104 on dev /dev/sde1, physical 1395531755520, root 119210, inode 63539433, offset 138194944, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:34 zefir kernel: [18116.660580] BTRFS warning (device sdc1): i/o error at logical 8728914735104 on dev /dev/sde1, physical 1395531755520, root 119208, inode 63539433, offset 138194944, length 4096, links 1 (path: var/mail/bem1) ------------------------------------------------ ------ 8< cut many similar erorrs ------------------ ------------------------------------------------ Jan 5 19:55:34 zefir kernel: [18116.662869] BTRFS warning (device sdc1): i/o error at logical 8728914735104 on dev /dev/sde1, physical 1395531755520, root 118872, inode 63539433, offset 138194944, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:34 zefir kernel: [18116.662894] BTRFS warning (device sdc1): i/o error at logical 8728914735104 on dev /dev/sde1, physical 1395531755520, root 118860, inode 63539433, offset 138194944, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:34 zefir kernel: [18116.662919] BTRFS warning (device sdc1): i/o error at logical 8728914735104 on dev /dev/sde1, physical 1395531755520, root 118848, inode 63539433, offset 138194944, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:34 zefir kernel: [18116.663077] BTRFS warning (device sdc1): i/o error at logical 8728914735104 on dev /dev/sde1, physical 1395531755520, root 118836, inode 63539433, offset 138194944, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:34 zefir kernel: [18116.663104] BTRFS warning (device sdc1): i/o error at logical 8728914735104 on dev /dev/sde1, physical 1395531755520, root 118824, inode 63539433, offset 138194944, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:34 zefir kernel: [18116.663863] BTRFS error (device sdc1): bdev /dev/sde1 errs: wr 0, rd 7, flush 0, corrupt 0, gen 0 Jan 5 19:55:38 zefir kernel: [18120.678251] BTRFS error (device sdc1): fixed up error at logical 8728914735104 on dev /dev/sde1 Jan 5 19:55:38 zefir kernel: [18120.681960] BTRFS error (device sdc1): bdev /dev/sde1 errs: wr 0, rd 8, flush 0, corrupt 0, gen 0 Jan 5 19:55:42 zefir kernel: [18124.968600] BTRFS warning (device sdc1): i/o error at logical 8728913420288 on dev /dev/sde1, physical 1395530440704, root 119218, inode 63539433, offset 136880128, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:42 zefir kernel: [18124.968627] BTRFS warning (device sdc1): i/o error at logical 8728913420288 on dev /dev/sde1, physical 1395530440704, root 119216, inode 63539433, offset 136880128, length 4096, links 1 (path: var/mail/bem1) ------------------------------------------------ ------ 8< cut many similar erorrs ------------------ ------------------------------------------------ Jan 5 19:55:42 zefir kernel: [18124.968961] BTRFS warning (device sdc1): i/o error at logical 8728913420288 on dev /dev/sde1, physical 1395530440704, root 119190, inode 63539433, offset 136880128, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:42 zefir kernel: [18124.968986] BTRFS warning (device sdc1): i/o error at logical 8728913420288 on dev /dev/sde1, physical 1395530440704, root 119188, inode 63539433, offset 136880128, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:42 zefir kernel: [18124.969010] BTRFS error (device sdc1): fixed up error at logical 8728912109568 on dev /dev/sde1 Jan 5 19:55:42 zefir kernel: [18124.969012] BTRFS warning (device sdc1): i/o error at logical 8728913420288 on dev /dev/sde1, physical 1395530440704, root 119186, inode 63539433, offset 136880128, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:42 zefir kernel: [18124.969038] BTRFS warning (device sdc1): i/o error at logical 8728913420288 on dev /dev/sde1, physical 1395530440704, root 119184, inode 63539433, offset 136880128, length 4096, links 1 (path: var/mail/bem1) ------------------------------------------------ ------ 8< cut many similar erorrs ------------------ ------------------------------------------------ Jan 5 19:55:42 zefir kernel: [18124.971065] BTRFS warning (device sdc1): i/o error at logical 8728913420288 on dev /dev/sde1, physical 1395530440704, root 118872, inode 63539433, offset 136880128, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:42 zefir kernel: [18124.971091] BTRFS warning (device sdc1): i/o error at logical 8728913420288 on dev /dev/sde1, physical 1395530440704, root 118860, inode 63539433, offset 136880128, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:42 zefir kernel: [18124.971117] BTRFS warning (device sdc1): i/o error at logical 8728913420288 on dev /dev/sde1, physical 1395530440704, root 118848, inode 63539433, offset 136880128, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:42 zefir kernel: [18124.971143] BTRFS warning (device sdc1): i/o error at logical 8728913420288 on dev /dev/sde1, physical 1395530440704, root 118836, inode 63539433, offset 136880128, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:42 zefir kernel: [18124.971169] BTRFS warning (device sdc1): i/o error at logical 8728913420288 on dev /dev/sde1, physical 1395530440704, root 118824, inode 63539433, offset 136880128, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:42 zefir kernel: [18124.971948] BTRFS error (device sdc1): bdev /dev/sde1 errs: wr 0, rd 9, flush 0, corrupt 0, gen 0 Jan 5 19:55:42 zefir kernel: [18125.004219] BTRFS error (device sdc1): fixed up error at logical 8728913420288 on dev /dev/sde1 Jan 5 19:55:47 zefir kernel: [18129.529678] BTRFS error (device sdc1): bdev /dev/sde1 errs: wr 0, rd 10, flush 0, corrupt 0, gen 0 Jan 5 19:55:51 zefir kernel: [18133.498294] BTRFS error (device sdc1): fixed up error at logical 8728914739200 on dev /dev/sde1 Jan 5 19:55:51 zefir kernel: [18133.498938] BTRFS warning (device sdc1): i/o error at logical 8728913424384 on dev /dev/sde1, physical 1395530444800, root 119218, inode 63539433, offset 136884224, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:51 zefir kernel: [18133.498966] BTRFS warning (device sdc1): i/o error at logical 8728913424384 on dev /dev/sde1, physical 1395530444800, root 119216, inode 63539433, offset 136884224, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:51 zefir kernel: [18133.498992] BTRFS warning (device sdc1): i/o error at logical 8728913424384 on dev /dev/sde1, physical 1395530444800, root 119214, inode 63539433, offset 136884224, length 4096, links 1 (path: var/mail/bem1) ------------------------------------------------ ------ 8< cut many similar erorrs ------------------ ------------------------------------------------ Jan 5 19:55:51 zefir kernel: [18133.501578] BTRFS warning (device sdc1): i/o error at logical 8728913424384 on dev /dev/sde1, physical 1395530444800, root 118848, inode 63539433, offset 136884224, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:51 zefir kernel: [18133.501604] BTRFS warning (device sdc1): i/o error at logical 8728913424384 on dev /dev/sde1, physical 1395530444800, root 118836, inode 63539433, offset 136884224, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:51 zefir kernel: [18133.501629] BTRFS warning (device sdc1): i/o error at logical 8728913424384 on dev /dev/sde1, physical 1395530444800, root 118824, inode 63539433, offset 136884224, length 4096, links 1 (path: var/mail/bem1) Jan 5 19:55:51 zefir kernel: [18133.502390] BTRFS error (device sdc1): bdev /dev/sde1 errs: wr 0, rd 11, flush 0, corrupt 0, gen 0 Jan 5 19:55:51 zefir kernel: [18133.525951] BTRFS error (device sdc1): fixed up error at logical 8728913424384 on dev /dev/sde1 Jan 6 00:19:37 zefir kernel: [33959.394053] BTRFS info (device sdc1): dev_replace from /dev/sde1 (devid 4) to /dev/sdb3 started Jan 6 04:48:25 zefir kernel: [50087.254634] BTRFS info (device sdc1): dev_replace from /dev/sde1 (devid 4) to /dev/sdb3 finished ########## before replace (/dev/sde1 got errors): [/dev/sdc1].write_io_errs 0 [/dev/sdc1].read_io_errs 0 [/dev/sdc1].flush_io_errs 0 [/dev/sdc1].corruption_errs 0 [/dev/sdc1].generation_errs 0 [/dev/sdf1].write_io_errs 0 [/dev/sdf1].read_io_errs 0 [/dev/sdf1].flush_io_errs 0 [/dev/sdf1].corruption_errs 0 [/dev/sdf1].generation_errs 0 [/dev/sdd1].write_io_errs 0 [/dev/sdd1].read_io_errs 0 [/dev/sdd1].flush_io_errs 0 [/dev/sdd1].corruption_errs 0 [/dev/sdd1].generation_errs 0 [/dev/sde1].write_io_errs 0 [/dev/sde1].read_io_errs 11 [/dev/sde1].flush_io_errs 0 [/dev/sde1].corruption_errs 0 [/dev/sde1].generation_errs 0 [/dev/sda1].write_io_errs 10418 [/dev/sda1].read_io_errs 227 [/dev/sda1].flush_io_errs 117 [/dev/sda1].corruption_errs 77 [/dev/sda1].generation_errs 47 root@zefir:/# smartctl -a /dev/sde | egrep 'Pending|Offline_Unc|Error Count' 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 ATA Error Count: 16 (device log contains only the most recent five errors) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2020-01-06 11:14 ` Leszek Dubiel @ 2020-01-07 0:21 ` Chris Murphy 2020-01-07 7:09 ` Leszek Dubiel 0 siblings, 1 reply; 32+ messages in thread From: Chris Murphy @ 2020-01-07 0:21 UTC (permalink / raw) To: Leszek Dubiel; +Cc: Btrfs BTRFS On Mon, Jan 6, 2020 at 4:14 AM Leszek Dubiel <leszek@dubiel.pl> wrote: > > > > W dniu 02.01.2020 o 22:57, Chris Murphy pisze: > > > but I would say that in retrospect it would have been better to NOT > > delete the device with a few bad sectors, and instead use `btrfs > > replace` to do a 1:1 replacement of that particular drive. > > > Tested "replace" on ahother server: > > root@zefir:~# btrfs replace start /dev/sde1 /dev/sdb3 / > > and speed was quite normal: > > 1.49 TiB * ( 1024 * 1024 MiB/TiB ) / ( 4.5 hours * 3600 sec/hour ) > = 1.49 * ( 1024 * 1024 ) / ( 4.5 * 3600 ) = 96.44 MiB / sec > > > Questions: > > 1. it is a little bit confusing that kerner reports sdc1 and sde1 on the > same line: "BTRFS warning (device sdc1): i/o error ... on dev > /dev/sde1".... Can you provide the entire line? It's probably already confusing but the ellipses makes it more confusing. > > # reduce slack > root@zefir:~# btrfs fi resize 4:max / > Resize '/' of '4:max' > root@zefir:~# btrfs dev usage / > ... > /dev/sdb3, ID: 4 > Device size: 1.80TiB > Device slack: 3.50KiB <<<<<<<<<<<<<<<<<<<< Maybe the partition isn't aligned on a 4KiB boundary? *shrug* But yeah one gotcha with 'btrfs replace' is that the replacement must be equal to or bigger than the drive being replaced; and once complete, the file system is not resized to fully utilize the replacement drive. That's intentional because by default you may want allocations to have the same balance as with the original device. If you resize to max, Btrfs will favor allocations to the drive with the most free space. > Jan 5 13:52:09 zefir kernel: [1291932.446093] BTRFS warning (device > sdc1): i/o error at logical 11658111352832 on dev /dev/sde1, physical > 867246145536: metadata leaf (level 0) in tree 9109477097472 Ahh yeah I see what you mean. I think that's confusing also. The error is on sde1. But I guess why sdc1 is reported first is probably to do with the device the kernel considers mounted, there's not really a good facility (or maybe it's in the newer VFS mount code, not sure) for showing two devices mounted on a single mount point. > [/dev/sda1].write_io_errs 10418 > [/dev/sda1].read_io_errs 227 > [/dev/sda1].flush_io_errs 117 > [/dev/sda1].corruption_errs 77 > [/dev/sda1].generation_errs 47 This isn't good either. I'd keep an eye on that. read errors can be fixed up if there's a good copy, Btrfs will use the good copy and overwrite the bad sector, *if* SCT ERC is lower duration than SCSI command timer. But write and flush errors are bad. You need to find out what that's about. -- Chris Murphy ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2020-01-07 0:21 ` Chris Murphy @ 2020-01-07 7:09 ` Leszek Dubiel 0 siblings, 0 replies; 32+ messages in thread From: Leszek Dubiel @ 2020-01-07 7:09 UTC (permalink / raw) To: Chris Murphy; +Cc: Btrfs BTRFS >> # reduce slack >> root@zefir:~# btrfs fi resize 4:max / >> Resize '/' of '4:max' >> root@zefir:~# btrfs dev usage / >> ... >> /dev/sdb3, ID: 4 >> Device size: 1.80TiB >> Device slack: 3.50KiB <<<<<<<<<<<<<<<<<<<< > > Maybe the partition isn't aligned on a 4KiB boundary? *shrug* Yes... this is old partition table. > But yeah one gotcha with 'btrfs replace' is that the replacement must > be equal to or bigger than the drive being replaced; and once > complete, the file system is not resized to fully utilize the > replacement drive. From my "user view" it is ok. After device replace was finished there was slack = 1.7GiB. Then I resised to max, and slack got down to 3.5KiB. > That's intentional because by default you may want > allocations to have the same balance as with the original device. If > you resize to max, Btrfs will favor allocations to the drive with the > most free space. That's perfect. >> Jan 5 13:52:09 zefir kernel: [1291932.446093] BTRFS warning (device >> sdc1): i/o error at logical 11658111352832 on dev /dev/sde1, physical >> 867246145536: metadata leaf (level 0) in tree 9109477097472 > > Ahh yeah I see what you mean. I think that's confusing also. The error > is on sde1. But I guess why sdc1 is reported first is probably to do > with the device the kernel considers mounted, there's not really a > good facility (or maybe it's in the newer VFS mount code, not sure) > for showing two devices mounted on a single mount point. Uhhh... you're right -- sdc1 is shown as mounted filesystem. Thank you. I didn't see this detail at first, I thought information about sdc1 comes from BTRFS and not kernel mount. root@zefir:~# mount | egrep sdc /dev/sdc1 on / type btrfs (rw,relatime,space_cache,subvolid=83043,subvol=/zefir) root@zefir:~# cat /etc/fstab | egrep / UUID=9d6e641c-ec71-411e-9312-f1f67a70913f / btrfs defaults,subvol=/zefir 0 0 root@zefir:~# blkid | egrep 0913f /dev/sdb3: UUID="9d6e641c-ec71-411e-9312-f1f67a70913f" UUID_SUB="e9eb3d18-2d87-479a-808d-74d61903196c" TYPE="btrfs" PARTUUID="c1befd42-e38c-be48-b8df-4301fa1d3b07" /dev/sda1: UUID="9d6e641c-ec71-411e-9312-f1f67a70913f" UUID_SUB="8bcae5cb-b3a5-4fd8-9284-602203f2a43e" TYPE="btrfs" /dev/sdc1: UUID="9d6e641c-ec71-411e-9312-f1f67a70913f" UUID_SUB="fb7055ef-6ae7-48e0-8f09-a315fc20f399" TYPE="btrfs" /dev/sdf1: UUID="9d6e641c-ec71-411e-9312-f1f67a70913f" UUID_SUB="5e38a881-9b21-4a35-a1bb-889095b32254" TYPE="btrfs" /dev/sdd1: UUID="9d6e641c-ec71-411e-9312-f1f67a70913f" UUID_SUB="1d776ab6-3ef2-4fe1-8026-78b6d87f85c1" TYPE="btrfs" >> [/dev/sda1].write_io_errs 10418 >> [/dev/sda1].read_io_errs 227 >> [/dev/sda1].flush_io_errs 117 >> [/dev/sda1].corruption_errs 77 >> [/dev/sda1].generation_errs 47 > > This isn't good either. I'd keep an eye on that. read errors can be > fixed up if there's a good copy, Btrfs will use the good copy and > overwrite the bad sector, *if* SCT ERC is lower duration than SCSI > command timer. But write and flush errors are bad. You need to find > out what that's about. This was about wrong cable. Cable was replaced. SCT ERC workarouds installed. I would like to reset counters to ZERO... if it is possible (?). ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2019-12-25 22:35 very slow "btrfs dev delete" 3x6Tb, 7Tb of data Leszek Dubiel 2019-12-26 5:08 ` Qu Wenruo @ 2019-12-26 22:15 ` Chris Murphy 2019-12-26 22:48 ` Leszek Dubiel 1 sibling, 1 reply; 32+ messages in thread From: Chris Murphy @ 2019-12-26 22:15 UTC (permalink / raw) To: Leszek Dubiel; +Cc: Btrfs BTRFS On Wed, Dec 25, 2019 at 3:42 PM Leszek Dubiel <leszek@dubiel.pl> wrote: > > > Hello! > > I have a server: 3 disks, 6TB each, total 17TB space, occupied with data > 6TB. > > > One of the disks got smart errors: > > 197 Current_Pending_Sector 0x0022 100 100 000 Old_age > Always - 16 > 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age > Offline - 2 > > And didn't pass tests: > > Num Test_Description Status Remaining > LifeTime(hours) LBA_of_first_error > # 1 Extended offline Completed: read failure 90% > 3575 - > # 2 Short offline Completed without error 00% > 3574 - > # 3 Extended offline Completed: read failure 90% > 3574 - > # 4 Extended offline Completed: read failure 90% > 3560 - > # 5 Extended offline Completed: read failure 50% > 3559 - > > I decided to remove that drive from BTRFS system: What is the SCT ERC for each drive? This applies to mdadm, lvm, and btrfs RAID. While you are not using raid for data, you are using it for metadata. And also mismatching SCT ERC with kernel's command timer is not good for any configuration, the SCT ERC must be shorter than the kernel command timer or inevitably bad sector errors are masked by the kernel resetting the link to the device. https://raid.wiki.kernel.org/index.php/Timeout_Mismatch And when was the last time a scrub was done on the volume? Were there any errors reported by either user space tools or kernel? And what were they? I do agree, however, that this configuration should have higher performing reads from the device being deleted, unless part of the reason why it's so slow is that one or more drives is trying to do deep recoveries. My suggestion for single profile multiple device is to leave the per drive SCT ERC disabled (or a high value, e.g. 1200 deciseconds) and also change the per block device command timer (this is a kernel timer set per device) to be at least 120 seconds. This will allow the drive to do deep recovery, which will make it dog slow, but if necessary it'll improve the chances of getting data off the drives. If you don't care about getting data off the drives, i.e. you have a backup, then you can set the SCT ERC value to 70 deciseconds. Any bad sector errors will quickly result in a read error, Btrfs will report the affected file. IF it's metadata that's affected, it'll get a good copy from the mirror, and fix up the bad copy, automatically. -- Chris Murphy ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data 2019-12-26 22:15 ` Chris Murphy @ 2019-12-26 22:48 ` Leszek Dubiel 0 siblings, 0 replies; 32+ messages in thread From: Leszek Dubiel @ 2019-12-26 22:48 UTC (permalink / raw) To: Chris Murphy, linux-btrfs W dniu 26.12.2019 o 23:15, Chris Murphy pisze: > > What is the SCT ERC for each drive? Failing drive is /dev/sda. root@wawel:~# smartctl -l scterc /dev/sda smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-6-amd64] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SCT Error Recovery Control command not supported root@wawel:~# smartctl -l scterc /dev/sdb SCT Error Recovery Control: Read: 70 (7.0 seconds) Write: 70 (7.0 seconds) root@wawel:~# smartctl -l scterc /dev/sdc SCT Error Recovery Control: Read: 100 (10.0 seconds) Write: 100 (10.0 seconds) I have changed timeout to higher: echo 180 > /sys/block/sda/device/timeout > And when was the last time a scrub was done on the volume? It was done few months ago unfortunatelly... > Were there > any errors reported by either user space tools or kernel? And what > were they? There were NO errors from any user space tools. Only smartctl reported uncorrectable sectors and failed smart self-tests. Thank you for suggestions below. Data is not critical (this is backup server...). I will see tommorow how is the progress. Actually there are no errors but this is slow only... > My suggestion for single profile multiple device is to leave the per > drive SCT ERC disabled (or a high value, e.g. 1200 deciseconds) and > also change the per block device command timer (this is a kernel timer > set per device) to be at least 120 seconds. > This will allow the drive > to do deep recovery, which will make it dog slow, but if necessary > it'll improve the chances of getting data off the drives. If you don't > care about getting data off the drives, i.e. you have a backup, then > you can set the SCT ERC value to 70 deciseconds. Any bad sector errors > will quickly result in a read error, Btrfs will report the affected > file. IF it's metadata that's affected, it'll get a good copy from the > mirror, and fix up the bad copy, automatically. > > > ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2020-01-08 2:54 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-12-25 22:35 very slow "btrfs dev delete" 3x6Tb, 7Tb of data Leszek Dubiel 2019-12-26 5:08 ` Qu Wenruo 2019-12-26 13:17 ` Leszek Dubiel 2019-12-26 13:44 ` Remi Gauvin 2019-12-26 14:05 ` Leszek Dubiel 2019-12-26 14:21 ` Remi Gauvin 2019-12-26 15:42 ` Leszek Dubiel 2019-12-26 22:40 ` Chris Murphy 2019-12-26 22:58 ` Leszek Dubiel 2019-12-28 17:04 ` Leszek Dubiel 2019-12-28 20:23 ` Zygo Blaxell 2020-01-02 18:37 ` Leszek Dubiel 2020-01-02 21:57 ` Chris Murphy 2020-01-02 22:39 ` Leszek Dubiel 2020-01-02 23:22 ` Chris Murphy 2020-01-03 9:08 ` Leszek Dubiel 2020-01-03 19:15 ` Chris Murphy 2020-01-03 14:39 ` Leszek Dubiel 2020-01-03 19:02 ` Chris Murphy 2020-01-03 20:59 ` Leszek Dubiel 2020-01-04 5:38 ` Zygo Blaxell 2020-01-07 18:44 ` write amplification, was: " Chris Murphy 2020-01-07 19:26 ` Holger Hoffstätte 2020-01-07 23:32 ` Zygo Blaxell 2020-01-07 23:53 ` Chris Murphy 2020-01-08 1:41 ` Zygo Blaxell 2020-01-08 2:54 ` Chris Murphy 2020-01-06 11:14 ` Leszek Dubiel 2020-01-07 0:21 ` Chris Murphy 2020-01-07 7:09 ` Leszek Dubiel 2019-12-26 22:15 ` Chris Murphy 2019-12-26 22:48 ` Leszek Dubiel
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.