All of lore.kernel.org
 help / color / mirror / Atom feed
* very slow "btrfs dev delete" 3x6Tb, 7Tb of data
@ 2019-12-25 22:35 Leszek Dubiel
  2019-12-26  5:08 ` Qu Wenruo
  2019-12-26 22:15 ` Chris Murphy
  0 siblings, 2 replies; 32+ messages in thread
From: Leszek Dubiel @ 2019-12-25 22:35 UTC (permalink / raw)
  To: linux-btrfs


Hello!

I have a server: 3 disks, 6TB each, total 17TB space, occupied with data 
6TB.


One of the disks got smart errors:

     197 Current_Pending_Sector  0x0022   100   100   000 Old_age   
Always       -       16
     198 Offline_Uncorrectable   0x0008   100   100   000 Old_age   
Offline      -       2

And didn't pass tests:

     Num  Test_Description    Status                  Remaining 
LifeTime(hours)  LBA_of_first_error
     # 1  Extended offline    Completed: read failure 90%      
3575         -
     # 2  Short offline       Completed without error 00%      
3574         -
     # 3  Extended offline    Completed: read failure 90%      
3574         -
     # 4  Extended offline    Completed: read failure 90%      
3560         -
     # 5  Extended offline    Completed: read failure 50%      
3559         -

I decided to remove that drive from BTRFS system:

     btrfs dev delete /dev/sda2 /


At the beginning iostat showed that system is reading from /dev/sda 
100Mb/sec, then writing to /dev/sdb + /dev/sdc 100Mb/sek. It looked 
correct, and during a few hours it moved aobut 1.5 TB of data out of the 
failing disk. But then it stuck... Iostat no longer showed reading or 
writing, "btrfs dev usage" didn't change. Couldn't stop "btrfs dev 
delete" command -- nor Ctrl+C nor "kill -9". I have rebooted system, 
started "btrfs dev delete" again.

But this is extremely slow. Disks can read/write 100-150Mb/sek. But 
"process btrfs dev delete" moved 53Gb during 6 hours, that is only about 
2Mb/sek.

Why this is so slow? Below are some logs. Some other people also have 
had this problem...

What is my problem? I am testing btrfs for many years, I have had some 
issues few years ago, recently I considered btrfs to be stable... but I 
cannot put btrfs into production when I have such problems when disk 
fails... to move 3Tb with 2mb/sec speed it would take many, many days. 
Server does almost nothing... it just deletes one drive. Cpu is not 
intensive, iostat looks lazy...




#################################################

root@wawel:~# uname -a
Linux wawel 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) 
x86_64 GNU/Linux

root@wawel:~# btrfs --version
btrfs-progs v4.20.1

root@wawel:~# dpkg -l | egrep btrfs
ii  btrfs-progs                   4.20.1-2 amd64        Checksumming 
Copy on Write Filesystem utilities

#################################################

root@wawel:~# btrfs sub list /  | wc -l
21

#################################################

root@wawel:~# btrfs dev usage /
/dev/sda2, ID: 1
    Device size:             5.45TiB
    Device slack:            5.45TiB
    Data,single:             1.08TiB
    Metadata,RAID1:         11.00GiB
    System,RAID1:           32.00MiB
    Unallocated:            -1.09TiB

/dev/sdb2, ID: 2
    Device size:             5.45TiB
    Device slack:              0.00B
    Data,single:             3.22TiB
    Metadata,RAID1:         87.00GiB
    Unallocated:             2.15TiB

/dev/sdc2, ID: 3
    Device size:             5.45TiB
    Device slack:              0.00B
    Data,single:             3.21TiB
    Metadata,RAID1:         96.00GiB
    System,RAID1:           32.00MiB
    Unallocated:             2.15TiB

#################################################

root@wawel:~# iostat -m -d 30
Linux 4.19.0-6-amd64 (wawel)     25.12.2019     _x86_64_    (4 CPU)

Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
sda              29,52         2,77         0,90     158467 51287
sdb             532,93         0,77        16,80      44046 962107
sdc             532,03         0,59        16,79      34017 961406

Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
sda               0,70         0,00         0,01 0          0
sdb             624,20         0,05        14,17          1 424
sdc             628,47         0,06        14,79          1 443

Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
sda              17,90         0,23         0,59 7         17
sdb             760,63         0,15        15,39          4 461
sdc             761,13         0,09        16,75          2 502

Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
sda              65,80         1,00         0,26 30          7
sdb            1085,93         0,18        26,14          5 784
sdc            1081,03         0,09        27,03          2 810

Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
sda             101,53         1,57         0,10 47          3
sdb             609,63         0,34        19,29         10 578
sdc             612,67         0,33        19,73          9 591

Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
sda             114,30         1,76         0,28 52          8
sdb             355,90         0,39        19,60         11 587
sdc             368,50         0,35        20,17         10 605

Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
sda              63,70         0,55         0,54 16         16
sdb             438,40         0,13         7,83          3 234
sdc             463,97         0,09         8,93          2 267

root@wawel:~# iostat -m -d 30
Linux 4.19.0-6-amd64 (wawel)     25.12.2019     _x86_64_    (4 CPU)

Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
sda              30,18         2,51         1,01     226710 91462
sdb             531,30         0,69        15,36      62022 1384919
sdc             535,25         0,59        15,53      53092 1400122

Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
sda               2,80         0,01         0,49 0         14
sdb             154,77         0,53         3,05 15         91
sdc             161,40         0,64         3,02 19         90

Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
sda             124,87         0,01         3,02 0         90
sdb             799,33         0,13        19,40          3 581
sdc             930,63         0,23        22,21          6 666

Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
sda               5,97         0,00         0,69 0         20
sdb             730,10         0,05        14,62          1 438
sdc             739,23         0,13        14,86          3 445

Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
sda               6,53         0,01         1,39 0         41
sdb             800,63         0,13        15,63          4 468
sdc             801,83         0,02        16,12          0 483

Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
sda               4,83         0,00         1,13 0         33
sdb             756,30         0,14        15,08          4 452
sdc             759,03         0,15        15,33          4 459



#################################################

root@wawel:~# btrfs dev usage -b / | sed -r "s/^/$(date +'%F %T')\t/" >> 
btrfsdevusage
root@wawel:~# btrfs dev usage -b / | sed -r "s/^/$(date +'%F %T')\t/" >> 
btrfsdevusage
root@wawel:~# btrfs dev usage -b / | sed -r "s/^/$(date +'%F %T')\t/" >> 
btrfsdevusage
root@wawel:~# btrfs dev usage -b / | sed -r "s/^/$(date +'%F %T')\t/" >> 
btrfsdevusage

root@wawel:~# cat btrfsdevusage  | egrep sda -A5 | egrep single
2019-12-25 17:03:04       Data,single:          1188632199168
2019-12-25 18:31:27       Data,single:          1184337231872
2019-12-25 19:00:56       Data,single:          1177894780928
2019-12-25 20:25:46       Data,single:          1158567428096
2019-12-25 21:06:00       Data,single:          1147830009856
2019-12-25 23:01:34       Data,single:          1132797624320
2019-12-25 23:16:27       Data,single:          1130650140672

root@wawel:~# echo "speed is $(( ( 1188632199168  - 1130650140672 ) / 
1024 / 1024 / ((23 - 17) * 3600)  )) MB/sec"
speed is 2 MB/sec

#################################################

root@wawel:~# iotop -d12

Total DISK READ:         0.00 B/s | Total DISK WRITE:         0.00 B/s
Current DISK READ:       0.00 B/s | Current DISK WRITE:     183.48 M/s
   TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO> COMMAND
  8138 be/4 root        0.00 B/s    0.00 B/s  0.00 % 99.99 % btrfs dev 
delete /dev/sda2 /
     1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init
     2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
     3 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_gp]
     4 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_par_gp]
     6 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % 
[kworker/0:0H-kblockd]
     8 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [mm_percpu_wq]

Total DISK READ:        11.23 M/s | Total DISK WRITE:        28.77 M/s
Current DISK READ:      11.23 M/s | Current DISK WRITE:      56.51 M/s
   TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO> COMMAND
10349 be/4 root       29.26 K/s  168.89 K/s  0.00 % 75.29 % 
[kworker/u8:3-btrfs-endio-meta]
  8138 be/4 root       11.03 M/s   23.61 M/s  0.00 % 74.65 % btrfs dev 
delete /dev/sda2 /
10351 be/4 root       45.21 K/s 1365.76 K/s  0.00 % 11.50 % 
[kworker/u8:8-btrfs-endio-write]
10350 be/4 root       63.83 K/s 1542.63 K/s  0.00 % 10.22 % 
[kworker/u8:7-btrfs-freespace-write]
  7879 be/4 root       42.56 K/s 1186.23 K/s  0.00 %  7.07 % 
[kworker/u8:1-btrfs-freespace-write]
23032 be/4 root       25.27 K/s  972.12 K/s  0.00 %  5.04 % 
[kworker/u8:12-btrfs-extent-refs]
  8035 be/4 root        0.00 B/s   25.27 K/s  0.00 %  0.00 % 
[kworker/u8:4-btrfs-endio-meta]
  8036 be/4 root        0.00 B/s   11.97 K/s  0.00 %  0.00 % 
[kworker/u8:5-btrfs-extent-refs]
31253 be/4 root        0.00 B/s    5.32 K/s  0.00 %  0.00 % 
[kworker/u8:9-btrfs-extent-refs]

Total DISK READ:       674.25 K/s | Total DISK WRITE:        13.57 M/s
Current DISK READ:     674.25 K/s | Current DISK WRITE:      39.97 M/s
   TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO> COMMAND
  8138 be/4 root      147.62 K/s    7.72 M/s  0.00 % 79.89 % btrfs dev 
delete /dev/sda2 /
10274 be/4 root      484.08 K/s    4.91 M/s  0.00 % 56.93 % 
[kworker/u8:2+btrfs-extent-refs]
10351 be/4 root       27.93 K/s  492.06 K/s  0.00 %  3.03 % 
[kworker/u8:8-btrfs-endio-meta]
10350 be/4 root        3.99 K/s  115.70 K/s  0.00 %  0.59 % 
[kworker/u8:7-btrfs-endio-write]
  8037 be/4 root        5.32 K/s   79.79 K/s  0.00 %  0.40 % 
[kworker/u8:6-btrfs-endio-write]
23032 be/4 root        3.99 K/s   79.79 K/s  0.00 %  0.20 % 
[kworker/u8:12-btrfs-endio-write]
  7879 be/4 root     1361.80 B/s  105.06 K/s  0.00 %  0.10 % 
[kworker/u8:1-btrfs-endio-write]
   538 be/4 root        0.00 B/s 1021.35 B/s  0.00 %  0.00 % rsyslogd -n 
-iNONE [rs:main Q:Reg]
10349 be/4 root        0.00 B/s   86.44 K/s  0.00 %  0.00 % 
[kworker/u8:3-btrfs-endio-meta]
     1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init



#################################################

root@wawel:~# top -d12
top - 23:23:01 up 1 day, 56 min,  1 user,  load average: 3,05, 3,28, 3,47
Tasks: 137 total,   1 running, 136 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0,0 us,  4,5 sy,  0,0 ni, 92,1 id,  3,3 wa,  0,0 hi, 0,1 si,  
0,0 st
MiB Mem :   7841,1 total,    797,0 free,    259,7 used,   6784,4 buff/cache
MiB Swap:   8053,0 total,   7962,2 free,     90,8 used.   7265,2 avail Mem
   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND
  8138 root      20   0    8540    104    104 D  11,6   0,0 23:25.88 btrfs
23032 root      20   0       0      0      0 I   4,8   0,0 0:51.20 
kworker/u8:12-btrfs-extent-refs
10350 root      20   0       0      0      0 I   0,9   0,0 0:01.79 
kworker/u8:7-btrfs-endio-meta
10349 root      20   0       0      0      0 I   0,8   0,0 0:02.12 
kworker/u8:3-btrfs-freespace-write
   242 root       0 -20       0      0      0 I   0,2   0,0 0:59.66 
kworker/2:1H-kblockd


#################################################

root@wawel:~# tail /var/log/kern.log -n50
Dec 25 21:50:35 wawel kernel: [84258.634360] BTRFS info (device sda2): 
found 1752 extents
Dec 25 21:50:41 wawel kernel: [84264.771079] BTRFS info (device sda2): 
relocating block group 3434964058112 flags data
Dec 25 21:51:09 wawel kernel: [84291.864553] BTRFS info (device sda2): 
found 584 extents
Dec 25 21:55:51 wawel kernel: [84574.383303] BTRFS info (device sda2): 
found 584 extents
Dec 25 21:55:56 wawel kernel: [84579.359349] BTRFS info (device sda2): 
relocating block group 3431742832640 flags data
Dec 25 21:56:17 wawel kernel: [84600.668237] BTRFS info (device sda2): 
found 635 extents
Dec 25 22:00:36 wawel kernel: [84859.766366] BTRFS info (device sda2): 
found 635 extents
Dec 25 22:00:47 wawel kernel: [84870.367399] BTRFS info (device sda2): 
relocating block group 3428521607168 flags data
Dec 25 22:01:14 wawel kernel: [84897.313282] BTRFS info (device sda2): 
found 1170 extents
Dec 25 22:05:38 wawel kernel: [85161.264397] BTRFS info (device sda2): 
found 1170 extents
Dec 25 22:05:46 wawel kernel: [85169.216425] BTRFS info (device sda2): 
relocating block group 3425300381696 flags data
Dec 25 22:06:20 wawel kernel: [85202.963222] BTRFS info (device sda2): 
found 1336 extents
Dec 25 22:13:00 wawel kernel: [85603.648327] BTRFS info (device sda2): 
found 1336 extents
Dec 25 22:13:02 wawel kernel: [85604.878785] BTRFS info (device sda2): 
relocating block group 3422079156224 flags data
Dec 25 22:14:57 wawel kernel: [85720.394945] BTRFS info (device sda2): 
found 1771 extents
Dec 25 22:23:03 wawel kernel: [86206.582245] BTRFS info (device sda2): 
found 1771 extents
Dec 25 22:23:04 wawel kernel: [86207.671814] BTRFS info (device sda2): 
relocating block group 3418857930752 flags data
Dec 25 22:23:19 wawel kernel: [86222.138761] BTRFS info (device sda2): 
found 199 extents
Dec 25 22:26:26 wawel kernel: [86408.991487] BTRFS info (device sda2): 
found 199 extents
Dec 25 22:26:29 wawel kernel: [86412.181922] BTRFS info (device sda2): 
relocating block group 3415636705280 flags data
Dec 25 22:26:48 wawel kernel: [86431.726268] BTRFS info (device sda2): 
found 50 extents
Dec 25 22:27:32 wawel kernel: [86475.800606] BTRFS info (device sda2): 
found 50 extents
Dec 25 22:27:39 wawel kernel: [86482.222298] BTRFS info (device sda2): 
found 20 extents
Dec 25 22:27:44 wawel kernel: [86487.168278] BTRFS info (device sda2): 
relocating block group 3412415479808 flags data
Dec 25 22:28:27 wawel kernel: [86530.525657] BTRFS info (device sda2): 
found 6748 extents
Dec 25 22:34:07 wawel kernel: [86870.618140] BTRFS info (device sda2): 
found 6748 extents
Dec 25 22:34:35 wawel kernel: [86898.498949] BTRFS info (device sda2): 
relocating block group 3409194254336 flags data
Dec 25 22:36:06 wawel kernel: [86989.589309] BTRFS info (device sda2): 
found 1175 extents
Dec 25 22:40:12 wawel kernel: [87234.881523] perf: interrupt took too 
long (3943 > 3930), lowering kernel.perf_event_max_sample_rate to 50500
Dec 25 22:44:44 wawel kernel: [87507.870507] BTRFS info (device sda2): 
found 1175 extents
Dec 25 22:44:52 wawel kernel: [87515.563744] BTRFS info (device sda2): 
relocating block group 3405973028864 flags data
Dec 25 22:45:14 wawel kernel: [87537.128352] BTRFS info (device sda2): 
found 372 extents
Dec 25 22:46:11 wawel kernel: [87594.569709] BTRFS info (device sda2): 
found 372 extents
Dec 25 22:46:17 wawel kernel: [87600.772681] BTRFS info (device sda2): 
relocating block group 3402751803392 flags data
Dec 25 22:47:27 wawel kernel: [87670.663043] BTRFS info (device sda2): 
found 3483 extents
Dec 25 23:04:57 wawel kernel: [88720.068885] BTRFS info (device sda2): 
found 3479 extents
Dec 25 23:05:03 wawel kernel: [88726.864461] BTRFS info (device sda2): 
relocating block group 3399530577920 flags data
Dec 25 23:06:36 wawel kernel: [88819.464525] BTRFS info (device sda2): 
found 1871 extents
Dec 25 23:16:11 wawel kernel: [89393.997406] BTRFS info (device sda2): 
found 1868 extents
Dec 25 23:16:13 wawel kernel: [89396.825305] BTRFS info (device sda2): 
relocating block group 3396309352448 flags data
Dec 25 23:16:41 wawel kernel: [89424.310916] BTRFS info (device sda2): 
found 1946 extents
Dec 25 23:22:12 wawel kernel: [89755.068457] BTRFS info (device sda2): 
found 1946 extents
Dec 25 23:22:15 wawel kernel: [89758.601122] BTRFS info (device sda2): 
relocating block group 3393088126976 flags data
Dec 25 23:22:35 wawel kernel: [89778.172631] BTRFS info (device sda2): 
found 647 extents
Dec 25 23:23:06 wawel kernel: [89809.153016] BTRFS info (device sda2): 
found 647 extents
Dec 25 23:23:07 wawel kernel: [89810.017006] BTRFS info (device sda2): 
relocating block group 3389866901504 flags data
Dec 25 23:23:24 wawel kernel: [89827.351203] BTRFS info (device sda2): 
found 745 extents
Dec 25 23:24:38 wawel kernel: [89901.176493] BTRFS info (device sda2): 
found 745 extents
Dec 25 23:24:51 wawel kernel: [89914.840850] BTRFS info (device sda2): 
relocating block group 3386645676032 flags data
Dec 25 23:25:40 wawel kernel: [89963.641062] BTRFS info (device sda2): 
found 3114 extents




################################################

More on this topic:

https://strugglers.net/~andy/blog/2018/06/15/another-disappointing-btrfs-experience/

https://www.spinics.net/lists/linux-btrfs/msg50771.html

https://bugzilla.redhat.com/show_bug.cgi?id=1294531






^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2019-12-25 22:35 very slow "btrfs dev delete" 3x6Tb, 7Tb of data Leszek Dubiel
@ 2019-12-26  5:08 ` Qu Wenruo
  2019-12-26 13:17   ` Leszek Dubiel
  2019-12-26 22:15 ` Chris Murphy
  1 sibling, 1 reply; 32+ messages in thread
From: Qu Wenruo @ 2019-12-26  5:08 UTC (permalink / raw)
  To: Leszek Dubiel, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 21626 bytes --]



On 2019/12/26 上午6:35, Leszek Dubiel wrote:
> 
> Hello!
> 
> I have a server: 3 disks, 6TB each, total 17TB space, occupied with data
> 6TB.
> 
> 
> One of the disks got smart errors:
> 
>     197 Current_Pending_Sector  0x0022   100   100   000 Old_age  
> Always       -       16
>     198 Offline_Uncorrectable   0x0008   100   100   000 Old_age  
> Offline      -       2
> 
> And didn't pass tests:
> 
>     Num  Test_Description    Status                  Remaining
> LifeTime(hours)  LBA_of_first_error
>     # 1  Extended offline    Completed: read failure 90%     
> 3575         -
>     # 2  Short offline       Completed without error 00%     
> 3574         -
>     # 3  Extended offline    Completed: read failure 90%     
> 3574         -
>     # 4  Extended offline    Completed: read failure 90%     
> 3560         -
>     # 5  Extended offline    Completed: read failure 50%     
> 3559         -
> 
> I decided to remove that drive from BTRFS system:
> 
>     btrfs dev delete /dev/sda2 /
> 
> 
> At the beginning iostat showed that system is reading from /dev/sda
> 100Mb/sec, then writing to /dev/sdb + /dev/sdc 100Mb/sek. It looked
> correct, and during a few hours it moved aobut 1.5 TB of data out of the
> failing disk. But then it stuck... Iostat no longer showed reading or
> writing, "btrfs dev usage" didn't change. Couldn't stop "btrfs dev
> delete" command -- nor Ctrl+C nor "kill -9". I have rebooted system,
> started "btrfs dev delete" again.
> 
> But this is extremely slow. Disks can read/write 100-150Mb/sek. But
> "process btrfs dev delete" moved 53Gb during 6 hours, that is only about
> 2Mb/sek.
> 
> Why this is so slow? Below are some logs. Some other people also have
> had this problem...

Are you using qgroups?

You can verify it by "btrfs qgroup show -pcre <mount>" to verify.
Qgroup can hugely impact performance for balance, if using kernel older
than v5.2.

You can either disable qgroup, losing the ability to trace how many
bytes are exclusively used by each subvolume, or upgrade to v5.2 kernel
to solve it.

Thanks,
Qu

> 
> What is my problem? I am testing btrfs for many years, I have had some
> issues few years ago, recently I considered btrfs to be stable... but I
> cannot put btrfs into production when I have such problems when disk
> fails... to move 3Tb with 2mb/sec speed it would take many, many days.
> Server does almost nothing... it just deletes one drive. Cpu is not
> intensive, iostat looks lazy...
> 
> 
> 
> 
> #################################################
> 
> root@wawel:~# uname -a
> Linux wawel 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11)
> x86_64 GNU/Linux
> 
> root@wawel:~# btrfs --version
> btrfs-progs v4.20.1
> 
> root@wawel:~# dpkg -l | egrep btrfs
> ii  btrfs-progs                   4.20.1-2 amd64        Checksumming
> Copy on Write Filesystem utilities
> 
> #################################################
> 
> root@wawel:~# btrfs sub list /  | wc -l
> 21
> 
> #################################################
> 
> root@wawel:~# btrfs dev usage /
> /dev/sda2, ID: 1
>    Device size:             5.45TiB
>    Device slack:            5.45TiB
>    Data,single:             1.08TiB
>    Metadata,RAID1:         11.00GiB
>    System,RAID1:           32.00MiB
>    Unallocated:            -1.09TiB
> 
> /dev/sdb2, ID: 2
>    Device size:             5.45TiB
>    Device slack:              0.00B
>    Data,single:             3.22TiB
>    Metadata,RAID1:         87.00GiB
>    Unallocated:             2.15TiB
> 
> /dev/sdc2, ID: 3
>    Device size:             5.45TiB
>    Device slack:              0.00B
>    Data,single:             3.21TiB
>    Metadata,RAID1:         96.00GiB
>    System,RAID1:           32.00MiB
>    Unallocated:             2.15TiB
> 
> #################################################
> 
> root@wawel:~# iostat -m -d 30
> Linux 4.19.0-6-amd64 (wawel)     25.12.2019     _x86_64_    (4 CPU)
> 
> Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
> sda              29,52         2,77         0,90     158467 51287
> sdb             532,93         0,77        16,80      44046 962107
> sdc             532,03         0,59        16,79      34017 961406
> 
> Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
> sda               0,70         0,00         0,01 0          0
> sdb             624,20         0,05        14,17          1 424
> sdc             628,47         0,06        14,79          1 443
> 
> Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
> sda              17,90         0,23         0,59 7         17
> sdb             760,63         0,15        15,39          4 461
> sdc             761,13         0,09        16,75          2 502
> 
> Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
> sda              65,80         1,00         0,26 30          7
> sdb            1085,93         0,18        26,14          5 784
> sdc            1081,03         0,09        27,03          2 810
> 
> Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
> sda             101,53         1,57         0,10 47          3
> sdb             609,63         0,34        19,29         10 578
> sdc             612,67         0,33        19,73          9 591
> 
> Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
> sda             114,30         1,76         0,28 52          8
> sdb             355,90         0,39        19,60         11 587
> sdc             368,50         0,35        20,17         10 605
> 
> Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
> sda              63,70         0,55         0,54 16         16
> sdb             438,40         0,13         7,83          3 234
> sdc             463,97         0,09         8,93          2 267
> 
> root@wawel:~# iostat -m -d 30
> Linux 4.19.0-6-amd64 (wawel)     25.12.2019     _x86_64_    (4 CPU)
> 
> Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
> sda              30,18         2,51         1,01     226710 91462
> sdb             531,30         0,69        15,36      62022 1384919
> sdc             535,25         0,59        15,53      53092 1400122
> 
> Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
> sda               2,80         0,01         0,49 0         14
> sdb             154,77         0,53         3,05 15         91
> sdc             161,40         0,64         3,02 19         90
> 
> Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
> sda             124,87         0,01         3,02 0         90
> sdb             799,33         0,13        19,40          3 581
> sdc             930,63         0,23        22,21          6 666
> 
> Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
> sda               5,97         0,00         0,69 0         20
> sdb             730,10         0,05        14,62          1 438
> sdc             739,23         0,13        14,86          3 445
> 
> Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
> sda               6,53         0,01         1,39 0         41
> sdb             800,63         0,13        15,63          4 468
> sdc             801,83         0,02        16,12          0 483
> 
> Device             tps    MB_read/s    MB_wrtn/s    MB_read MB_wrtn
> sda               4,83         0,00         1,13 0         33
> sdb             756,30         0,14        15,08          4 452
> sdc             759,03         0,15        15,33          4 459
> 
> 
> 
> #################################################
> 
> root@wawel:~# btrfs dev usage -b / | sed -r "s/^/$(date +'%F %T')\t/" >>
> btrfsdevusage
> root@wawel:~# btrfs dev usage -b / | sed -r "s/^/$(date +'%F %T')\t/" >>
> btrfsdevusage
> root@wawel:~# btrfs dev usage -b / | sed -r "s/^/$(date +'%F %T')\t/" >>
> btrfsdevusage
> root@wawel:~# btrfs dev usage -b / | sed -r "s/^/$(date +'%F %T')\t/" >>
> btrfsdevusage
> 
> root@wawel:~# cat btrfsdevusage  | egrep sda -A5 | egrep single
> 2019-12-25 17:03:04       Data,single:          1188632199168
> 2019-12-25 18:31:27       Data,single:          1184337231872
> 2019-12-25 19:00:56       Data,single:          1177894780928
> 2019-12-25 20:25:46       Data,single:          1158567428096
> 2019-12-25 21:06:00       Data,single:          1147830009856
> 2019-12-25 23:01:34       Data,single:          1132797624320
> 2019-12-25 23:16:27       Data,single:          1130650140672
> 
> root@wawel:~# echo "speed is $(( ( 1188632199168  - 1130650140672 ) /
> 1024 / 1024 / ((23 - 17) * 3600)  )) MB/sec"
> speed is 2 MB/sec
> 
> #################################################
> 
> root@wawel:~# iotop -d12
> 
> Total DISK READ:         0.00 B/s | Total DISK WRITE:         0.00 B/s
> Current DISK READ:       0.00 B/s | Current DISK WRITE:     183.48 M/s
>   TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO> COMMAND
>  8138 be/4 root        0.00 B/s    0.00 B/s  0.00 % 99.99 % btrfs dev
> delete /dev/sda2 /
>     1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init
>     2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
>     3 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_gp]
>     4 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_par_gp]
>     6 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 %
> [kworker/0:0H-kblockd]
>     8 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [mm_percpu_wq]
> 
> Total DISK READ:        11.23 M/s | Total DISK WRITE:        28.77 M/s
> Current DISK READ:      11.23 M/s | Current DISK WRITE:      56.51 M/s
>   TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO> COMMAND
> 10349 be/4 root       29.26 K/s  168.89 K/s  0.00 % 75.29 %
> [kworker/u8:3-btrfs-endio-meta]
>  8138 be/4 root       11.03 M/s   23.61 M/s  0.00 % 74.65 % btrfs dev
> delete /dev/sda2 /
> 10351 be/4 root       45.21 K/s 1365.76 K/s  0.00 % 11.50 %
> [kworker/u8:8-btrfs-endio-write]
> 10350 be/4 root       63.83 K/s 1542.63 K/s  0.00 % 10.22 %
> [kworker/u8:7-btrfs-freespace-write]
>  7879 be/4 root       42.56 K/s 1186.23 K/s  0.00 %  7.07 %
> [kworker/u8:1-btrfs-freespace-write]
> 23032 be/4 root       25.27 K/s  972.12 K/s  0.00 %  5.04 %
> [kworker/u8:12-btrfs-extent-refs]
>  8035 be/4 root        0.00 B/s   25.27 K/s  0.00 %  0.00 %
> [kworker/u8:4-btrfs-endio-meta]
>  8036 be/4 root        0.00 B/s   11.97 K/s  0.00 %  0.00 %
> [kworker/u8:5-btrfs-extent-refs]
> 31253 be/4 root        0.00 B/s    5.32 K/s  0.00 %  0.00 %
> [kworker/u8:9-btrfs-extent-refs]
> 
> Total DISK READ:       674.25 K/s | Total DISK WRITE:        13.57 M/s
> Current DISK READ:     674.25 K/s | Current DISK WRITE:      39.97 M/s
>   TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO> COMMAND
>  8138 be/4 root      147.62 K/s    7.72 M/s  0.00 % 79.89 % btrfs dev
> delete /dev/sda2 /
> 10274 be/4 root      484.08 K/s    4.91 M/s  0.00 % 56.93 %
> [kworker/u8:2+btrfs-extent-refs]
> 10351 be/4 root       27.93 K/s  492.06 K/s  0.00 %  3.03 %
> [kworker/u8:8-btrfs-endio-meta]
> 10350 be/4 root        3.99 K/s  115.70 K/s  0.00 %  0.59 %
> [kworker/u8:7-btrfs-endio-write]
>  8037 be/4 root        5.32 K/s   79.79 K/s  0.00 %  0.40 %
> [kworker/u8:6-btrfs-endio-write]
> 23032 be/4 root        3.99 K/s   79.79 K/s  0.00 %  0.20 %
> [kworker/u8:12-btrfs-endio-write]
>  7879 be/4 root     1361.80 B/s  105.06 K/s  0.00 %  0.10 %
> [kworker/u8:1-btrfs-endio-write]
>   538 be/4 root        0.00 B/s 1021.35 B/s  0.00 %  0.00 % rsyslogd -n
> -iNONE [rs:main Q:Reg]
> 10349 be/4 root        0.00 B/s   86.44 K/s  0.00 %  0.00 %
> [kworker/u8:3-btrfs-endio-meta]
>     1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init
> 
> 
> 
> #################################################
> 
> root@wawel:~# top -d12
> top - 23:23:01 up 1 day, 56 min,  1 user,  load average: 3,05, 3,28, 3,47
> Tasks: 137 total,   1 running, 136 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  0,0 us,  4,5 sy,  0,0 ni, 92,1 id,  3,3 wa,  0,0 hi, 0,1 si, 
> 0,0 st
> MiB Mem :   7841,1 total,    797,0 free,    259,7 used,   6784,4 buff/cache
> MiB Swap:   8053,0 total,   7962,2 free,     90,8 used.   7265,2 avail Mem
>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND
>  8138 root      20   0    8540    104    104 D  11,6   0,0 23:25.88 btrfs
> 23032 root      20   0       0      0      0 I   4,8   0,0 0:51.20
> kworker/u8:12-btrfs-extent-refs
> 10350 root      20   0       0      0      0 I   0,9   0,0 0:01.79
> kworker/u8:7-btrfs-endio-meta
> 10349 root      20   0       0      0      0 I   0,8   0,0 0:02.12
> kworker/u8:3-btrfs-freespace-write
>   242 root       0 -20       0      0      0 I   0,2   0,0 0:59.66
> kworker/2:1H-kblockd
> 
> 
> #################################################
> 
> root@wawel:~# tail /var/log/kern.log -n50
> Dec 25 21:50:35 wawel kernel: [84258.634360] BTRFS info (device sda2):
> found 1752 extents
> Dec 25 21:50:41 wawel kernel: [84264.771079] BTRFS info (device sda2):
> relocating block group 3434964058112 flags data
> Dec 25 21:51:09 wawel kernel: [84291.864553] BTRFS info (device sda2):
> found 584 extents
> Dec 25 21:55:51 wawel kernel: [84574.383303] BTRFS info (device sda2):
> found 584 extents
> Dec 25 21:55:56 wawel kernel: [84579.359349] BTRFS info (device sda2):
> relocating block group 3431742832640 flags data
> Dec 25 21:56:17 wawel kernel: [84600.668237] BTRFS info (device sda2):
> found 635 extents
> Dec 25 22:00:36 wawel kernel: [84859.766366] BTRFS info (device sda2):
> found 635 extents
> Dec 25 22:00:47 wawel kernel: [84870.367399] BTRFS info (device sda2):
> relocating block group 3428521607168 flags data
> Dec 25 22:01:14 wawel kernel: [84897.313282] BTRFS info (device sda2):
> found 1170 extents
> Dec 25 22:05:38 wawel kernel: [85161.264397] BTRFS info (device sda2):
> found 1170 extents
> Dec 25 22:05:46 wawel kernel: [85169.216425] BTRFS info (device sda2):
> relocating block group 3425300381696 flags data
> Dec 25 22:06:20 wawel kernel: [85202.963222] BTRFS info (device sda2):
> found 1336 extents
> Dec 25 22:13:00 wawel kernel: [85603.648327] BTRFS info (device sda2):
> found 1336 extents
> Dec 25 22:13:02 wawel kernel: [85604.878785] BTRFS info (device sda2):
> relocating block group 3422079156224 flags data
> Dec 25 22:14:57 wawel kernel: [85720.394945] BTRFS info (device sda2):
> found 1771 extents
> Dec 25 22:23:03 wawel kernel: [86206.582245] BTRFS info (device sda2):
> found 1771 extents
> Dec 25 22:23:04 wawel kernel: [86207.671814] BTRFS info (device sda2):
> relocating block group 3418857930752 flags data
> Dec 25 22:23:19 wawel kernel: [86222.138761] BTRFS info (device sda2):
> found 199 extents
> Dec 25 22:26:26 wawel kernel: [86408.991487] BTRFS info (device sda2):
> found 199 extents
> Dec 25 22:26:29 wawel kernel: [86412.181922] BTRFS info (device sda2):
> relocating block group 3415636705280 flags data
> Dec 25 22:26:48 wawel kernel: [86431.726268] BTRFS info (device sda2):
> found 50 extents
> Dec 25 22:27:32 wawel kernel: [86475.800606] BTRFS info (device sda2):
> found 50 extents
> Dec 25 22:27:39 wawel kernel: [86482.222298] BTRFS info (device sda2):
> found 20 extents
> Dec 25 22:27:44 wawel kernel: [86487.168278] BTRFS info (device sda2):
> relocating block group 3412415479808 flags data
> Dec 25 22:28:27 wawel kernel: [86530.525657] BTRFS info (device sda2):
> found 6748 extents
> Dec 25 22:34:07 wawel kernel: [86870.618140] BTRFS info (device sda2):
> found 6748 extents
> Dec 25 22:34:35 wawel kernel: [86898.498949] BTRFS info (device sda2):
> relocating block group 3409194254336 flags data
> Dec 25 22:36:06 wawel kernel: [86989.589309] BTRFS info (device sda2):
> found 1175 extents
> Dec 25 22:40:12 wawel kernel: [87234.881523] perf: interrupt took too
> long (3943 > 3930), lowering kernel.perf_event_max_sample_rate to 50500
> Dec 25 22:44:44 wawel kernel: [87507.870507] BTRFS info (device sda2):
> found 1175 extents
> Dec 25 22:44:52 wawel kernel: [87515.563744] BTRFS info (device sda2):
> relocating block group 3405973028864 flags data
> Dec 25 22:45:14 wawel kernel: [87537.128352] BTRFS info (device sda2):
> found 372 extents
> Dec 25 22:46:11 wawel kernel: [87594.569709] BTRFS info (device sda2):
> found 372 extents
> Dec 25 22:46:17 wawel kernel: [87600.772681] BTRFS info (device sda2):
> relocating block group 3402751803392 flags data
> Dec 25 22:47:27 wawel kernel: [87670.663043] BTRFS info (device sda2):
> found 3483 extents
> Dec 25 23:04:57 wawel kernel: [88720.068885] BTRFS info (device sda2):
> found 3479 extents
> Dec 25 23:05:03 wawel kernel: [88726.864461] BTRFS info (device sda2):
> relocating block group 3399530577920 flags data
> Dec 25 23:06:36 wawel kernel: [88819.464525] BTRFS info (device sda2):
> found 1871 extents
> Dec 25 23:16:11 wawel kernel: [89393.997406] BTRFS info (device sda2):
> found 1868 extents
> Dec 25 23:16:13 wawel kernel: [89396.825305] BTRFS info (device sda2):
> relocating block group 3396309352448 flags data
> Dec 25 23:16:41 wawel kernel: [89424.310916] BTRFS info (device sda2):
> found 1946 extents
> Dec 25 23:22:12 wawel kernel: [89755.068457] BTRFS info (device sda2):
> found 1946 extents
> Dec 25 23:22:15 wawel kernel: [89758.601122] BTRFS info (device sda2):
> relocating block group 3393088126976 flags data
> Dec 25 23:22:35 wawel kernel: [89778.172631] BTRFS info (device sda2):
> found 647 extents
> Dec 25 23:23:06 wawel kernel: [89809.153016] BTRFS info (device sda2):
> found 647 extents
> Dec 25 23:23:07 wawel kernel: [89810.017006] BTRFS info (device sda2):
> relocating block group 3389866901504 flags data
> Dec 25 23:23:24 wawel kernel: [89827.351203] BTRFS info (device sda2):
> found 745 extents
> Dec 25 23:24:38 wawel kernel: [89901.176493] BTRFS info (device sda2):
> found 745 extents
> Dec 25 23:24:51 wawel kernel: [89914.840850] BTRFS info (device sda2):
> relocating block group 3386645676032 flags data
> Dec 25 23:25:40 wawel kernel: [89963.641062] BTRFS info (device sda2):
> found 3114 extents
> 
> 
> 
> 
> ################################################
> 
> More on this topic:
> 
> https://strugglers.net/~andy/blog/2018/06/15/another-disappointing-btrfs-experience/
> 
> 
> https://www.spinics.net/lists/linux-btrfs/msg50771.html
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1294531
> 
> 
> 
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2019-12-26  5:08 ` Qu Wenruo
@ 2019-12-26 13:17   ` Leszek Dubiel
  2019-12-26 13:44     ` Remi Gauvin
  0 siblings, 1 reply; 32+ messages in thread
From: Leszek Dubiel @ 2019-12-26 13:17 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo


> Are you using qgroups?
>
> You can verify it by "btrfs qgroup show -pcre <mount>" to verify.
> Qgroup can hugely impact performance for balance, if using kernel older
> than v5.2.
>
> You can either disable qgroup, losing the ability to trace how many
> bytes are exclusively used by each subvolume, or upgrade to v5.2 kernel
> to solve it.
>

Not using qgroups.

root@wawel:~# btrfs qgroup show -pcre /
ERROR: can't list qgroups: quotas not enabled


Current data transfer speed is 2Mb/sek.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2019-12-26 13:17   ` Leszek Dubiel
@ 2019-12-26 13:44     ` Remi Gauvin
  2019-12-26 14:05       ` Leszek Dubiel
  0 siblings, 1 reply; 32+ messages in thread
From: Remi Gauvin @ 2019-12-26 13:44 UTC (permalink / raw)
  To: Leszek Dubiel, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 579 bytes --]

On 2019-12-26 8:17 a.m., Leszek Dubiel wrote:

> Not using qgroups.
> 
> root@wawel:~# btrfs qgroup show -pcre /
> ERROR: can't list qgroups: quotas not enabled
> 
> 
> Current data transfer speed is 2Mb/sek.
> 
> 


Did you check dmesg to see if you're getting a whole pile of I/O errors
on the failing disk?  It might be necessary to remove it (physically)
and finish replacing it with a degraded array.   (I would not jump to
trying that, however, as there are additional risks if your metadata is
Raid6 as well, and you have any write hole corruption.)



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2019-12-26 13:44     ` Remi Gauvin
@ 2019-12-26 14:05       ` Leszek Dubiel
  2019-12-26 14:21         ` Remi Gauvin
  2019-12-26 22:40         ` Chris Murphy
  0 siblings, 2 replies; 32+ messages in thread
From: Leszek Dubiel @ 2019-12-26 14:05 UTC (permalink / raw)
  To: Remi Gauvin, linux-btrfs

> Did you check dmesg to see if you're getting a whole pile of I/O errors
> on the failing disk?  It might be necessary to remove it (physically)
> and finish replacing it with a degraded array.   (I would not jump to
> trying that, however, as there are additional risks if your metadata is
> Raid6 as well, and you have any write hole corruption.)

I didn't use Raid6, but Raid0.


Yes I have chcecked dmesg in first place (see original mail).

Btrfs no error logged.

root@wawel:~# btrfs dev stat /
[/dev/sda2].write_io_errs    0
[/dev/sda2].read_io_errs     0
[/dev/sda2].flush_io_errs    0
[/dev/sda2].corruption_errs  0
[/dev/sda2].generation_errs  0
[/dev/sdb2].write_io_errs    0
[/dev/sdb2].read_io_errs     0
[/dev/sdb2].flush_io_errs    0
[/dev/sdb2].corruption_errs  0
[/dev/sdb2].generation_errs  0
[/dev/sdc2].write_io_errs    0
[/dev/sdc2].read_io_errs     0
[/dev/sdc2].flush_io_errs    0
[/dev/sdc2].corruption_errs  0
[/dev/sdc2].generation_errs  0



But currently it looks like this:



Dec 22 17:44:37 wawel kernel: [  179.075180] BTRFS warning (device 
sda2): block group 5414977536000 has wrong amount of free space
Dec 22 17:44:37 wawel kernel: [  179.075185] BTRFS warning (device 
sda2): failed to load free space cache for block group 5414977536000, 
rebuilding it now
Dec 22 18:59:17 wawel kernel: [ 4659.089590] BTRFS info (device sda2): 
relocating block group 8562114822144 flags metadata|raid1
Dec 22 19:01:44 wawel kernel: [ 4805.180779] BTRFS info (device sda2): 
relocating block group 8562114822144 flags metadata|raid1
Dec 22 19:02:14 wawel kernel: [ 4835.167239] BTRFS info (device sda2): 
found 12211 extents
Dec 22 19:02:14 wawel kernel: [ 4835.641938] BTRFS info (device sda2): 
relocating block group 8555672371200 flags data
Dec 22 19:02:24 wawel kernel: [ 4845.378907] BTRFS info (device sda2): 
found 10 extents
Dec 22 19:02:26 wawel kernel: [ 4847.909681] BTRFS info (device sda2): 
found 10 extents
Dec 22 19:02:27 wawel kernel: [ 4848.442186] BTRFS info (device sda2): 
relocating block group 8552451145728 flags data
Dec 22 19:02:27 wawel kernel: [ 4848.639837] BTRFS warning (device 
sda2): block group 6255717384192 has wrong amount of free space
Dec 22 19:02:27 wawel kernel: [ 4848.639841] BTRFS warning (device 
sda2): failed to load free space cache for block group 6255717384192, 
rebuilding it now
Dec 22 19:02:37 wawel kernel: [ 4858.473525] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:03:03 wawel kernel: [ 4884.425697] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:03:16 wawel kernel: [ 4898.034082] BTRFS info (device sda2): 
relocating block group 8549229920256 flags data
Dec 22 19:03:40 wawel kernel: [ 4921.855827] BTRFS info (device sda2): 
found 12 extents
Dec 22 19:03:49 wawel kernel: [ 4931.023551] BTRFS info (device sda2): 
found 12 extents
Dec 22 19:03:50 wawel kernel: [ 4931.603340] BTRFS info (device sda2): 
relocating block group 8546008694784 flags data
Dec 22 19:04:05 wawel kernel: [ 4946.520618] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:04:14 wawel kernel: [ 4955.158390] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:04:15 wawel kernel: [ 4956.492239] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:04:16 wawel kernel: [ 4957.213730] BTRFS info (device sda2): 
relocating block group 8542787469312 flags data
Dec 22 19:04:32 wawel kernel: [ 4973.410098] BTRFS info (device sda2): 
found 12 extents
Dec 22 19:04:35 wawel kernel: [ 4976.628967] BTRFS info (device sda2): 
found 12 extents
Dec 22 19:04:35 wawel kernel: [ 4977.062360] BTRFS info (device sda2): 
relocating block group 8539566243840 flags data
Dec 22 19:04:46 wawel kernel: [ 4987.604348] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:04:49 wawel kernel: [ 4991.071047] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:04:50 wawel kernel: [ 4991.867354] BTRFS info (device sda2): 
relocating block group 8536345018368 flags data
Dec 22 19:05:02 wawel kernel: [ 5003.539073] BTRFS info (device sda2): 
found 10 extents
Dec 22 19:05:06 wawel kernel: [ 5007.309308] BTRFS info (device sda2): 
found 10 extents
Dec 22 19:05:06 wawel kernel: [ 5007.938822] BTRFS info (device sda2): 
relocating block group 8533123792896 flags data
Dec 22 19:05:19 wawel kernel: [ 5020.576346] BTRFS info (device sda2): 
found 11 extents
Dec 22 19:05:32 wawel kernel: [ 5033.787681] BTRFS info (device sda2): 
found 11 extents
Dec 22 19:05:45 wawel kernel: [ 5046.894372] BTRFS info (device sda2): 
relocating block group 8529902567424 flags data
Dec 22 19:06:08 wawel kernel: [ 5069.249286] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:06:18 wawel kernel: [ 5079.974289] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:06:19 wawel kernel: [ 5080.609198] BTRFS info (device sda2): 
relocating block group 8526681341952 flags data
Dec 22 19:06:30 wawel kernel: [ 5091.408195] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:06:34 wawel kernel: [ 5095.494893] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:06:34 wawel kernel: [ 5095.974846] BTRFS info (device sda2): 
relocating block group 8523460116480 flags data
Dec 22 19:06:48 wawel kernel: [ 5109.948486] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:06:50 wawel kernel: [ 5111.872561] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:06:51 wawel kernel: [ 5112.262733] BTRFS info (device sda2): 
relocating block group 8520238891008 flags data
Dec 22 19:07:01 wawel kernel: [ 5122.295835] BTRFS info (device sda2): 
found 11 extents
Dec 22 19:07:03 wawel kernel: [ 5124.152445] BTRFS info (device sda2): 
found 11 extents
Dec 22 19:07:03 wawel kernel: [ 5124.603091] BTRFS info (device sda2): 
relocating block group 8517017665536 flags data
Dec 22 19:07:14 wawel kernel: [ 5135.149621] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:07:15 wawel kernel: [ 5136.612473] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:07:15 wawel kernel: [ 5137.042461] BTRFS info (device sda2): 
relocating block group 8513796440064 flags data
Dec 22 19:07:26 wawel kernel: [ 5147.805243] BTRFS info (device sda2): 
found 11 extents
Dec 22 19:07:30 wawel kernel: [ 5151.553394] BTRFS info (device sda2): 
found 11 extents
Dec 22 19:07:30 wawel kernel: [ 5152.056795] BTRFS info (device sda2): 
relocating block group 8510575214592 flags data
Dec 22 19:07:41 wawel kernel: [ 5162.747546] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:07:44 wawel kernel: [ 5165.848208] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:07:45 wawel kernel: [ 5166.412946] BTRFS info (device sda2): 
relocating block group 8507353989120 flags data
Dec 22 19:07:55 wawel kernel: [ 5176.370942] BTRFS info (device sda2): 
found 12 extents
Dec 22 19:07:56 wawel kernel: [ 5177.729094] BTRFS info (device sda2): 
found 12 extents
Dec 22 19:07:57 wawel kernel: [ 5178.312743] BTRFS info (device sda2): 
relocating block group 8504132763648 flags data
Dec 22 19:08:03 wawel kernel: [ 5184.392945] BTRFS info (device sda2): 
found 4 extents
Dec 22 19:08:04 wawel kernel: [ 5185.610488] BTRFS info (device sda2): 
found 4 extents
Dec 22 19:08:04 wawel kernel: [ 5186.076337] BTRFS info (device sda2): 
relocating block group 8500911538176 flags data
Dec 22 19:08:15 wawel kernel: [ 5196.915114] BTRFS info (device sda2): 
found 14 extents
Dec 22 19:08:38 wawel kernel: [ 5219.384163] BTRFS info (device sda2): 
found 14 extents
Dec 22 19:08:53 wawel kernel: [ 5234.746250] BTRFS info (device sda2): 
relocating block group 8497690312704 flags data
Dec 22 19:09:10 wawel kernel: [ 5252.065078] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:09:20 wawel kernel: [ 5261.174092] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:09:20 wawel kernel: [ 5262.064000] BTRFS info (device sda2): 
relocating block group 8494469087232 flags data
Dec 22 19:09:31 wawel kernel: [ 5272.940366] BTRFS info (device sda2): 
found 9 extents
Dec 22 19:09:35 wawel kernel: [ 5276.735998] BTRFS info (device sda2): 
found 9 extents
Dec 22 19:09:36 wawel kernel: [ 5277.343659] BTRFS info (device sda2): 
relocating block group 8491247861760 flags data
Dec 22 19:09:47 wawel kernel: [ 5288.587703] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:09:51 wawel kernel: [ 5292.187071] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:09:51 wawel kernel: [ 5292.617360] BTRFS info (device sda2): 
relocating block group 8489100378112 flags data
Dec 22 19:09:55 wawel kernel: [ 5296.165794] BTRFS info (device sda2): 
found 2 extents
Dec 22 19:09:57 wawel kernel: [ 5298.717969] BTRFS info (device sda2): 
found 2 extents
Dec 22 19:09:58 wawel kernel: [ 5300.067976] BTRFS info (device sda2): 
relocating block group 8485879152640 flags data
Dec 22 19:10:03 wawel kernel: [ 5304.928496] BTRFS info (device sda2): 
found 3 extents



Dec 22 19:20:11 wawel kernel: [ 5912.116874] ata1.00: exception Emask 
0x0 SAct 0x1f80 SErr 0x0 action 0x0
Dec 22 19:20:11 wawel kernel: [ 5912.116878] ata1.00: irq_stat 0x40000008
Dec 22 19:20:11 wawel kernel: [ 5912.116880] ata1.00: failed command: 
READ FPDMA QUEUED
Dec 22 19:20:11 wawel kernel: [ 5912.116882] ata1.00: cmd 
60/00:38:00:00:98/0a:00:45:01:00/40 tag 7 ncq dma 1310720 in
Dec 22 19:20:11 wawel kernel: [ 5912.116882]          res 
43/40:18:e8:05:98/00:04:45:01:00/40 Emask 0x409 (media error) <F>
Dec 22 19:20:11 wawel kernel: [ 5912.116885] ata1.00: status: { DRDY 
SENSE ERR }
Dec 22 19:20:11 wawel kernel: [ 5912.116886] ata1.00: error: { UNC }
Dec 22 19:20:11 wawel kernel: [ 5912.153695] ata1.00: configured for 
UDMA/133
Dec 22 19:20:11 wawel kernel: [ 5912.153707] sd 0:0:0:0: [sda] tag#7 
FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 22 19:20:11 wawel kernel: [ 5912.153709] sd 0:0:0:0: [sda] tag#7 
Sense Key : Medium Error [current]
Dec 22 19:20:11 wawel kernel: [ 5912.153710] sd 0:0:0:0: [sda] tag#7 
Add. Sense: Unrecovered read error - auto reallocate failed
Dec 22 19:20:11 wawel kernel: [ 5912.153711] sd 0:0:0:0: [sda] tag#7 
CDB: Read(16) 88 00 00 00 00 01 45 98 00 00 00 00 0a 00 00 00
Dec 22 19:20:11 wawel kernel: [ 5912.153712] print_req_error: I/O error, 
dev sda, sector 5462556672
Dec 22 19:20:11 wawel kernel: [ 5912.153724] ata1: EH complete
Dec 22 19:21:28 wawel kernel: [ 5989.527853] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:21:29 wawel kernel: [ 5990.861317] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:21:30 wawel kernel: [ 5991.492017] BTRFS info (device sda2): 
relocating block group 8392463613952 flags data
Dec 22 19:21:40 wawel kernel: [ 6001.782573] BTRFS info (device sda2): 
found 10 extents
Dec 22 19:22:01 wawel kernel: [ 6022.162622] BTRFS info (device sda2): 
found 10 extents
Dec 22 19:22:17 wawel kernel: [ 6039.036941] BTRFS info (device sda2): 
relocating block group 8389242388480 flags data
Dec 22 19:22:33 wawel kernel: [ 6054.947905] BTRFS info (device sda2): 
found 9 extents
Dec 22 19:22:40 wawel kernel: [ 6061.630503] BTRFS info (device sda2): 
found 9 extents
Dec 22 19:22:41 wawel kernel: [ 6062.758008] BTRFS info (device sda2): 
relocating block group 8386021163008 flags data
Dec 22 19:22:53 wawel kernel: [ 6074.648119] BTRFS info (device sda2): 
found 8 extents
Dec 22 19:22:55 wawel kernel: [ 6076.264156] BTRFS info (device sda2): 
found 8 extents




Dec 22 23:31:24 wawel kernel: [20985.918518] BTRFS info (device sda2): 
found 17 extents
Dec 22 23:31:25 wawel kernel: [20986.705143] BTRFS info (device sda2): 
relocating block group 7150144323584 flags data
Dec 22 23:31:36 wawel kernel: [20997.997520] BTRFS info (device sda2): 
found 11 extents
Dec 22 23:31:56 wawel kernel: [21017.914604] BTRFS info (device sda2): 
found 11 extents
Dec 22 23:32:05 wawel kernel: [21027.095442] BTRFS info (device sda2): 
relocating block group 7146923098112 flags data
Dec 22 23:32:42 wawel kernel: [21064.078043] BTRFS info (device sda2): 
found 11 extents
Dec 22 23:32:52 wawel kernel: [21073.534844] BTRFS info (device sda2): 
found 11 extents
Dec 22 23:32:53 wawel kernel: [21074.527201] BTRFS info (device sda2): 
relocating block group 7141554388992 flags data
Dec 22 23:33:05 wawel kernel: [21086.947393] BTRFS info (device sda2): 
found 14 extents
Dec 22 23:33:14 wawel kernel: [21095.872547] BTRFS info (device sda2): 
found 14 extents
Dec 22 23:33:16 wawel kernel: [21097.161827] BTRFS info (device sda2): 
relocating block group 7138333163520 flags data
Dec 22 23:33:27 wawel kernel: [21108.868346] BTRFS info (device sda2): 
found 14 extents
Dec 22 23:33:35 wawel kernel: [21116.664273] BTRFS info (device sda2): 
found 14 extents
Dec 22 23:33:36 wawel kernel: [21117.233358] BTRFS info (device sda2): 
relocating block group 8559967338496 flags metadata|raid1
Dec 22 23:47:19 wawel kernel: [21940.204817] BTRFS info (device sda2): 
found 59958 extents
Dec 22 23:47:19 wawel kernel: [21940.935869] BTRFS info (device sda2): 
relocating block group 8558893596672 flags metadata|raid1
Dec 22 23:57:18 wawel kernel: [22539.150585] BTRFS info (device sda2): 
found 63167 extents
Dec 22 23:57:18 wawel kernel: [22540.041179] BTRFS info (device sda2): 
relocating block group 7128669487104 flags data
Dec 22 23:57:31 wawel kernel: [22552.964707] BTRFS info (device sda2): 
found 13 extents
Dec 22 23:57:49 wawel kernel: [22570.932560] BTRFS info (device sda2): 
found 13 extents
Dec 22 23:57:50 wawel kernel: [22571.760150] BTRFS info (device sda2): 
relocating block group 7125448261632 flags data
Dec 22 23:58:02 wawel kernel: [22583.650959] BTRFS info (device sda2): 
found 11 extents
Dec 22 23:58:10 wawel kernel: [22592.036311] BTRFS info (device sda2): 
found 11 extents
Dec 22 23:58:11 wawel kernel: [22592.520003] BTRFS info (device sda2): 
relocating block group 7122227036160 flags data
Dec 22 23:58:22 wawel kernel: [22603.590833] BTRFS info (device sda2): 
found 16 extents
Dec 22 23:58:35 wawel kernel: [22616.179240] BTRFS info (device sda2): 
found 16 extents
Dec 22 23:58:35 wawel kernel: [22616.810120] BTRFS info (device sda2): 
relocating block group 7119005810688 flags data
Dec 22 23:58:54 wawel kernel: [22635.938434] BTRFS info (device sda2): 
found 17 extents
Dec 22 23:59:30 wawel kernel: [22671.668073] BTRFS info (device sda2): 
found 17 extents
Dec 22 23:59:33 wawel kernel: [22675.053253] BTRFS info (device sda2): 
found 6 extents
Dec 22 23:59:34 wawel kernel: [22675.806876] BTRFS info (device sda2): 
relocating block group 7115784585216 flags data
Dec 22 23:59:47 wawel kernel: [22689.015207] BTRFS info (device sda2): 
found 15 extents
Dec 23 00:00:04 wawel kernel: [22705.897111] BTRFS info (device sda2): 
found 15 extents
Dec 23 00:00:11 wawel kernel: [22712.320353] BTRFS info (device sda2): 
relocating block group 7112563359744 flags data
Dec 23 00:01:19 wawel kernel: [22780.810851] BTRFS info (device sda2): 
found 25 extents
Dec 23 00:02:23 wawel kernel: [22844.836838] BTRFS info (device sda2): 
found 25 extents
Dec 23 00:02:24 wawel kernel: [22845.996013] BTRFS info (device sda2): 
relocating block group 7109342134272 flags data
Dec 23 00:02:36 wawel kernel: [22857.961127] BTRFS info (device sda2): 
found 14 extents
Dec 23 00:02:43 wawel kernel: [22864.547374] BTRFS info (device sda2): 
found 14 extents
Dec 23 00:02:44 wawel kernel: [22865.258107] BTRFS info (device sda2): 
relocating block group 7106120908800 flags data
Dec 23 00:02:55 wawel kernel: [22876.860762] BTRFS info (device sda2): 
found 14 extents
Dec 23 00:03:02 wawel kernel: [22883.804570] BTRFS info (device sda2): 
found 14 extents
Dec 23 00:03:03 wawel kernel: [22884.363060] BTRFS info (device sda2): 
relocating block group 7102899683328 flags data
Dec 23 00:03:14 wawel kernel: [22895.383964] BTRFS info (device sda2): 
found 8 extents
Dec 23 00:03:38 wawel kernel: [22919.988908] BTRFS info (device sda2): 
found 8 extents
Dec 23 00:03:53 wawel kernel: [22935.003150] BTRFS info (device sda2): 
relocating block group 7081424846848 flags data
Dec 23 00:04:37 wawel kernel: [22978.751114] BTRFS info (device sda2): 
found 9 extents
Dec 23 00:04:56 wawel kernel: [22997.228499] BTRFS info (device sda2): 
found 9 extents
Dec 23 00:04:57 wawel kernel: [22998.219232] BTRFS info (device sda2): 
relocating block group 7078203621376 flags data



Dec 23 00:08:20 wawel kernel: [23201.188424] INFO: task btrfs:2546 
blocked for more than 120 seconds.
Dec 23 00:08:20 wawel kernel: [23201.188437]       Not tainted 
4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2
Dec 23 00:08:20 wawel kernel: [23201.188442] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 23 00:08:20 wawel kernel: [23201.188449] btrfs           D    0 
2546   2539 0x00000000
Dec 23 00:08:20 wawel kernel: [23201.188454] Call Trace:
Dec 23 00:08:20 wawel kernel: [23201.188468]  ? __schedule+0x2a2/0x870
Dec 23 00:08:20 wawel kernel: [23201.188476]  schedule+0x28/0x80
Dec 23 00:08:20 wawel kernel: [23201.188480] 
schedule_preempt_disabled+0xa/0x10
Dec 23 00:08:20 wawel kernel: [23201.188486] __mutex_lock.isra.8+0x2b5/0x4a0
Dec 23 00:08:20 wawel kernel: [23201.188550] 
btrfs_relocate_block_group.cold.48+0x48/0x1ae [btrfs]
Dec 23 00:08:20 wawel kernel: [23201.188609] 
btrfs_relocate_chunk+0x31/0xa0 [btrfs]
Dec 23 00:08:20 wawel kernel: [23201.188666] 
btrfs_shrink_device+0x1e1/0x550 [btrfs]
Dec 23 00:08:20 wawel kernel: [23201.188722]  ? 
btrfs_find_device_missing_or_by_path+0x110/0x130 [btrfs]
Dec 23 00:08:20 wawel kernel: [23201.188775] btrfs_rm_device+0x165/0x590 
[btrfs]
Dec 23 00:08:20 wawel kernel: [23201.188782]  ? 
__kmalloc_track_caller+0x16c/0x210
Dec 23 00:08:20 wawel kernel: [23201.188788]  ? 
__check_object_size+0x15d/0x189
Dec 23 00:08:20 wawel kernel: [23201.188841] btrfs_ioctl+0x2a8e/0x2d90 
[btrfs]
Dec 23 00:08:20 wawel kernel: [23201.188848]  ? 
__check_object_size+0x15d/0x189
Dec 23 00:08:20 wawel kernel: [23201.188855]  ? _copy_to_user+0x26/0x30
Dec 23 00:08:20 wawel kernel: [23201.188861]  ? cp_new_stat+0x150/0x180
Dec 23 00:08:20 wawel kernel: [23201.188868]  ? do_vfs_ioctl+0xa4/0x630
Dec 23 00:08:20 wawel kernel: [23201.188873] do_vfs_ioctl+0xa4/0x630
Dec 23 00:08:20 wawel kernel: [23201.188879]  ? __do_sys_newstat+0x48/0x70
Dec 23 00:08:20 wawel kernel: [23201.188885]  ksys_ioctl+0x60/0x90
Dec 23 00:08:20 wawel kernel: [23201.188891] __x64_sys_ioctl+0x16/0x20
Dec 23 00:08:20 wawel kernel: [23201.188898] do_syscall_64+0x53/0x110
Dec 23 00:08:20 wawel kernel: [23201.188905] 
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 23 00:08:20 wawel kernel: [23201.188910] RIP: 0033:0x7f3d307fd427
Dec 23 00:08:20 wawel kernel: [23201.188919] Code: Bad RIP value.
Dec 23 00:08:20 wawel kernel: [23201.188922] RSP: 002b:00007ffc69d36b48 
EFLAGS: 00000206 ORIG_RAX: 0000000000000010
Dec 23 00:08:20 wawel kernel: [23201.188927] RAX: ffffffffffffffda RBX: 
0000000000000000 RCX: 00007f3d307fd427
Dec 23 00:08:20 wawel kernel: [23201.188929] RDX: 00007ffc69d37b68 RSI: 
000000005000943a RDI: 0000000000000003
Dec 23 00:08:20 wawel kernel: [23201.188931] RBP: 0000000000000001 R08: 
0000000000000008 R09: 0000000000000078
Dec 23 00:08:20 wawel kernel: [23201.188933] R10: fffffffffffffa4a R11: 
0000000000000206 R12: 00007ffc69d38ce8
Dec 23 00:08:20 wawel kernel: [23201.188935] R13: 0000000000000000 R14: 
0000000000000003 R15: 0000000000000003
Dec 23 00:10:20 wawel kernel: [23322.019744] INFO: task btrfs:2546 
blocked for more than 120 seconds.
Dec 23 00:10:20 wawel kernel: [23322.019757]       Not tainted 
4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2
Dec 23 00:10:20 wawel kernel: [23322.019762] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 23 00:10:20 wawel kernel: [23322.019768] btrfs           D    0 
2546   2539 0x00000000
Dec 23 00:10:20 wawel kernel: [23322.019774] Call Trace:
Dec 23 00:10:20 wawel kernel: [23322.019788]  ? __schedule+0x2a2/0x870
Dec 23 00:10:20 wawel kernel: [23322.019795]  schedule+0x28/0x80
Dec 23 00:10:20 wawel kernel: [23322.019800] 
schedule_preempt_disabled+0xa/0x10
Dec 23 00:10:20 wawel kernel: [23322.019805] __mutex_lock.isra.8+0x2b5/0x4a0
Dec 23 00:10:20 wawel kernel: [23322.019864] 
btrfs_relocate_block_group.cold.48+0x48/0x1ae [btrfs]
Dec 23 00:10:20 wawel kernel: [23322.019923] 
btrfs_relocate_chunk+0x31/0xa0 [btrfs]
Dec 23 00:10:20 wawel kernel: [23322.019979] 
btrfs_shrink_device+0x1e1/0x550 [btrfs]
Dec 23 00:10:20 wawel kernel: [23322.020035]  ? 
btrfs_find_device_missing_or_by_path+0x110/0x130 [btrfs]
Dec 23 00:10:20 wawel kernel: [23322.020088] btrfs_rm_device+0x165/0x590 
[btrfs]
Dec 23 00:10:20 wawel kernel: [23322.020095]  ? 
__kmalloc_track_caller+0x16c/0x210
Dec 23 00:10:20 wawel kernel: [23322.020101]  ? 
__check_object_size+0x15d/0x189
Dec 23 00:10:20 wawel kernel: [23322.020152] btrfs_ioctl+0x2a8e/0x2d90 
[btrfs]
Dec 23 00:10:20 wawel kernel: [23322.020159]  ? 
__check_object_size+0x15d/0x189
Dec 23 00:10:20 wawel kernel: [23322.020166]  ? _copy_to_user+0x26/0x30
Dec 23 00:10:20 wawel kernel: [23322.020171]  ? cp_new_stat+0x150/0x180
Dec 23 00:10:20 wawel kernel: [23322.020179]  ? do_vfs_ioctl+0xa4/0x630
Dec 23 00:10:20 wawel kernel: [23322.020183] do_vfs_ioctl+0xa4/0x630
Dec 23 00:10:20 wawel kernel: [23322.020189]  ? __do_sys_newstat+0x48/0x70
Dec 23 00:10:20 wawel kernel: [23322.020195]  ksys_ioctl+0x60/0x90
Dec 23 00:10:20 wawel kernel: [23322.020201] __x64_sys_ioctl+0x16/0x20
Dec 23 00:10:20 wawel kernel: [23322.020207] do_syscall_64+0x53/0x110
Dec 23 00:10:20 wawel kernel: [23322.020214] 
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 23 00:10:20 wawel kernel: [23322.020219] RIP: 0033:0x7f3d307fd427
Dec 23 00:10:20 wawel kernel: [23322.020228] Code: Bad RIP value.
Dec 23 00:10:20 wawel kernel: [23322.020231] RSP: 002b:00007ffc69d36b48 
EFLAGS: 00000206 ORIG_RAX: 0000000000000010
Dec 23 00:10:20 wawel kernel: [23322.020235] RAX: ffffffffffffffda RBX: 
0000000000000000 RCX: 00007f3d307fd427
Dec 23 00:10:20 wawel kernel: [23322.020238] RDX: 00007ffc69d37b68 RSI: 
000000005000943a RDI: 0000000000000003
Dec 23 00:10:20 wawel kernel: [23322.020240] RBP: 0000000000000001 R08: 
0000000000000008 R09: 0000000000000078
Dec 23 00:10:20 wawel kernel: [23322.020242] R10: fffffffffffffa4a R11: 
0000000000000206 R12: 00007ffc69d38ce8
Dec 23 00:10:20 wawel kernel: [23322.020244] R13: 0000000000000000 R14: 
0000000000000003 R15: 0000000000000003
Dec 23 00:12:21 wawel kernel: [23442.851059] INFO: task btrfs:2546 
blocked for more than 120 seconds.
Dec 23 00:12:21 wawel kernel: [23442.851076]       Not tainted 
4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2
Dec 23 00:12:21 wawel kernel: [23442.851086] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 23 00:12:21 wawel kernel: [23442.851097] btrfs           D    0 
2546   2539 0x00000000
Dec 23 00:12:21 wawel kernel: [23442.851105] Call Trace:
Dec 23 00:12:21 wawel kernel: [23442.851122]  ? __schedule+0x2a2/0x870
Dec 23 00:12:21 wawel kernel: [23442.851131]  schedule+0x28/0x80
Dec 23 00:12:21 wawel kernel: [23442.851138] 
schedule_preempt_disabled+0xa/0x10
Dec 23 00:12:21 wawel kernel: [23442.851145] __mutex_lock.isra.8+0x2b5/0x4a0
Dec 23 00:12:21 wawel kernel: [23442.851228] 
btrfs_relocate_block_group.cold.48+0x48/0x1ae [btrfs]
Dec 23 00:12:21 wawel kernel: [23442.851310] 
btrfs_relocate_chunk+0x31/0xa0 [btrfs]
Dec 23 00:12:21 wawel kernel: [23442.851391] 
btrfs_shrink_device+0x1e1/0x550 [btrfs]
Dec 23 00:12:21 wawel kernel: [23442.851469]  ? 
btrfs_find_device_missing_or_by_path+0x110/0x130 [btrfs]
Dec 23 00:12:21 wawel kernel: [23442.851544] btrfs_rm_device+0x165/0x590 
[btrfs]
Dec 23 00:12:21 wawel kernel: [23442.851554]  ? 
__kmalloc_track_caller+0x16c/0x210
Dec 23 00:12:21 wawel kernel: [23442.851562]  ? 
__check_object_size+0x15d/0x189
Dec 23 00:12:21 wawel kernel: [23442.851637] btrfs_ioctl+0x2a8e/0x2d90 
[btrfs]
Dec 23 00:12:21 wawel kernel: [23442.851646]  ? 
__check_object_size+0x15d/0x189
Dec 23 00:12:21 wawel kernel: [23442.851655]  ? _copy_to_user+0x26/0x30
Dec 23 00:12:21 wawel kernel: [23442.851662]  ? cp_new_stat+0x150/0x180
Dec 23 00:12:21 wawel kernel: [23442.851672]  ? do_vfs_ioctl+0xa4/0x630
Dec 23 00:12:21 wawel kernel: [23442.851678] do_vfs_ioctl+0xa4/0x630
Dec 23 00:12:21 wawel kernel: [23442.851685]  ? __do_sys_newstat+0x48/0x70
Dec 23 00:12:21 wawel kernel: [23442.851694]  ksys_ioctl+0x60/0x90
Dec 23 00:12:21 wawel kernel: [23442.851702] __x64_sys_ioctl+0x16/0x20
Dec 23 00:12:21 wawel kernel: [23442.851710] do_syscall_64+0x53/0x110
Dec 23 00:12:21 wawel kernel: [23442.851719] 
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 23 00:12:21 wawel kernel: [23442.851725] RIP: 0033:0x7f3d307fd427
Dec 23 00:12:21 wawel kernel: [23442.851736] Code: Bad RIP value.
Dec 23 00:12:21 wawel kernel: [23442.851739] RSP: 002b:00007ffc69d36b48 
EFLAGS: 00000206 ORIG_RAX: 0000000000000010
Dec 23 00:12:21 wawel kernel: [23442.851746] RAX: ffffffffffffffda RBX: 
0000000000000000 RCX: 00007f3d307fd427
Dec 23 00:12:21 wawel kernel: [23442.851749] RDX: 00007ffc69d37b68 RSI: 
000000005000943a RDI: 0000000000000003
Dec 23 00:12:21 wawel kernel: [23442.851752] RBP: 0000000000000001 R08: 
0000000000000008 R09: 0000000000000078
Dec 23 00:12:21 wawel kernel: [23442.851756] R10: fffffffffffffa4a R11: 
0000000000000206 R12: 00007ffc69d38ce8
Dec 23 00:12:21 wawel kernel: [23442.851759] R13: 0000000000000000 R14: 
0000000000000003 R15: 0000000000000003
Dec 23 00:14:22 wawel kernel: [23563.682406] INFO: task btrfs:2546 
blocked for more than 120 seconds.
Dec 23 00:14:22 wawel kernel: [23563.682424]       Not tainted 
4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2
Dec 23 00:14:22 wawel kernel: [23563.682434] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 23 00:14:22 wawel kernel: [23563.682446] btrfs           D    0 
2546   2539 0x00000000
Dec 23 00:14:22 wawel kernel: [23563.682453] Call Trace:
Dec 23 00:14:22 wawel kernel: [23563.682470]  ? __schedule+0x2a2/0x870
Dec 23 00:14:22 wawel kernel: [23563.682480]  schedule+0x28/0x80
Dec 23 00:14:22 wawel kernel: [23563.682486] 
schedule_preempt_disabled+0xa/0x10
Dec 23 00:14:22 wawel kernel: [23563.682493] __mutex_lock.isra.8+0x2b5/0x4a0
Dec 23 00:14:22 wawel kernel: [23563.682577] 
btrfs_relocate_block_group.cold.48+0x48/0x1ae [btrfs]
Dec 23 00:14:22 wawel kernel: [23563.682658] 
btrfs_relocate_chunk+0x31/0xa0 [btrfs]
Dec 23 00:14:22 wawel kernel: [23563.682737] 
btrfs_shrink_device+0x1e1/0x550 [btrfs]
Dec 23 00:14:22 wawel kernel: [23563.682817]  ? 
btrfs_find_device_missing_or_by_path+0x110/0x130 [btrfs]
Dec 23 00:14:22 wawel kernel: [23563.682892] btrfs_rm_device+0x165/0x590 
[btrfs]
Dec 23 00:14:22 wawel kernel: [23563.682902]  ? 
__kmalloc_track_caller+0x16c/0x210
Dec 23 00:14:22 wawel kernel: [23563.682910]  ? 
__check_object_size+0x15d/0x189
Dec 23 00:14:22 wawel kernel: [23563.682984] btrfs_ioctl+0x2a8e/0x2d90 
[btrfs]
Dec 23 00:14:22 wawel kernel: [23563.682993]  ? 
__check_object_size+0x15d/0x189
Dec 23 00:14:22 wawel kernel: [23563.683002]  ? _copy_to_user+0x26/0x30
Dec 23 00:14:22 wawel kernel: [23563.683009]  ? cp_new_stat+0x150/0x180
Dec 23 00:14:22 wawel kernel: [23563.683018]  ? do_vfs_ioctl+0xa4/0x630
Dec 23 00:14:22 wawel kernel: [23563.683024] do_vfs_ioctl+0xa4/0x630
Dec 23 00:14:22 wawel kernel: [23563.683032]  ? __do_sys_newstat+0x48/0x70
Dec 23 00:14:22 wawel kernel: [23563.683040]  ksys_ioctl+0x60/0x90
Dec 23 00:14:22 wawel kernel: [23563.683048] __x64_sys_ioctl+0x16/0x20
Dec 23 00:14:22 wawel kernel: [23563.683056] do_syscall_64+0x53/0x110
Dec 23 00:14:22 wawel kernel: [23563.683065] 
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 23 00:14:22 wawel kernel: [23563.683071] RIP: 0033:0x7f3d307fd427
Dec 23 00:14:22 wawel kernel: [23563.683081] Code: Bad RIP value.
Dec 23 00:14:22 wawel kernel: [23563.683085] RSP: 002b:00007ffc69d36b48 
EFLAGS: 00000206 ORIG_RAX: 0000000000000010
Dec 23 00:14:22 wawel kernel: [23563.683092] RAX: ffffffffffffffda RBX: 
0000000000000000 RCX: 00007f3d307fd427
Dec 23 00:14:22 wawel kernel: [23563.683096] RDX: 00007ffc69d37b68 RSI: 
000000005000943a RDI: 0000000000000003
Dec 23 00:14:22 wawel kernel: [23563.683099] RBP: 0000000000000001 R08: 
0000000000000008 R09: 0000000000000078
Dec 23 00:14:22 wawel kernel: [23563.683102] R10: fffffffffffffa4a R11: 
0000000000000206 R12: 00007ffc69d38ce8
Dec 23 00:14:22 wawel kernel: [23563.683104] R13: 0000000000000000 R14: 
0000000000000003 R15: 0000000000000003
Dec 23 00:14:55 wawel kernel: [23596.566285] BTRFS info (device sda2): 
found 16 extents
Dec 23 00:15:21 wawel kernel: [23622.371794] BTRFS info (device sda2): 
found 16 extents
Dec 23 00:15:21 wawel kernel: [23622.805017] BTRFS info (device sda2): 
relocating block group 7073908654080 flags data
Dec 23 00:15:32 wawel kernel: [23633.535788] BTRFS info (device sda2): 
found 18 extents
Dec 23 00:15:56 wawel kernel: [23657.552946] BTRFS info (device sda2): 
found 18 extents
Dec 23 00:15:57 wawel kernel: [23658.309332] BTRFS info (device sda2): 
relocating block group 7070687428608 flags data
Dec 23 00:16:11 wawel kernel: [23672.185586] BTRFS info (device sda2): 
found 16 extents
Dec 23 00:16:18 wawel kernel: [23679.466177] BTRFS info (device sda2): 
found 16 extents
Dec 23 00:16:18 wawel kernel: [23679.780450] BTRFS info (device sda2): 
relocating block group 7067466203136 flags data
Dec 23 00:16:29 wawel kernel: [23691.026282] BTRFS info (device sda2): 
found 8 extents
Dec 23 00:16:32 wawel kernel: [23693.545940] BTRFS info (device sda2): 
found 8 extents
Dec 23 00:16:32 wawel kernel: [23693.977786] BTRFS info (device sda2): 
relocating block group 7064244977664 flags data
Dec 23 00:16:43 wawel kernel: [23704.182394] BTRFS info (device sda2): 
found 13 extents
Dec 23 00:16:50 wawel kernel: [23711.975492] BTRFS info (device sda2): 
found 13 extents
Dec 23 00:16:51 wawel kernel: [23712.408224] BTRFS info (device sda2): 
relocating block group 7061023752192 flags data
Dec 23 00:17:01 wawel kernel: [23722.754585] BTRFS info (device sda2): 
found 11 extents
Dec 23 00:17:09 wawel kernel: [23730.262877] BTRFS info (device sda2): 
found 11 extents
Dec 23 00:17:11 wawel kernel: [23732.302305] BTRFS info (device sda2): 
found 11 extents
Dec 23 00:17:11 wawel kernel: [23732.722001] BTRFS info (device sda2): 
relocating block group 7057802526720 flags data
Dec 23 00:17:26 wawel kernel: [23747.679932] BTRFS info (device sda2): 
found 8 extents
Dec 23 00:17:30 wawel kernel: [23751.451465] BTRFS info (device sda2): 
found 8 extents
Dec 23 00:17:30 wawel kernel: [23751.876404] BTRFS info (device sda2): 
relocating block group 7054581301248 flags data
Dec 23 00:17:41 wawel kernel: [23762.977551] BTRFS info (device sda2): 
found 8 extents
Dec 23 00:17:42 wawel kernel: [23763.883422] BTRFS info (device sda2): 
found 8 extents
Dec 23 00:17:43 wawel kernel: [23764.275202] BTRFS info (device sda2): 
relocating block group 7051360075776 flags data
Dec 23 00:17:52 wawel kernel: [23773.479034] BTRFS info (device sda2): 
found 8 extents
Dec 23 00:17:53 wawel kernel: [23774.431145] BTRFS info (device sda2): 
found 8 extents
Dec 23 00:17:53 wawel kernel: [23774.905979] BTRFS info (device sda2): 
relocating block group 7048138850304 flags data
Dec 23 00:18:21 wawel kernel: [23802.511276] BTRFS info (device sda2): 
found 18 extents
Dec 23 00:18:47 wawel kernel: [23828.664441] BTRFS info (device sda2): 
found 18 extents
Dec 23 00:18:51 wawel kernel: [23832.165407] BTRFS info (device sda2): 
found 9 extents
Dec 23 00:18:51 wawel kernel: [23832.596833] BTRFS info (device sda2): 
relocating block group 7044917624832 flags data
Dec 23 00:19:07 wawel kernel: [23848.329302] BTRFS info (device sda2): 
found 9 extents
Dec 23 00:19:18 wawel kernel: [23859.557383] BTRFS info (device sda2): 
found 9 extents
Dec 23 00:19:18 wawel kernel: [23859.916559] BTRFS info (device sda2): 
relocating block group 7041696399360 flags data
Dec 23 00:19:31 wawel kernel: [23872.555941] BTRFS info (device sda2): 
found 18 extents
Dec 23 00:19:46 wawel kernel: [23888.035472] BTRFS info (device sda2): 
found 18 extents
Dec 23 00:19:47 wawel kernel: [23888.620939] BTRFS info (device sda2): 
relocating block group 7038475173888 flags data
Dec 23 00:19:59 wawel kernel: [23900.878661] BTRFS info (device sda2): 
found 9 extents
Dec 23 00:20:14 wawel kernel: [23915.924062] BTRFS info (device sda2): 
found 9 extents
Dec 23 00:20:15 wawel kernel: [23916.373951] BTRFS info (device sda2): 
relocating block group 7035253948416 flags data
Dec 23 00:20:29 wawel kernel: [23930.526982] BTRFS info (device sda2): 
found 13 extents
Dec 23 00:20:40 wawel kernel: [23941.166196] BTRFS info (device sda2): 
found 13 extents
Dec 23 00:20:40 wawel kernel: [23941.603215] BTRFS info (device sda2): 
relocating block group 7032032722944 flags data
Dec 23 00:20:52 wawel kernel: [23953.361622] BTRFS info (device sda2): 
found 8 extents
Dec 23 00:20:54 wawel kernel: [23955.734989] BTRFS info (device sda2): 
found 8 extents
Dec 23 00:20:55 wawel kernel: [23956.150691] BTRFS info (device sda2): 
relocating block group 7028811497472 flags data
Dec 23 00:21:07 wawel kernel: [23968.438551] BTRFS info (device sda2): 
found 11 extents
Dec 23 00:21:11 wawel kernel: [23972.732940] BTRFS info (device sda2): 
found 11 extents
Dec 23 00:21:12 wawel kernel: [23973.122960] BTRFS info (device sda2): 
relocating block group 7025590272000 flags data
Dec 23 00:21:25 wawel kernel: [23986.098275] BTRFS info (device sda2): 
found 10 extents
Dec 23 00:21:43 wawel kernel: [24004.479645] BTRFS info (device sda2): 
found 10 extents
Dec 23 00:21:58 wawel kernel: [24019.409741] BTRFS info (device sda2): 
relocating block group 7022369046528 flags data
Dec 23 00:22:16 wawel kernel: [24037.905347] BTRFS info (device sda2): 
found 10 extents
Dec 23 00:22:36 wawel kernel: [24057.193719] BTRFS info (device sda2): 
found 10 extents
Dec 23 00:22:42 wawel kernel: [24063.236104] BTRFS info (device sda2): 
relocating block group 7019147821056 flags data
Dec 23 00:22:57 wawel kernel: [24078.235035] BTRFS info (device sda2): 
found 10 extents
Dec 23 00:23:17 wawel kernel: [24098.884138] BTRFS info (device sda2): 
found 10 extents
Dec 23 00:23:19 wawel kernel: [24100.237497] BTRFS info (device sda2): 
relocating block group 7015926595584 flags data
Dec 23 00:23:32 wawel kernel: [24113.252248] BTRFS info (device sda2): 
found 11 extents

Dec 26 09:06:30 wawel kernel: [124813.206299] BTRFS info (device sda2): 
found 538 extents
Dec 26 09:06:41 wawel kernel: [124824.121993] BTRFS info (device sda2): 
relocating block group 2862659665920 flags data
Dec 26 09:07:01 wawel kernel: [124844.420159] BTRFS info (device sda2): 
found 76 extents
Dec 26 09:07:26 wawel kernel: [124869.796349] BTRFS info (device sda2): 
found 76 extents
Dec 26 09:07:33 wawel kernel: [124876.379783] BTRFS info (device sda2): 
relocating block group 2859438440448 flags data
Dec 26 09:07:55 wawel kernel: [124898.518810] BTRFS info (device sda2): 
found 314 extents
Dec 26 09:08:20 wawel kernel: [124922.923430] BTRFS info (device sda2): 
found 314 extents
Dec 26 09:08:24 wawel kernel: [124927.088396] BTRFS info (device sda2): 
relocating block group 2856217214976 flags data
Dec 26 09:09:01 wawel kernel: [124964.879333] BTRFS info (device sda2): 
found 3257 extents
Dec 26 09:10:07 wawel kernel: [125030.251298] BTRFS info (device sda2): 
found 3257 extents
Dec 26 09:10:27 wawel kernel: [125050.474861] BTRFS info (device sda2): 
relocating block group 2852995989504 flags data
Dec 26 09:11:17 wawel kernel: [125100.646368] BTRFS info (device sda2): 
found 720 extents
Dec 26 09:11:57 wawel kernel: [125140.038078] BTRFS info (device sda2): 
found 720 extents
Dec 26 09:12:00 wawel kernel: [125143.489371] BTRFS info (device sda2): 
relocating block group 2849774764032 flags data
Dec 26 09:12:30 wawel kernel: [125173.882124] BTRFS info (device sda2): 
found 781 extents
Dec 26 09:13:13 wawel kernel: [125216.215971] BTRFS info (device sda2): 
found 781 extents
Dec 26 09:13:30 wawel kernel: [125233.369639] BTRFS info (device sda2): 
relocating block group 2846553538560 flags data
Dec 26 09:14:10 wawel kernel: [125273.480275] BTRFS info (device sda2): 
found 863 extents
Dec 26 09:14:40 wawel kernel: [125303.044820] BTRFS info (device sda2): 
found 863 extents
Dec 26 09:14:50 wawel kernel: [125313.825051] BTRFS info (device sda2): 
relocating block group 2843332313088 flags data
Dec 26 09:15:11 wawel kernel: [125334.898553] BTRFS info (device sda2): 
found 605 extents
Dec 26 09:15:34 wawel kernel: [125357.032946] BTRFS info (device sda2): 
found 605 extents
Dec 26 09:15:39 wawel kernel: [125362.305375] BTRFS info (device sda2): 
relocating block group 2840111087616 flags data
Dec 26 09:15:57 wawel kernel: [125380.738443] BTRFS info (device sda2): 
found 822 extents
Dec 26 09:16:27 wawel kernel: [125410.554652] BTRFS info (device sda2): 
found 822 extents
Dec 26 09:16:33 wawel kernel: [125416.795941] BTRFS info (device sda2): 
found 1 extents
Dec 26 09:16:40 wawel kernel: [125423.287556] BTRFS info (device sda2): 
found 1 extents
Dec 26 09:16:41 wawel kernel: [125424.662419] BTRFS info (device sda2): 
relocating block group 2836889862144 flags data
Dec 26 09:17:58 wawel kernel: [125500.923465] BTRFS info (device sda2): 
found 6105 extents
Dec 26 09:20:20 wawel kernel: [125643.445337] BTRFS info (device sda2): 
found 6105 extents
Dec 26 09:20:37 wawel kernel: [125660.315318] BTRFS info (device sda2): 
found 9 extents
Dec 26 09:21:19 wawel kernel: [125702.836577] BTRFS info (device sda2): 
found 9 extents
Dec 26 09:21:36 wawel kernel: [125719.739584] BTRFS info (device sda2): 
relocating block group 2833668636672 flags data
Dec 26 09:22:15 wawel kernel: [125758.510136] BTRFS info (device sda2): 
found 1732 extents
Dec 26 09:22:39 wawel kernel: [125782.335053] BTRFS info (device sda2): 
found 1732 extents
Dec 26 09:22:43 wawel kernel: [125786.733606] BTRFS info (device sda2): 
relocating block group 2830447411200 flags data
Dec 26 09:23:14 wawel kernel: [125817.174210] BTRFS info (device sda2): 
found 1540 extents
Dec 26 09:23:32 wawel kernel: [125835.049452] BTRFS info (device sda2): 
found 1540 extents
Dec 26 09:23:37 wawel kernel: [125840.764399] BTRFS info (device sda2): 
relocating block group 2827226185728 flags data
Dec 26 09:23:59 wawel kernel: [125862.499798] BTRFS info (device sda2): 
found 1730 extents
Dec 26 09:24:20 wawel kernel: [125883.528931] BTRFS info (device sda2): 
found 1730 extents
Dec 26 09:24:25 wawel kernel: [125888.791587] BTRFS info (device sda2): 
relocating block group 2824004960256 flags data
Dec 26 09:25:00 wawel kernel: [125923.748361] BTRFS info (device sda2): 
found 1453 extents
Dec 26 09:25:30 wawel kernel: [125953.500151] BTRFS info (device sda2): 
found 1453 extents
Dec 26 09:25:47 wawel kernel: [125970.178853] BTRFS info (device sda2): 
relocating block group 2820783734784 flags data
Dec 26 09:26:19 wawel kernel: [126001.953383] BTRFS info (device sda2): 
found 1321 extents
Dec 26 09:26:47 wawel kernel: [126030.506902] BTRFS info (device sda2): 
found 1321 extents
Dec 26 09:26:59 wawel kernel: [126042.292928] BTRFS info (device sda2): 
relocating block group 2817562509312 flags data
Dec 26 09:27:26 wawel kernel: [126069.399351] BTRFS info (device sda2): 
found 1321 extents
Dec 26 09:28:54 wawel kernel: [126157.696317] BTRFS info (device sda2): 
found 1321 extents
Dec 26 09:29:05 wawel kernel: [126167.934754] BTRFS info (device sda2): 
relocating block group 2814341283840 flags data
Dec 26 09:29:29 wawel kernel: [126192.741151] BTRFS info (device sda2): 
found 1308 extents
Dec 26 09:29:48 wawel kernel: [126210.929302] BTRFS info (device sda2): 
found 1308 extents
Dec 26 09:29:55 wawel kernel: [126218.316856] BTRFS info (device sda2): 
relocating block group 2811120058368 flags data
Dec 26 09:30:24 wawel kernel: [126247.373223] BTRFS info (device sda2): 
found 1302 extents
Dec 26 09:30:50 wawel kernel: [126273.241357] BTRFS info (device sda2): 
found 1302 extents
Dec 26 09:31:06 wawel kernel: [126289.751136] BTRFS info (device sda2): 
found 1301 extents
Dec 26 09:31:11 wawel kernel: [126293.918325] BTRFS info (device sda2): 
relocating block group 2807898832896 flags data
Dec 26 09:31:29 wawel kernel: [126312.570431] BTRFS info (device sda2): 
found 1239 extents
Dec 26 09:31:57 wawel kernel: [126340.525339] BTRFS info (device sda2): 
found 1239 extents
Dec 26 09:32:07 wawel kernel: [126350.580726] BTRFS info (device sda2): 
relocating block group 2804677607424 flags data
Dec 26 09:32:38 wawel kernel: [126381.535312] BTRFS info (device sda2): 
found 1318 extents
Dec 26 09:33:04 wawel kernel: [126407.888226] BTRFS info (device sda2): 
found 1318 extents
Dec 26 09:33:17 wawel kernel: [126420.834830] BTRFS info (device sda2): 
relocating block group 2801456381952 flags data
Dec 26 09:34:57 wawel kernel: [126520.878652] BTRFS info (device sda2): 
found 1314 extents
Dec 26 09:35:30 wawel kernel: [126553.629757] BTRFS info (device sda2): 
found 1314 extents
Dec 26 09:35:40 wawel kernel: [126563.198254] BTRFS info (device sda2): 
relocating block group 2798235156480 flags data
Dec 26 09:36:05 wawel kernel: [126588.438641] BTRFS info (device sda2): 
found 1259 extents
Dec 26 09:36:23 wawel kernel: [126605.928090] BTRFS info (device sda2): 
found 1259 extents
Dec 26 09:36:31 wawel kernel: [126614.349082] BTRFS info (device sda2): 
relocating block group 2795013931008 flags data
Dec 26 09:36:55 wawel kernel: [126638.306903] BTRFS info (device sda2): 
found 1254 extents
Dec 26 09:37:13 wawel kernel: [126656.883642] BTRFS info (device sda2): 
found 1254 extents
Dec 26 09:37:18 wawel kernel: [126661.319326] BTRFS info (device sda2): 
relocating block group 2791792705536 flags data
Dec 26 09:37:39 wawel kernel: [126682.613613] BTRFS info (device sda2): 
found 1034 extents
Dec 26 09:37:53 wawel kernel: [126695.934991] BTRFS info (device sda2): 
found 1034 extents
Dec 26 09:37:58 wawel kernel: [126701.052766] BTRFS info (device sda2): 
relocating block group 2788571480064 flags data
Dec 26 09:38:23 wawel kernel: [126726.414875] BTRFS info (device sda2): 
found 633 extents
Dec 26 09:38:54 wawel kernel: [126757.375947] BTRFS info (device sda2): 
found 633 extents
Dec 26 09:39:05 wawel kernel: [126768.679478] BTRFS info (device sda2): 
relocating block group 2785350254592 flags data
Dec 26 09:39:36 wawel kernel: [126799.425144] BTRFS info (device sda2): 
found 693 extents
Dec 26 09:40:06 wawel kernel: [126829.604815] BTRFS info (device sda2): 
found 693 extents
Dec 26 09:40:25 wawel kernel: [126848.363498] BTRFS info (device sda2): 
relocating block group 2782129029120 flags data
Dec 26 09:41:05 wawel kernel: [126888.549973] BTRFS info (device sda2): 
found 514 extents
Dec 26 09:41:36 wawel kernel: [126919.490217] BTRFS info (device sda2): 
found 514 extents
Dec 26 09:42:03 wawel kernel: [126946.123272] BTRFS info (device sda2): 
relocating block group 2778907803648 flags data
Dec 26 09:42:36 wawel kernel: [126979.670180] BTRFS info (device sda2): 
found 519 extents
Dec 26 09:42:56 wawel kernel: [126999.459002] BTRFS info (device sda2): 
found 519 extents
Dec 26 09:43:05 wawel kernel: [127008.768350] BTRFS info (device sda2): 
relocating block group 2775686578176 flags data
Dec 26 09:43:28 wawel kernel: [127031.510719] BTRFS info (device sda2): 
found 653 extents
Dec 26 09:43:44 wawel kernel: [127047.653185] BTRFS info (device sda2): 
found 653 extents
Dec 26 09:43:51 wawel kernel: [127054.633127] BTRFS info (device sda2): 
relocating block group 2772465352704 flags data
Dec 26 09:44:16 wawel kernel: [127079.730172] BTRFS info (device sda2): 
found 609 extents
Dec 26 09:44:51 wawel kernel: [127114.373682] BTRFS info (device sda2): 
found 609 extents
Dec 26 09:45:09 wawel kernel: [127132.516306] BTRFS info (device sda2): 
relocating block group 2770317869056 flags data
Dec 26 09:46:06 wawel kernel: [127189.531463] BTRFS info (device sda2): 
found 2162 extents
Dec 26 09:47:13 wawel kernel: [127256.870794] BTRFS info (device sda2): 
found 2162 extents
Dec 26 09:47:23 wawel kernel: [127266.335336] BTRFS info (device sda2): 
relocating block group 2768170385408 flags metadata|raid1
Dec 26 10:02:40 wawel kernel: [128183.204056] BTRFS info (device sda2): 
found 64344 extents
Dec 26 10:02:51 wawel kernel: [128194.752371] BTRFS info (device sda2): 
relocating block group 2766022901760 flags data
Dec 26 10:05:51 wawel kernel: [128374.372685] BTRFS info (device sda2): 
found 14657 extents
Dec 26 10:11:59 wawel kernel: [128742.106841] BTRFS info (device sda2): 
found 14657 extents
Dec 26 10:12:04 wawel kernel: [128747.164928] BTRFS info (device sda2): 
relocating block group 2762801676288 flags data
Dec 26 10:12:46 wawel kernel: [128789.843093] BTRFS info (device sda2): 
found 687 extents
Dec 26 10:13:37 wawel kernel: [128840.608651] BTRFS info (device sda2): 
found 687 extents
Dec 26 10:13:57 wawel kernel: [128860.515578] BTRFS info (device sda2): 
found 4 extents
Dec 26 10:14:04 wawel kernel: [128867.761205] BTRFS info (device sda2): 
relocating block group 2759580450816 flags data
Dec 26 10:14:48 wawel kernel: [128911.046501] BTRFS info (device sda2): 
found 2078 extents
Dec 26 10:23:54 wawel kernel: [129457.365421] BTRFS info (device sda2): 
found 2078 extents
Dec 26 10:24:17 wawel kernel: [129480.141652] BTRFS info (device sda2): 
relocating block group 2756359225344 flags data
Dec 26 10:25:06 wawel kernel: [129529.130468] BTRFS info (device sda2): 
found 1948 extents
Dec 26 10:37:54 wawel kernel: [130297.130706] BTRFS info (device sda2): 
found 1948 extents
Dec 26 10:38:07 wawel kernel: [130310.753097] BTRFS info (device sda2): 
relocating block group 2753137999872 flags data
Dec 26 10:39:52 wawel kernel: [130415.674175] BTRFS info (device sda2): 
found 658 extents
Dec 26 10:43:24 wawel kernel: [130627.255410] BTRFS info (device sda2): 
found 658 extents
Dec 26 10:43:42 wawel kernel: [130645.434399] BTRFS info (device sda2): 
found 42 extents
Dec 26 10:43:53 wawel kernel: [130656.815802] BTRFS info (device sda2): 
found 42 extents
Dec 26 10:44:05 wawel kernel: [130668.284526] BTRFS info (device sda2): 
relocating block group 2749916774400 flags data
Dec 26 10:45:37 wawel kernel: [130760.519557] BTRFS info (device sda2): 
found 7795 extents
Dec 26 11:24:02 wawel kernel: [133065.590359] BTRFS info (device sda2): 
found 7795 extents
Dec 26 11:24:49 wawel kernel: [133112.551574] BTRFS info (device sda2): 
relocating block group 2746695548928 flags data
Dec 26 11:25:21 wawel kernel: [133144.523206] BTRFS info (device sda2): 
found 67 extents
Dec 26 11:26:23 wawel kernel: [133206.718874] BTRFS info (device sda2): 
found 67 extents
Dec 26 11:26:34 wawel kernel: [133217.364169] BTRFS info (device sda2): 
relocating block group 2743474323456 flags data
Dec 26 11:27:09 wawel kernel: [133252.030493] BTRFS info (device sda2): 
found 154 extents
Dec 26 11:28:37 wawel kernel: [133340.658370] BTRFS info (device sda2): 
found 154 extents
Dec 26 11:28:48 wawel kernel: [133350.928232] BTRFS info (device sda2): 
relocating block group 2740253097984 flags data
Dec 26 11:29:31 wawel kernel: [133394.349923] BTRFS info (device sda2): 
found 3377 extents
Dec 26 11:50:07 wawel kernel: [134630.385264] BTRFS info (device sda2): 
found 3377 extents
Dec 26 11:50:40 wawel kernel: [134663.264144] BTRFS info (device sda2): 
relocating block group 2737031872512 flags data
Dec 26 11:51:31 wawel kernel: [134714.239962] BTRFS info (device sda2): 
found 244 extents
Dec 26 11:54:06 wawel kernel: [134869.320437] BTRFS info (device sda2): 
found 244 extents
Dec 26 11:54:28 wawel kernel: [134891.019946] BTRFS info (device sda2): 
relocating block group 2733810647040 flags data
Dec 26 11:54:57 wawel kernel: [134920.370189] BTRFS info (device sda2): 
found 560 extents
Dec 26 11:57:30 wawel kernel: [135073.351351] BTRFS info (device sda2): 
found 560 extents
Dec 26 11:57:53 wawel kernel: [135096.801125] BTRFS info (device sda2): 
relocating block group 2730589421568 flags data
Dec 26 11:58:26 wawel kernel: [135129.186591] BTRFS info (device sda2): 
found 243 extents
Dec 26 12:00:28 wawel kernel: [135251.901624] BTRFS info (device sda2): 
found 243 extents
Dec 26 12:01:18 wawel kernel: [135301.341113] BTRFS info (device sda2): 
relocating block group 2727368196096 flags data
Dec 26 12:02:58 wawel kernel: [135401.616231] BTRFS info (device sda2): 
found 7966 extents
Dec 26 12:35:08 wawel kernel: [137331.224676] BTRFS info (device sda2): 
found 7966 extents
Dec 26 12:35:25 wawel kernel: [137348.767051] BTRFS info (device sda2): 
relocating block group 2724146970624 flags data
Dec 26 12:36:18 wawel kernel: [137401.253589] BTRFS info (device sda2): 
found 6267 extents
Dec 26 13:00:55 wawel kernel: [138878.357733] BTRFS info (device sda2): 
found 6267 extents
Dec 26 13:01:11 wawel kernel: [138893.979362] BTRFS info (device sda2): 
relocating block group 2720925745152 flags data
Dec 26 13:02:56 wawel kernel: [138999.875074] BTRFS info (device sda2): 
found 9154 extents
Dec 26 13:29:38 wawel kernel: [140601.538362] BTRFS info (device sda2): 
found 9154 extents
Dec 26 13:30:08 wawel kernel: [140631.396198] BTRFS info (device sda2): 
relocating block group 2717704519680 flags data
Dec 26 13:31:04 wawel kernel: [140687.543046] BTRFS info (device sda2): 
found 7404 extents
Dec 26 13:54:06 wawel kernel: [142069.542451] BTRFS info (device sda2): 
found 7404 extents
Dec 26 13:54:36 wawel kernel: [142099.763738] BTRFS info (device sda2): 
relocating block group 2714483294208 flags data
Dec 26 13:56:01 wawel kernel: [142184.270785] BTRFS info (device sda2): 
found 5820 extents
Dec 26 14:23:00 wawel kernel: [143803.389716] BTRFS info (device sda2): 
found 5820 extents
Dec 26 14:23:05 wawel kernel: [143808.503120] BTRFS info (device sda2): 
relocating block group 2711262068736 flags data
Dec 26 14:25:19 wawel kernel: [143942.663324] BTRFS info (device sda2): 
found 8754 extents



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2019-12-26 14:05       ` Leszek Dubiel
@ 2019-12-26 14:21         ` Remi Gauvin
  2019-12-26 15:42           ` Leszek Dubiel
  2019-12-26 22:40         ` Chris Murphy
  1 sibling, 1 reply; 32+ messages in thread
From: Remi Gauvin @ 2019-12-26 14:21 UTC (permalink / raw)
  To: Leszek Dubiel, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2086 bytes --]

On 2019-12-26 9:05 a.m., Leszek Dubiel wrote:

> 
> I didn't use Raid6, but Raid0.
> 
> 
> Yes I have chcecked dmesg in first place (see original mail).
> 
> Btrfs no error logged.
> 

I'm sorry, that was reading comprehension failure on my part.

But regardless, your dmesg clearly shows drive read failures, (to be
expected with a drive clearly in a failing state.)

Recovery of Raid0 with a faulty drive is a job for restore from backup,
(or attempt to scrape what you can with array mounted ro).. That's what
Raid0 *means*.



Dec 22 19:20:11 wawel kernel: [ 5912.116874] ata1.00: exception Emask
0x0 SAct 0x1f80 SErr 0x0 action 0x0
Dec 22 19:20:11 wawel kernel: [ 5912.116878] ata1.00: irq_stat 0x40000008
Dec 22 19:20:11 wawel kernel: [ 5912.116880] ata1.00: failed command:
READ FPDMA QUEUED
Dec 22 19:20:11 wawel kernel: [ 5912.116882] ata1.00: cmd
60/00:38:00:00:98/0a:00:45:01:00/40 tag 7 ncq dma 1310720 in
Dec 22 19:20:11 wawel kernel: [ 5912.116882]          res
43/40:18:e8:05:98/00:04:45:01:00/40 Emask 0x409 (media error) <F>
Dec 22 19:20:11 wawel kernel: [ 5912.116885] ata1.00: status: { DRDY
SENSE ERR }
Dec 22 19:20:11 wawel kernel: [ 5912.116886] ata1.00: error: { UNC }
Dec 22 19:20:11 wawel kernel: [ 5912.153695] ata1.00: configured for
UDMA/133
Dec 22 19:20:11 wawel kernel: [ 5912.153707] sd 0:0:0:0: [sda] tag#7
FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 22 19:20:11 wawel kernel: [ 5912.153709] sd 0:0:0:0: [sda] tag#7
Sense Key : Medium Error [current]
Dec 22 19:20:11 wawel kernel: [ 5912.153710] sd 0:0:0:0: [sda] tag#7
Add. Sense: Unrecovered read error - auto reallocate failed
Dec 22 19:20:11 wawel kernel: [ 5912.153711] sd 0:0:0:0: [sda] tag#7
CDB: Read(16) 88 00 00 00 00 01 45 98 00 00 00 00 0a 00 00 00
Dec 22 19:20:11 wawel kernel: [ 5912.153712] print_req_error: I/O error,
dev sda, sector 5462556672
Dec 22 19:20:11 wawel kernel: [ 5912.153724] ata1: EH complete

I'm assuming that /dev/sda is the the drive you already identified as
failing,, if not, you're totally hosed.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2019-12-26 14:21         ` Remi Gauvin
@ 2019-12-26 15:42           ` Leszek Dubiel
  0 siblings, 0 replies; 32+ messages in thread
From: Leszek Dubiel @ 2019-12-26 15:42 UTC (permalink / raw)
  To: Remi Gauvin, linux-btrfs



> Recovery of Raid0 with a faulty drive is a job for restore from backup,
> (or attempt to scrape what you can with array mounted ro).. That's what
> Raid0 *means*.
> I'm assuming that /dev/sda is the the drive you already identified as
> failing,, if not, you're totally hosed.


Yes -- /dev/sda is getting errors, it's failing. I didin't pass smart 
self-tests.
  Thank you for inspection of my logs.

I will try to pe patient until btrfs removes /dev/sda2 from array.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2019-12-25 22:35 very slow "btrfs dev delete" 3x6Tb, 7Tb of data Leszek Dubiel
  2019-12-26  5:08 ` Qu Wenruo
@ 2019-12-26 22:15 ` Chris Murphy
  2019-12-26 22:48   ` Leszek Dubiel
  1 sibling, 1 reply; 32+ messages in thread
From: Chris Murphy @ 2019-12-26 22:15 UTC (permalink / raw)
  To: Leszek Dubiel; +Cc: Btrfs BTRFS

On Wed, Dec 25, 2019 at 3:42 PM Leszek Dubiel <leszek@dubiel.pl> wrote:
>
>
> Hello!
>
> I have a server: 3 disks, 6TB each, total 17TB space, occupied with data
> 6TB.
>
>
> One of the disks got smart errors:
>
>      197 Current_Pending_Sector  0x0022   100   100   000 Old_age
> Always       -       16
>      198 Offline_Uncorrectable   0x0008   100   100   000 Old_age
> Offline      -       2
>
> And didn't pass tests:
>
>      Num  Test_Description    Status                  Remaining
> LifeTime(hours)  LBA_of_first_error
>      # 1  Extended offline    Completed: read failure 90%
> 3575         -
>      # 2  Short offline       Completed without error 00%
> 3574         -
>      # 3  Extended offline    Completed: read failure 90%
> 3574         -
>      # 4  Extended offline    Completed: read failure 90%
> 3560         -
>      # 5  Extended offline    Completed: read failure 50%
> 3559         -
>
> I decided to remove that drive from BTRFS system:


What is the SCT ERC for each drive? This applies to mdadm, lvm, and
btrfs RAID. While you are not using raid for data, you are using it
for metadata. And also mismatching SCT ERC with kernel's command timer
is not good for any configuration, the SCT ERC must be shorter than
the kernel command timer or inevitably bad sector errors are masked by
the kernel resetting the link to the device.

https://raid.wiki.kernel.org/index.php/Timeout_Mismatch

And when was the last time a scrub was done on the volume? Were there
any errors reported by either user space tools or kernel? And what
were they?

I do agree, however, that this configuration should have higher
performing reads from the device being deleted, unless part of the
reason why it's so slow is that one or more drives is trying to do
deep recoveries.

My suggestion for single profile multiple device is to leave the per
drive SCT ERC disabled (or a high value, e.g. 1200 deciseconds) and
also change the per block device command timer (this is a kernel timer
set per device) to be at least 120 seconds. This will allow the drive
to do deep recovery, which will make it dog slow, but if necessary
it'll improve the chances of getting data off the drives. If you don't
care about getting data off the drives, i.e. you have a backup, then
you can set the SCT ERC value to 70 deciseconds. Any bad sector errors
will quickly result in a read error, Btrfs will report the affected
file. IF it's metadata that's affected, it'll get a good copy from the
mirror, and fix up the bad copy, automatically.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2019-12-26 14:05       ` Leszek Dubiel
  2019-12-26 14:21         ` Remi Gauvin
@ 2019-12-26 22:40         ` Chris Murphy
  2019-12-26 22:58           ` Leszek Dubiel
  1 sibling, 1 reply; 32+ messages in thread
From: Chris Murphy @ 2019-12-26 22:40 UTC (permalink / raw)
  To: Leszek Dubiel; +Cc: Remi Gauvin, Btrfs BTRFS

On Thu, Dec 26, 2019 at 7:05 AM Leszek Dubiel <leszek@dubiel.pl> wrote:
>
> Dec 22 19:20:11 wawel kernel: [ 5912.116874] ata1.00: exception Emask
> 0x0 SAct 0x1f80 SErr 0x0 action 0x0
> Dec 22 19:20:11 wawel kernel: [ 5912.116878] ata1.00: irq_stat 0x40000008
> Dec 22 19:20:11 wawel kernel: [ 5912.116880] ata1.00: failed command:
> READ FPDMA QUEUED
> Dec 22 19:20:11 wawel kernel: [ 5912.116882] ata1.00: cmd
> 60/00:38:00:00:98/0a:00:45:01:00/40 tag 7 ncq dma 1310720 in
> Dec 22 19:20:11 wawel kernel: [ 5912.116882]          res
> 43/40:18:e8:05:98/00:04:45:01:00/40 Emask 0x409 (media error) <F>
> Dec 22 19:20:11 wawel kernel: [ 5912.116885] ata1.00: status: { DRDY
> SENSE ERR }
> Dec 22 19:20:11 wawel kernel: [ 5912.116886] ata1.00: error: { UNC }
> Dec 22 19:20:11 wawel kernel: [ 5912.153695] ata1.00: configured for
> UDMA/133
> Dec 22 19:20:11 wawel kernel: [ 5912.153707] sd 0:0:0:0: [sda] tag#7
> FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Dec 22 19:20:11 wawel kernel: [ 5912.153709] sd 0:0:0:0: [sda] tag#7
> Sense Key : Medium Error [current]
> Dec 22 19:20:11 wawel kernel: [ 5912.153710] sd 0:0:0:0: [sda] tag#7
> Add. Sense: Unrecovered read error - auto reallocate failed
> Dec 22 19:20:11 wawel kernel: [ 5912.153711] sd 0:0:0:0: [sda] tag#7
> CDB: Read(16) 88 00 00 00 00 01 45 98 00 00 00 00 0a 00 00 00
> Dec 22 19:20:11 wawel kernel: [ 5912.153712] print_req_error: I/O error,
> dev sda, sector 5462556672
> Dec 22 19:20:11 wawel kernel: [ 5912.153724] ata1: EH complete
> Dec 22 19:21:28 wawel kernel: [ 5989.527853] BTRFS info (device sda2):
> found 8 extents

Weird. This is not expected. I see a discrete read error with LBA
reported for the device, and yet Btfs shows no attempt to correct it
(using raid1 metadata) nor does it report the path to file that's
affected by this lost sector. I'm expecting to see one of those two
outcomes, given the profiles being used.


>
> Dec 23 00:08:20 wawel kernel: [23201.188424] INFO: task btrfs:2546
> blocked for more than 120 seconds.

Multiples of these, but no coinciding read errors or SATA link resets.
This suggests bad sectors in deep recover. And that would explain why
the copies are so slow. It's not a Btrfs problem per se. It's that
you've decided to have only one copy of data, self healing of data
isn't possible. The file system itself is fine, but slow because the
affected drive is slow to recover these bad sectors.

Again, dropping the SCT ERC for the drives would result in a faster
error recovery when encountering bad sectors. It also increases the
chance of data loss (not metadata loss since it's raid1).




-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2019-12-26 22:15 ` Chris Murphy
@ 2019-12-26 22:48   ` Leszek Dubiel
  0 siblings, 0 replies; 32+ messages in thread
From: Leszek Dubiel @ 2019-12-26 22:48 UTC (permalink / raw)
  To: Chris Murphy, linux-btrfs



W dniu 26.12.2019 o 23:15, Chris Murphy pisze:
 >
 > What is the SCT ERC for each drive?

Failing drive is /dev/sda.

    root@wawel:~# smartctl -l scterc /dev/sda
    smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-6-amd64] (local
    build)
    Copyright (C) 2002-17, Bruce Allen, Christian Franke,
    www.smartmontools.org

    SCT Error Recovery Control command not supported

    root@wawel:~# smartctl -l scterc /dev/sdb
    SCT Error Recovery Control:
                Read:     70 (7.0 seconds)
               Write:     70 (7.0 seconds)

    root@wawel:~# smartctl -l scterc /dev/sdc
    SCT Error Recovery Control:
                Read:    100 (10.0 seconds)
               Write:    100 (10.0 seconds)


I have changed timeout to higher:


        echo 180 >  /sys/block/sda/device/timeout



 > And when was the last time a scrub was done on the volume?

It was done few months ago unfortunatelly...




 > Were there
 > any errors reported by either user space tools or kernel? And what
 > were they?

There were NO errors from any user space tools.
Only smartctl reported uncorrectable sectors and failed smart self-tests.




Thank you for suggestions below. Data is not critical (this is backup 
server...).
I will see tommorow how is the progress.
Actually there are no errors but this is slow only...


 > My suggestion for single profile multiple device is to leave the per
 > drive SCT ERC disabled (or a high value, e.g. 1200 deciseconds) and
 > also change the per block device command timer (this is a kernel timer
 > set per device) to be at least 120 seconds.
 > This will allow the drive
 > to do deep recovery, which will make it dog slow, but if necessary
 > it'll improve the chances of getting data off the drives. If you don't
 > care about getting data off the drives, i.e. you have a backup, then
 > you can set the SCT ERC value to 70 deciseconds. Any bad sector errors
 > will quickly result in a read error, Btrfs will report the affected
 > file. IF it's metadata that's affected, it'll get a good copy from the
 > mirror, and fix up the bad copy, automatically.
 >
 >
 >



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2019-12-26 22:40         ` Chris Murphy
@ 2019-12-26 22:58           ` Leszek Dubiel
  2019-12-28 17:04             ` Leszek Dubiel
  0 siblings, 1 reply; 32+ messages in thread
From: Leszek Dubiel @ 2019-12-26 22:58 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Remi Gauvin, Btrfs BTRFS


 > Weird. This is not expected. I see a discrete read error with LBA
 > reported for the device, and yet Btfs shows no attempt to correct it
 > (using raid1 metadata) nor does it report the path to file that's
 > affected by this lost sector. I'm expecting to see one of those two
 > outcomes, given the profiles being used.

Yes. This is strange that I see no errors from btrfs.
Maybe that ata erorrs were during smart self tests of /dev/sda...

I put some logs below.




 > Dec 23 00:08:20 wawel kernel: [23201.188424] INFO: task btrfs:2546
 > blocked for more than 120 seconds.
 >
 > Multiples of these, but no coinciding read errors or SATA link resets.
 > This suggests bad sectors in deep recover. And that would explain why
 > the copies are so slow. It's not a Btrfs problem per se. It's that
 > you've decided to have only one copy of data, self healing of data
 > isn't possible. The file system itself is fine, but slow because the
 > affected drive is slow to recover these bad sectors.
 >
 > Again, dropping the SCT ERC for the drives would result in a faster
 > error recovery when encountering bad sectors. It also increases the
 > chance of data loss (not metadata loss since it's raid1)

Deep recovery theory could be correct, because two days ago
there were  16 "Current_Pending_Sector" and 2 "uncorrectable sectors"
and smart self tests failed.

Today there are zoro pending sectors and self tests passed okey.




root@wawel:~# smartctl -a /dev/sda
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-6-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD6003FZBX-00K5WB0
Serial Number:    V8GJ99HR
LU WWN Device Id: 5 000cca 098c768df
Firmware Version: 01.01A01
User Capacity:    6 001 175 126 016 bytes [6,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Dec 26 23:56:39 2019 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                     was completed without error.
                     Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test 
routine completed
                     without error or no self-test has ever
                     been run.
Total time to complete Offline
data collection:         (   87) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                     Auto Offline data collection on/off support.
                     Suspend Offline collection upon new
                     command.
                     Offline surface scan supported.
                     Self-test supported.
                     No Conveyance Self-test supported.
                     Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                     power-saving mode.
                     Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                     General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 688) minutes.
SCT capabilities:            (0x0035)    SCT Status supported.
                     SCT Feature Control supported.
                     SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      
UPDATED  WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x000b   100   100   016 Pre-fail  
Always       -       0
   2 Throughput_Performance  0x0004   130   130   054 Old_age   
Offline      -       100
   3 Spin_Up_Time            0x0007   100   100   024 Pre-fail  
Always       -       0
   4 Start_Stop_Count        0x0012   100   100   000 Old_age   
Always       -       7
   5 Reallocated_Sector_Ct   0x0033   100   100   005 Pre-fail  
Always       -       0
   7 Seek_Error_Rate         0x000a   100   100   067 Old_age   
Always       -       0
   8 Seek_Time_Performance   0x0004   128   128   020 Old_age   
Offline      -       18
   9 Power_On_Hours          0x0012   100   100   000 Old_age   
Always       -       3676
  10 Spin_Retry_Count        0x0012   100   100   060 Old_age   
Always       -       0
  12 Power_Cycle_Count       0x0032   100   100   000 Old_age   
Always       -       7
192 Power-Off_Retract_Count 0x0032   100   100   000 Old_age   
Always       -       154
193 Load_Cycle_Count        0x0012   100   100   000 Old_age   
Always       -       154
194 Temperature_Celsius     0x0002   183   183   000 Old_age   
Always       -       30 (Min/Max 21/38)
196 Reallocated_Event_Count 0x0032   100   100   000 Old_age   
Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000 Old_age   
Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000 Old_age   
Offline      -       2
199 UDMA_CRC_Error_Count    0x000a   200   200   000 Old_age   
Always       -       0

SMART Error Log Version: 1
ATA Error Count: 1
     CR = Command Register [HEX]
     FR = Features Register [HEX]
     SC = Sector Count Register [HEX]
     SN = Sector Number Register [HEX]
     CL = Cylinder Low Register [HEX]
     CH = Cylinder High Register [HEX]
     DH = Device/Head Register [HEX]
     DC = Device Command Register [HEX]
     ER = Error register [HEX]
     ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 3576 hours (149 days + 0 hours)
   When the command that caused the error occurred, the device was 
active or idle.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   40 43 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time Command/Feature_Name
   -- -- -- -- -- -- -- --  ---------------- --------------------
   60 00 38 00 00 98 40 08   1d+04:19:05.582  READ FPDMA QUEUED
   60 00 40 00 0a 98 40 08   1d+04:19:03.035  READ FPDMA QUEUED
   60 30 30 d0 ff 97 40 08   1d+04:19:03.035  READ FPDMA QUEUED
   60 00 28 d0 f5 97 40 08   1d+04:19:03.035  READ FPDMA QUEUED
   60 d0 20 00 f0 97 40 08   1d+04:19:03.035  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining 
LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error 00%      3675         -
# 2  Extended offline    Completed without error 00%      3668         -
# 3  Short offline       Completed without error 00%      3651         -
# 4  Extended offline    Completed without error 00%      3644         -
# 5  Short offline       Completed without error 00%      3627         -
# 6  Extended offline    Completed without error 00%      3622         -
# 7  Extended offline    Completed without error 00%      3604         -
# 8  Short offline       Completed without error 00%      3579         -
# 9  Extended offline    Completed: read failure 90%      3575         -
#10  Short offline       Completed without error 00%      3574         -
#11  Extended offline    Completed: read failure 90%      3574         -
#12  Extended offline    Completed: read failure 90%      3560         -
#13  Extended offline    Completed: read failure 50%      3559         -
4 of 4 failed self-tests are outdated by newer successful extended 
offline self-test # 2

SMART Selective self-test log data structure revision number 1
  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
     1        0        0  Not_testing
     2        0        0  Not_testing
     3        0        0  Not_testing
     4        0        0  Not_testing
     5        0        0  Not_testing
Selective self-test flags (0x0):
   After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.











 >



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2019-12-26 22:58           ` Leszek Dubiel
@ 2019-12-28 17:04             ` Leszek Dubiel
  2019-12-28 20:23               ` Zygo Blaxell
  0 siblings, 1 reply; 32+ messages in thread
From: Leszek Dubiel @ 2019-12-28 17:04 UTC (permalink / raw)
  To: Btrfs BTRFS; +Cc: Chris Murphy, Remi Gauvin


PROBLEM SOLVED --- btrfs was busy cleaing after snaphot deletion few 
days ago, so it dodn't have time to "dev delete", that's why it was slow

=======================


I restarted server, so job "btrfs delete" was not existent any more.
But disks were still working (!!!). I wondered why? What is BTRFS doing 
all the time?

I realized that afew days before starting "btrfs dev delete" I have 
removed many snapshots -- there were about 400 snapshots and I left 20 
only. I did that because I have read that many snapshot could slowdown 
btrfs operations severely.



I made an experiment on another testing serwer:

1. started command "watch -n1 'btrfs fi df /"
2. started "iostat -x -m"

Disks were not working at all.


3. Then I removed many shapshots on that testing server

and I was watching:

Data, single: total=6.56TiB, used=5.13TiB
System, RAID1: total=32.00MiB, used=992.00KiB
Metadata, RAID1: total=92.00GiB, used=70.56GiB
GlobalReserve, single: total=512.00MiB, used=1.39MiB

Disks started to work hard. So btrfs was probably cleaining after 
snapshot deletion.

At the beginning in "Metadata" line there was "used=70.00GiB".

            Metadata, RAID1: total=92.00GiB, used=70.00GiB

It was changing all the time... getting lower and lower. During that 
process in line

            GlobalReserve, single: total=512.00MiB, used=1.39MiB

"used=" was going up until it reached about 100MiB, then it was flushed 
to zero, and started again to fill, flush, fill, flush... some 
loop/buffer/cache (?).
It convinced me, that after snapshot deletion btrfs is really working 
hard on cleanup.
After some time "Metadata...used=" stopped changing, disks stopped 
working, server got lazy again.



I got back to main server. Started to watch "Metadata...used=". It was 
going down and down...
I waited. When "Metadata...used=" stopped changing, then "iostat -m" 
stopped showing any disk activity.

I started "btrfs dev delete" again and now speed is 50Mb/sek. Hurrray! 
Problem solved.


Sorry for not beeing clever enough to connect all this facts at the 
beginning.
Anyway -- maybe in the future someone has the same problem, then btrfs 
experts could ask him if he let btrfs do some other hard work in the 
same time (eg cleaning up after massive snapsot deletion).

Maybe it would be usful to have a tool to ask btrfs "what are you doing? 
are you busy?".
It would respond "I am cleaing up after snapshot deletion... I am 
balancing... I am scrubbing... I am resizing... I am deleting ...".





^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2019-12-28 17:04             ` Leszek Dubiel
@ 2019-12-28 20:23               ` Zygo Blaxell
  2020-01-02 18:37                 ` Leszek Dubiel
  0 siblings, 1 reply; 32+ messages in thread
From: Zygo Blaxell @ 2019-12-28 20:23 UTC (permalink / raw)
  To: Leszek Dubiel; +Cc: Btrfs BTRFS, Chris Murphy, Remi Gauvin

[-- Attachment #1: Type: text/plain, Size: 3448 bytes --]

On Sat, Dec 28, 2019 at 06:04:21PM +0100, Leszek Dubiel wrote:
> 
> PROBLEM SOLVED --- btrfs was busy cleaing after snaphot deletion few days
> ago, so it dodn't have time to "dev delete", that's why it was slow

That checks out.  Snapshot delete and remove-device/resize/balance are
not able to run at the same time.  There is a mutex, so one or the
other will run while the other is blocked.

> =======================
> 
> 
> I restarted server, so job "btrfs delete" was not existent any more.
> But disks were still working (!!!). I wondered why? What is BTRFS doing all
> the time?
> 
> I realized that afew days before starting "btrfs dev delete" I have removed
> many snapshots -- there were about 400 snapshots and I left 20 only. I did
> that because I have read that many snapshot could slowdown btrfs operations
> severely.
> 
> 
> 
> I made an experiment on another testing serwer:
> 
> 1. started command "watch -n1 'btrfs fi df /"
> 2. started "iostat -x -m"
> 
> Disks were not working at all.
> 
> 
> 3. Then I removed many shapshots on that testing server
> 
> and I was watching:
> 
> Data, single: total=6.56TiB, used=5.13TiB
> System, RAID1: total=32.00MiB, used=992.00KiB
> Metadata, RAID1: total=92.00GiB, used=70.56GiB
> GlobalReserve, single: total=512.00MiB, used=1.39MiB
> 
> Disks started to work hard. So btrfs was probably cleaining after snapshot
> deletion.
> 
> At the beginning in "Metadata" line there was "used=70.00GiB".
> 
>            Metadata, RAID1: total=92.00GiB, used=70.00GiB
> 
> It was changing all the time... getting lower and lower. During that process
> in line
> 
>            GlobalReserve, single: total=512.00MiB, used=1.39MiB
> 
> "used=" was going up until it reached about 100MiB, then it was flushed to
> zero, and started again to fill, flush, fill, flush... some
> loop/buffer/cache (?).
> It convinced me, that after snapshot deletion btrfs is really working hard
> on cleanup.
> After some time "Metadata...used=" stopped changing, disks stopped working,
> server got lazy again.
> 
> 
> 
> I got back to main server. Started to watch "Metadata...used=". It was going
> down and down...
> I waited. When "Metadata...used=" stopped changing, then "iostat -m" stopped
> showing any disk activity.
> 
> I started "btrfs dev delete" again and now speed is 50Mb/sek. Hurrray!
> Problem solved.
> 
> 
> Sorry for not beeing clever enough to connect all this facts at the
> beginning.
> Anyway -- maybe in the future someone has the same problem, then btrfs
> experts could ask him if he let btrfs do some other hard work in the same
> time (eg cleaning up after massive snapsot deletion).
> 
> Maybe it would be usful to have a tool to ask btrfs "what are you doing? are
> you busy?".
> It would respond "I am cleaing up after snapshot deletion... I am
> balancing... I am scrubbing... I am resizing... I am deleting ...".

Usually 'top' or 'iotop' suffices for that.  btrfs-cleaner = deleting
snapshots, other activities will be tied to their respective userspace
agents.  The balance/delete/resize/drop-snapshot mutex is the only special
case that I know of where one btrfs maintenance thread waits for another.

It might be handy to give users a clue on snapshot delete, like add
"use btrfs sub list -d to monitor deletion progress, or btrfs sub sync
to wait for deletion to finish".

> 
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2019-12-28 20:23               ` Zygo Blaxell
@ 2020-01-02 18:37                 ` Leszek Dubiel
  2020-01-02 21:57                   ` Chris Murphy
  0 siblings, 1 reply; 32+ messages in thread
From: Leszek Dubiel @ 2020-01-02 18:37 UTC (permalink / raw)
  To: Btrfs BTRFS


 > It might be handy to give users a clue on snapshot delete, like add
 > "use btrfs sub list -d to monitor deletion progress, or btrfs sub sync
 > to wait for deletion to finish".

After having cleaned old shapshots, and after "dev delete" has
completed I have added new fresh empty disk


                  btrfs dev add /dev/sda3 /


and started to balance:


                      btrfs balance start -dconvert=raid1 -mconvert=raid1 /


It was slow (3-5 MB/sec), so canceled balance.
Iostat showed no activity.
Started balance again:


                 btrfs balance start -dconvert=raid1,soft 
-mconvert=raid1,soft /


it is slow again. Server is not working hard, disks are healthy this time.

I will patiently wait until it completes, but I wanted to inform about 
user experience.
I have no clue why it is not going on full speed of disks.




root@wawel:~# iostat -x 60 -m
Linux 4.19.0-6-amd64 (wawel)     02.01.2020     _x86_64_    (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,25    0,00    0,79    6,30    0,00   92,66

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
sda              4,39  142,05      0,90      3,78     3,64 12,68  
45,32   8,20    9,81    5,48   0,78   209,68    27,26 0,54   7,95
sdb              4,66  155,25      0,97      4,03     4,52 13,11  
49,27   7,78    9,25    4,68   0,73   213,20    26,59 0,49   7,89
sdc              6,35  246,61      0,38      6,94     4,35 25,11  
40,67   9,24   27,09   48,00  11,92    61,02    28,82 2,65  67,02

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,09    0,00    1,68   11,70    0,00   86,53

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
sda              1,27  205,48      0,02      7,44     0,00 2,47   0,00   
1,19   18,25    1,67   0,37    16,00    37,07 0,37   7,55
sdb              1,20  223,10      0,02      7,33     0,00 5,32   0,00   
2,33   16,26    1,68   0,38    16,00    33,64 0,34   7,72
sdc             19,50  428,38      0,30     14,76     0,00 7,78   0,00   
1,78   59,97    6,80   4,07    16,00    35,28 2,05  91,85

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,09    0,00    0,63    8,79    0,00   90,49

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
sda              0,75  208,78      0,01      3,74     0,00 4,47   0,00   
2,09   11,96    3,78   0,75    16,00    18,35 0,23   4,75
sdb              0,45  109,97      0,01      1,92     0,00 2,77   0,00   
2,45   21,22    3,44   0,36    16,00    17,92 0,27   2,94
sdc              9,22  318,28      0,14      5,66     0,00 7,22   0,00   
2,22   13,79   58,62  18,71    16,00    18,20 2,92  95,57

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,09    0,00    0,68    7,07    0,00   92,16

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
sda              0,00  305,72      0,00      5,69     0,00 9,73   0,00   
3,09    0,00    7,20   2,12     0,00    19,06 0,47  14,23
sdb              0,00  189,15      0,00      3,39     0,00 5,90   0,00   
3,02    0,00    7,59   1,39     0,00    18,37 0,52   9,84
sdc              0,00  288,97      0,00      5,44     0,00 8,63   0,00   
2,90    0,00   62,44  17,91     0,00    19,29 2,89  83,38

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,12    0,00    0,34    6,73    0,00   92,81

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
sda              0,00  189,57      0,00      3,53     0,00 5,65   0,00   
2,89    0,00    8,12   1,48     0,00    19,09 0,56  10,66
sdb              0,00  284,67      0,00      5,03     0,00 9,10   0,00   
3,10    0,00    6,01   1,62     0,00    18,11 0,38  10,83
sdc              0,00  308,95      0,00      5,65     0,00 9,30   0,00   
2,92    0,00   60,87  18,73     0,00    18,74 2,86  88,24

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,09    0,00    1,99    6,62    0,00   91,30

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
sda              0,80  294,67      0,01     18,03     0,00 2,43   0,00   
0,82   22,50    3,45   1,00    16,00    62,66 0,32   9,51
sdb             28,28  224,48     16,84      4,72     0,33 13,97   
1,16   5,86    6,80    3,88   1,00   609,62    21,54 0,46  11,61
sdc              8,60  518,88      0,14     22,76     0,08 16,30   
0,96   3,05   10,12   27,26  14,01    16,19    44,91 1,42  74,85

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,22    0,00    1,91    9,46    0,00   88,42

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
sda              1,88  130,13      0,03      2,70     0,00 9,73   0,00   
6,96   15,23    0,58   0,10    16,00    21,23 0,45   5,98
sdb              1,45  147,78      0,02      3,18     0,00 12,62   
0,00   7,87   11,53    0,66   0,11    16,00    22,02 0,39   5,84
sdc             25,08  275,75      0,39      5,88     0,00 24,55   
0,00   8,18   47,78    6,50   2,97    16,00    21,83 3,18  95,79

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,14    0,00    0,73    9,28    0,00   89,85

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
sda              2,38  186,52      0,20      3,33     1,33 4,20  35,87   
2,20    9,88    3,48   0,63    86,04    18,27 0,30   5,60
sdb              1,00  108,35      0,02      1,99     0,00 2,73   0,00   
2,46    8,92    3,78   0,41    18,00    18,85 0,27   2,98
sdc             13,47  294,68      0,21      5,32     0,00 6,92   0,00   
2,29   13,33   61,56  18,28    16,00    18,48 3,13  96,45




top:

top - 19:29:52 up  2:00,  2 users,  load average: 1,21, 1,53, 1,92
Tasks: 165 total,   1 running, 164 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0,0 us,  0,0 sy,  0,0 ni, 91,9 id,  8,1 wa,  0,0 hi, 0,0 si,  
0,0 st
MiB Mem :  32130,4 total,  15827,0 free,    409,4 used,  15893,9 buff/cache
MiB Swap:      0,0 total,      0,0 free,      0,0 used.  31244,2 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND
   289 root       0 -20       0      0      0 I   6,7   0,0 0:00.73 
kworker/4:1H-kblockd
     1 root      20   0   22036  10140   7872 S   0,0   0,0 0:11.52 systemd
     2 root      20   0       0      0      0 S   0,0   0,0 0:00.01 
kthreadd
     3 root       0 -20       0      0      0 I   0,0   0,0 0:00.00 rcu_gp
     4 root       0 -20       0      0      0 I   0,0   0,0 0:00.00 
rcu_par_gp
     6 root       0 -20       0      0      0 I   0,0   0,0 0:00.00 
kworker/0:0H-kblockd
     8 root       0 -20       0      0      0 I   0,0   0,0 0:00.00 
mm_percpu_wq
     9 root      20   0       0      0      0 S   0,0   0,0 0:00.04 
ksoftirqd/0
    10 root      20   0       0      0      0 I   0,0   0,0 0:02.61 
rcu_sched
    11 root      20   0       0      0      0 I   0,0   0,0 0:00.00 rcu_bh
    12 root      rt   0       0      0      0 S   0,0   0,0 0:00.02 
migration/0
    14 root      20   0       0      0      0 S   0,0   0,0 0:00.00 cpuhp/0
    15 root      20   0       0      0      0 S   0,0   0,0 0:00.00 cpuhp/1
    16 root      rt   0       0      0      0 S   0,0   0,0 0:00.02 
migration/1
    17 root      20   0       0      0      0 S   0,0   0,0 0:00.07 
ksoftirqd/1
    19 root       0 -20       0      0      0 I   0,0   0,0 0:00.00 
kworker/1:0H-kblockd



iotop:

Total DISK READ:         0.00 B/s | Total DISK WRITE:         0.00 B/s
Current DISK READ:       0.00 B/s | Current DISK WRITE:       0.00 B/s
   TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO> COMMAND
     1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init
     2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
     3 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_gp]
     4 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_par_gp]
     6 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % 
[kworker/0:0H-kblockd]
     8 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [mm_percpu_wq]
     9 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
    10 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_sched]
    11 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_bh]
    12 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/0]
    14 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [cpuhp/0]
    15 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [cpuhp/1]
    16 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/1]
    17 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/1]
    19 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % 
[kworker/1:0H-kblockd]
    20 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [cpuhp/2]
    21 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/2]
    22 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/2]
    24 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % 
[kworker/2:0H-kblockd]
    25 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [cpuhp/3]
    26 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/3]
    27 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/3]
    29 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % 
[kworker/3:0H-kblockd]
    30 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [cpuhp/4]




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2020-01-02 18:37                 ` Leszek Dubiel
@ 2020-01-02 21:57                   ` Chris Murphy
  2020-01-02 22:39                     ` Leszek Dubiel
  2020-01-06 11:14                     ` Leszek Dubiel
  0 siblings, 2 replies; 32+ messages in thread
From: Chris Murphy @ 2020-01-02 21:57 UTC (permalink / raw)
  To: Leszek Dubiel; +Cc: Btrfs BTRFS, Zygo Blaxell

On Thu, Jan 2, 2020 at 11:47 AM Leszek Dubiel <leszek@dubiel.pl> wrote:
>
>
>  > It might be handy to give users a clue on snapshot delete, like add
>  > "use btrfs sub list -d to monitor deletion progress, or btrfs sub sync
>  > to wait for deletion to finish".
>
> After having cleaned old shapshots, and after "dev delete" has
> completed I have added new fresh empty disk
>
>
>                   btrfs dev add /dev/sda3 /
>
>
> and started to balance:
>
>
>                       btrfs balance start -dconvert=raid1 -mconvert=raid1 /
>
>
> It was slow (3-5 MB/sec), so canceled balance.

I'm not sure why it's slow and I don't think it should be this slow,
but I would say that in retrospect it would have been better to NOT
delete the device with a few bad sectors, and instead use `btrfs
replace` to do a 1:1 replacement of that particular drive.

When you delete a device, it has to rewrite everything onto the two
remaining device. And now that you added another device, it has to
rewrite everything onto three devices. Two full balances. Whereas
'device replace' is optimized to copy block groups as-is from source
to destination drives. Only on a read-error will Btrfs use mirror
copies from other devices.


> Device            r/s     w/s     rMB/s     wMB/s   rrqm/s wrqm/s
> %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
> sda              4,39  142,05      0,90      3,78     3,64 12,68
> 45,32   8,20    9,81    5,48   0,78   209,68    27,26 0,54   7,95
> sdb              4,66  155,25      0,97      4,03     4,52 13,11
> 49,27   7,78    9,25    4,68   0,73   213,20    26,59 0,49   7,89
> sdc              6,35  246,61      0,38      6,94     4,35 25,11
> 40,67   9,24   27,09   48,00  11,92    61,02    28,82 2,65  67,02

Almost no reads, all writes, but slow. And rather high write request
per second, almost double for sdc. And sdc is near it's max
utilization so it might be ear to its iops limit?

~210 rareq-sz = 210KiB is the average size of the read request for sda and sdb

Default mkfs and default mount options? Or other and if so what other?

Many small files on this file system? Or possibly large files with a
lot of fragmentation?


> Device            r/s     w/s     rMB/s     wMB/s   rrqm/s wrqm/s
> %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
> sda              2,38  186,52      0,20      3,33     1,33 4,20  35,87
> 2,20    9,88    3,48   0,63    86,04    18,27 0,30   5,60
> sdb              1,00  108,35      0,02      1,99     0,00 2,73   0,00
> 2,46    8,92    3,78   0,41    18,00    18,85 0,27   2,98
> sdc             13,47  294,68      0,21      5,32     0,00 6,92   0,00
> 2,29   13,33   61,56  18,28    16,00    18,48 3,13  96,45

And again, sdc is at max utilization, with ~300 write requests per
second which is at the high end for a fast drive for IOPS, if I'm not
mistaken. That's a lot of writes per second. The average write request
size is 18KiB.

So what's going on with the workload? Is this only a balance operation
or are there concurrent writes happening from some process?




> iotop:
>
> Total DISK READ:         0.00 B/s | Total DISK WRITE:         0.00 B/s
> Current DISK READ:       0.00 B/s | Current DISK WRITE:       0.00 B/s
>    TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO> COMMAND
>      1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init
>      2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
>      3 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_gp]
>      4 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_par_gp]
>      6 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 %
> [kworker/0:0H-kblockd]
>      8 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [mm_percpu_wq]
>      9 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
>     10 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_sched]
>     11 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_bh]
>     12 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/0]
>     14 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [cpuhp/0]
>     15 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [cpuhp/1]
>     16 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/1]
>     17 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/1]
>     19 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 %
> [kworker/1:0H-kblockd]
>     20 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [cpuhp/2]
>     21 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/2]
>     22 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/2]
>     24 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 %
> [kworker/2:0H-kblockd]
>     25 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [cpuhp/3]
>     26 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/3]
>     27 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/3]
>     29 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 %
> [kworker/3:0H-kblockd]
>     30 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [cpuhp/4]


iotop -d 5 -o might be more revealing; all zeros doesn't really make
sense. I see balance and scrub reported in iotop.




-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2020-01-02 21:57                   ` Chris Murphy
@ 2020-01-02 22:39                     ` Leszek Dubiel
  2020-01-02 23:22                       ` Chris Murphy
  2020-01-06 11:14                     ` Leszek Dubiel
  1 sibling, 1 reply; 32+ messages in thread
From: Leszek Dubiel @ 2020-01-02 22:39 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS, Zygo Blaxell


 > I'm not sure why it's slow and I don't think it should be this slow,
 > but I would say that in retrospect it would have been better to NOT
 > delete the device with a few bad sectors, and instead use `btrfs
 > replace` to do a 1:1 replacement of that particular drive.

Thank you. I will remember that.



 > When you delete a device, it has to rewrite everything onto the two
 > remaining device. And now that you added another device, it has to
 > rewrite everything onto three devices. Two full balances. Whereas
 > 'device replace' is optimized to copy block groups as-is from source
 > to destination drives. Only on a read-error will Btrfs use mirror
 > copies from other devices.

Thank you.





 > Almost no reads, all writes, but slow. And rather high write request
 > per second, almost double for sdc. And sdc is near it's max
 > utilization so it might be ear to its iops limit?
 >
 > ~210 rareq-sz = 210KiB is the average size of the read request for 
sda and sdb
 >
 > Default mkfs and default mount options? Or other and if so what other?
 >
 > Many small files on this file system? Or possibly large files with a
 > lot of fragmentation?

Default mkfs and default mount options.

This system could have a few million (!) of small files.
On reiserfs it takes about 40 minutes, to do "find /".
Rsync runs for 6 hours to backup data.





 >> Device            r/s     w/s     rMB/s     wMB/s rrqm/s wrqm/s
 >> %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
 >> sda              2,38  186,52      0,20      3,33 1,33 4,20  35,87
 >> 2,20    9,88    3,48   0,63    86,04    18,27 0,30   5,60
 >> sdb              1,00  108,35      0,02      1,99 0,00 2,73   0,00
 >> 2,46    8,92    3,78   0,41    18,00    18,85 0,27   2,98
 >> sdc             13,47  294,68      0,21      5,32 0,00 6,92   0,00
 >> 2,29   13,33   61,56  18,28    16,00    18,48 3,13  96,45
 >
 > And again, sdc is at max utilization, with ~300 write requests per
 > second which is at the high end for a fast drive for IOPS, if I'm not
 > mistaken. That's a lot of writes per second. The average write request
 > size is 18KiB.
 >
 > So what's going on with the workload? Is this only a balance operation
 > or are there concurrent writes happening from some process?

root@wawel:~# btrfs bala stat /
Balance on '/' is running
62 out of about 1231 chunks balanced (4224 considered),  95% left

root@wawel:~# btrfs scrub stat  /
scrub status for 44803366-3981-4ebb-853b-6c991380c8a6
     scrub started at Sat Dec 28 19:32:06 2019 and was aborted after 
03:07:57
     total bytes scrubbed: 3.33TiB with 0 errors

root@wawel:~# btrfs sub list -d /

root@wawel:~# btrfs sub list / | wc -l
27






 > iotop -d 5 -o might be more revealing; all zeros doesn't really make
 > sense. I see balance and scrub reported in iotop.


# iotop -d5

Total DISK READ:         0.00 B/s | Total DISK WRITE:        32.98 M/s
Current DISK READ:       0.00 B/s | Current DISK WRITE:      26.19 M/s
   TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO> COMMAND
  4596 be/4 root        0.00 B/s   31.78 M/s  0.00 % 86.16 % btrfs 
balance start -dconvert raid1 soft -mconvert raid1 soft /
30893 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.16 % 
[kworker/0:2-events]
29683 be/4 root        0.00 B/s  659.23 K/s  0.00 %  0.00 % 
[kworker/u16:4-btrfs-extent-refs]
29684 be/4 root        0.00 B/s  560.50 K/s  0.00 %  0.00 % 
[kworker/u16:8-btrfs-extent-refs]
30115 be/4 root        0.00 B/s    6.37 K/s  0.00 %  0.00 % 
[kworker/u16:2-btrfs-extent-refs]
     1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init
     2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
     3 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_gp]
     4 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_par_gp]
     6 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % 
[kworker/0:0H-kblockd]






# iotop -d30

Total DISK READ:        34.12 M/s | Total DISK WRITE: 40.36 M/s
Current DISK READ:      34.12 M/s | Current DISK WRITE:      79.22 M/s
   TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO> COMMAND
  4596 be/4 root       34.12 M/s   37.79 M/s  0.00 % 91.77 % btrfs 
balance start -dconvert raid1 soft -mconvert raid1 soft /
29687 be/4 root        0.00 B/s  465.87 K/s  0.00 %  0.32 % 
[kworker/u16:10-edac-poller]
30893 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.13 % 
[kworker/0:2-events_freezable_power_]
30115 be/4 root        0.00 B/s 1373.65 K/s  0.00 %  0.08 % 
[kworker/u16:2-btrfs-endio-write]
26871 be/4 root        0.00 B/s  260.89 K/s  0.00 %  0.03 % 
[kworker/u16:7-btrfs-extent-refs]
23756 be/4 root        0.00 B/s   80.40 K/s  0.00 %  0.00 % 
[kworker/u16:1-btrfs-extent-refs]
29681 be/4 root        0.00 B/s  169.84 K/s  0.00 %  0.00 % 
[kworker/u16:0-btrfs-extent-refs]
29683 be/4 root        0.00 B/s  280.59 K/s  0.00 %  0.00 % 
[kworker/u16:4-btrfs-endio-write]
     1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init
     2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
     3 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_gp]




PS. I have set Thunderbird not to wrap long lines. Hope my emails are 
more readable now.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2020-01-02 22:39                     ` Leszek Dubiel
@ 2020-01-02 23:22                       ` Chris Murphy
  2020-01-03  9:08                         ` Leszek Dubiel
                                           ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Chris Murphy @ 2020-01-02 23:22 UTC (permalink / raw)
  To: Leszek Dubiel; +Cc: Chris Murphy, Btrfs BTRFS, Zygo Blaxell

On Thu, Jan 2, 2020 at 3:39 PM Leszek Dubiel <leszek@dubiel.pl> wrote:

>  > Almost no reads, all writes, but slow. And rather high write request
>  > per second, almost double for sdc. And sdc is near it's max
>  > utilization so it might be ear to its iops limit?
>  >
>  > ~210 rareq-sz = 210KiB is the average size of the read request for
> sda and sdb
>  >
>  > Default mkfs and default mount options? Or other and if so what other?
>  >
>  > Many small files on this file system? Or possibly large files with a
>  > lot of fragmentation?
>
> Default mkfs and default mount options.
>
> This system could have a few million (!) of small files.
> On reiserfs it takes about 40 minutes, to do "find /".
> Rsync runs for 6 hours to backup data.

There is a mount option:  max_inline=<bytes> which the man page says
(default: min(2048, page size) )

I've never used it, so in theory the max_inline byte size is 2KiB.
However, I have seen substantially larger inline extents than 2KiB
when using a nodesize larger than 16KiB at mkfs time.

I've wondered whether it makes any difference for the "many small
files" case to do more aggressive inlining of extents.

I've seen with 16KiB leaf size, often small files that could be
inlined, are instead put into a data block group, taking up a minimum
4KiB block size (on x64_64 anyway). I'm not sure why, but I suspect
there just isn't enough room in that leaf to always use inline
extents, and yet there is enough room to just reference it as a data
block group extent. When using a larger node size, a larger percentage
of small files ended up using inline extents. I'd expect this to be
quite a bit more efficient, because it eliminates a time expensive (on
HDD anyway) seek.

Another optimization, using compress=zstd:1, which is the lowest
compression setting. That'll increase the chance a file can use inline
extents, in particular with a larger nodesize.

And still another optimization, at the expense of much more
complexity, is LVM cache with an SSD. You'd have to pick a suitable
policy for the workload, but I expect that if the iostat utilizations
you see of often near max utilization in normal operation, you'll see
improved performance. SSD's can handle way higher iops than HDD. But a
lot of this optimization stuff is use case specific. I'm not even sure
what your mean small file size is.

> # iotop -d30
>
> Total DISK READ:        34.12 M/s | Total DISK WRITE: 40.36 M/s
> Current DISK READ:      34.12 M/s | Current DISK WRITE:      79.22 M/s
>    TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO> COMMAND
>   4596 be/4 root       34.12 M/s   37.79 M/s  0.00 % 91.77 % btrfs

Not so bad for many small file reads and writes with HDD. I've see
this myself with single spindle when doing small file reads and
writes.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2020-01-02 23:22                       ` Chris Murphy
@ 2020-01-03  9:08                         ` Leszek Dubiel
  2020-01-03 19:15                           ` Chris Murphy
  2020-01-03 14:39                         ` Leszek Dubiel
  2020-01-04  5:38                         ` Zygo Blaxell
  2 siblings, 1 reply; 32+ messages in thread
From: Leszek Dubiel @ 2020-01-03  9:08 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS, Zygo Blaxell



W dniu 03.01.2020 o 00:22, Chris Murphy pisze:
 > On Thu, Jan 2, 2020 at 3:39 PM Leszek Dubiel <leszek@dubiel.pl> wrote:
 >
 > > This system could have a few million (!) of small files.
 > > On reiserfs it takes about 40 minutes, to do "find /".
 > > Rsync runs for 6 hours to backup data.
 >
 >
 > There is a mount option:  max_inline=<bytes> which the man page says
 > (default: min(2048, page size) )
 >
 > I've never used it, so in theory the max_inline byte size is 2KiB.
 > However, I have seen substantially larger inline extents than 2KiB
 > when using a nodesize larger than 16KiB at mkfs time.
 >
 > I've wondered whether it makes any difference for the "many small
 > files" case to do more aggressive inlining of extents.
 >
 > I've seen with 16KiB leaf size, often small files that could be
 > inlined, are instead put into a data block group, taking up a minimum
 > 4KiB block size (on x64_64 anyway). I'm not sure why, but I suspect
 > there just isn't enough room in that leaf to always use inline
 > extents, and yet there is enough room to just reference it as a data
 > block group extent. When using a larger node size, a larger percentage
 > of small files ended up using inline extents. I'd expect this to be
 > quite a bit more efficient, because it eliminates a time expensive (on
 > HDD anyway) seek.

I will try that option when making new disks with BTRFS.
Then I'll report about efficiency.





 > Another optimization, using compress=zstd:1, which is the lowest
 > compression setting. That'll increase the chance a file can use inline
 > extents, in particular with a larger nodesize.
 >
 > And still another optimization, at the expense of much more
 > complexity, is LVM cache with an SSD. You'd have to pick a suitable
 > policy for the workload, but I expect that if the iostat utilizations
 > you see of often near max utilization in normal operation, you'll see
 > improved performance. SSD's can handle way higher iops than HDD. But a
 > lot of this optimization stuff is use case specific. I'm not even sure
 > what your mean small file size is.



There is 11 million files:

root@gamma:/mnt/sdb1# find orion2 > listor2
root@gamma:/mnt/sdb1# ls -lt listor2
-rw-r--r-- 1 root root 988973729 sty  3 03:09 listor2
root@gamma:/mnt/sdb1# wc -l listor2
11329331 listor2


And df on reiserfs shows:

root@orion:~# df  -h -BM
System plików    1M-bl   used      avail %uż. zamont. na
/dev/md0        71522M  10353M   61169M  15% /
/dev/md1       905967M 731199M  174768M  81% /root

10353 + 731199 = 741552 M,

that is average file size is 741552 * 1000000 / 11000000 = 67413 bytes 
per file.
This estimation is not good, because df counts in blocks...

I will count more precisely with df --apparent-size.





 >> # iotop -d30
 >>
 >> Total DISK READ:        34.12 M/s | Total DISK WRITE: 40.36 M/s
 >> Current DISK READ:      34.12 M/s | Current DISK WRITE:      79.22 M/s
 >>    TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN IO> COMMAND
 >>   4596 be/4 root       34.12 M/s   37.79 M/s  0.00 % 91.77 % btrfs
 >
 > Not so bad for many small file reads and writes with HDD. I've see
 > this myself with single spindle when doing small file reads and
 > writes.


So small files slow down in my case.
Ok! Thank you for the expertise.



PS. This morning:

root@wawel:~# btrfs bala stat /
Balance on '/' is running
1227 out of about 1231 chunks balanced (5390 considered),   0% left

So during the night it balanced  600Gb + 600Gb = 1.2Tb of
data in single profile to raid1 in about 12 hours. That is:

(600 + 600) * 1000 Mb/Gb / (12 hours * 3600 sec/hour)
       = (600 + 600) * 1000 / (12 × 3600)
             = 27 Mb/sec




root@wawel:~# btrfs dev usag /
/dev/sda2, ID: 2
    Device size:             5.45TiB
    Device slack:              0.00B
    Data,RAID1:              2.62TiB
    Metadata,RAID1:         22.00GiB
    Unallocated:             2.81TiB

/dev/sdb2, ID: 3
    Device size:             5.45TiB
    Device slack:              0.00B
    Data,RAID1:              2.62TiB
    Metadata,RAID1:         21.00GiB
    System,RAID1:           32.00MiB
    Unallocated:             2.81TiB

/dev/sdc3, ID: 4
    Device size:            10.90TiB
    Device slack:            3.50KiB
    Data,RAID1:              5.24TiB
    Metadata,RAID1:         33.00GiB
    System,RAID1:           32.00MiB
    Unallocated:             5.62TiB





root@wawel:~# iostat 10  -x
Linux 4.19.0-6-amd64 (wawel)     03.01.2020     _x86_64_    (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,00    0,00    0,00    0,00    0,00  100,00

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
sda              0,00    0,00      0,00      0,00     0,00 0,00   0,00   
0,00    0,00    0,00   0,00     0,00     0,00 0,00   0,00
sdb              0,00    0,00      0,00      0,00     0,00 0,00   0,00   
0,00    0,00    0,00   0,00     0,00     0,00 0,00   0,00
sdc              0,00    0,00      0,00      0,00     0,00 0,00   0,00   
0,00    0,00    0,00   0,00     0,00     0,00 0,00   0,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,04    0,00    0,08    0,00    0,00   99,89

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
sda              0,00    0,00      0,00      0,00     0,00 0,00   0,00   
0,00    0,00    0,00   0,00     0,00     0,00 0,00   0,00
sdb              0,00    0,00      0,00      0,00     0,00 0,00   0,00   
0,00    0,00    0,00   0,00     0,00     0,00 0,00   0,00
sdc              0,00    0,00      0,00      0,00     0,00 0,00   0,00   
0,00    0,00    0,00   0,00     0,00     0,00 0,00   0,00







^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2020-01-02 23:22                       ` Chris Murphy
  2020-01-03  9:08                         ` Leszek Dubiel
@ 2020-01-03 14:39                         ` Leszek Dubiel
  2020-01-03 19:02                           ` Chris Murphy
  2020-01-04  5:38                         ` Zygo Blaxell
  2 siblings, 1 reply; 32+ messages in thread
From: Leszek Dubiel @ 2020-01-03 14:39 UTC (permalink / raw)
  To: Btrfs BTRFS


I have made some measurements in case someone has the same profile on 
his server.




=============
=== BTRFS ===

This is the same server "wawel" that was discussed above on
slow "btrfs delete" and "btrfs balance".

I must admit that BTRFS is much faster these days than ReiserFs.
See ReiserFs below.



** 51 minutes to list 11 million files:

root@wawel:/mnt/root/orion# time find > list_of_files
real    51m14,611s
user    0m30,487s
sys     2m23,520s

root@wawel:/mnt/root/orion# wc -l list_of_files
11439162 list_of_files



** 86 min to check size of files

root@wawel:/mnt/root/orion# time du -abx > disk_usage
real    86m42,328s
user    0m47,225s
sys     7m13,320s

root@wawel:/mnt/root/orion# wc -l disk_usage
11438206 disk_usage



** 45 minutes to read half million files, 50GB of data

root@wawel:/mnt/root/orion# egrep 'root/Omega/Domains/Zakupy' disk_usage 
| wc -l
476499

root@wawel:/mnt/root/orion# egrep 'root/Omega/Domains/Zakupy$' disk_usage
49154713130    ./root/Omega/Domains/Zakupy

root@wawel:/mnt/root/orion# time find root/Omega/Domains/Zakupy/ -type f 
-print0 | xargs -0r wc > word_count
real    45m11,398s
user    32m22,395s
sys     0m22,657s



** 21 min to copy 50Gb

root@wawel:/mnt/root/orion# time cp -a root/Omega/Domains/Zakupy/ tempcopy
real    21m36,030s
user    0m9,553s
sys     2m47,038s


** remove is very fast

root@wawel:/mnt/root/orion# time rm -r tempcopy
real    0m39,952s
user    0m0,359s
sys     0m9,985s



** number of files by given size

root@wawel:/mnt/root/orion# cat disk_usage | perl -MData::Dumper -e 
'$Data::Dumper::Sortkeys = 1; while (<>) { chomp; my ($byt, $nam) = 
split /\t/, $_, -1; if (index("$las/", $nam) == 0) { $dir++; } else { 
$filtot++; for $p (1 .. 99) { if ($byt < 10 ** $p) { $fil{"num of files 
size <10^$p"}++; last; } } }; $las = $nam; }; print "\ndirectories: 
$dir\ntotal num of files: $filtot\n", "\nnumber of files grouped by 
size: \n", Dumper(\%fil) '

directories: 1314246
total num of files: 10123960

number of files grouped by size:
$VAR1 = {
           'num of files size <10^1' => 3325886,
           'num of files size <10^2' => 3709276,
           'num of files size <10^3' => 789852,
           'num of files size <10^4' => 1085927,
           'num of files size <10^5' => 650571,
           'num of files size <10^6' => 438717,
           'num of files size <10^7' => 116757,
           'num of files size <10^8' => 6638,
           'num of files size <10^9' => 323
           'num of files size <10^10' => 13,
         };



** disk speed

root@wawel:/mnt/root/orion# hdparm -t /dev/sda
/dev/sda:
  Timing buffered disk reads: 680 MB in  3.01 seconds = 226.14 MB/sec





================
=== REISERFS ===

ReiserFs is much slower:

root@gamma:/mnt/sdb1/orion2# time find > list_of_files
real    111m49,529s
user    0m28,665s
sys     1m3,576s

root@gamma:/mnt/sdb1/orion2# wc -l list_of_files
11329332 list_of_files

root@gamma:/mnt/sdb1/orion2# time du -abx > disk_usage
real    198m43,813s
user    0m53,501s
sys     3m38,447s

root@gamma:/mnt/sdb1/orion2# hdparm -t /dev/sdb
/dev/sdb:
  Timing buffered disk reads: 534 MB in  3.00 seconds = 177.77 MB/sec



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2020-01-03 14:39                         ` Leszek Dubiel
@ 2020-01-03 19:02                           ` Chris Murphy
  2020-01-03 20:59                             ` Leszek Dubiel
  0 siblings, 1 reply; 32+ messages in thread
From: Chris Murphy @ 2020-01-03 19:02 UTC (permalink / raw)
  To: Leszek Dubiel; +Cc: Btrfs BTRFS

On Fri, Jan 3, 2020 at 7:39 AM Leszek Dubiel <leszek@dubiel.pl> wrote:
>
> ** number of files by given size
>
> root@wawel:/mnt/root/orion# cat disk_usage | perl -MData::Dumper -e
> '$Data::Dumper::Sortkeys = 1; while (<>) { chomp; my ($byt, $nam) =
> split /\t/, $_, -1; if (index("$las/", $nam) == 0) { $dir++; } else {
> $filtot++; for $p (1 .. 99) { if ($byt < 10 ** $p) { $fil{"num of files
> size <10^$p"}++; last; } } }; $las = $nam; }; print "\ndirectories:
> $dir\ntotal num of files: $filtot\n", "\nnumber of files grouped by
> size: \n", Dumper(\%fil) '
>
> directories: 1314246
> total num of files: 10123960
>
> number of files grouped by size:
> $VAR1 = {
>            'num of files size <10^1' => 3325886,
>            'num of files size <10^2' => 3709276,
>            'num of files size <10^3' => 789852,
>            'num of files size <10^4' => 1085927,
>            'num of files size <10^5' => 650571,
>            'num of files size <10^6' => 438717,
>            'num of files size <10^7' => 116757,
>            'num of files size <10^8' => 6638,
>            'num of files size <10^9' => 323
>            'num of files size <10^10' => 13,

Is that really ~7.8 million files at or less than 1KiB?? (totalling
the first three)

Compression may not do much with such small files, and also I'm not
sure which algorithm would do the best job. They all probably want a
lot more than 1KiB to become efficient.

But nodesize 64KiB might be a big deal...worth testing.




-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2020-01-03  9:08                         ` Leszek Dubiel
@ 2020-01-03 19:15                           ` Chris Murphy
  0 siblings, 0 replies; 32+ messages in thread
From: Chris Murphy @ 2020-01-03 19:15 UTC (permalink / raw)
  To: Leszek Dubiel; +Cc: Chris Murphy, Btrfs BTRFS, Zygo Blaxell

On Fri, Jan 3, 2020 at 2:08 AM Leszek Dubiel <leszek@dubiel.pl> wrote:
>
>  >> # iotop -d30
>  >>
>  >> Total DISK READ:        34.12 M/s | Total DISK WRITE: 40.36 M/s
>  >> Current DISK READ:      34.12 M/s | Current DISK WRITE:      79.22 M/s
>  >>    TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN IO> COMMAND
>  >>   4596 be/4 root       34.12 M/s   37.79 M/s  0.00 % 91.77 % btrfs
>  >
>  > Not so bad for many small file reads and writes with HDD. I've see
>  > this myself with single spindle when doing small file reads and
>  > writes.

It's not small files directly. It's the number of write requests per
second, resulting in high latency seeks. And the reason for the
seeking needs a second opinion, to be certain it's related to small
files.

I'm not really sure why there are hundreds of write requests per
second. Seems to me with thousands of small files, Btrfs can aggregate
them into a single sequential write (mostly sequential anyway) and do
the same for metadata writes; yes there is some back and forth seeking
since metadata and data block groups are in different physical
locations. But hundreds of times per second? Hmmm. I'm suspicious why.
It must be trying to read and write hundreds of small files *in
different locations* causing the seeks, and the ensuing latency.

The typical work around for this these days is add more disks or add
SSD. If you add a fourth disk, you reduce your one bottle neck:


> root@wawel:~# btrfs dev usag /
> /dev/sda2, ID: 2
>     Device size:             5.45TiB
>     Device slack:              0.00B
>     Data,RAID1:              2.62TiB
>     Metadata,RAID1:         22.00GiB
>     Unallocated:             2.81TiB
>
> /dev/sdb2, ID: 3
>     Device size:             5.45TiB
>     Device slack:              0.00B
>     Data,RAID1:              2.62TiB
>     Metadata,RAID1:         21.00GiB
>     System,RAID1:           32.00MiB
>     Unallocated:             2.81TiB
>
> /dev/sdc3, ID: 4
>     Device size:            10.90TiB
>     Device slack:            3.50KiB
>     Data,RAID1:              5.24TiB
>     Metadata,RAID1:         33.00GiB
>     System,RAID1:           32.00MiB
>     Unallocated:             5.62TiB

OK this is important. Two equal size drives, and the third is much
larger. This means writes are going to be IO bound to that single
large device because it's always going to be written to. The reads get
spread out somewhat.

Again, maybe the every day workload is the one to focus on  because
it's not such a big deal for a device replace to take overnight. Even
though it would be good for everyone's use case if it turns out
there's some optimization possible to avoid hundreds of write requests
per second, just because of small files.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2020-01-03 19:02                           ` Chris Murphy
@ 2020-01-03 20:59                             ` Leszek Dubiel
  0 siblings, 0 replies; 32+ messages in thread
From: Leszek Dubiel @ 2020-01-03 20:59 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS



W dniu 03.01.2020 o 20:02, Chris Murphy pisze:
 > On Fri, Jan 3, 2020 at 7:39 AM Leszek Dubiel <leszek@dubiel.pl> wrote:
 >>
 >> ** number of files by given size
 >>
 >> root@wawel:/mnt/root/orion# cat disk_usage | perl -MData::Dumper -e
 >> '$Data::Dumper::Sortkeys = 1; while (<>) { chomp; my ($byt, $nam) =
 >> split /\t/, $_, -1; if (index("$las/", $nam) == 0) { $dir++; } else {
 >> $filtot++; for $p (1 .. 99) { if ($byt < 10 ** $p) { $fil{"num of files
 >> size <10^$p"}++; last; } } }; $las = $nam; }; print "\ndirectories:
 >> $dir\ntotal num of files: $filtot\n", "\nnumber of files grouped by
 >> size: \n", Dumper(\%fil) '
 >>
 >> directories: 1314246
 >> total num of files: 10123960
 >>
 >> number of files grouped by size:
 >> $VAR1 = {
 >>            'num of files size <10^1' => 3325886,
 >>            'num of files size <10^2' => 3709276,
 >>            'num of files size <10^3' => 789852,
 >>            'num of files size <10^4' => 1085927,
 >>            'num of files size <10^5' => 650571,
 >>            'num of files size <10^6' => 438717,
 >>            'num of files size <10^7' => 116757,
 >>            'num of files size <10^8' => 6638,
 >>            'num of files size <10^9' => 323
 >>            'num of files size <10^10' => 13,
 >
 > Is that really ~7.8 million files at or less than 1KiB?? (totalling
 > the first three)

Yes. This is Workflow Management system in my company (bathroom mirorr 
manufacturer).

System was made in 2004. Back then ReisierFs was great because it had 
"tail packing" and put small pieces of data together with metadata, so 
disks could read many pieces of data during one read request. Other 
systems were not any close to ReiserFs when it came to speed with lots 
of small files. That's why I'm testing BTFS for a few years now.


 > Compression may not do much with such small files, and also I'm not
 > sure which algorithm would do the best job. They all probably want a
 > lot more than 1KiB to become efficient.
 >
 > But nodesize 64KiB might be a big deal...worth testing.

Yes -- I will have to read about nodesize.
Thank you for hint.


Current data:

root@wawel:~# btrfs inspect dump-super /dev/sdb2
superblock: bytenr=65536, device=/dev/sdb2
---------------------------------------------------------
csum_type        0 (crc32c)
csum_size        4
csum            0x0bd7280d [match]
bytenr            65536
flags            0x1
             ( WRITTEN )
magic            _BHRfS_M [match]
fsid            44803366-3981-4ebb-853b-6c991380c8a6
metadata_uuid        44803366-3981-4ebb-853b-6c991380c8a6
label
generation        553943
root            17155128295424
sys_array_size        129
chunk_root_generation    553648
root_level        1
chunk_root        10136287444992
chunk_root_level    1
log_root        0
log_root_transid    0
log_root_level        0
total_bytes        23967879376896
bytes_used        5844982415360
sectorsize        4096
nodesize        16384 ---------------<<<
leafsize (deprecated)        16384
stripesize        4096
root_dir        6
num_devices        3
compat_flags        0x0
compat_ro_flags        0x0
incompat_flags        0x163
             ( MIXED_BACKREF |
               DEFAULT_SUBVOL |
               BIG_METADATA |
               EXTENDED_IREF |
               SKINNY_METADATA )
cache_generation    553943
uuid_tree_generation    594
dev_item.uuid        5f74e436-f8f9-43ba-95fc-44cdb2bc1838
dev_item.fsid        44803366-3981-4ebb-853b-6c991380c8a6 [match]
dev_item.type        0
dev_item.total_bytes    5992192409600
dev_item.bytes_used    2946381119488
dev_item.io_align    4096
dev_item.io_width    4096
dev_item.sector_size    4096
dev_item.devid        3
dev_item.dev_group    0
dev_item.seek_speed    0
dev_item.bandwidth    0
dev_item.generation    0






^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2020-01-02 23:22                       ` Chris Murphy
  2020-01-03  9:08                         ` Leszek Dubiel
  2020-01-03 14:39                         ` Leszek Dubiel
@ 2020-01-04  5:38                         ` Zygo Blaxell
  2020-01-07 18:44                           ` write amplification, was: " Chris Murphy
  2 siblings, 1 reply; 32+ messages in thread
From: Zygo Blaxell @ 2020-01-04  5:38 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Leszek Dubiel, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 4637 bytes --]

On Thu, Jan 02, 2020 at 04:22:37PM -0700, Chris Murphy wrote:
> On Thu, Jan 2, 2020 at 3:39 PM Leszek Dubiel <leszek@dubiel.pl> wrote:
> 
> >  > Almost no reads, all writes, but slow. And rather high write request
> >  > per second, almost double for sdc. And sdc is near it's max
> >  > utilization so it might be ear to its iops limit?
> >  >
> >  > ~210 rareq-sz = 210KiB is the average size of the read request for
> > sda and sdb
> >  >
> >  > Default mkfs and default mount options? Or other and if so what other?
> >  >
> >  > Many small files on this file system? Or possibly large files with a
> >  > lot of fragmentation?
> >
> > Default mkfs and default mount options.
> >
> > This system could have a few million (!) of small files.
> > On reiserfs it takes about 40 minutes, to do "find /".
> > Rsync runs for 6 hours to backup data.
> 
> There is a mount option:  max_inline=<bytes> which the man page says
> (default: min(2048, page size) )

It's half the page size per a commit from some years ago.  For compressed
size, it's the compressed data size (i.e. you can have a 4095-byte
inline file with max_inline=2048 due to the compression).

> I've never used it, so in theory the max_inline byte size is 2KiB.
> However, I have seen substantially larger inline extents than 2KiB
> when using a nodesize larger than 16KiB at mkfs time.
> 
> I've wondered whether it makes any difference for the "many small
> files" case to do more aggressive inlining of extents.
> 
> I've seen with 16KiB leaf size, often small files that could be
> inlined, are instead put into a data block group, taking up a minimum
> 4KiB block size (on x64_64 anyway). I'm not sure why, but I suspect
> there just isn't enough room in that leaf to always use inline
> extents, and yet there is enough room to just reference it as a data
> block group extent. When using a larger node size, a larger percentage
> of small files ended up using inline extents. I'd expect this to be
> quite a bit more efficient, because it eliminates a time expensive (on
> HDD anyway) seek.

Putting a lot of inline file data into metadata pages makes them less
dense, which is either good or bad depending on which bottleneck you're
currently hitting.  If you have snapshots there is an up-to-300x metadata
write amplification penalty to update extent item references every time
a shared metadata page is unshared.  Inline extents reduce the write
amplification.  On the other hand, if you are doing a lot of 'find'-style
tree sweeps, then inline extents will reduce their efficiency because more
pages will have to be read to scan the same number of dirents and inodes.

For workloads that reiserfs was good at, there's no reliable rule of
thumb to guess which is better--you have to try both, and measure results.

> Another optimization, using compress=zstd:1, which is the lowest
> compression setting. That'll increase the chance a file can use inline
> extents, in particular with a larger nodesize.
> 
> And still another optimization, at the expense of much more
> complexity, is LVM cache with an SSD. You'd have to pick a suitable
> policy for the workload, but I expect that if the iostat utilizations
> you see of often near max utilization in normal operation, you'll see
> improved performance. SSD's can handle way higher iops than HDD. But a
> lot of this optimization stuff is use case specific. I'm not even sure
> what your mean small file size is.

I've found an interesting result in cache configuration testing: btrfs's
writes with datacow seem to be very well optimized, to the point that
adding a writeback SSD cache between btrfs and a HDD makes btrfs commits
significantly slower.  A writeback cache adds latency to the write path
without removing many seeks--btrfs already does writes in big contiguous
bursts--so the extra latency makes the writeback cache slow compared to
writethrough.  A writethrough SSD cache helps with reads (which are very
seeky and benefit a lot from caching) without adding latency to writes,
and btrfs reads a _lot_ during commits.
 
> > # iotop -d30
> >
> > Total DISK READ:        34.12 M/s | Total DISK WRITE: 40.36 M/s
> > Current DISK READ:      34.12 M/s | Current DISK WRITE:      79.22 M/s
> >    TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO> COMMAND
> >   4596 be/4 root       34.12 M/s   37.79 M/s  0.00 % 91.77 % btrfs
> 
> Not so bad for many small file reads and writes with HDD. I've see
> this myself with single spindle when doing small file reads and
> writes.
> 
> 
> -- 
> Chris Murphy

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2020-01-02 21:57                   ` Chris Murphy
  2020-01-02 22:39                     ` Leszek Dubiel
@ 2020-01-06 11:14                     ` Leszek Dubiel
  2020-01-07  0:21                       ` Chris Murphy
  1 sibling, 1 reply; 32+ messages in thread
From: Leszek Dubiel @ 2020-01-06 11:14 UTC (permalink / raw)
  To: Btrfs BTRFS; +Cc: Chris Murphy, Zygo Blaxell



W dniu 02.01.2020 o 22:57, Chris Murphy pisze:

 > but I would say that in retrospect it would have been better to NOT
 > delete the device with a few bad sectors, and instead use `btrfs
 > replace` to do a 1:1 replacement of that particular drive.


Tested "replace" on ahother server:

     root@zefir:~# btrfs replace start /dev/sde1 /dev/sdb3 /

and speed was quite normal:

     1.49 TiB * ( 1024 * 1024 MiB/TiB ) / ( 4.5 hours * 3600 sec/hour )
         =     1.49 * ( 1024 * 1024 ) / ( 4.5 * 3600 )   =  96.44 MiB / sec


Questions:

1. it is a little bit confusing that kerner reports sdc1 and sde1 on the 
same line: "BTRFS warning (device sdc1): i/o error ... on dev 
/dev/sde1"....

2. can I reset counters of errors for /dev/sda1? there were errors due 
to malfunctioning SATA cable, cable was replaced and counters don't grow 
any more; btrfs withstood that failure with no data loss



=======================================

root@zefir:~# tail /var/log/kern.log
Jan  6 00:19:37 zefir kernel: [33959.394053] BTRFS info (device sdc1): 
dev_replace from /dev/sde1 (devid 4) to /dev/sdb3 started
Jan  6 00:50:32 zefir kernel: [35815.080061] INFO: NMI handler 
(perf_event_nmi_handler) took too long to run: 1.126 msecs
Jan  6 04:48:25 zefir kernel: [50087.254634] BTRFS info (device sdc1): 
dev_replace from /dev/sde1 (devid 4) to /dev/sdb3 finished



root@zefir:~# btrfs dev usage /
/dev/sda1, ID: 5
    Device size:             1.80TiB
    Device slack:              0.00B
    Data,RAID1:              1.51TiB
    Metadata,RAID1:         20.00GiB
    System,RAID1:           32.00MiB
    Unallocated:           279.97GiB

/dev/sdb3, ID: 4
    Device size:             1.80TiB
    Device slack:            1.77GiB  <<<<------------
    Data,RAID1:              1.49TiB
    Metadata,RAID1:         42.00GiB
    Unallocated:           280.00GiB

/dev/sdc1, ID: 1
    Device size:             1.80TiB
    Device slack:              0.00B
    Data,RAID1:              1.50TiB
    Metadata,RAID1:         30.00GiB
    Unallocated:           278.00GiB

/dev/sdd1, ID: 3
    Device size:             1.80TiB
    Device slack:              0.00B
    Data,RAID1:              1.50TiB
    Metadata,RAID1:         36.00GiB
    Unallocated:           277.00GiB

/dev/sdf1, ID: 2
    Device size:             1.80TiB
    Device slack:              0.00B
    Data,RAID1:              1.49TiB
    Metadata,RAID1:         40.00GiB
    System,RAID1:           32.00MiB
    Unallocated:           279.97GiB




# reduce slack
root@zefir:~# btrfs fi resize 4:max /
Resize '/' of '4:max'
root@zefir:~# btrfs dev usage /
...
/dev/sdb3, ID: 4
    Device size:             1.80TiB
    Device slack:            3.50KiB <<<<<<<<<<<<<<<<<<<<
    Data,RAID1:              1.49TiB
    Metadata,RAID1:         42.00GiB
    Unallocated:           281.77GiB
...




root@zefir:~# btrfs dev stat /
[/dev/sdc1].write_io_errs    0
[/dev/sdc1].read_io_errs     0
[/dev/sdc1].flush_io_errs    0
[/dev/sdc1].corruption_errs  0
[/dev/sdc1].generation_errs  0
[/dev/sdf1].write_io_errs    0
[/dev/sdf1].read_io_errs     0
[/dev/sdf1].flush_io_errs    0
[/dev/sdf1].corruption_errs  0
[/dev/sdf1].generation_errs  0
[/dev/sdd1].write_io_errs    0
[/dev/sdd1].read_io_errs     0
[/dev/sdd1].flush_io_errs    0
[/dev/sdd1].corruption_errs  0
[/dev/sdd1].generation_errs  0
[/dev/sdb3].write_io_errs    0
[/dev/sdb3].read_io_errs     0
[/dev/sdb3].flush_io_errs    0
[/dev/sdb3].corruption_errs  0
[/dev/sdb3].generation_errs  0
[/dev/sda1].write_io_errs    10418
[/dev/sda1].read_io_errs     227
[/dev/sda1].flush_io_errs    117
[/dev/sda1].corruption_errs  77
[/dev/sda1].generation_errs  47




# erorr count doesn't change any more:
root@zefir:/mnt/root/zefir_snaps# egrep . */root/btrfs_dev_stat  | egrep 
write_io_errs | egrep sda1 | egrep ' 23:'
2019-12-27 23:50:01 z 
harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418
2019-12-28 23:50:01 z 
harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418
2019-12-29 23:50:01 z 
harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418
2019-12-30 23:50:02 z 
harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418
2019-12-31 23:50:01 z 
harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418
2020-01-01 23:50:01 z 
harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418
2020-01-02 23:50:03 z 
harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418
2020-01-03 23:50:01 z 
harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418
2020-01-04 23:50:01 z 
harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418
2020-01-05 23:00:01 z 
harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418
2020-01-05 23:10:03 z 
harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418
2020-01-05 23:20:01 z 
harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418
2020-01-05 23:30:01 z 
harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418
2020-01-05 23:40:01 z 
harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418
2020-01-05 23:50:01 z 
harmonogramu/root/btrfs_dev_stat:[/dev/sda1].write_io_errs 10418



######### kernel log
root@zefir:~# cat /var/log/kern.log | egrep BTRFS
Jan  5 13:52:09 zefir kernel: [1291932.446093] BTRFS warning (device 
sdc1): i/o error at logical 11658111352832 on dev /dev/sde1, physical 
867246145536: metadata leaf (level 0) in tree 9109477097472
Jan  5 13:52:09 zefir kernel: [1291932.446095] BTRFS warning (device 
sdc1): i/o error at logical 11658111352832 on dev /dev/sde1, physical 
867246145536: metadata leaf (level 0) in tree 9109477097472
Jan  5 13:52:09 zefir kernel: [1291932.446098] BTRFS error (device 
sdc1): bdev /dev/sde1 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
Jan  5 13:52:09 zefir kernel: [1291932.689845] BTRFS error (device 
sdc1): fixed up error at logical 11658111352832 on dev /dev/sde1
Jan  5 14:54:30 zefir kernel: [    4.654734] BTRFS: device fsid 
dd38db97-fa4a-479b-bc1b-973f01a61a8a devid 1 transid 257 /dev/sdd3
Jan  5 14:54:30 zefir kernel: [    4.654855] BTRFS: device fsid 
0a75ba3d-0540-4370-bb04-22470e855caa devid 1 transid 241 /dev/sdf3
Jan  5 14:54:30 zefir kernel: [    4.655047] BTRFS: device fsid 
e2652ff5-6d5e-45de-912c-a662f92c10f0 devid 1 transid 280 /dev/sde3
Jan  5 14:54:30 zefir kernel: [    4.655200] BTRFS: device fsid 
8e960630-2b51-452b-b026-abc59dcf5f35 devid 1 transid 61 /dev/sda3
Jan  5 14:54:30 zefir kernel: [    4.661180] BTRFS: device fsid 
c100cf56-f210-42d7-a953-bbe1974a28f0 devid 1 transid 292 /dev/sdc3
Jan  5 14:54:30 zefir kernel: [    4.661331] BTRFS: device fsid 
9d6e641c-ec71-411e-9312-f1f67a70913f devid 3 transid 2500779 /dev/sdd1
Jan  5 14:54:30 zefir kernel: [    4.661450] BTRFS: device fsid 
9d6e641c-ec71-411e-9312-f1f67a70913f devid 2 transid 2500779 /dev/sdf1
Jan  5 14:54:30 zefir kernel: [    4.663399] BTRFS: device fsid 
9d6e641c-ec71-411e-9312-f1f67a70913f devid 4 transid 2500779 /dev/sde1
Jan  5 14:54:30 zefir kernel: [    4.663547] BTRFS: device fsid 
9d6e641c-ec71-411e-9312-f1f67a70913f devid 5 transid 2500779 /dev/sda1
Jan  5 14:54:30 zefir kernel: [    4.663779] BTRFS: device fsid 
9d6e641c-ec71-411e-9312-f1f67a70913f devid 1 transid 2500779 /dev/sdc1
Jan  5 14:54:30 zefir kernel: [    4.695640] BTRFS info (device sdc1): 
disk space caching is enabled
Jan  5 14:54:30 zefir kernel: [    4.695642] BTRFS info (device sdc1): 
has skinny extents
Jan  5 14:54:30 zefir kernel: [    5.363215] BTRFS info (device sdc1): 
bdev /dev/sde1 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
Jan  5 14:54:30 zefir kernel: [    5.363219] BTRFS info (device sdc1): 
bdev /dev/sda1 errs: wr 10418, rd 227, flush 117, corrupt 77, gen 47
Jan  5 14:54:30 zefir kernel: [   37.646828] BTRFS info (device sdc1): 
disk space caching is enabled
Jan  5 19:55:25 zefir kernel: [18107.366949] BTRFS warning (device 
sdc1): i/o error at logical 8728914731008 on dev /dev/sde1, physical 
1395531751424, root 119218, inode 63539433, offset 138190848, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:25 zefir kernel: [18107.366952] BTRFS warning (device 
sdc1): i/o error at logical 8728912101376 on dev /dev/sde1, physical 
1395529121792, root 119218, inode 63539433, offset 135561216, length 
4096, links 1 (path: var/mail/bem1)
------------------------------------------------
------ 8< cut many similar erorrs ------------------
------------------------------------------------
Jan  5 19:55:25 zefir kernel: [18107.368411] BTRFS warning (device 
sdc1): i/o error at logical 8728912101376 on dev /dev/sde1, physical 
1395529121792, root 119122, inode 63539433, offset 135561216, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:25 zefir kernel: [18107.368413] BTRFS warning (device 
sdc1): i/o error at logical 8728914731008 on dev /dev/sde1, physical 
1395531751424, root 119122, inode 63539433, offset 138190848, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:25 zefir kernel: [18107.368441] BTRFS warning (device 
sdc1): i/o error at logical 8728914731008 on dev /dev/sde1, physical 
1395531751424, root 119120, inode 63539433, offset 138190848, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:25 zefir kernel: [18107.368443] BTRFS warning (device 
sdc1): i/o error at logical 8728912101376 on dev /dev/sde1, physical 
1395529121792, root 119120, inode 63539433, offset 135561216, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:25 zefir kernel: [18107.446304] BTRFS error (device sdc1): 
bdev /dev/sde1 errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
Jan  5 19:55:25 zefir kernel: [18107.446306] BTRFS error (device sdc1): 
bdev /dev/sde1 errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
Jan  5 19:55:25 zefir kernel: [18107.446450] BTRFS error (device sdc1): 
bdev /dev/sde1 errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
Jan  5 19:55:25 zefir kernel: [18107.473619] BTRFS error (device sdc1): 
fixed up error at logical 8728912101376 on dev /dev/sde1
Jan  5 19:55:29 zefir kernel: [18111.775171] BTRFS error (device sdc1): 
fixed up error at logical 8728914731008 on dev /dev/sde1
Jan  5 19:55:29 zefir kernel: [18111.775336] BTRFS error (device sdc1): 
fixed up error at logical 8728913416192 on dev /dev/sde1
Jan  5 19:55:29 zefir kernel: [18111.778932] BTRFS error (device sdc1): 
bdev /dev/sde1 errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
Jan  5 19:55:29 zefir kernel: [18111.779367] BTRFS error (device sdc1): 
fixed up error at logical 8728912105472 on dev /dev/sde1
Jan  5 19:55:34 zefir kernel: [18116.660449] BTRFS warning (device 
sdc1): i/o error at logical 8728914735104 on dev /dev/sde1, physical 
1395531755520, root 119218, inode 63539433, offset 138194944, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:34 zefir kernel: [18116.660476] BTRFS warning (device 
sdc1): i/o error at logical 8728914735104 on dev /dev/sde1, physical 
1395531755520, root 119216, inode 63539433, offset 138194944, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:34 zefir kernel: [18116.660502] BTRFS warning (device 
sdc1): i/o error at logical 8728914735104 on dev /dev/sde1, physical 
1395531755520, root 119214, inode 63539433, offset 138194944, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:34 zefir kernel: [18116.660528] BTRFS warning (device 
sdc1): i/o error at logical 8728914735104 on dev /dev/sde1, physical 
1395531755520, root 119212, inode 63539433, offset 138194944, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:34 zefir kernel: [18116.660554] BTRFS warning (device 
sdc1): i/o error at logical 8728914735104 on dev /dev/sde1, physical 
1395531755520, root 119210, inode 63539433, offset 138194944, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:34 zefir kernel: [18116.660580] BTRFS warning (device 
sdc1): i/o error at logical 8728914735104 on dev /dev/sde1, physical 
1395531755520, root 119208, inode 63539433, offset 138194944, length 
4096, links 1 (path: var/mail/bem1)
------------------------------------------------
------ 8< cut many similar erorrs ------------------
------------------------------------------------
Jan  5 19:55:34 zefir kernel: [18116.662869] BTRFS warning (device 
sdc1): i/o error at logical 8728914735104 on dev /dev/sde1, physical 
1395531755520, root 118872, inode 63539433, offset 138194944, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:34 zefir kernel: [18116.662894] BTRFS warning (device 
sdc1): i/o error at logical 8728914735104 on dev /dev/sde1, physical 
1395531755520, root 118860, inode 63539433, offset 138194944, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:34 zefir kernel: [18116.662919] BTRFS warning (device 
sdc1): i/o error at logical 8728914735104 on dev /dev/sde1, physical 
1395531755520, root 118848, inode 63539433, offset 138194944, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:34 zefir kernel: [18116.663077] BTRFS warning (device 
sdc1): i/o error at logical 8728914735104 on dev /dev/sde1, physical 
1395531755520, root 118836, inode 63539433, offset 138194944, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:34 zefir kernel: [18116.663104] BTRFS warning (device 
sdc1): i/o error at logical 8728914735104 on dev /dev/sde1, physical 
1395531755520, root 118824, inode 63539433, offset 138194944, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:34 zefir kernel: [18116.663863] BTRFS error (device sdc1): 
bdev /dev/sde1 errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
Jan  5 19:55:38 zefir kernel: [18120.678251] BTRFS error (device sdc1): 
fixed up error at logical 8728914735104 on dev /dev/sde1
Jan  5 19:55:38 zefir kernel: [18120.681960] BTRFS error (device sdc1): 
bdev /dev/sde1 errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
Jan  5 19:55:42 zefir kernel: [18124.968600] BTRFS warning (device 
sdc1): i/o error at logical 8728913420288 on dev /dev/sde1, physical 
1395530440704, root 119218, inode 63539433, offset 136880128, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:42 zefir kernel: [18124.968627] BTRFS warning (device 
sdc1): i/o error at logical 8728913420288 on dev /dev/sde1, physical 
1395530440704, root 119216, inode 63539433, offset 136880128, length 
4096, links 1 (path: var/mail/bem1)
------------------------------------------------
------ 8< cut many similar erorrs ------------------
------------------------------------------------
Jan  5 19:55:42 zefir kernel: [18124.968961] BTRFS warning (device 
sdc1): i/o error at logical 8728913420288 on dev /dev/sde1, physical 
1395530440704, root 119190, inode 63539433, offset 136880128, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:42 zefir kernel: [18124.968986] BTRFS warning (device 
sdc1): i/o error at logical 8728913420288 on dev /dev/sde1, physical 
1395530440704, root 119188, inode 63539433, offset 136880128, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:42 zefir kernel: [18124.969010] BTRFS error (device sdc1): 
fixed up error at logical 8728912109568 on dev /dev/sde1
Jan  5 19:55:42 zefir kernel: [18124.969012] BTRFS warning (device 
sdc1): i/o error at logical 8728913420288 on dev /dev/sde1, physical 
1395530440704, root 119186, inode 63539433, offset 136880128, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:42 zefir kernel: [18124.969038] BTRFS warning (device 
sdc1): i/o error at logical 8728913420288 on dev /dev/sde1, physical 
1395530440704, root 119184, inode 63539433, offset 136880128, length 
4096, links 1 (path: var/mail/bem1)
------------------------------------------------
------ 8< cut many similar erorrs ------------------
------------------------------------------------
Jan  5 19:55:42 zefir kernel: [18124.971065] BTRFS warning (device 
sdc1): i/o error at logical 8728913420288 on dev /dev/sde1, physical 
1395530440704, root 118872, inode 63539433, offset 136880128, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:42 zefir kernel: [18124.971091] BTRFS warning (device 
sdc1): i/o error at logical 8728913420288 on dev /dev/sde1, physical 
1395530440704, root 118860, inode 63539433, offset 136880128, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:42 zefir kernel: [18124.971117] BTRFS warning (device 
sdc1): i/o error at logical 8728913420288 on dev /dev/sde1, physical 
1395530440704, root 118848, inode 63539433, offset 136880128, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:42 zefir kernel: [18124.971143] BTRFS warning (device 
sdc1): i/o error at logical 8728913420288 on dev /dev/sde1, physical 
1395530440704, root 118836, inode 63539433, offset 136880128, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:42 zefir kernel: [18124.971169] BTRFS warning (device 
sdc1): i/o error at logical 8728913420288 on dev /dev/sde1, physical 
1395530440704, root 118824, inode 63539433, offset 136880128, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:42 zefir kernel: [18124.971948] BTRFS error (device sdc1): 
bdev /dev/sde1 errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
Jan  5 19:55:42 zefir kernel: [18125.004219] BTRFS error (device sdc1): 
fixed up error at logical 8728913420288 on dev /dev/sde1
Jan  5 19:55:47 zefir kernel: [18129.529678] BTRFS error (device sdc1): 
bdev /dev/sde1 errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
Jan  5 19:55:51 zefir kernel: [18133.498294] BTRFS error (device sdc1): 
fixed up error at logical 8728914739200 on dev /dev/sde1
Jan  5 19:55:51 zefir kernel: [18133.498938] BTRFS warning (device 
sdc1): i/o error at logical 8728913424384 on dev /dev/sde1, physical 
1395530444800, root 119218, inode 63539433, offset 136884224, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:51 zefir kernel: [18133.498966] BTRFS warning (device 
sdc1): i/o error at logical 8728913424384 on dev /dev/sde1, physical 
1395530444800, root 119216, inode 63539433, offset 136884224, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:51 zefir kernel: [18133.498992] BTRFS warning (device 
sdc1): i/o error at logical 8728913424384 on dev /dev/sde1, physical 
1395530444800, root 119214, inode 63539433, offset 136884224, length 
4096, links 1 (path: var/mail/bem1)
------------------------------------------------
------ 8< cut many similar erorrs ------------------
------------------------------------------------
Jan  5 19:55:51 zefir kernel: [18133.501578] BTRFS warning (device 
sdc1): i/o error at logical 8728913424384 on dev /dev/sde1, physical 
1395530444800, root 118848, inode 63539433, offset 136884224, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:51 zefir kernel: [18133.501604] BTRFS warning (device 
sdc1): i/o error at logical 8728913424384 on dev /dev/sde1, physical 
1395530444800, root 118836, inode 63539433, offset 136884224, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:51 zefir kernel: [18133.501629] BTRFS warning (device 
sdc1): i/o error at logical 8728913424384 on dev /dev/sde1, physical 
1395530444800, root 118824, inode 63539433, offset 136884224, length 
4096, links 1 (path: var/mail/bem1)
Jan  5 19:55:51 zefir kernel: [18133.502390] BTRFS error (device sdc1): 
bdev /dev/sde1 errs: wr 0, rd 11, flush 0, corrupt 0, gen 0
Jan  5 19:55:51 zefir kernel: [18133.525951] BTRFS error (device sdc1): 
fixed up error at logical 8728913424384 on dev /dev/sde1
Jan  6 00:19:37 zefir kernel: [33959.394053] BTRFS info (device sdc1): 
dev_replace from /dev/sde1 (devid 4) to /dev/sdb3 started
Jan  6 04:48:25 zefir kernel: [50087.254634] BTRFS info (device sdc1): 
dev_replace from /dev/sde1 (devid 4) to /dev/sdb3 finished



########## before replace (/dev/sde1 got errors):
[/dev/sdc1].write_io_errs    0
[/dev/sdc1].read_io_errs     0
[/dev/sdc1].flush_io_errs    0
[/dev/sdc1].corruption_errs  0
[/dev/sdc1].generation_errs  0
[/dev/sdf1].write_io_errs    0
[/dev/sdf1].read_io_errs     0
[/dev/sdf1].flush_io_errs    0
[/dev/sdf1].corruption_errs  0
[/dev/sdf1].generation_errs  0
[/dev/sdd1].write_io_errs    0
[/dev/sdd1].read_io_errs     0
[/dev/sdd1].flush_io_errs    0
[/dev/sdd1].corruption_errs  0
[/dev/sdd1].generation_errs  0
[/dev/sde1].write_io_errs    0
[/dev/sde1].read_io_errs     11
[/dev/sde1].flush_io_errs    0
[/dev/sde1].corruption_errs  0
[/dev/sde1].generation_errs  0
[/dev/sda1].write_io_errs    10418
[/dev/sda1].read_io_errs     227
[/dev/sda1].flush_io_errs    117
[/dev/sda1].corruption_errs  77
[/dev/sda1].generation_errs  47


root@zefir:/# smartctl -a /dev/sde | egrep 'Pending|Offline_Unc|Error 
Count'
197 Current_Pending_Sector  0x0012   100   100   000    Old_age 
Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age 
Offline      -       0
ATA Error Count: 16 (device log contains only the most recent five errors)




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2020-01-06 11:14                     ` Leszek Dubiel
@ 2020-01-07  0:21                       ` Chris Murphy
  2020-01-07  7:09                         ` Leszek Dubiel
  0 siblings, 1 reply; 32+ messages in thread
From: Chris Murphy @ 2020-01-07  0:21 UTC (permalink / raw)
  To: Leszek Dubiel; +Cc: Btrfs BTRFS

On Mon, Jan 6, 2020 at 4:14 AM Leszek Dubiel <leszek@dubiel.pl> wrote:
>
>
>
> W dniu 02.01.2020 o 22:57, Chris Murphy pisze:
>
>  > but I would say that in retrospect it would have been better to NOT
>  > delete the device with a few bad sectors, and instead use `btrfs
>  > replace` to do a 1:1 replacement of that particular drive.
>
>
> Tested "replace" on ahother server:
>
>      root@zefir:~# btrfs replace start /dev/sde1 /dev/sdb3 /
>
> and speed was quite normal:
>
>      1.49 TiB * ( 1024 * 1024 MiB/TiB ) / ( 4.5 hours * 3600 sec/hour )
>          =     1.49 * ( 1024 * 1024 ) / ( 4.5 * 3600 )   =  96.44 MiB / sec
>
>
> Questions:
>
> 1. it is a little bit confusing that kerner reports sdc1 and sde1 on the
> same line: "BTRFS warning (device sdc1): i/o error ... on dev
> /dev/sde1"....

Can you provide the entire line? It's probably already confusing but
the ellipses makes it more confusing.


>
> # reduce slack
> root@zefir:~# btrfs fi resize 4:max /
> Resize '/' of '4:max'
> root@zefir:~# btrfs dev usage /
> ...
> /dev/sdb3, ID: 4
>     Device size:             1.80TiB
>     Device slack:            3.50KiB <<<<<<<<<<<<<<<<<<<<

Maybe the partition isn't aligned on a 4KiB boundary? *shrug*

But yeah one gotcha with 'btrfs replace' is that the replacement must
be equal to or bigger than the drive being replaced; and once
complete, the file system is not resized to fully utilize the
replacement drive. That's intentional because by default you may want
allocations to have the same balance as with the original device. If
you resize to max, Btrfs will favor allocations to the drive with the
most free space.


> Jan  5 13:52:09 zefir kernel: [1291932.446093] BTRFS warning (device
> sdc1): i/o error at logical 11658111352832 on dev /dev/sde1, physical
> 867246145536: metadata leaf (level 0) in tree 9109477097472


Ahh yeah I see what you mean. I think that's confusing also. The error
is on sde1. But I guess why sdc1 is reported first is probably to do
with the device the kernel considers mounted, there's not really a
good facility (or maybe it's in the newer VFS mount code, not sure)
for showing two devices mounted on a single mount point.



> [/dev/sda1].write_io_errs    10418
> [/dev/sda1].read_io_errs     227
> [/dev/sda1].flush_io_errs    117
> [/dev/sda1].corruption_errs  77
> [/dev/sda1].generation_errs  47

This isn't good either. I'd keep an eye on that. read errors can be
fixed up if there's a good copy, Btrfs will use the good copy and
overwrite the bad sector, *if* SCT ERC is lower duration than SCSI
command timer. But write and flush errors are bad. You need to find
out what that's about.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2020-01-07  0:21                       ` Chris Murphy
@ 2020-01-07  7:09                         ` Leszek Dubiel
  0 siblings, 0 replies; 32+ messages in thread
From: Leszek Dubiel @ 2020-01-07  7:09 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS


 >> # reduce slack
 >> root@zefir:~# btrfs fi resize 4:max /
 >> Resize '/' of '4:max'
 >> root@zefir:~# btrfs dev usage /
 >> ...
 >> /dev/sdb3, ID: 4
 >>     Device size:             1.80TiB
 >>     Device slack:            3.50KiB <<<<<<<<<<<<<<<<<<<<
 >
 > Maybe the partition isn't aligned on a 4KiB boundary? *shrug*

Yes... this is old partition table.




 > But yeah one gotcha with 'btrfs replace' is that the replacement must
 > be equal to or bigger than the drive being replaced; and once
 > complete, the file system is not resized to fully utilize the
 > replacement drive.

 From my "user view" it is ok. After device replace was finished there 
was slack = 1.7GiB.
Then I resised to max, and slack got down to 3.5KiB.



 > That's intentional because by default you may want
 > allocations to have the same balance as with the original device. If
 > you resize to max, Btrfs will favor allocations to the drive with the
 > most free space.

That's perfect.





 >> Jan  5 13:52:09 zefir kernel: [1291932.446093] BTRFS warning (device
 >> sdc1): i/o error at logical 11658111352832 on dev /dev/sde1, physical
 >> 867246145536: metadata leaf (level 0) in tree 9109477097472
 >
 > Ahh yeah I see what you mean. I think that's confusing also. The error
 > is on sde1. But I guess why sdc1 is reported first is probably to do
 > with the device the kernel considers mounted, there's not really a
 > good facility (or maybe it's in the newer VFS mount code, not sure)
 > for showing two devices mounted on a single mount point.

Uhhh... you're right -- sdc1 is shown as mounted filesystem. Thank you.
I didn't see this detail at first, I thought information about sdc1 
comes from BTRFS and not kernel mount.

root@zefir:~# mount | egrep sdc
/dev/sdc1 on / type btrfs 
(rw,relatime,space_cache,subvolid=83043,subvol=/zefir)

root@zefir:~# cat /etc/fstab | egrep /
UUID=9d6e641c-ec71-411e-9312-f1f67a70913f  /          btrfs 
defaults,subvol=/zefir  0   0

root@zefir:~# blkid | egrep 0913f
/dev/sdb3: UUID="9d6e641c-ec71-411e-9312-f1f67a70913f" 
UUID_SUB="e9eb3d18-2d87-479a-808d-74d61903196c" TYPE="btrfs" 
PARTUUID="c1befd42-e38c-be48-b8df-4301fa1d3b07"
/dev/sda1: UUID="9d6e641c-ec71-411e-9312-f1f67a70913f" 
UUID_SUB="8bcae5cb-b3a5-4fd8-9284-602203f2a43e" TYPE="btrfs"
/dev/sdc1: UUID="9d6e641c-ec71-411e-9312-f1f67a70913f" 
UUID_SUB="fb7055ef-6ae7-48e0-8f09-a315fc20f399" TYPE="btrfs"
/dev/sdf1: UUID="9d6e641c-ec71-411e-9312-f1f67a70913f" 
UUID_SUB="5e38a881-9b21-4a35-a1bb-889095b32254" TYPE="btrfs"
/dev/sdd1: UUID="9d6e641c-ec71-411e-9312-f1f67a70913f" 
UUID_SUB="1d776ab6-3ef2-4fe1-8026-78b6d87f85c1" TYPE="btrfs"





 >> [/dev/sda1].write_io_errs    10418
 >> [/dev/sda1].read_io_errs     227
 >> [/dev/sda1].flush_io_errs    117
 >> [/dev/sda1].corruption_errs  77
 >> [/dev/sda1].generation_errs  47
 >
 > This isn't good either. I'd keep an eye on that. read errors can be
 > fixed up if there's a good copy, Btrfs will use the good copy and
 > overwrite the bad sector, *if* SCT ERC is lower duration than SCSI
 > command timer. But write and flush errors are bad. You need to find
 > out what that's about.

This was about wrong cable.
Cable was replaced. SCT ERC workarouds installed.

I would like to reset counters to ZERO... if it is possible (?).



^ permalink raw reply	[flat|nested] 32+ messages in thread

* write amplification, was: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2020-01-04  5:38                         ` Zygo Blaxell
@ 2020-01-07 18:44                           ` Chris Murphy
  2020-01-07 19:26                             ` Holger Hoffstätte
  2020-01-07 23:32                             ` Zygo Blaxell
  0 siblings, 2 replies; 32+ messages in thread
From: Chris Murphy @ 2020-01-07 18:44 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: Btrfs BTRFS

On Fri, Jan 3, 2020 at 10:38 PM Zygo Blaxell
<ce3g8jdj@umail.furryterror.org> wrote:
>
> On Thu, Jan 02, 2020 at 04:22:37PM -0700, Chris Murphy wrote:

> > I've seen with 16KiB leaf size, often small files that could be
> > inlined, are instead put into a data block group, taking up a minimum
> > 4KiB block size (on x64_64 anyway). I'm not sure why, but I suspect
> > there just isn't enough room in that leaf to always use inline
> > extents, and yet there is enough room to just reference it as a data
> > block group extent. When using a larger node size, a larger percentage
> > of small files ended up using inline extents. I'd expect this to be
> > quite a bit more efficient, because it eliminates a time expensive (on
> > HDD anyway) seek.
>
> Putting a lot of inline file data into metadata pages makes them less
> dense, which is either good or bad depending on which bottleneck you're
> currently hitting.  If you have snapshots there is an up-to-300x metadata
> write amplification penalty to update extent item references every time
> a shared metadata page is unshared.  Inline extents reduce the write
> amplification.  On the other hand, if you are doing a lot of 'find'-style
> tree sweeps, then inline extents will reduce their efficiency because more
> pages will have to be read to scan the same number of dirents and inodes.

Egads! Soo... total tangent. I'll change the subject.

I have had multiple flash drive failures while using Btrfs: all
Samsung, several SD Cards, and so far two USB sticks. They all fail in
the essentially the same way, the media itself becomes read only. USB:
writes succeed but they do not persist. Write data to the media, and
there is no error. Read that same sector back, old data is there. SD
Card: writes fail with a call trace and diagnostic info unique to the
sd card kernel code, and everything just goes belly up. This happens
inside of 6 months of rather casual use as rootfs. And BTW Samsung
always replaces the media under warranty without complaint.

It's not a scientific sample. Could be the host device, which is the
same in each case. Could be a bug in the firmware. I have nothing to
go on really.

But I wonder if this is due to write amplification that's just not
anticipated by the manufacturers? Is there any way to test for this or
estimate the amount of amplification? This class of media doesn't
report LBA's written, so I'm at quite a lack of useful information to
know what the cause is. The relevance here though is, I really like
the idea of Btrfs used as a rootfs for things like IoT because of
compression, ostensibly there are ssd optimizations, and always on
checksumming to catch what often can be questionable media: like USB
sticks, SD Cards, eMMC, etc. But not if the write amplication has a
good chance of killing people's hardware (I have no proof of this but
now I wonder, as I read your email).

I'm aware of write amplification, I just didn't realize it could be
this massive. It's is 300x just by having snapshots at all? Or does it
get worse with each additional snapshot? And is it multiplicative or
exponentially worse?

In the most prolific snapshotting case, I had two subvolumes, each
with 20 snapshots (at most). I used default ssd mount option for the
sdcards, most recently ssd_spread with the usb sticks. And now nossd
with the most recent USB stick I just started to use.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: write amplification, was: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2020-01-07 18:44                           ` write amplification, was: " Chris Murphy
@ 2020-01-07 19:26                             ` Holger Hoffstätte
  2020-01-07 23:32                             ` Zygo Blaxell
  1 sibling, 0 replies; 32+ messages in thread
From: Holger Hoffstätte @ 2020-01-07 19:26 UTC (permalink / raw)
  To: Chris Murphy, Zygo Blaxell, Btrfs BTRFS

On 1/7/20 7:44 PM, Chris Murphy wrote:
> Egads! Soo... total tangent. I'll change the subject.

Time for science!

"Analyzing IO Amplification in Linux File Systems"
https://arxiv.org/abs/1707.08514

Ironically Btrfs does better in a related paper:

"Filesystem Aging: It’s more Usage than Fullness"
https://www.usenix.org/conference/hotstorage19/presentation/conway

enjoy,
Holger

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: write amplification, was: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2020-01-07 18:44                           ` write amplification, was: " Chris Murphy
  2020-01-07 19:26                             ` Holger Hoffstätte
@ 2020-01-07 23:32                             ` Zygo Blaxell
  2020-01-07 23:53                               ` Chris Murphy
  1 sibling, 1 reply; 32+ messages in thread
From: Zygo Blaxell @ 2020-01-07 23:32 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 8957 bytes --]

On Tue, Jan 07, 2020 at 11:44:43AM -0700, Chris Murphy wrote:
> On Fri, Jan 3, 2020 at 10:38 PM Zygo Blaxell
> <ce3g8jdj@umail.furryterror.org> wrote:
> >
> > On Thu, Jan 02, 2020 at 04:22:37PM -0700, Chris Murphy wrote:
> 
> > > I've seen with 16KiB leaf size, often small files that could be
> > > inlined, are instead put into a data block group, taking up a minimum
> > > 4KiB block size (on x64_64 anyway). I'm not sure why, but I suspect
> > > there just isn't enough room in that leaf to always use inline
> > > extents, and yet there is enough room to just reference it as a data
> > > block group extent. When using a larger node size, a larger percentage
> > > of small files ended up using inline extents. I'd expect this to be
> > > quite a bit more efficient, because it eliminates a time expensive (on
> > > HDD anyway) seek.
> >
> > Putting a lot of inline file data into metadata pages makes them less
> > dense, which is either good or bad depending on which bottleneck you're
> > currently hitting.  If you have snapshots there is an up-to-300x metadata
> > write amplification penalty to update extent item references every time
> > a shared metadata page is unshared.  Inline extents reduce the write
> > amplification.  On the other hand, if you are doing a lot of 'find'-style
> > tree sweeps, then inline extents will reduce their efficiency because more
> > pages will have to be read to scan the same number of dirents and inodes.
> 
> Egads! Soo... total tangent. I'll change the subject.
> 
> I have had multiple flash drive failures while using Btrfs: all
> Samsung, several SD Cards, and so far two USB sticks. They all fail in
> the essentially the same way, the media itself becomes read only. USB:
> writes succeed but they do not persist. Write data to the media, and
> there is no error. Read that same sector back, old data is there. SD
> Card: writes fail with a call trace and diagnostic info unique to the
> sd card kernel code, and everything just goes belly up. This happens
> inside of 6 months of rather casual use as rootfs. And BTW Samsung
> always replaces the media under warranty without complaint.
> 
> It's not a scientific sample. Could be the host device, which is the
> same in each case. Could be a bug in the firmware. I have nothing to
> go on really.

It seems to be normal behavior for USB sticks and SD cards.  I've also
had USB sticks degrade (bit errors) simply from sitting unused on a shelf
for six months.  Some low-end SATA SSDs (like $20/TB drives from Amazon)
are giant USB sticks with a SATA interface, and will fail the same way.

SD card vendors are starting to notice, and there are now SD card options
with higher endurance ratings.  Still "putting this card in a dashcam
voids the warranty" in most cases.

ext2 and msdos both make USB sticks last longer, but they have obvious
other problems.  From my fleet of raspberry pis, I find that SD cards
last longer on btrfs than ext4 with comparable write loads, but they
are still both lead very short lives, and the biggest life expectancy
improvement (up to a couple of years) comes from eliminating local
writes entirely.

> But I wonder if this is due to write amplification that's just not
> anticipated by the manufacturers? Is there any way to test for this or
> estimate the amount of amplification? This class of media doesn't
> report LBA's written, so I'm at quite a lack of useful information to
> know what the cause is. The relevance here though is, I really like
> the idea of Btrfs used as a rootfs for things like IoT because of
> compression, ostensibly there are ssd optimizations, and always on
> checksumming to catch what often can be questionable media: like USB
> sticks, SD Cards, eMMC, etc. But not if the write amplication has a
> good chance of killing people's hardware (I have no proof of this but
> now I wonder, as I read your email).
> 
> I'm aware of write amplification, I just didn't realize it could be
> this massive. It's is 300x just by having snapshots at all? Or does it
> get worse with each additional snapshot? And is it multiplicative or
> exponentially worse?

A 16K subvol metadata page can hold ~300 extent references.  Each extent
reference is bidirectional--there is a reference from the subvol metadata
page to the extent data item, and a backref from the extent data item
to the reference.  If a new reference is created via clone or dedupe or
partially overwriting an extent in the middle, then the extent item's
reference count is increased, and new backrefs are added to the extent
item's page.

When a snapshot is created, all the metadata pages except the root become
shared.  Each referenced extent data item is not changed at this time, as
there is only one metadata page containing references to each extent data
item.  The metadata page carrying the extent reference item has multiple
owners which are ancestor nodes in all of the snapshot subvol trees.

The backref walking code starts from an extent data item, and follows
references back to subvol metadata pages.  If the subvol metadata
pages are also shared, then the walking code follows those back to the
subvol roots.  The true reference count for an extent is a combination
of the direct references (subvol leaf page to extent data) and indirect
references (subvol root or node page to subvol leaf page).

When a snapshot metadata page is modified, a new metadata page is created
with mostly the same content, give or take the items that are added
or modified.  This inserts ~300 new extent data backref items into the
extent tree because they are now owned by both the old and new versions of
the metadata page.  It is as if the files located in the subvol metadata
page were silently reflinked from the old to new subvol, but only in
the specific areas listed on the single modified metadata page.

In the worst case, all ~300 extent data items are stored on separate
extent tree pages (i.e. you have really bad fragmentation on a big
filesystem, and all of the extents in a file are in different places on
disk).  In this case, to modify a single page of shared subvol metadata,
we must also update up to 300 extent tree pages.  This is where the
300x write multiply comes from.  It's not really the worst case--each
of those 300 page updates has their own multiplication (e.g. the extent
tree page may be overfilled and split, doubling one write).  If you end
up freeing pages all over the filesystem, there will be free space
cache/tree modifications as well.

In the best case, the metadata page isn't shared (i.e. it was CoWed or
it's a brand new metadata page).  In this case there's no need to update
backref pointers or reference counts for unmodified items, as they will
be deleted during the same transaction that creates a copy of them.

Real cases fall between 1x and 300x.  The first time you modify a metadata
page in a snapshotted subvol, you must also update ~300 extent data item
backrefs (unless the subvol is so small it contains fewer than 300 items).
This repeats for every shared page as it is unshared.  Over time, shared
pages are replaced with unshared pages, and performance within the subvol
returns to normal levels.  If a new snapshot is created for the subvol,
we start over at 300x item updates since every metadata page in the
subvol is now shared again.

The write multiplication contribution from snapshots quickly drops to 1x,
but it's worst after a new snapshot is created (for either the old or
new subvol).  On a big filesystem it can take minutes to create a new
directory, rename, delete, or hardlink files for the first time after
a snapshot is created, as even the most trivial metadata update becomes
an update of tens of thousands of randomly scattered extent tree pages.

Snapshots and 'cp -ax --reflink=always a b' are comparably expensive
if you modify more than 0.3% of the metadata and the modifications are
equally distributed across the subvol.  In the end, you'll have completely
unshared metadata and touched every page of the extent tree either way,
but the timing will be different (e.g. mkdirs and renames will be blamed
for the low performance, when the root cause is really the snapshot for
backups that happened 12 hours earlier).

> In the most prolific snapshotting case, I had two subvolumes, each
> with 20 snapshots (at most). I used default ssd mount option for the
> sdcards, most recently ssd_spread with the usb sticks. And now nossd
> with the most recent USB stick I just started to use.

The number of snapshots doesn't really matter:  you get the up-to-300x
write multiple from writing to a subvol that has shared metadata pages,
which happens when you have just one snapshot.  It doesn't matter if
you have 1 snapshot or 10000 (at least, not for _this_ reason).

> -- 
> Chris Murphy
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: write amplification, was: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2020-01-07 23:32                             ` Zygo Blaxell
@ 2020-01-07 23:53                               ` Chris Murphy
  2020-01-08  1:41                                 ` Zygo Blaxell
  0 siblings, 1 reply; 32+ messages in thread
From: Chris Murphy @ 2020-01-07 23:53 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: Chris Murphy, Btrfs BTRFS

On Tue, Jan 7, 2020 at 4:32 PM Zygo Blaxell
<ce3g8jdj@umail.furryterror.org> wrote:
>
> It seems to be normal behavior for USB sticks and SD cards.  I've also
> had USB sticks degrade (bit errors) simply from sitting unused on a shelf
> for six months.  Some low-end SATA SSDs (like $20/TB drives from Amazon)
> are giant USB sticks with a SATA interface, and will fail the same way.
>
> SD card vendors are starting to notice, and there are now SD card options
> with higher endurance ratings.  Still "putting this card in a dashcam
> voids the warranty" in most cases.
>
> ext2 and msdos both make USB sticks last longer, but they have obvious
> other problems.  From my fleet of raspberry pis, I find that SD cards
> last longer on btrfs than ext4 with comparable write loads, but they
> are still both lead very short lives, and the biggest life expectancy
> improvement (up to a couple of years) comes from eliminating local
> writes entirely.

It's long been an accepted fact in professional photography circles
that CF/SD card corruption is due to crappy firmware in cameras. Ergo,
format (FAT32/exFAT) only with that camera's firmware, don't delete
files using the camera's interface, suck off all the files (back them
up too) then format the card in that particular camera, etc.

I've wondered for a while now, if all of that was flash manufacturer
buck passing.


>
> > In the most prolific snapshotting case, I had two subvolumes, each
> > with 20 snapshots (at most). I used default ssd mount option for the
> > sdcards, most recently ssd_spread with the usb sticks. And now nossd
> > with the most recent USB stick I just started to use.
>
> The number of snapshots doesn't really matter:  you get the up-to-300x
> write multiple from writing to a subvol that has shared metadata pages,
> which happens when you have just one snapshot.  It doesn't matter if
> you have 1 snapshot or 10000 (at least, not for _this_ reason).

Gotcha. So I wonder if the cheap consumer USB/SD card use case,
Raspberry Pi and such, rootfs or general purpose use, should use a
4KiB node instead of default 16KiB? I read on HN someone using much
much more expensive industrial SD cards and this problem  has entirely
vanished for them (Pi use case). I've looked around and they are a lot
smaller and more expensive but for a Pi rootfs it's pretty sane
really, USD$56 for a 4G card that won't die every 6 months? They are
slower too. *shrug*


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: write amplification, was: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2020-01-07 23:53                               ` Chris Murphy
@ 2020-01-08  1:41                                 ` Zygo Blaxell
  2020-01-08  2:54                                   ` Chris Murphy
  0 siblings, 1 reply; 32+ messages in thread
From: Zygo Blaxell @ 2020-01-08  1:41 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 4281 bytes --]

On Tue, Jan 07, 2020 at 04:53:13PM -0700, Chris Murphy wrote:
> On Tue, Jan 7, 2020 at 4:32 PM Zygo Blaxell
> <ce3g8jdj@umail.furryterror.org> wrote:
> >
> > It seems to be normal behavior for USB sticks and SD cards.  I've also
> > had USB sticks degrade (bit errors) simply from sitting unused on a shelf
> > for six months.  Some low-end SATA SSDs (like $20/TB drives from Amazon)
> > are giant USB sticks with a SATA interface, and will fail the same way.
> >
> > SD card vendors are starting to notice, and there are now SD card options
> > with higher endurance ratings.  Still "putting this card in a dashcam
> > voids the warranty" in most cases.
> >
> > ext2 and msdos both make USB sticks last longer, but they have obvious
> > other problems.  From my fleet of raspberry pis, I find that SD cards
> > last longer on btrfs than ext4 with comparable write loads, but they
> > are still both lead very short lives, and the biggest life expectancy
> > improvement (up to a couple of years) comes from eliminating local
> > writes entirely.
> 
> It's long been an accepted fact in professional photography circles
> that CF/SD card corruption is due to crappy firmware in cameras. Ergo,
> format (FAT32/exFAT) only with that camera's firmware, don't delete
> files using the camera's interface, suck off all the files (back them
> up too) then format the card in that particular camera, etc.

Well, those are all good ideas.  One should never assume firmware
is bug-free or that any individual storage device is durable or
interoperable.

I have encountered embedded firmware developers who write their own VFAT
implementation from scratch "because [they] don't want to debug someone
else's", as if exhaustive bug compatibility was somehow not inherent to
correct operation of a VFAT filesystem.

> I've wondered for a while now, if all of that was flash manufacturer
> buck passing.

Consumer SD cards have been ruthlessly optimized for decades, mostly
for cost.  It will take a while for the consumer-facing part of the
industry to dig itself out of that hole.  In the meantime, we can expect
silent data corruption, bad wear leveling, and power failure corruption
to be part of the default feature set.

> > > In the most prolific snapshotting case, I had two subvolumes, each
> > > with 20 snapshots (at most). I used default ssd mount option for the
> > > sdcards, most recently ssd_spread with the usb sticks. And now nossd
> > > with the most recent USB stick I just started to use.
> >
> > The number of snapshots doesn't really matter:  you get the up-to-300x
> > write multiple from writing to a subvol that has shared metadata pages,
> > which happens when you have just one snapshot.  It doesn't matter if
> > you have 1 snapshot or 10000 (at least, not for _this_ reason).
> 
> Gotcha. So I wonder if the cheap consumer USB/SD card use case,
> Raspberry Pi and such, rootfs or general purpose use, should use a
> 4KiB node instead of default 16KiB? I read on HN someone using much
> much more expensive industrial SD cards and this problem  has entirely
> vanished for them (Pi use case). I've looked around and they are a lot
> smaller and more expensive but for a Pi rootfs it's pretty sane
> really, USD$56 for a 4G card that won't die every 6 months? They are
> slower too. *shrug*

I run btrfs with dup data and dup metadata on the Pis, with minimized
write workloads (i.e. I turn off all the local log files, sending the
data to other machines or putting it on tmpfs with periodic uploads,
and I use compression to reduce write volume).  I don't use snapshots on
these devices--they do get backups, but they are fine with plain rsync,
given the minimized write workloads.  I haven't tried changing filesystem
parameters like node size.  Dup data doesn't help the SD card last any
longer, but it does keep the device operating long enough to build and
deploy a new SD card.

Samsung is making SD cards with 10-year warranties and a 300 TBW rating
(equivalent, it is specified in units of "hours of HD video").  They are
USD$25 for 128GB, 100MB/s read 90MB/s write.  No idea what they're like
in the field, I start test deployments next week...

 
> 
> -- 
> Chris Murphy
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: write amplification, was: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
  2020-01-08  1:41                                 ` Zygo Blaxell
@ 2020-01-08  2:54                                   ` Chris Murphy
  0 siblings, 0 replies; 32+ messages in thread
From: Chris Murphy @ 2020-01-08  2:54 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: Chris Murphy, Btrfs BTRFS

On Tue, Jan 7, 2020 at 6:41 PM Zygo Blaxell
<ce3g8jdj@umail.furryterror.org> wrote:
>
> Consumer SD cards have been ruthlessly optimized for decades, mostly
> for cost.  It will take a while for the consumer-facing part of the
> industry to dig itself out of that hole.  In the meantime, we can expect
> silent data corruption, bad wear leveling, and power failure corruption
> to be part of the default feature set.

I cackled out loud at this.


> I run btrfs with dup data and dup metadata on the Pis, with minimized
> write workloads (i.e. I turn off all the local log files, sending the
> data to other machines or putting it on tmpfs with periodic uploads,
> and I use compression to reduce write volume).  I don't use snapshots on
> these devices--they do get backups, but they are fine with plain rsync,
> given the minimized write workloads.  I haven't tried changing filesystem
> parameters like node size.  Dup data doesn't help the SD card last any
> longer, but it does keep the device operating long enough to build and
> deploy a new SD card.

I use zstd:1, space_cache v2, and dsingle mdup. Curiously I've not
seen any read errors on these. They just stop writing. I can mount and
read from them fine, just writes file (silently on the USB sticks,
kinda hilarious: yep, yep, i'm writing, no problem, yep, oh you want
back what you just wrote, yeah no you get yesterday's data).


> Samsung is making SD cards with 10-year warranties and a 300 TBW rating
> (equivalent, it is specified in units of "hours of HD video").  They are
> USD$25 for 128GB, 100MB/s read 90MB/s write.  No idea what they're like
> in the field, I start test deployments next week...

Yeah the Samsung SD cards I have also are 10 year, but I have no idea
what the TBW is and they don't report writes. EVO Plus, U3, is
supposed to do 30M/s writes but no matter the file system I get 20MB/s
out of it. I get a statistically significant extra ~350K/s with Btrfs.
Reads are around 80-90M/s. Of course effective reads/writes is higher
with compression. USB sticks I think are a few years warranty.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2020-01-08  2:54 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-25 22:35 very slow "btrfs dev delete" 3x6Tb, 7Tb of data Leszek Dubiel
2019-12-26  5:08 ` Qu Wenruo
2019-12-26 13:17   ` Leszek Dubiel
2019-12-26 13:44     ` Remi Gauvin
2019-12-26 14:05       ` Leszek Dubiel
2019-12-26 14:21         ` Remi Gauvin
2019-12-26 15:42           ` Leszek Dubiel
2019-12-26 22:40         ` Chris Murphy
2019-12-26 22:58           ` Leszek Dubiel
2019-12-28 17:04             ` Leszek Dubiel
2019-12-28 20:23               ` Zygo Blaxell
2020-01-02 18:37                 ` Leszek Dubiel
2020-01-02 21:57                   ` Chris Murphy
2020-01-02 22:39                     ` Leszek Dubiel
2020-01-02 23:22                       ` Chris Murphy
2020-01-03  9:08                         ` Leszek Dubiel
2020-01-03 19:15                           ` Chris Murphy
2020-01-03 14:39                         ` Leszek Dubiel
2020-01-03 19:02                           ` Chris Murphy
2020-01-03 20:59                             ` Leszek Dubiel
2020-01-04  5:38                         ` Zygo Blaxell
2020-01-07 18:44                           ` write amplification, was: " Chris Murphy
2020-01-07 19:26                             ` Holger Hoffstätte
2020-01-07 23:32                             ` Zygo Blaxell
2020-01-07 23:53                               ` Chris Murphy
2020-01-08  1:41                                 ` Zygo Blaxell
2020-01-08  2:54                                   ` Chris Murphy
2020-01-06 11:14                     ` Leszek Dubiel
2020-01-07  0:21                       ` Chris Murphy
2020-01-07  7:09                         ` Leszek Dubiel
2019-12-26 22:15 ` Chris Murphy
2019-12-26 22:48   ` Leszek Dubiel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.