All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs differential receive has become excrutiatingly slow on one machine
@ 2014-09-08  1:51 Marc MERLIN
  2014-09-08 21:49 ` Filipe David Manana
  0 siblings, 1 reply; 11+ messages in thread
From: Marc MERLIN @ 2014-09-08  1:51 UTC (permalink / raw)
  To: linux-btrfs, fdmanana

Hi Filipe and others,

TL;DR:
- btrfs send of a 1.5GB filesystem takes 4mn to replicate that
  filesystem on the target machine.
- btrfs send -p between 2 snapshots happens in 1 second
- copying that diff to the other machine takes 8 seconds
- applying that diff with btrfs receive takes 82mn (again
  when resyncing the entire filesystem from scratch only takes 4mn)

Details:
Kernel 3.16.2 on both sides, although the problem started earlier, maybe
after 3.14.

btrfs send/receive works great on my laptop from one drive to another,
but it's unbearable when I run it to my disk server (which was on 3.14
up to today due to the bugs in 3.15 and 3.16.x<2).

Sending an entire snapshot (non differential mode) is fast too:
> legolas:/mnt/btrfs_pool1# time btrfs send root_ggm_daily_ro.20140907_13* | ssh gargamel "time btrfs receive /mnt/btrfs_pool2"
> At subvol root_ggm_daily_ro.20140907_13:16:58
> At subvol root_ggm_daily_ro.20140907_13:16:58
> 
> real	4m41.392s
> user	0m1.380s
> sys	0m9.504s
> 
> real	4m41.685s
> user	0m4.580s
> sys	0m16.844s

Sending an incremental 154MB diff takes almost 82mn for btrfs receive to process.


btrfs send differential is fast:
legolas:/mnt/btrfs_pool1# time btrfs send -p /mnt/btrfs_pool1/root_ggm_daily_ro.20140906_10:25:25 root_ggm_daily_ro.20140907_14:55:40 > /mnt/btrfs_pool2/diff
At subvol root_ggm_daily_ro.20140907_14:55:40

real	0m1.464s
user	0m0.004s
sys	0m1.172s

However, btrfs receive of that diff is slow as molasses.

The diff is smallish:
legolas:/mnt/btrfs_pool1# du -sh /mnt/btrfs_pool2/diff
154M	/mnt/btrfs_pool2/diff

The subvolume is small too:
legolas:/mnt/btrfs_pool1# du -sh root
1.5G	root

Copying that diff file over ssh takes 8 seconds, and then doing the
receive manually is where wait is going on

gargamel:/mnt/btrfs_pool2/backup/debian64/legolas# time btrfs receive -f /tmp/diff .  
At snapshot root_ggm_daily_ro.20140907_14:55:40


Strace shows that it hangs for portion of or multiple seconds on read from FH 3
btrfs   4467 root    3r   REG   0,27 160962420 13136056 /tmp/diff

/tmp/diff is on a different filesystem and different drives than the
btrfs FS the restore is happening on.

Here is a snapshot of strace showing how it hangs for multiple seconds at the read
step that reads from the diff file:
     0.000114 read(3, "h\0\0\0\21\0\305\341\244\266", 10) = 10
     0.000059 read(3, "\17\0X\0lib/modules/3.16.2-amd64-i91"..., 104) = 104
     0.000061 truncate("/mnt/btrfs_pool2/backup/debian64/legolas/root_ggm_daily_ro.20140907_14:55:40/lib/modules/3.16.2-amd64-i915-preempt-20140714/kernel/arch/x86/crypto/blowfish-x86_64.ko", 20304) = 0
>>>  91.915793 read(3, "t\0\0\0\23\0\v\370\222\244", 10) = 10
     0.000066 read(3, "\17\0X\0lib/modules/3.16.2-amd64-i91"..., 116) = 116
     0.000061 lchown("/mnt/btrfs_pool2/backup/debian64/legolas/root_ggm_daily_ro.20140907_14:55:40/lib/modules/3.16.2-amd64-i915-preempt-20140714/kernel/arch/x86/crypto/blowfish-x86_64.ko", 0, 0) = 0
     0.000125 read(3, "h\0\0\0\22\0Xa\r\271", 10) = 10
     0.000050 read(3, "\17\0X\0lib/modules/3.16.2-amd64-i91"..., 104) = 104
     0.000055 chmod("/mnt/btrfs_pool2/backup/debian64/legolas/root_ggm_daily_ro.20140907_14:55:40/lib/modules/3.16.2-amd64-i915-preempt-20140714/kernel/arch/x86/crypto/blowfish-x86_64.ko", 0644) = 0

See how that was 91 seconds up there, it's thankfully usually less.

While strace shows me the wait is on read, I'm going to assume that really the hang is 
when btrfs receive is sending the bytes to the kernel to write on disk.

The target filesystem looks like this:
Label: 'dshelf2'  uuid: d4a51178-c1e6-4219-95ab-5c5864695bfd 
        Total devices 1 FS bytes used 3.53TiB 
        devid    1 size 7.28TiB used 3.66TiB path /dev/mapper/dshelf2 

Data, single: total=3.50TiB, used=3.47TiB
System, DUP: total=8.00MiB, used=416.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, DUP: total=77.50GiB, used=62.31GiB
Metadata, single: total=8.00MiB, used=0.00
unknown, single: total=512.00MiB, used=8.64MiB

So it's not critically misbalanced since it's only 50% used, correct?

For good measure, I ran:
btrfs balance start -dusage=10 /mnt/btrfs_pool2 
ran and exited with:
Done, had to relocate 5 out of 3746 chunks
but that did not help the restore speed at all.


btrfs restore -v -v -v shows it's doing work, but slow enough that I can easily read the text going by:
truncate sbin/swaplabel size=14568
chown sbin/swaplabel - uid=0, gid=0
chmod sbin/swaplabel - mode=0755
utimes sbin/swaplabel
mkfile o85871-279575-0
rename o85871-279575-0 -> sbin/raw
utimes sbin
truncate sbin/raw size=10336
chown sbin/raw - uid=0, gid=0
chmod sbin/raw - mode=0755
utimes sbin/raw
mkfile o85872-279575-0
rename o85872-279575-0 -> sbin/fsck.cramfs  
utimes sbin
truncate sbin/fsck.cramfs size=14544
chown sbin/fsck.cramfs - uid=0, gid=0
chmod sbin/fsck.cramfs - mode=0755
utimes sbin/fsck.cramfs
mkfile o85873-279575-0
rename o85873-279575-0 -> sbin/mkfs.cramfs
utimes sbin
truncate sbin/mkfs.cramfs size=31120
chown sbin/mkfs.cramfs - uid=0, gid=0
chmod sbin/mkfs.cramfs - mode=0755
utimes sbin/mkfs.cramfs
mkfile o85874-279575-0
rename o85874-279575-0 -> sbin/findfs
utimes sbin
truncate sbin/findfs size=6208
chown sbin/findfs - uid=0, gid=0
chmod sbin/findfs - mode=0755
utimes sbin/findfs
mkfile o85875-279575-0
rename o85875-279575-0 -> sbin/wipefs
utimes sbin

Here is a full run:
gargamel:/mnt/btrfs_pool2/backup/debian64/legolas# time btrfs receive -v -v -v -f /tmp/diff .
At snapshot root_ggm_daily_ro.20140907_14:55:40
receiving snapshot root_ggm_daily_ro.20140907_14:55:40 uuid=8bfc0ac8-6436-614d-8a56-7f7894f743b8, ctransid=281821 parent_uuid=075e6ff5-9b87-874f-991c-b453a0f5d4c0, parent_ctransid=278891
utimes tmp
utimes bin
unlink bin/dmesg
utimes bin
unlink bin/lsblk
utimes bin
unlink bin/more
utimes bin
unlink bin/tailf
(...)
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=8bfc0ac8-6436-614d-8a56-7f7894f743b8, stransid=281821

real    82m55.222s
user    0m0.236s
sys     0m1.816s


Any idea what's going on?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs differential receive has become excrutiatingly slow on one machine
  2014-09-08  1:51 btrfs differential receive has become excrutiatingly slow on one machine Marc MERLIN
@ 2014-09-08 21:49 ` Filipe David Manana
  2014-09-15  0:18   ` Marc MERLIN
  0 siblings, 1 reply; 11+ messages in thread
From: Filipe David Manana @ 2014-09-08 21:49 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs, Filipe David Borba Manana

On Mon, Sep 8, 2014 at 2:51 AM, Marc MERLIN <marc@merlins.org> wrote:
> Hi Filipe and others,
>
> TL;DR:
> - btrfs send of a 1.5GB filesystem takes 4mn to replicate that
>   filesystem on the target machine.
> - btrfs send -p between 2 snapshots happens in 1 second
> - copying that diff to the other machine takes 8 seconds
> - applying that diff with btrfs receive takes 82mn (again
>   when resyncing the entire filesystem from scratch only takes 4mn)
>
> Details:
> Kernel 3.16.2 on both sides, although the problem started earlier, maybe
> after 3.14.
>
> btrfs send/receive works great on my laptop from one drive to another,
> but it's unbearable when I run it to my disk server (which was on 3.14
> up to today due to the bugs in 3.15 and 3.16.x<2).
>
> Sending an entire snapshot (non differential mode) is fast too:
>> legolas:/mnt/btrfs_pool1# time btrfs send root_ggm_daily_ro.20140907_13* | ssh gargamel "time btrfs receive /mnt/btrfs_pool2"
>> At subvol root_ggm_daily_ro.20140907_13:16:58
>> At subvol root_ggm_daily_ro.20140907_13:16:58
>>
>> real  4m41.392s
>> user  0m1.380s
>> sys   0m9.504s
>>
>> real  4m41.685s
>> user  0m4.580s
>> sys   0m16.844s
>
> Sending an incremental 154MB diff takes almost 82mn for btrfs receive to process.
>
>
> btrfs send differential is fast:
> legolas:/mnt/btrfs_pool1# time btrfs send -p /mnt/btrfs_pool1/root_ggm_daily_ro.20140906_10:25:25 root_ggm_daily_ro.20140907_14:55:40 > /mnt/btrfs_pool2/diff
> At subvol root_ggm_daily_ro.20140907_14:55:40
>
> real    0m1.464s
> user    0m0.004s
> sys     0m1.172s
>
> However, btrfs receive of that diff is slow as molasses.
>
> The diff is smallish:
> legolas:/mnt/btrfs_pool1# du -sh /mnt/btrfs_pool2/diff
> 154M    /mnt/btrfs_pool2/diff
>
> The subvolume is small too:
> legolas:/mnt/btrfs_pool1# du -sh root
> 1.5G    root
>
> Copying that diff file over ssh takes 8 seconds, and then doing the
> receive manually is where wait is going on
>
> gargamel:/mnt/btrfs_pool2/backup/debian64/legolas# time btrfs receive -f /tmp/diff .
> At snapshot root_ggm_daily_ro.20140907_14:55:40
>
>
> Strace shows that it hangs for portion of or multiple seconds on read from FH 3
> btrfs   4467 root    3r   REG   0,27 160962420 13136056 /tmp/diff
>
> /tmp/diff is on a different filesystem and different drives than the
> btrfs FS the restore is happening on.
>
> Here is a snapshot of strace showing how it hangs for multiple seconds at the read
> step that reads from the diff file:
>      0.000114 read(3, "h\0\0\0\21\0\305\341\244\266", 10) = 10
>      0.000059 read(3, "\17\0X\0lib/modules/3.16.2-amd64-i91"..., 104) = 104
>      0.000061 truncate("/mnt/btrfs_pool2/backup/debian64/legolas/root_ggm_daily_ro.20140907_14:55:40/lib/modules/3.16.2-amd64-i915-preempt-20140714/kernel/arch/x86/crypto/blowfish-x86_64.ko", 20304) = 0
>>>>  91.915793 read(3, "t\0\0\0\23\0\v\370\222\244", 10) = 10
>      0.000066 read(3, "\17\0X\0lib/modules/3.16.2-amd64-i91"..., 116) = 116
>      0.000061 lchown("/mnt/btrfs_pool2/backup/debian64/legolas/root_ggm_daily_ro.20140907_14:55:40/lib/modules/3.16.2-amd64-i915-preempt-20140714/kernel/arch/x86/crypto/blowfish-x86_64.ko", 0, 0) = 0
>      0.000125 read(3, "h\0\0\0\22\0Xa\r\271", 10) = 10
>      0.000050 read(3, "\17\0X\0lib/modules/3.16.2-amd64-i91"..., 104) = 104
>      0.000055 chmod("/mnt/btrfs_pool2/backup/debian64/legolas/root_ggm_daily_ro.20140907_14:55:40/lib/modules/3.16.2-amd64-i915-preempt-20140714/kernel/arch/x86/crypto/blowfish-x86_64.ko", 0644) = 0
>
> See how that was 91 seconds up there, it's thankfully usually less.

Hi Marc,

Does the sum of all reads from the stream file (fd 3) gets anywhere
close to the total btrfs receive time? (or even more than 50%)
Can you paste somewhere the full output of strace (with -T option)?

thanks

>
> While strace shows me the wait is on read, I'm going to assume that really the hang is
> when btrfs receive is sending the bytes to the kernel to write on disk.
>
> The target filesystem looks like this:
> Label: 'dshelf2'  uuid: d4a51178-c1e6-4219-95ab-5c5864695bfd
>         Total devices 1 FS bytes used 3.53TiB
>         devid    1 size 7.28TiB used 3.66TiB path /dev/mapper/dshelf2
>
> Data, single: total=3.50TiB, used=3.47TiB
> System, DUP: total=8.00MiB, used=416.00KiB
> System, single: total=4.00MiB, used=0.00
> Metadata, DUP: total=77.50GiB, used=62.31GiB
> Metadata, single: total=8.00MiB, used=0.00
> unknown, single: total=512.00MiB, used=8.64MiB
>
> So it's not critically misbalanced since it's only 50% used, correct?
>
> For good measure, I ran:
> btrfs balance start -dusage=10 /mnt/btrfs_pool2
> ran and exited with:
> Done, had to relocate 5 out of 3746 chunks
> but that did not help the restore speed at all.
>
>
> btrfs restore -v -v -v shows it's doing work, but slow enough that I can easily read the text going by:
> truncate sbin/swaplabel size=14568
> chown sbin/swaplabel - uid=0, gid=0
> chmod sbin/swaplabel - mode=0755
> utimes sbin/swaplabel
> mkfile o85871-279575-0
> rename o85871-279575-0 -> sbin/raw
> utimes sbin
> truncate sbin/raw size=10336
> chown sbin/raw - uid=0, gid=0
> chmod sbin/raw - mode=0755
> utimes sbin/raw
> mkfile o85872-279575-0
> rename o85872-279575-0 -> sbin/fsck.cramfs
> utimes sbin
> truncate sbin/fsck.cramfs size=14544
> chown sbin/fsck.cramfs - uid=0, gid=0
> chmod sbin/fsck.cramfs - mode=0755
> utimes sbin/fsck.cramfs
> mkfile o85873-279575-0
> rename o85873-279575-0 -> sbin/mkfs.cramfs
> utimes sbin
> truncate sbin/mkfs.cramfs size=31120
> chown sbin/mkfs.cramfs - uid=0, gid=0
> chmod sbin/mkfs.cramfs - mode=0755
> utimes sbin/mkfs.cramfs
> mkfile o85874-279575-0
> rename o85874-279575-0 -> sbin/findfs
> utimes sbin
> truncate sbin/findfs size=6208
> chown sbin/findfs - uid=0, gid=0
> chmod sbin/findfs - mode=0755
> utimes sbin/findfs
> mkfile o85875-279575-0
> rename o85875-279575-0 -> sbin/wipefs
> utimes sbin
>
> Here is a full run:
> gargamel:/mnt/btrfs_pool2/backup/debian64/legolas# time btrfs receive -v -v -v -f /tmp/diff .
> At snapshot root_ggm_daily_ro.20140907_14:55:40
> receiving snapshot root_ggm_daily_ro.20140907_14:55:40 uuid=8bfc0ac8-6436-614d-8a56-7f7894f743b8, ctransid=281821 parent_uuid=075e6ff5-9b87-874f-991c-b453a0f5d4c0, parent_ctransid=278891
> utimes tmp
> utimes bin
> unlink bin/dmesg
> utimes bin
> unlink bin/lsblk
> utimes bin
> unlink bin/more
> utimes bin
> unlink bin/tailf
> (...)
> BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=8bfc0ac8-6436-614d-8a56-7f7894f743b8, stransid=281821
>
> real    82m55.222s
> user    0m0.236s
> sys     0m1.816s
>
>
> Any idea what's going on?
>
> Thanks,
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs differential receive has become excrutiatingly slow on one machine
  2014-09-08 21:49 ` Filipe David Manana
@ 2014-09-15  0:18   ` Marc MERLIN
  2014-09-15 17:57     ` Marc MERLIN
  2015-05-11 21:44     ` btrfs differential receive has become excrutiatingly slow on one machine Marc MERLIN
  0 siblings, 2 replies; 11+ messages in thread
From: Marc MERLIN @ 2014-09-15  0:18 UTC (permalink / raw)
  To: Filipe David Manana; +Cc: linux-btrfs, Filipe David Borba Manana

On Mon, Sep 08, 2014 at 10:49:01PM +0100, Filipe David Manana wrote:
> Hi Marc,
> 
> Does the sum of all reads from the stream file (fd 3) gets anywhere
> close to the total btrfs receive time? (or even more than 50%)
> Can you paste somewhere the full output of strace (with -T option)?

Sorry for the lack of answer, I lost the snapshot I used for that mail,
so it was not possible to do again easily.
Because my backups were so hopelessly behind, I did a full resync of
/var, i.e. not a differential send (300GB or so). The copy went at about
25GB/h, which wasn't bad at all since was over wifi (took about 14H).

I'm still waiting for the next incremental backup to run.
Once it's done, I'll try to run a smaller one under strace so I don't
have a ridiculously long log to give you.

Is there a reasonable way to know if btrfs receive is indeed hanging on
kernel code, and whether it's on receive code or on code that tries to
writes the files on the filesystem in the new snapshot?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs differential receive has become excrutiatingly slow on one machine
  2014-09-15  0:18   ` Marc MERLIN
@ 2014-09-15 17:57     ` Marc MERLIN
  2014-09-16 23:57       ` btrfs differential receive has become excrutiatingly slow with COW files Marc MERLIN
  2015-05-11 21:44     ` btrfs differential receive has become excrutiatingly slow on one machine Marc MERLIN
  1 sibling, 1 reply; 11+ messages in thread
From: Marc MERLIN @ 2014-09-15 17:57 UTC (permalink / raw)
  To: Filipe David Manana; +Cc: linux-btrfs, Filipe David Borba Manana

On Sun, Sep 14, 2014 at 05:18:36PM -0700, Marc MERLIN wrote:
> I'm still waiting for the next incremental backup to run.
> Once it's done, I'll try to run a smaller one under strace so I don't
> have a ridiculously long log to give you.
> 
> Is there a reasonable way to know if btrfs receive is indeed hanging on
> kernel code, and whether it's on receive code or on code that tries to
> writes the files on the filesystem in the new snapshot?

Ok, so my incremental sync of /var which I had _just_ fully copied over,
took over 12H and then died with

ABORT: btrfs send /mnt/btrfs_pool1/var_ggm_daily_ro.20140913_14:50:24 var_ggm_daily_ro.20140914_19:16:48 | ssh gargamel.merlins.org btrfs receive /mnt/btrfs_pool2/backup/debian64/legolas/ failed
Write failed: Broken pipe

I'm not sure that my ssh connection went down because it was at home on
reliable wireless and I had other connections that were fine during that
time, but you never know.

Unfortunately, due to my stupidity, the strace logging failed, but it looks 
like I got about 12H worth of the following below. Interestingly that directory
is nocow (VM images).
Does btrfs send get unhappy with NOCOW?

Thanks,
Marc

read(0, "\17\0(\0local/nobck/VirtualBox VMs/W"..., 49212) = 15511 <0.000023>
read(0, "\336\350A\340\2@\377p\20\377u\360\3\36T\245\241\25\307\2033\314E`\20,f\20\25C$4"..., 33701) = 32768 <0.117252>
read(0, "\210:\214:\220:\224:\0\230:\234:\240:\244:\0\250:\254:\260:\264:\0\270:\304:\310"..., 933) = 933 <0.369642>
pwrite(5, "`\273\0}\300\271\f\0\0\0003\0\300\363\253\213\3163\300\211\0E\350\211U\274\213\331\213\0E\274"..., 49152, 32698048512) = 49152 <0.000086>
read(0, "<\260\0\0\17\0\224\22\261\v", 10) = 10 <0.000013>
read(0, "\17\0(\0local/nobck/VirtualBox VMs/W"..., 45116) = 31825 <0.000024>
read(0, "\325\260\3 \270\3T\270\3\340y\7\201\1\\\350\16\262\3\241\0\261\3\207\260\0038\271\0\354Z\347"..., 13291) = 13291 <0.228846>
pwrite(5, "\363\273\0\0\0\0\350\30(\353\377\0\213F\4\211E\340\213N@\0209\t\350\10\227\0x\330\0\213"..., 45056, 32698114048) = 45056 <0.000060>
read(0, "<\260\0\0\17\0(-\275\365", 10) = 10 <0.000044>
read(0, "\17\0(\0local/nobck/VirtualBox VMs/W"..., 45116) = 19467 <0.000021>
read(0, "\3\341\23P\25\241\3\360\240\3\211b\10\205\311u\0>\213G\10\203x(\0\4t5\0\1\213@"..., 25649) = 25649 <0.176543>
pwrite(5, "\226\272\10\377j\1\1\20\213\323\213\316\0\350\310N\352\377\211GH\0\351\37\1\0\0\213N(\0009"..., 45056, 32698179584) = 45056 <0.000071>
read(0, "<`\0\0\17\0\tmn,", 10)         = 10 <0.000043>
read(0, "\17\0(\0local/nobck/VirtualBox VMs/W"..., 24636) = 7109 <0.000019>
read(0, "\377\307A\10 \243\377\240\7\300m\352^\200\2\2140\3P0\2\204\1\245\3\1\320\31\314\314D\30"..., 17527) = 17527 <0.113324>
pwrite(5, "P\272\0\n\351\377\213G\4\213@\20 \211E\360\2\200\10\211E\0\3543\366\203}\360\0~H\\"..., 24576, 32698245120) = 24576 <0.000055>
read(0, "<\200\0\0\17\0\346\vV\270", 10) = 10 <0.000030>
read(0, "\17\0(\0local/nobck/VirtualBox VMs/W"..., 32828) = 15231 <0.000017>
read(0, "\350\24\202b\243(\206@\n\213E\350H\377p\10\301\26\213\310\240\26'\10\261\377\377\5f\213~T"..., 17597) = 17597 <0.165656>
pwrite(5, "\200\320u\327\205\366~u\251%D\v\372\240+\310\213\201#\36\350\320\213E\350\213!%\360`\234\300"..., 32768, 32699318272) = 32768 <0.000057>
read(0, "<@\0\0\17\0\221\256ob", 10)    = 10 <0.000015>
read(0, "\17\0(\0local/nobck/VirtualBox VMs/W"..., 16444) = 15161 <0.000015>
read(0, "\210V<4a\273\10\23\372\240\227o\20\332\267\30S\375\2628oy\271\370yj\211ryr\335\177"..., 1283) = 1283 <0.190593>
pwrite(5, "\342\377\303\320vr\21\364zs|N\302\36\262\20N\203\224\212R\315\222S)\231E\255\344\265X\211"..., 16384, 32699351040) = 16384 <0.000050>
read(0, "<\300\0\0\17\0:\205\337\336", 10) = 10 <0.000015>



-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs differential receive has become excrutiatingly slow with COW files
  2014-09-15 17:57     ` Marc MERLIN
@ 2014-09-16 23:57       ` Marc MERLIN
  2014-09-17 15:00         ` NOCOW on VM images causes extreme btrfs slowdowns, memory leaks, and deadlocks Marc MERLIN
  0 siblings, 1 reply; 11+ messages in thread
From: Marc MERLIN @ 2014-09-16 23:57 UTC (permalink / raw)
  To: Filipe David Manana; +Cc: linux-btrfs, Filipe David Borba Manana

On Mon, Sep 15, 2014 at 10:57:47AM -0700, Marc MERLIN wrote:
> Unfortunately, due to my stupidity, the strace logging failed, but it looks 
> like I got about 12H worth of the following below. Interestingly that directory
> is nocow (VM images).
> Does btrfs send get unhappy with NOCOW?

So I ran another differential send/receive last night and it took another 10 or 12H
again.
Strace shows most of the time was spent on my virtualbox images which
did not change at all.

The diff itself wasn't big and only took 2mn to generate:
legolas:/mnt/btrfs_pool1# time btrfs send -p var_ggm_daily_ro.20140913_14:50:24 var_ggm_daily_ro.20140915_22:57:33 > /tmp/diff
At subvol var_ggm_daily_ro.20140915_22:57:33
real	2m14.365s
user	0m0.400s
sys	1m10.192s
legolas:/mnt/btrfs_pool1# du -sh /tmp/diff
11G	/tmp/diff

I have a filtered log showing any system call that took more than 1 sec,
that list is small:
http://marc.merlins.org/tmp/btrfs_receive.log

Most of the time is apparently just death by a thousand cuts of many
many system calls spent around receiving my virtual images that didn't
change.

Here's the full strace log if you wish
http://marc.merlins.org/tmp/btrfs_receive.log.xz

What do you think?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NOCOW on VM images causes extreme btrfs slowdowns, memory leaks, and deadlocks
  2014-09-16 23:57       ` btrfs differential receive has become excrutiatingly slow with COW files Marc MERLIN
@ 2014-09-17 15:00         ` Marc MERLIN
  2014-09-17 17:13           ` Marc MERLIN
  0 siblings, 1 reply; 11+ messages in thread
From: Marc MERLIN @ 2014-09-17 15:00 UTC (permalink / raw)
  To: Filipe David Manana; +Cc: linux-btrfs, Filipe David Borba Manana

kernel: 3.16.2

On Tue, Sep 16, 2014 at 04:57:42PM -0700, Marc MERLIN wrote:
> I have a filtered log showing any system call that took more than 1 sec,
> that list is small:
> http://marc.merlins.org/tmp/btrfs_receive.log
> 
> Most of the time is apparently just death by a thousand cuts of many
> many system calls spent around receiving my virtual images that didn't
> change.
> 
> Here's the full strace log if you wish
> http://marc.merlins.org/tmp/btrfs_receive.log.xz

Ok, so while debugging this further, I found out that my VM images were
not NOCOW anymore (they used to be, but this must have been lost during a
restore).

Problems:
filefrag on my vbox file took all of my RAM and swap (32GB) and killed my
machine without being able to finish.

Moving the dir to +C and copying the vbox image from backup (having deleted
the fragmented one) took much longer to start than it should have
(destination had a filesize of 0 for a long time), but finished overnight.

The next morning (now), I see multiple of my CPUs deadlocked and a kworker
at the top of the list:
INFO: task kworker/u16:6:21880 blocked for more than 120 seconds.
      Tainted: G           O  3.16.2-amd64-i915-preempt-20140714 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u16:6   D 0000000000000000     0 21880      2 0x00000080
Workqueue: writeback bdi_writeback_workfn (flush-btrfs-2)
 ffff88012f87b9d0 0000000000000046 ffff88012f87b9a0 ffff88012f87bfd8
 ffff8800139c0490 00000000000140c0 ffff88041d2140c0 ffff8800139c0490
 ffff88012f87ba70 0000000000000002 ffffffff81106441 ffff88012f87b9e0
Call Trace:
 [<ffffffff81106441>] ? wait_on_page_read+0x3c/0x3c
 [<ffffffff8163a889>] schedule+0x6e/0x70
 [<ffffffff8163aa2b>] io_schedule+0x60/0x7a
 [<ffffffff8110644f>] sleep_on_page+0xe/0x12
 [<ffffffff8163adbb>] __wait_on_bit_lock+0x46/0x8a
 [<ffffffff8110650a>] __lock_page+0x69/0x6b
 [<ffffffff81087ba4>] ? autoremove_wake_function+0x34/0x34
 [<ffffffff8124aead>] lock_page+0x1e/0x21
 [<ffffffff8124ecb9>] extent_write_cache_pages.isra.16.constprop.31+0x10e/0x2c3
 [<ffffffff8124f2a2>] extent_writepages+0x4b/0x5c
 [<ffffffff81238e7c>] ? btrfs_submit_direct+0x3f9/0x3f9
 [<ffffffff81079658>] ? preempt_count_add+0x78/0x8d
 [<ffffffff81237568>] btrfs_writepages+0x28/0x2a
 [<ffffffff81110efe>] do_writepages+0x1e/0x2c
 [<ffffffff811814db>] __writeback_single_inode+0x7d/0x238
 [<ffffffff81182213>] writeback_sb_inodes+0x1eb/0x339
 [<ffffffff811823d5>] __writeback_inodes_wb+0x74/0xb7
 [<ffffffff81182550>] wb_writeback+0x138/0x293
 [<ffffffff81182b88>] bdi_writeback_workfn+0x19a/0x329
 [<ffffffff81068bf7>] process_one_work+0x195/0x2d2
 [<ffffffff81068fd8>] worker_thread+0x275/0x352
 [<ffffffff81068d63>] ? process_scheduled_works+0x2f/0x2f
 [<ffffffff8106e3a9>] kthread+0xae/0xb6
 [<ffffffff8106e2fb>] ? __kthread_parkme+0x61/0x61
 [<ffffffff8163d8fc>] ret_from_fork+0x7c/0xb0
 [<ffffffff8106e2fb>] ? __kthread_parkme+0x61/0x61

Hung tasks (sysrq-w) are here: 
http://marc.merlins.org/tmp/btrfs_hang-3.16.2.txt

I'm going to purge that fragmented vbox image from all my snapshots and reboot,
but clearly there are things that are going wrong.

Filipe, sorry for the initial bad problem report. While I can't exactly see
how it's related, it looks like btrfs receive of a heavily fragmented files
can take 12h or more.
It may not be that important to fix compared to the main problem heavy fragmentation
causes to btrfs still

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NOCOW on VM images causes extreme btrfs slowdowns, memory leaks, and deadlocks
  2014-09-17 15:00         ` NOCOW on VM images causes extreme btrfs slowdowns, memory leaks, and deadlocks Marc MERLIN
@ 2014-09-17 17:13           ` Marc MERLIN
  0 siblings, 0 replies; 11+ messages in thread
From: Marc MERLIN @ 2014-09-17 17:13 UTC (permalink / raw)
  To: Filipe David Manana; +Cc: linux-btrfs, Filipe David Borba Manana

On Wed, Sep 17, 2014 at 08:00:02AM -0700, Marc MERLIN wrote:
> kernel: 3.16.2

Grumble, I messed up the subject line.
I meant that COW on virtual disk images causes the problems described in the
previous Email.
Obviously I already know that COW on disk images wasn't ideal, but the
effects in 3.16 are still fairly severe.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs differential receive has become excrutiatingly slow on one machine
  2014-09-15  0:18   ` Marc MERLIN
  2014-09-15 17:57     ` Marc MERLIN
@ 2015-05-11 21:44     ` Marc MERLIN
  2015-05-13 11:35       ` Filipe David Manana
  1 sibling, 1 reply; 11+ messages in thread
From: Marc MERLIN @ 2015-05-11 21:44 UTC (permalink / raw)
  To: Filipe David Manana; +Cc: linux-btrfs, Filipe David Borba Manana

On Sun, Sep 14, 2014 at 05:18:36PM -0700, Marc MERLIN wrote:
> On Mon, Sep 08, 2014 at 10:49:01PM +0100, Filipe David Manana wrote:
> > Hi Marc,
> > 
> > Does the sum of all reads from the stream file (fd 3) gets anywhere
> > close to the total btrfs receive time? (or even more than 50%)
> > Can you paste somewhere the full output of strace (with -T option)?
> 
> Sorry for the lack of answer, I lost the snapshot I used for that mail,
> so it was not possible to do again easily.
> Because my backups were so hopelessly behind, I did a full resync of
> /var, i.e. not a differential send (300GB or so). The copy went at about
> 25GB/h, which wasn't bad at all since was over wifi (took about 14H).

Sigh, now that I'm resyncing my laptop I just rebuilt after the btrfs
crash, to my server (both running 3.19.5+), full btrfs sends (i.e. not
incremental), are taking ages.

I'm seeing less than 100GB/day on my home network when my tcp
connections over wifi easily get 50MB/s

Right now I'm seeing the equivalent of aout 1MB/s, or 50 times less than
what my network connection can do.

Last time I tried to strace btrfs send, it killed the process with
SIGPIPE and I lost a full day of sync and had to start over :(

It's a broad question, but how can I diagnose btrfs send being so slow
without taking the risk of killing my connection?
(if there is no good answer on this one, I can try another sync later
with -vvv and strace if you'd like)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs differential receive has become excrutiatingly slow on one machine
  2015-05-11 21:44     ` btrfs differential receive has become excrutiatingly slow on one machine Marc MERLIN
@ 2015-05-13 11:35       ` Filipe David Manana
  2015-06-17 17:58         ` Marc MERLIN
  0 siblings, 1 reply; 11+ messages in thread
From: Filipe David Manana @ 2015-05-13 11:35 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs, Filipe David Borba Manana

On Mon, May 11, 2015 at 10:44 PM, Marc MERLIN <marc@merlins.org> wrote:
> On Sun, Sep 14, 2014 at 05:18:36PM -0700, Marc MERLIN wrote:
>> On Mon, Sep 08, 2014 at 10:49:01PM +0100, Filipe David Manana wrote:
>> > Hi Marc,
>> >
>> > Does the sum of all reads from the stream file (fd 3) gets anywhere
>> > close to the total btrfs receive time? (or even more than 50%)
>> > Can you paste somewhere the full output of strace (with -T option)?
>>
>> Sorry for the lack of answer, I lost the snapshot I used for that mail,
>> so it was not possible to do again easily.
>> Because my backups were so hopelessly behind, I did a full resync of
>> /var, i.e. not a differential send (300GB or so). The copy went at about
>> 25GB/h, which wasn't bad at all since was over wifi (took about 14H).
>
> Sigh, now that I'm resyncing my laptop I just rebuilt after the btrfs
> crash, to my server (both running 3.19.5+), full btrfs sends (i.e. not
> incremental), are taking ages.
>
> I'm seeing less than 100GB/day on my home network when my tcp
> connections over wifi easily get 50MB/s
>
> Right now I'm seeing the equivalent of aout 1MB/s, or 50 times less than
> what my network connection can do.
>
> Last time I tried to strace btrfs send, it killed the process with
> SIGPIPE and I lost a full day of sync and had to start over :(
>
> It's a broad question, but how can I diagnose btrfs send being so slow
> without taking the risk of killing my connection?
> (if there is no good answer on this one, I can try another sync later
> with -vvv and strace if you'd like)

Try to see if it's a problem at the sending side or at the receiving
side. Redirect send's output to a file, see how much it takes. Then
run receive with that file as input and see how long it takes.
You can also use 'perf record -ag' while doing both, it might give
some useful information.

>
> Thanks,
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs differential receive has become excrutiatingly slow on one machine
  2015-05-13 11:35       ` Filipe David Manana
@ 2015-06-17 17:58         ` Marc MERLIN
  2015-06-17 21:54           ` Marc MERLIN
  0 siblings, 1 reply; 11+ messages in thread
From: Marc MERLIN @ 2015-06-17 17:58 UTC (permalink / raw)
  To: Filipe David Manana; +Cc: linux-btrfs, Filipe David Borba Manana

On Wed, May 13, 2015 at 12:35:20PM +0100, Filipe David Manana wrote:
> > It's a broad question, but how can I diagnose btrfs send being so slow
> > without taking the risk of killing my connection?
> > (if there is no good answer on this one, I can try another sync later
> > with -vvv and strace if you'd like)
> 
> Try to see if it's a problem at the sending side or at the receiving
> side. Redirect send's output to a file, see how much it takes. Then
> run receive with that file as input and see how long it takes.
> You can also use 'perf record -ag' while doing both, it might give
> some useful information.

Hi Filipe, sorry this took a while, I've been away and then had my
server hardware die, but things are back up and I'm now trying to sync a
100GB btrfs diff from my laptop to my server.

I've confirmed that btrfs send is fast (I sent it to a file)
Then scp of the diff is fast too (45mn for 100GB over wireless)
But the restore is slow. I see files going by quickly, and then hangs and restarts.

You requested strace -T in the past. I'm showing an exerpt of system calls that take
more than 1 second.

When I see this, I get worried:
truncate("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/merlin-change/Maildir.google/lists2/new/1432663866_0.19916.legolas,U=427356,FMD5=7e806062200fb6d33546530d24aac86c:2,", 21043) = 0 <19.335333>

Or this:
unlink("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/src/linux-3.19.8-amd64-i915-volpreempt-s20150421/drivers/media/tuners/mt2266.mod.dwo") = 0 <28.298224>
unlink("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/merlin-change/Maildir.google/INBOX/cur/1432061846_0.2789.legolas,U=381014,FMD5=7e33429f656f1e6e9d79b29c3f82c57e:2,") = 0 <45.084068>

19 seconds for a truncate or 28 or 45 seconds for an unlink cannot be right of course.

It's btrfs over dmcrypt over swraid5 (5 drives). Filesystem is only half full:

gargamel:~# btrfs fi show /mnt/btrfs_pool2/
Label: 'dshelf2'  uuid: d4a51178-c1e6-4219-95ab-5c5864695bfd
        Total devices 1 FS bytes used 4.34TiB
        devid    1 size 7.28TiB used 4.43TiB path /dev/mapper/dshelf2

gargamel:~# btrfs fi df /mnt/btrfs_pool2/
Data, single: total=4.28TiB, used=4.28TiB
System, DUP: total=8.00MiB, used=496.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=77.50GiB, used=68.05GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=192.00KiB


More system calls:

gargamel:~# grep ' <[1-9]'  /mnt/btrfs_pool2/strace
ioctl(3, BTRFS_IOC_SNAP_CREATE_V2, 0x7ffca6be3a40) = 0 <11.430413>
link("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/www/Pix/new/tmp01/dsc05964.jpg", "/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/www/Pix/albums/Outings/Dinners/Misc/20150228_Alexander_Patisserie.jpg") = 0 <1.572029>
) = 93 <1.195424>
truncate("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/merlin-change/.config/google-chrome-beta/Profile 1/Storage/ext/beknehfpfkghjoafdifaflglpjkojoco/def/Cookies", 7168) = 0 <53.618366>
unlink("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/src/linux-3.19.8-amd64-i915-volpreempt-s20150421/drivers/iio/dac/.ad7303.ko.cmd") = 0 <64.146415>
unlink("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/src/linux-3.19.8-amd64-i915-volpreempt-s20150421/drivers/iio/dac/.mcp4725.ko.cmd") = 0 <10.403782>
unlink("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/src/linux-3.19.8-amd64-i915-volpreempt-s20150421/drivers/iio/dac/ad5360.mod.dwo") = 0 <2.180297>
unlink("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/src/linux-3.19.8-amd64-i915-volpreempt-s20150421/drivers/iio/dac/ad5360.mod.o") = 0 <1.278088>
unlink("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/src/linux-3.19.8-amd64-i915-volpreempt-s20150421/drivers/md/persistent-data/built-in.o") = 0 <3.706420>
unlink("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/src/linux-3.19.8-amd64-i915-volpreempt-s20150421/drivers/media/tuners/mt2266.mod.dwo") = 0 <28.298224>
unlink("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/src/linux-3.19.8-amd64-i915-volpreempt-s20150421/drivers/media/usb/dvb-usb-v2/.tmp_af9035.dwo") = 0 <7.626091>
unlink("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/src/linux-3.19.8-amd64-i915-volpreempt-s20150421/drivers/net/wimax/i2400m/i2400m.ko") = 0 <25.749138>
unlink("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/merlin-change/.config/google-chrome-unstable/Default/Cache/f_002221") = 0 <1.751792>
unlink("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/merlin-change/Maildir.google/INBOX/cur/1432061343_0.31033.legolas,U=381013,FMD5=7e33429f656f1e6e9d79b29c3f82c57e:2,") = 0 <10.108744>
unlink("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/merlin-change/Maildir.google/INBOX/cur/1432061846_0.2789.legolas,U=381014,FMD5=7e33429f656f1e6e9d79b29c3f82c57e:2,") = 0 <45.084068>
unlink("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/merlin-change/.config/google-chrome-unstable/Default/Application Cache/Cache/f_000406") = 0 <1.018996>
unlink("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/merlin-change/.config/google-chrome-unstable/Default/IndexedDB/https_docs.google.com_0.indexeddb.leveldb/004187.ldb") = 0 <1.421004>
unlink("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/merlin-change/.config/google-chrome-beta/Profile 1/Cache/f_000113") = 0 <3.521448>
utimensat(AT_FDCWD, "/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/merlin-change/.config/google-chrome-beta/Profile 1/Cache", {{1432074658, 417493132}, {1434520921, 352677949}}, AT_SYMLINK_NOFOLLOW) = 0 <7.411364>
unlink("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/merlin-change/.config/google-chrome-beta/Profile 1/Cache/f_000114") = 0 <1.069991>
unlink("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/merlin-change/.config/google-chrome-beta/Profile 1/Application Cache/Cache/f_0058af") = 0 <11.624418>
unlink("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/merlin-change/.config/google-chrome-beta/Profile 1/Application Cache/Cache/f_0058b0") = 0 <13.887990>
unlink("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/merlin-change/.config/google-chrome-beta/Profile 1/Application Cache/Cache/f_0058bf") = 0 <1.050436>
utimensat(AT_FDCWD, "/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/o2323882-50676-0/204_20150519_WLW_PAO.jpg", {{1432186112, 0}, {1432186112, 0}}, AT_SYMLINK_NOFOLLOW) = 0 <1.487349>
link("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/www/Pix/new/tmp06/dsc00007.jpg", "/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/www/PrivPix/albums/Home/Misc/Barranca/850_Garage_Door.jpg") = 0 <1.296418>
read(0, "\17\0%\0www/Pix/new/Div1pix/orig/dsc"..., 49209) = 49209 <3.409544>
read(0, "\17\0%\0www/Pix/new/Div1pix/orig/dsc"..., 49209) = 49209 <1.708734>
truncate("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/www/Pix/new/Div1pix/orig/dsc00117.jpg", 4043020) = 0 <30.348609>
truncate("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/www/Pix/new/Div1pix/orig/dsc00118.jpg", 3271147) = 0 <19.919724>
truncate("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/www/Pix/new/Div1pix/orig/dsc00119.jpg", 4152259) = 0 <5.785387>
pwrite(5, "\314\362\221O\202\302\10@4\16Ikv\243\365fnV\251\317\243  <4(\34s\251LB\227"..., 49152, 10485760) = 49152 <0.000023>
pwrite(5, "_\255\250\351n\323\211 <2PH\23\250  ;\255f9\30x3.\301\307,\262T\3%>"..., 49152, 16924672) = 49152 <0.000023>
pwrite(5, "vD_'\241\3051\251,\5\2\250E\32\252~\256\323\30\2\341\303 <2\233H\21202\221v"..., 49152, 5865472) = 49152 <0.000031>
read(0, "\17\0 \0www/Pix/new/Div2pix/DSC00157"..., 49204) = 49204 <3.668907>
read(0, "\17\0 \0www/Pix/new/Div2pix/DSC00157"..., 49204) = 49204 <4.023088>
truncate("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/www/Pix/new/Div2pix/DSC00175.ARW", 20588032) = 0 <11.129091>
read(0, "\17\0 \0www/Pix/new/Div2pix/DSC00176"..., 49204) = 49204 <8.645482>
read(0, "\17\0 \0www/Pix/new/Div2pix/DSC00189"..., 49204) = 49204 <1.369523>
truncate("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/www/Pix/new/Div2pix/DSC00189.ARW", 20804096) = 0 <2.916934>
pwrite(5, "\237\225+\255&#\201x4\17\20\244\360\220 <5\205(\255\210r\331l4\227\4%\301\350\214:"..., 49152, 12582912) = 49152 <0.000020>
pwrite(5, "\252\207\273\213\205\0X|D\215H\4\200x\210r\364\336\366\353d\3408(*\214\203#Q <2"..., 49152, 17072128) = 49152 <0.000037>
read(0, "\17\0 \0www/Pix/new/Div2pix/DSC00211"..., 49204) = 49204 <6.216209>
read(0, "\17\0 \0www/Pix/new/Div2pix/DSC00211"..., 49204) = 49204 <2.990708>
truncate("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/www/Pix/new/Div2pix/DSC00211.ARW", 21403136) = 0 <3.533557>
pwrite(5, "D\263\26\3G\0051\201|>!\224\254\325:\255\254\222S\213@\201` <4\213C\243B\321<"..., 49152, 13549568) = 49152 <0.000027>
truncate("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/www/Pix/new/Div2pix/DSC00225.ARW", 20851200) = 0 <8.745869>
truncate("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/www/Pix/new/Div2pix/DSC00226.ARW", 20775936) = 0 <4.497084>
pwrite(5, "\250\36\353\217eb <4\237\217\352\346+:\24186g\3\241\20\241T.\240\225\255\250\244F\243"..., 49152, 11255808) = 49152 <0.000025>
read(0, "\17\0 \0www/Pix/new/Div2pix/DSC00250"..., 49204) = 49204 <1.323551>
read(0, "\17\0 \0www/Pix/new/Div2pix/DSC00250"..., 49204) = 49204 <8.653995>
truncate("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/www/Pix/new/Div2pix/DSC00250.ARW", 21162496) = 0 <4.062709>
pwrite(5, "\270#\327\v\304\342\200\230T\22\33\v,U\372\252\5\316\351\vG\22 <2\"\225N\352mZ\255"..., 49152, 11829248) = 49152 <0.000019>
truncate("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/www/Pix/new/Div2pix/DSC00277.ARW", 21304320) = 0 <24.996196>
pwrite(5, "\3422\224\27\302D\320 <2\235\217g\303\252\244W\266lG\241@\261tT\252_\316W\203\202\267"..., 49152, 18595840) = 49152 <0.000021>
pwrite(5, "z\23Y8\t\346\344\t\f*\222Q$r\0214FCW\24\210\303\303 <6\35\323'#\360P"..., 49152, 3817472) = 49152 <0.000023>
pwrite(5, ":BO\230LG\241\231\10\10\211\300B0 <2d\26(\357\303\231=\2\212B#p\0IT"..., 49152, 11059200) = 49152 <0.000033>
truncate("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/www/Pix/new/Div2pix/DSC00313.ARW", 20855808) = 0 <10.785386>
truncate("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/www/Pix/new/Div2pix/DSC00314.ARW", 20806656) = 0 <1.424402>
pwrite(5, "\0274\327\303\200qP$\22\224\305$\324\212\311\206\306R\222\223\306\341\340 <6\234\314f\23\232\341"..., 49152, 868352) = 49152 <0.000020>
truncate("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/www/Pix/new/Div2pix/DSC00345.ARW", 20960256) = 0 <7.405145>
truncate("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/www/Pix/new/Div2pix/DSC00346.ARW", 20952064) = 0 <17.107006>
pwrite(5, "\26\2\17\n\6a\00080$\24K\342p \34*}\340\265`\20( <1!m\3020\24\22"..., 32768, 7831552) = 32768 <0.000028>
rename("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/o2241724-43287-0", "/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/merlin-change/Maildir.google/lists2/new/1432663866_0.19916.legolas,U=427356,FMD5=7e806062200fb6d33546530d24aac86c:2,") = 0 <6.434152>
truncate("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/merlin-change/Maildir.google/lists2/new/1432663866_0.19916.legolas,U=427356,FMD5=7e806062200fb6d33546530d24aac86c:2,", 21043) = 0 <19.335333>
truncate("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/merlin-change/Maildir.google/lists2/new/1432663868_0.19916.legolas,U=427355,FMD5=7e806062200fb6d33546530d24aac86c:2,", 688135) = 0 <6.613715>
truncate("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/merlin-change/Maildir.google/lists2/new/1432665782_0.19916.legolas,U=427359,FMD5=7e806062200fb6d33546530d24aac86c:2,", 10795) = 0 <1.009631>

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs differential receive has become excrutiatingly slow on one machine
  2015-06-17 17:58         ` Marc MERLIN
@ 2015-06-17 21:54           ` Marc MERLIN
  0 siblings, 0 replies; 11+ messages in thread
From: Marc MERLIN @ 2015-06-17 21:54 UTC (permalink / raw)
  To: Filipe David Manana; +Cc: linux-btrfs, Filipe David Borba Manana

On Wed, Jun 17, 2015 at 10:58:05AM -0700, Marc MERLIN wrote:
> You requested strace -T in the past. I'm showing an exerpt of system calls that take
> more than 1 second.
> 
> When I see this, I get worried:
> truncate("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/merlin-change/Maildir.google/lists2/new/1432663866_0.19916.legolas,U=427356,FMD5=7e806062200fb6d33546530d24aac86c:2,", 21043) = 0 <19.335333>
> 
> Or this:
> unlink("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/src/linux-3.19.8-amd64-i915-volpreempt-s20150421/drivers/media/tuners/mt2266.mod.dwo") = 0 <28.298224>
> unlink("/mnt/btrfs_pool2/backup/debian64/legolas/varchange_ggm_daily_ro.20150616_23:06:10/merlin-change/Maildir.google/INBOX/cur/1432061846_0.2789.legolas,U=381014,FMD5=7e33429f656f1e6e9d79b29c3f82c57e:2,") = 0 <45.084068>
> 
> 19 seconds for a truncate or 28 or 45 seconds for an unlink cannot be right of course.

Interesting. The restore only took 2.5h.
It's still too long but not as bad as I thought.

But now I think I understand what's going on, because of the frequent
pauses of a few seconds to 30s or more, this totally destroys the tcp
flow, causing the sender to stop, and re-start sending slowly, ramp up
the speed, only to be stopped again.

No wonder that given that it can take 12h or more when I have send to
receive talk over the network.

So now the question is why the receive pauses for so long, and pseudo-randomly.

Is there anything I can provide on that filesystem that would help?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-06-17 21:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-08  1:51 btrfs differential receive has become excrutiatingly slow on one machine Marc MERLIN
2014-09-08 21:49 ` Filipe David Manana
2014-09-15  0:18   ` Marc MERLIN
2014-09-15 17:57     ` Marc MERLIN
2014-09-16 23:57       ` btrfs differential receive has become excrutiatingly slow with COW files Marc MERLIN
2014-09-17 15:00         ` NOCOW on VM images causes extreme btrfs slowdowns, memory leaks, and deadlocks Marc MERLIN
2014-09-17 17:13           ` Marc MERLIN
2015-05-11 21:44     ` btrfs differential receive has become excrutiatingly slow on one machine Marc MERLIN
2015-05-13 11:35       ` Filipe David Manana
2015-06-17 17:58         ` Marc MERLIN
2015-06-17 21:54           ` Marc MERLIN

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.