* 5.6, slow send/receive, < 1MB/s
@ 2020-05-04 7:43 Chris Murphy
2020-05-04 7:47 ` Nikolay Borisov
2020-05-04 8:03 ` Chris Murphy
0 siblings, 2 replies; 14+ messages in thread
From: Chris Murphy @ 2020-05-04 7:43 UTC (permalink / raw)
To: Btrfs BTRFS
btrfs-progs-5.6-1.fc32.x86_64
kernel-5.6.8-300.fc32.x86_64
I have one snapshot of a subvolume being sent from one Btrfs
filesystem to another. Both are single device. The HDD scrub
performance is ~90MB/s. The send/receive is averaging less than 2MB/s.
Upon attaching strace to the send PID, it dies.
# btrfs send /mnt/first/scratch/gits.20200503/ | btrfs receive /mnt/aviat
ERROR: command length 1718776930 too big for buffer 65536
#
$ sudo strace -p 5628
strace: Process 5628 attached
ioctl(4, BTRFS_IOC_SEND, {send_fd=6, clone_sources_count=0,
clone_sources=NULL, parent
_root=0, flags=0}) = ?
+++ killed by SIGPIPE +++
During the send/receive, top reports both of them using 90%+ CPU
rsync averages 65MBs
Another subvolume, also one snapshot, has long periods of time (1/2
hour) where the transfer rate is never above 500K/s, but both the
btrfs send and btrfs receive PIDS are pinned at 99%+ CPU.
perf top during high btrfs command CPU consumption and low IO throughput
34.01% [kernel] [k]
mutex_spin_on_owner
28.35% [kernel] [k] mutex_unlock
8.05% [kernel] [k]
mwait_idle_with_hints.constprop.
5.10% [kernel] [k] mutex_lock
perf top during moderate btrfs command CPU consumption and high IO throughput
15.16% [kernel] [k]
fuse_perform_write
10.27% [kernel] [k]
mwait_idle_with_hints.constprop.
7.97% [kernel] [k] mutex_unlock
7.35% [kernel] [k]
mutex_spin_on_owner
5.75% btrfs [.] __crc32c_le
I suspect many fragments for some files when performance is slow, and
somehow btrfs send is more adversely affected by this than cp or
rsync.
--
Chris Murphy
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 5.6, slow send/receive, < 1MB/s
2020-05-04 7:43 5.6, slow send/receive, < 1MB/s Chris Murphy
@ 2020-05-04 7:47 ` Nikolay Borisov
2020-05-04 8:03 ` Chris Murphy
1 sibling, 0 replies; 14+ messages in thread
From: Nikolay Borisov @ 2020-05-04 7:47 UTC (permalink / raw)
To: Chris Murphy, Btrfs BTRFS
On 4.05.20 г. 10:43 ч., Chris Murphy wrote:
> btrfs-progs-5.6-1.fc32.x86_64
> kernel-5.6.8-300.fc32.x86_64
>
> I have one snapshot of a subvolume being sent from one Btrfs
> filesystem to another. Both are single device. The HDD scrub
> performance is ~90MB/s. The send/receive is averaging less than 2MB/s.
> Upon attaching strace to the send PID, it dies.
>
> # btrfs send /mnt/first/scratch/gits.20200503/ | btrfs receive /mnt/aviat
> ERROR: command length 1718776930 too big for buffer 65536
> #
>
> $ sudo strace -p 5628
> strace: Process 5628 attached
> ioctl(4, BTRFS_IOC_SEND, {send_fd=6, clone_sources_count=0,
> clone_sources=NULL, parent
> _root=0, flags=0}) = ?
> +++ killed by SIGPIPE +++
>
> During the send/receive, top reports both of them using 90%+ CPU
>
> rsync averages 65MBs
>
> Another subvolume, also one snapshot, has long periods of time (1/2
> hour) where the transfer rate is never above 500K/s, but both the
> btrfs send and btrfs receive PIDS are pinned at 99%+ CPU.
>
> perf top during high btrfs command CPU consumption and low IO throughput
>
> 34.01% [kernel] [k]
> mutex_spin_on_owner
> 28.35% [kernel] [k] mutex_unlock
> 8.05% [kernel] [k]
> mwait_idle_with_hints.constprop.
> 5.10% [kernel] [k] mutex_lock
That doesn't say which lock would be the contended one. Could you
perhahps run with perf top -g -U and see what is the callchain.
>
> perf top during moderate btrfs command CPU consumption and high IO throughput
>
> 15.16% [kernel] [k]
> fuse_perform_write
> 10.27% [kernel] [k]
> mwait_idle_with_hints.constprop.
> 7.97% [kernel] [k] mutex_unlock
> 7.35% [kernel] [k]
> mutex_spin_on_owner
> 5.75% btrfs [.] __crc32c_le
>
>
> I suspect many fragments for some files when performance is slow, and
> somehow btrfs send is more adversely affected by this than cp or
> rsync.
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 5.6, slow send/receive, < 1MB/s
2020-05-04 7:43 5.6, slow send/receive, < 1MB/s Chris Murphy
2020-05-04 7:47 ` Nikolay Borisov
@ 2020-05-04 8:03 ` Chris Murphy
2020-05-04 8:12 ` Chris Murphy
2020-05-04 8:13 ` Nikolay Borisov
1 sibling, 2 replies; 14+ messages in thread
From: Chris Murphy @ 2020-05-04 8:03 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS
receive (rw,noatime,seclabel,compress-force=zstd:5,space_cache=v2,subvolid=5,subvol=/)
send (rw,noatime,seclabel,compress=zstd:1,nossd,notreelog,space_cache=v2,subvolid=5,subvol=/)
Both are on dm-crypt.
perf top -g -U consumes about 85% CPU according to top, and every time
I run it, the btrfs send performance *increases*. When I cancel this
perf top command, it returns to the slower performance. Curious.
Samples
Children Self Shared Object Symbol
+ 40.56% 0.09% [kernel] [k] entry_SYSCALL_64_after_hwframe
+ 40.30% 0.07% [kernel] [k] do_syscall_64
+ 27.88% 0.00% [kernel] [k] ret_from_fork
+ 27.88% 0.00% [kernel] [k] kthread
+ 27.20% 0.02% [kernel] [k] process_one_work
+ 25.69% 0.01% [kernel] [k] worker_thread
+ 22.05% 0.01% [kernel] [k] btrfs_work_helper
+ 19.01% 0.01% [kernel] [k] __x64_sys_splice
+ 18.98% 2.02% [kernel] [k] do_splice
+ 14.70% 0.00% [kernel] [k] async_cow_start
+ 14.70% 0.00% [kernel] [k] compress_file_range
+ 14.40% 0.00% [kernel] [k] temp_notif.12+0x4
+ 13.20% 0.00% [kernel] [k] ZSTD_compressContinue_internal
+ 13.14% 0.00% [kernel] [k] garp_protos+0x35
+ 12.99% 0.01% [kernel] [k] nft_request_module
+ 12.95% 0.01% [kernel] [k] ZSTD_compressStream
+ 11.95% 0.01% [kernel] [k] ksys_read
+ 11.87% 0.02% [kernel] [k] vfs_read
+ 11.76% 11.45% [kernel] [k] fuse_perform_write
+ 11.26% 0.02% [kernel] [k] do_idle
+ 10.80% 0.01% [kernel] [k] cpuidle_enter
+ 10.79% 0.01% [kernel] [k] cpuidle_enter_state
+ 10.54% 10.54% [kernel] [k] mwait_idle_with_hints.constprop.0
+ 10.33% 0.00% libc-2.31.so [.] splice
+ 7.93% 7.59% [kernel] [k] mutex_unlock
+ 7.38% 0.01% [kernel] [k] new_sync_read
+ 7.36% 0.10% [kernel] [k] pipe_read
+ 6.46% 0.07% [kernel] [k] __mutex_lock.constprop.0
+ 6.34% 6.23% [kernel] [k] mutex_spin_on_owner
+ 6.20% 0.00% [kernel] [k] secondary_startup_64
+ 6.20% 0.00% [kernel] [k] cpu_startup_entry
+ 6.05% 3.89% [kernel] [k] mutex_lock
+ 5.78% 0.00% [kernel] [k] intel_idle
+ 5.53% 0.00% [kernel] [k] iwl_mvm_mac_conf_tx
+ 5.43% 0.00% [kernel] [k] run_one_async_start
+ 5.43% 0.00% [kernel] [k] btrfs_submit_bio_start
+ 5.31% 0.06% [kernel] [k] blake2b_update
+ 5.11% 1.39% [kernel] [k] pipe_double_lock
+ 4.28% 0.01% [kernel] [k] proc_reg_read
+ 4.27% 0.09% [kernel] [k] seq_read
For a higher level overview, try: perf top --sort comm,dso
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 5.6, slow send/receive, < 1MB/s
2020-05-04 8:03 ` Chris Murphy
@ 2020-05-04 8:12 ` Chris Murphy
2020-05-04 8:13 ` Nikolay Borisov
1 sibling, 0 replies; 14+ messages in thread
From: Chris Murphy @ 2020-05-04 8:12 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS
So far it's consistently improving the performance by 3-6x just by
running perf top -g -U.
---
Chris Murphy
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 5.6, slow send/receive, < 1MB/s
2020-05-04 8:03 ` Chris Murphy
2020-05-04 8:12 ` Chris Murphy
@ 2020-05-04 8:13 ` Nikolay Borisov
2020-05-04 8:25 ` Chris Murphy
1 sibling, 1 reply; 14+ messages in thread
From: Nikolay Borisov @ 2020-05-04 8:13 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS
On 4.05.20 г. 11:03 ч., Chris Murphy wrote:
> receive (rw,noatime,seclabel,compress-force=zstd:5,space_cache=v2,subvolid=5,subvol=/)
> send (rw,noatime,seclabel,compress=zstd:1,nossd,notreelog,space_cache=v2,subvolid=5,subvol=/)
>
> Both are on dm-crypt.
>
> perf top -g -U consumes about 85% CPU according to top, and every time
> I run it, the btrfs send performance *increases*. When I cancel this
> perf top command, it returns to the slower performance. Curious.
>
Well this still doesn't show the stack traces, after all the + sign
means you can expand that (with the 'e' key). But looking at this I
don't see any particular lock contention - just compression-related
functions.
> Samples
> Children Self Shared Object Symbol
> + 40.56% 0.09% [kernel] [k] entry_SYSCALL_64_after_hwframe
> + 40.30% 0.07% [kernel] [k] do_syscall_64
> + 27.88% 0.00% [kernel] [k] ret_from_fork
> + 27.88% 0.00% [kernel] [k] kthread
> + 27.20% 0.02% [kernel] [k] process_one_work
> + 25.69% 0.01% [kernel] [k] worker_thread
> + 22.05% 0.01% [kernel] [k] btrfs_work_helper
> + 19.01% 0.01% [kernel] [k] __x64_sys_splice
> + 18.98% 2.02% [kernel] [k] do_splice
> + 14.70% 0.00% [kernel] [k] async_cow_start
> + 14.70% 0.00% [kernel] [k] compress_file_range
> + 14.40% 0.00% [kernel] [k] temp_notif.12+0x4
> + 13.20% 0.00% [kernel] [k] ZSTD_compressContinue_internal
> + 13.14% 0.00% [kernel] [k] garp_protos+0x35
> + 12.99% 0.01% [kernel] [k] nft_request_module
> + 12.95% 0.01% [kernel] [k] ZSTD_compressStream
> + 11.95% 0.01% [kernel] [k] ksys_read
> + 11.87% 0.02% [kernel] [k] vfs_read
> + 11.76% 11.45% [kernel] [k] fuse_perform_write
> + 11.26% 0.02% [kernel] [k] do_idle
> + 10.80% 0.01% [kernel] [k] cpuidle_enter
> + 10.79% 0.01% [kernel] [k] cpuidle_enter_state
> + 10.54% 10.54% [kernel] [k] mwait_idle_with_hints.constprop.0
> + 10.33% 0.00% libc-2.31.so [.] splice
> + 7.93% 7.59% [kernel] [k] mutex_unlock
> + 7.38% 0.01% [kernel] [k] new_sync_read
> + 7.36% 0.10% [kernel] [k] pipe_read
> + 6.46% 0.07% [kernel] [k] __mutex_lock.constprop.0
> + 6.34% 6.23% [kernel] [k] mutex_spin_on_owner
> + 6.20% 0.00% [kernel] [k] secondary_startup_64
> + 6.20% 0.00% [kernel] [k] cpu_startup_entry
> + 6.05% 3.89% [kernel] [k] mutex_lock
> + 5.78% 0.00% [kernel] [k] intel_idle
> + 5.53% 0.00% [kernel] [k] iwl_mvm_mac_conf_tx
> + 5.43% 0.00% [kernel] [k] run_one_async_start
> + 5.43% 0.00% [kernel] [k] btrfs_submit_bio_start
> + 5.31% 0.06% [kernel] [k] blake2b_update
> + 5.11% 1.39% [kernel] [k] pipe_double_lock
> + 4.28% 0.01% [kernel] [k] proc_reg_read
> + 4.27% 0.09% [kernel] [k] seq_read
> For a higher level overview, try: perf top --sort comm,dso
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 5.6, slow send/receive, < 1MB/s
2020-05-04 8:13 ` Nikolay Borisov
@ 2020-05-04 8:25 ` Chris Murphy
2020-05-04 8:43 ` Chris Murphy
0 siblings, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2020-05-04 8:25 UTC (permalink / raw)
To: Nikolay Borisov; +Cc: Chris Murphy, Btrfs BTRFS
On Mon, May 4, 2020 at 2:13 AM Nikolay Borisov <nborisov@suse.com> wrote:
>
>
>
> On 4.05.20 г. 11:03 ч., Chris Murphy wrote:
> > receive (rw,noatime,seclabel,compress-force=zstd:5,space_cache=v2,subvolid=5,subvol=/)
> > send (rw,noatime,seclabel,compress=zstd:1,nossd,notreelog,space_cache=v2,subvolid=5,subvol=/)
> >
> > Both are on dm-crypt.
> >
> > perf top -g -U consumes about 85% CPU according to top, and every time
> > I run it, the btrfs send performance *increases*. When I cancel this
> > perf top command, it returns to the slower performance. Curious.
> >
>
> Well this still doesn't show the stack traces, after all the + sign
> means you can expand that (with the 'e' key). But looking at this I
> don't see any particular lock contention - just compression-related
> functions.
I'm not sure which ones...
Samples
Children Self Shared Object Symbol
- 62.58% 0.10% [kernel] [k]
entry_SYSCALL_64_after_hwframe
◆
- 62.47% entry_SYSCALL_64_after_hwframe
▒
- 62.17% do_syscall_64
▒
- 23.84% ksys_read
▒
- 23.62% vfs_read
▒
- 14.79% proc_reg_read
▒
- seq_read
▒
- 7.07% s_show
▒
- seq_printf
▒
- vsnprintf
▒
1.87% format_decode
▒
1.49% number
▒
0.84% string
▒
0.68% memcpy_erms
▒
- 6.23% s_next
▒
- update_iter
▒
4.49% module_get_kallsym
▒
1.41% kallsyms_expand_symbol.constprop.0
▒
- 0.79% s_start
▒
- update_iter
▒
0.57% module_get_kallsym
▒
- 8.38% new_sync_read
▒
- 8.35% pipe_read
▒
- 6.46% __mutex_lock.constprop.0
▒
6.33% mutex_spin_on_owner
▒
- 0.86% copy_page_to_iter
▒
- 0.78% copyout
▒
0.77% copy_user_enhanced_fast_string
▒
- 17.96% __x64_sys_splice
▒
- 17.92% do_splice
▒
7.80% mutex_unlock
▒
- 4.55% pipe_double_lock
▒
- 2.88% mutex_lock
▒
0.95% _cond_resched
▒
- 2.61% mutex_lock
▒
0.82% _cond_resched
▒
0.52% pipe_unlock
▒
- 9.80% __x64_sys_ioctl
▒
- ksys_ioctl
▒
- 9.79% rpc_populate.constprop.0
▒
For a higher level overview, try: perf top --sort comm,dso
▒
--
Chris Murphy
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 5.6, slow send/receive, < 1MB/s
2020-05-04 8:25 ` Chris Murphy
@ 2020-05-04 8:43 ` Chris Murphy
2020-05-06 5:48 ` Chris Murphy
0 siblings, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2020-05-04 8:43 UTC (permalink / raw)
To: Chris Murphy; +Cc: Nikolay Borisov, Btrfs BTRFS
Samples
Children Self Shared Objec Symbol
- 97.03% 0.01% [kernel] [k] entry_SYSCALL_64_after_hwframe
◆
- 52.00% entry_SYSCALL_64_after_hwframe
▒
- 51.95% do_syscall_64
▒
- 44.40% __x64_sys_splice
▒
- 65.95% do_splice
▒
- 31.93% mutex_unlock
▒
- 16.82% pipe_double_lock
▒
- 10.75% mutex_lock
▒
- 3.90% _cond_resched
▒
1.52% rcu_all_qs
▒
- 3.49% __mutex_lock.constprop.0
▒
3.19% mutex_spin_on_owner
▒
- 9.99% mutex_lock
▒
- 3.30% _cond_resched
▒
1.07% rcu_all_qs
▒
2.16% pipe_unlock
▒
- 1.27% __mutex_lock.constprop.0
▒
1.25% mutex_spin_on_owner
▒
- 11.49% __x64_sys_ioctl
▒
ksys_ioctl
▒
rpc_populate.constprop.0
▒
_btrfs_ioctl_send
▒
- btrfs_ioctl_send
▒
- 12.87% process_extent
▒
- 13.00% send_extent_data
▒
- 4.72% send_cmd
▒
- 4.64% kernel_write
▒
- 4.73% vfs_write
▒
- 4.52% new_sync_write
▒
- 4.83% pipe_write
▒
- 1.89% copy_page_from_iter
▒
- 3.13% _copy_from_iter
▒
2.93% memcpy_erms
▒
- 1.29% __alloc_pages_nodemask
▒
1.05% get_page_from_freelist
▒
- 0.72% schedule
▒
0.77% __schedule
▒
- 2.72% __do_page_cache_readahead
▒
- 2.07% read_pages.isra.0
▒
- 2.05% extent_readpages
▒
For a higher level overview, try: perf top --sort comm,dso
▒
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 5.6, slow send/receive, < 1MB/s
2020-05-04 8:43 ` Chris Murphy
@ 2020-05-06 5:48 ` Chris Murphy
2020-05-06 21:04 ` Chris Murphy
0 siblings, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2020-05-06 5:48 UTC (permalink / raw)
To: Chris Murphy; +Cc: Nikolay Borisov, Btrfs BTRFS
btrfs-progs-5.6-1.fc32.x86_64
kernel-5.6.8-300.fc32.x86_64
I've created a new file system, filled it with ~600G, set the seed
flag, and I'm creating a sprout by adding a 2nd device and removing
the 1st. This used to go at device speeds. Today it's running at
~35M/s where I expect 95-120M/s. I don't know that it's related to
send/receive performance.
The remove process with 'perf top -g -U' is only using 7% CPU, unlike
the send/receive cases.
Samples
Children Self Shared Objec Symbol
- 82.39% 0.04% [kernel] [k] relocate_block_group
- 17.88% relocate_block_group
- 7.78% prepare_to_relocate
- 7.21% btrfs_commit_transaction
- 7.51% btrfs_run_delayed_refs
- __btrfs_run_delayed_refs
- 3.19% __tracepoint_kvm_mmu_spte_requested
- 3.15% pipapo_get.constprop.0
- 3.57% __add_to_free_space_tree
3.61% btrfs_del_items
1.37% btrfs_insert_empty_items
- 1.41% __tracepoint_kvm_entry
0.92% __tracepoint_kvm_apic_accept_irq
- 6.41% relocate_data_extent
- 7.52% relocate_file_extent_cluster
- 2.35% __do_page_cache_readahead
- 1.70% read_pages.isra.0
- 2.03% extent_readpages
1.32% __do_readpage
0.93% __alloc_pages_nodemask
- 1.67% clear_extent_bit
1.92% __clear_extent_bit
- 1.40% lock_extent_bits
1.59% __set_extent_bit
1.19% btrfs_delalloc_reserve_metadata
1.15% __set_page_dirty_nobuffers
- 2.23% prepare_to_merge
- 1.84% btrfs_commit_transaction
- 2.15% btrfs_run_delayed_refs
- __btrfs_run_delayed_refs
1.47% __btrfs_inc_extent_ref.isra.0
+ 38.75% 0.47% [kernel] [k] __btrfs_run_delayed_refs
+ 32.98% 0.21% [kernel] [k] relocate_file_extent_cluster
+ 29.88% 0.00% [kernel] [k] relocate_data_extent
+ 21.63% 0.00% libc-2.31.so [.] __GI___ioctl
+ 21.46% 0.00% [kernel] [k] entry_SYSCALL_64_after_hwframe
+ 21.46% 0.00% [kernel] [k] do_syscall_64
+ 21.46% 0.00% [kernel] [k] __x64_sys_ioctl
+ 21.46% 0.00% [kernel] [k] ksys_ioctl
+ 21.46% 0.00% [kernel] [k] rpc_info_open
+ 21.46% 0.00% [kernel] [k] btrfs_rm_device
For a higher level overview, try: perf top --sort comm,dso
And also running perf doesn't improve performance, which it does
rather remarkably for send/receive.
--
Chris Murphy
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 5.6, slow send/receive, < 1MB/s
2020-05-06 5:48 ` Chris Murphy
@ 2020-05-06 21:04 ` Chris Murphy
2020-05-06 21:07 ` Chris Murphy
0 siblings, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2020-05-06 21:04 UTC (permalink / raw)
To: Chris Murphy; +Cc: Nikolay Borisov, Btrfs BTRFS
On Tue, May 5, 2020 at 11:48 PM Chris Murphy <lists@colorremedies.com> wrote:
>
> btrfs-progs-5.6-1.fc32.x86_64
> kernel-5.6.8-300.fc32.x86_64
>
> I've created a new file system, filled it with ~600G, set the seed
> flag, and I'm creating a sprout by adding a 2nd device and removing
> the 1st. This used to go at device speeds. Today it's running at
> ~35M/s where I expect 95-120M/s. I don't know that it's related to
> send/receive performance.
Same brand new file system, and a second new file system, doing
send/receive with a single snapshot (no other snapshots or subvolumes
related to it) and I'm seeing this same abysmal performance. ~5MB/s
send. The instant I do 'perf top -g -U' on the send process, the send
performance increases 10x to over 50MB/s. As soon as I quit perf top,
abysmal performance. Both of these drives scrub at ~100M/s. I'm not
sure what's going on or how to get more information but it's a
regression.
I'll attach a few perf top text files to the next email.
--
Chris Murphy
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 5.6, slow send/receive, < 1MB/s
2020-05-06 21:04 ` Chris Murphy
@ 2020-05-06 21:07 ` Chris Murphy
2020-05-23 16:11 ` Jorge Bastos
0 siblings, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2020-05-06 21:07 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS
[-- Attachment #1: Type: text/plain, Size: 1324 bytes --]
On Wed, May 6, 2020 at 3:04 PM Chris Murphy <lists@colorremedies.com> wrote:
>
> On Tue, May 5, 2020 at 11:48 PM Chris Murphy <lists@colorremedies.com> wrote:
> >
> > btrfs-progs-5.6-1.fc32.x86_64
> > kernel-5.6.8-300.fc32.x86_64
> >
> > I've created a new file system, filled it with ~600G, set the seed
> > flag, and I'm creating a sprout by adding a 2nd device and removing
> > the 1st. This used to go at device speeds. Today it's running at
> > ~35M/s where I expect 95-120M/s. I don't know that it's related to
> > send/receive performance.
>
> Same brand new file system, and a second new file system, doing
> send/receive with a single snapshot (no other snapshots or subvolumes
> related to it) and I'm seeing this same abysmal performance. ~5MB/s
> send. The instant I do 'perf top -g -U' on the send process, the send
> performance increases 10x to over 50MB/s. As soon as I quit perf top,
> abysmal performance. Both of these drives scrub at ~100M/s. I'm not
> sure what's going on or how to get more information but it's a
> regression.
>
> I'll attach a few perf top text files to the next email.
[chris@fnuc ~]$ sudo perf top -g -U -p 263438 -d 10
The pid is for the 'btrfs send' process. Both btrfs send and receive
processes are using nearly 100% CPU which itself just seems
excessive.
--
Chris Murphy
[-- Attachment #2: perftopsend1c.txt --]
[-- Type: text/plain, Size: 3377 bytes --]
Samples
Children Self Shared Object Symbol
+ 96.31% 0.02% [kernel] [k] entry_SYSCALL_64_after_hwframe
+ 96.19% 0.00% [kernel] [k] do_syscall_64
+ 72.48% 0.00% libc-2.31.so [.] splice
+ 71.99% 0.01% [kernel] [k] __x64_sys_splice
+ 71.93% 6.54% [kernel] [k] do_splice
+ 34.57% 33.41% [kernel] [k] mutex_unlock
+ 27.52% 0.00% libc-2.31.so [.] __GI___ioctl
+ 24.20% 0.00% [kernel] [k] __x64_sys_ioctl
+ 24.20% 0.00% [kernel] [k] ksys_ioctl
+ 24.20% 0.00% [kernel] [k] rpc_populate.constprop.0
+ 24.20% 0.00% [kernel] [k] _btrfs_ioctl_send
+ 24.20% 0.03% [kernel] [k] btrfs_ioctl_send
+ 23.96% 0.03% [kernel] [k] process_extent
+ 21.72% 3.73% [kernel] [k] send_extent_data
+ 19.06% 12.52% [kernel] [k] mutex_lock
+ 17.62% 3.83% [kernel] [k] pipe_double_lock
+ 8.65% 0.06% [kernel] [k] send_cmd
+ 8.52% 0.02% [kernel] [k] kernel_write
+ 8.49% 0.05% [kernel] [k] vfs_write
+ 8.12% 0.07% [kernel] [k] new_sync_write
+ 8.05% 0.28% [kernel] [k] pipe_write
+ 6.50% 3.82% [kernel] [k] _cond_resched
+ 5.70% 0.27% [kernel] [k] __mutex_lock.constprop.0
+ 5.39% 0.02% [kernel] [k] __do_page_cache_readahead
+ 5.34% 5.24% [kernel] [k] mutex_spin_on_owner
+ 5.15% 5.00% [kernel] [k] memcpy_erms
+ 4.13% 0.00% [kernel] [k] read_pages.isra.0
+ 4.00% 0.05% [kernel] [k] extent_readpages
+ 3.40% 0.11% [kernel] [k] copy_page_from_iter
+ 3.32% 0.12% [kernel] [k] _copy_from_iter
+ 2.97% 0.24% [kernel] [k] __alloc_pages_nodemask
+ 2.79% 2.13% [kernel] [k] rcu_all_qs
+ 2.50% 1.92% [kernel] [k] pipe_unlock
+ 2.31% 1.07% [kernel] [k] get_page_from_freelist
+ 1.95% 0.02% [kernel] [k] tlv_put
+ 1.90% 0.23% [kernel] [k] __do_readpage
+ 1.78% 0.10% [kernel] [k] __schedule
+ 1.77% 0.00% [kernel] [k] iterate_extent_inodes
+ 1.74% 0.03% [kernel] [k] schedule
+ 1.48% 0.01% [kernel] [k] apic_timer_interrupt
Press '?' for help on key bindings
[-- Attachment #3: perftopsend3c.txt --]
[-- Type: text/plain, Size: 3373 bytes --]
Samples
Children Self Shared Objec Symbol
- 1.64% tlv_put
1.62% memcpy_erms
- 0.75% iterate_extent_inodes
+ 96.66% 0.01% [kernel] [k] do_syscall_64
+ 74.70% 0.01% [kernel] [k] __x64_sys_splice
- 74.64% 7.46% [kernel] [k] do_splice
- 52.97% do_splice
32.61% mutex_unlock
- 14.18% pipe_double_lock
- 9.97% mutex_lock
- 3.59% _cond_resched
1.30% rcu_all_qs
- 3.62% __mutex_lock.constprop.0
3.45% mutex_spin_on_owner
- 9.01% mutex_lock
- 3.12% _cond_resched
1.15% rcu_all_qs
- 1.59% __mutex_lock.constprop.0
1.56% mutex_spin_on_owner
1.51% pipe_unlock
- 7.46% splice
entry_SYSCALL_64_after_hwframe
do_syscall_64
__x64_sys_splice
- do_splice
- 33.71% 32.42% [kernel] [k] mutex_unlock
- 32.39% splice
entry_SYSCALL_64_after_hwframe
do_syscall_64
__x64_sys_splice
- do_splice
31.67% mutex_unlock
0.95% mutex_unlock
+ 25.11% 0.00% libc-2.31.so [.] splice
+ 21.95% 0.02% [kernel] [k] btrfs_ioctl_send
+ 21.74% 0.03% [kernel] [k] process_extent
+ 21.28% 13.95% [kernel] [k] mutex_lock
+ 19.60% 3.29% [kernel] [k] send_extent_data
+ 19.29% 4.38% [kernel] [k] pipe_double_lock
+ 9.27% 0.00% libc-2.31.so [.] __GI___ioctl
Press '?' for help on key bindings
[-- Attachment #4: perftopsend2c.txt --]
[-- Type: text/plain, Size: 3373 bytes --]
Samples
Children Self Shared Objec Symbol
- 96.85% 0.02% [kernel] [k] entry_SYSCALL_64_after_hwframe
- 30.54% entry_SYSCALL_64_after_hwframe
- 30.53% do_syscall_64
- 29.04% __x64_sys_splice
- 60.72% do_splice
32.47% mutex_unlock
- 14.37% pipe_double_lock
- 10.21% mutex_lock
- 3.72% _cond_resched
1.33% rcu_all_qs
- 3.39% __mutex_lock.constprop.0
3.20% mutex_spin_on_owner
- 9.35% mutex_lock
- 3.25% _cond_resched
1.16% rcu_all_qs
2.14% pipe_unlock
- 1.58% __mutex_lock.constprop.0
1.56% mutex_spin_on_owner
- 7.32% __x64_sys_ioctl
ksys_ioctl
rpc_populate.constprop.0
_btrfs_ioctl_send
- btrfs_ioctl_send
- 9.39% process_extent
- 8.81% send_extent_data
- 2.69% send_cmd
- 2.64% kernel_write
- 2.71% vfs_write
- 2.68% new_sync_write
- 3.14% pipe_write
- 1.18% copy_page_from_iter
- 2.88% _copy_from_iter
2.72% memcpy_erms
- 0.93% __alloc_pages_nodemask
- 0.98% get_page_from_freelist
- 1.75% __do_page_cache_readahead
- 1.33% read_pages.isra.0
- 1.49% extent_readpages
- 0.90% __do_readpage
- 0.66% __alloc_pages_nodemask
Press '?' for help on key bindings
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 5.6, slow send/receive, < 1MB/s
2020-05-06 21:07 ` Chris Murphy
@ 2020-05-23 16:11 ` Jorge Bastos
2020-05-23 16:42 ` A L
0 siblings, 1 reply; 14+ messages in thread
From: Jorge Bastos @ 2020-05-23 16:11 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS
Hi there,
I'm also having much lower than normal send/receive performance since
upgrading to kernel 5.6 (currently kernel 5.6.13 - btrfs-progs 5.6),
there's also a very high CPU utilization during it, anything that can
be done for now?
Regards,
Jorge Bastos
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 5.6, slow send/receive, < 1MB/s
2020-05-23 16:11 ` Jorge Bastos
@ 2020-05-23 16:42 ` A L
2020-06-06 16:16 ` Jorge Bastos
0 siblings, 1 reply; 14+ messages in thread
From: A L @ 2020-05-23 16:42 UTC (permalink / raw)
To: Btrfs BTRFS
---- From: Jorge Bastos <jorge.mrbastos@gmail.com> -- Sent: 2020-05-23 - 18:11 ----
> Hi there,
>
> I'm also having much lower than normal send/receive performance since
> upgrading to kernel 5.6 (currently kernel 5.6.13 - btrfs-progs 5.6),
> there's also a very high CPU utilization during it, anything that can
> be done for now?
>
> Regards,
> Jorge Bastos
One possibility is to add "mbuffer" in the pipeline. It is async compared to "pv". This helps quite a bit on spinning HDD's.
# btrfs send /some/subvol | mbuffer | btrfs receive /other/destination
PS. There was a possibility of a bug in <=mbuffer-20150412 so don't use these old versions.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 5.6, slow send/receive, < 1MB/s
2020-05-23 16:42 ` A L
@ 2020-06-06 16:16 ` Jorge Bastos
2020-06-06 16:24 ` Jorge Bastos
0 siblings, 1 reply; 14+ messages in thread
From: Jorge Bastos @ 2020-06-06 16:16 UTC (permalink / raw)
To: A L; +Cc: Btrfs BTRFS
Hi there,
Happy to report my issue is not related to btrfs, since I remembered
reading this post and noticed performance issues after upgrading to
kernel 5.6 wrongly assumed it was related, my issue is caused by pv,
for some reason sometimes it starts using a lot of CPU and the
send/receive barely crawls along, but it's not always, i.e.same
send/receive can go at 250MB/s one time and the next time it will
transfer at less than 10Mb/s, anyway sorry to waste everyone's time.
Regards,
Jorge Bastos
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 5.6, slow send/receive, < 1MB/s
2020-06-06 16:16 ` Jorge Bastos
@ 2020-06-06 16:24 ` Jorge Bastos
0 siblings, 0 replies; 14+ messages in thread
From: Jorge Bastos @ 2020-06-06 16:24 UTC (permalink / raw)
To: A L; +Cc: Btrfs BTRFS
Well forget that, it's not pv, it's doing it again without it, will
need to do more testing.
On Sat, Jun 6, 2020 at 5:16 PM Jorge Bastos <jorge.mrbastos@gmail.com> wrote:
>
> Hi there,
>
> Happy to report my issue is not related to btrfs, since I remembered
> reading this post and noticed performance issues after upgrading to
> kernel 5.6 wrongly assumed it was related, my issue is caused by pv,
> for some reason sometimes it starts using a lot of CPU and the
> send/receive barely crawls along, but it's not always, i.e.same
> send/receive can go at 250MB/s one time and the next time it will
> transfer at less than 10Mb/s, anyway sorry to waste everyone's time.
>
> Regards,
> Jorge Bastos
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-06-06 16:24 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-04 7:43 5.6, slow send/receive, < 1MB/s Chris Murphy
2020-05-04 7:47 ` Nikolay Borisov
2020-05-04 8:03 ` Chris Murphy
2020-05-04 8:12 ` Chris Murphy
2020-05-04 8:13 ` Nikolay Borisov
2020-05-04 8:25 ` Chris Murphy
2020-05-04 8:43 ` Chris Murphy
2020-05-06 5:48 ` Chris Murphy
2020-05-06 21:04 ` Chris Murphy
2020-05-06 21:07 ` Chris Murphy
2020-05-23 16:11 ` Jorge Bastos
2020-05-23 16:42 ` A L
2020-06-06 16:16 ` Jorge Bastos
2020-06-06 16:24 ` Jorge Bastos
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).