linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 5.6, slow send/receive, < 1MB/s
@ 2020-05-04  7:43 Chris Murphy
  2020-05-04  7:47 ` Nikolay Borisov
  2020-05-04  8:03 ` Chris Murphy
  0 siblings, 2 replies; 14+ messages in thread
From: Chris Murphy @ 2020-05-04  7:43 UTC (permalink / raw)
  To: Btrfs BTRFS

btrfs-progs-5.6-1.fc32.x86_64
kernel-5.6.8-300.fc32.x86_64

I have one snapshot of a subvolume being sent from one Btrfs
filesystem to another. Both are single device. The HDD scrub
performance is ~90MB/s. The send/receive is averaging less than 2MB/s.
Upon attaching strace to the send PID, it dies.

# btrfs send /mnt/first/scratch/gits.20200503/ | btrfs receive /mnt/aviat
ERROR: command length 1718776930 too big for buffer 65536
#

$ sudo strace -p 5628
strace: Process 5628 attached
ioctl(4, BTRFS_IOC_SEND, {send_fd=6, clone_sources_count=0,
clone_sources=NULL, parent
_root=0, flags=0}) = ?
+++ killed by SIGPIPE +++

During the send/receive, top reports both of them using 90%+ CPU

rsync averages 65MBs

Another subvolume, also one snapshot, has long periods of time (1/2
hour) where the transfer rate is never above 500K/s, but both the
btrfs send and btrfs receive PIDS are pinned at 99%+ CPU.

perf top during high btrfs command CPU consumption and low IO throughput

  34.01%  [kernel]                                [k]
mutex_spin_on_owner
  28.35%  [kernel]                                [k] mutex_unlock
    8.05%  [kernel]                                [k]
mwait_idle_with_hints.constprop.
    5.10%  [kernel]                                [k] mutex_lock

perf top during moderate btrfs command CPU consumption and high IO throughput

 15.16%  [kernel]                                [k]
fuse_perform_write
 10.27%  [kernel]                                [k]
mwait_idle_with_hints.constprop.
   7.97%  [kernel]                                [k] mutex_unlock
   7.35%  [kernel]                                [k]
mutex_spin_on_owner
    5.75%  btrfs                                   [.] __crc32c_le


I suspect many fragments for some files when performance is slow, and
somehow btrfs send is more adversely affected by this than cp or
rsync.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.6, slow send/receive, < 1MB/s
  2020-05-04  7:43 5.6, slow send/receive, < 1MB/s Chris Murphy
@ 2020-05-04  7:47 ` Nikolay Borisov
  2020-05-04  8:03 ` Chris Murphy
  1 sibling, 0 replies; 14+ messages in thread
From: Nikolay Borisov @ 2020-05-04  7:47 UTC (permalink / raw)
  To: Chris Murphy, Btrfs BTRFS



On 4.05.20 г. 10:43 ч., Chris Murphy wrote:
> btrfs-progs-5.6-1.fc32.x86_64
> kernel-5.6.8-300.fc32.x86_64
> 
> I have one snapshot of a subvolume being sent from one Btrfs
> filesystem to another. Both are single device. The HDD scrub
> performance is ~90MB/s. The send/receive is averaging less than 2MB/s.
> Upon attaching strace to the send PID, it dies.
> 
> # btrfs send /mnt/first/scratch/gits.20200503/ | btrfs receive /mnt/aviat
> ERROR: command length 1718776930 too big for buffer 65536
> #
> 
> $ sudo strace -p 5628
> strace: Process 5628 attached
> ioctl(4, BTRFS_IOC_SEND, {send_fd=6, clone_sources_count=0,
> clone_sources=NULL, parent
> _root=0, flags=0}) = ?
> +++ killed by SIGPIPE +++
> 
> During the send/receive, top reports both of them using 90%+ CPU
> 
> rsync averages 65MBs
> 
> Another subvolume, also one snapshot, has long periods of time (1/2
> hour) where the transfer rate is never above 500K/s, but both the
> btrfs send and btrfs receive PIDS are pinned at 99%+ CPU.
> 
> perf top during high btrfs command CPU consumption and low IO throughput
> 
>   34.01%  [kernel]                                [k]
> mutex_spin_on_owner
>   28.35%  [kernel]                                [k] mutex_unlock
>     8.05%  [kernel]                                [k]
> mwait_idle_with_hints.constprop.
>     5.10%  [kernel]                                [k] mutex_lock

That doesn't say which lock would be the contended one. Could you
perhahps run with perf top -g -U and see what is the callchain.

> 
> perf top during moderate btrfs command CPU consumption and high IO throughput
> 
>  15.16%  [kernel]                                [k]
> fuse_perform_write
>  10.27%  [kernel]                                [k]
> mwait_idle_with_hints.constprop.
>    7.97%  [kernel]                                [k] mutex_unlock
>    7.35%  [kernel]                                [k]
> mutex_spin_on_owner
>     5.75%  btrfs                                   [.] __crc32c_le
> 
> 
> I suspect many fragments for some files when performance is slow, and
> somehow btrfs send is more adversely affected by this than cp or
> rsync.
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.6, slow send/receive, < 1MB/s
  2020-05-04  7:43 5.6, slow send/receive, < 1MB/s Chris Murphy
  2020-05-04  7:47 ` Nikolay Borisov
@ 2020-05-04  8:03 ` Chris Murphy
  2020-05-04  8:12   ` Chris Murphy
  2020-05-04  8:13   ` Nikolay Borisov
  1 sibling, 2 replies; 14+ messages in thread
From: Chris Murphy @ 2020-05-04  8:03 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

receive (rw,noatime,seclabel,compress-force=zstd:5,space_cache=v2,subvolid=5,subvol=/)
send    (rw,noatime,seclabel,compress=zstd:1,nossd,notreelog,space_cache=v2,subvolid=5,subvol=/)

Both are on dm-crypt.

perf top -g -U consumes about 85% CPU according to top, and every time
I run it, the btrfs send performance *increases*. When I cancel this
perf top command, it returns to the slower performance. Curious.

Samples
  Children      Self  Shared Object        Symbol
+   40.56%     0.09%  [kernel]             [k] entry_SYSCALL_64_after_hwframe
+   40.30%     0.07%  [kernel]             [k] do_syscall_64
+   27.88%     0.00%  [kernel]             [k] ret_from_fork
+   27.88%     0.00%  [kernel]             [k] kthread
+   27.20%     0.02%  [kernel]             [k] process_one_work
+   25.69%     0.01%  [kernel]             [k] worker_thread
+   22.05%     0.01%  [kernel]             [k] btrfs_work_helper
+   19.01%     0.01%  [kernel]             [k] __x64_sys_splice
+   18.98%     2.02%  [kernel]             [k] do_splice
+   14.70%     0.00%  [kernel]             [k] async_cow_start
+   14.70%     0.00%  [kernel]             [k] compress_file_range
+   14.40%     0.00%  [kernel]             [k] temp_notif.12+0x4
+   13.20%     0.00%  [kernel]             [k] ZSTD_compressContinue_internal
+   13.14%     0.00%  [kernel]             [k] garp_protos+0x35
+   12.99%     0.01%  [kernel]             [k] nft_request_module
+   12.95%     0.01%  [kernel]             [k] ZSTD_compressStream
+   11.95%     0.01%  [kernel]             [k] ksys_read
+   11.87%     0.02%  [kernel]             [k] vfs_read
+   11.76%    11.45%  [kernel]             [k] fuse_perform_write
+   11.26%     0.02%  [kernel]             [k] do_idle
+   10.80%     0.01%  [kernel]             [k] cpuidle_enter
+   10.79%     0.01%  [kernel]             [k] cpuidle_enter_state
+   10.54%    10.54%  [kernel]             [k] mwait_idle_with_hints.constprop.0
+   10.33%     0.00%  libc-2.31.so         [.] splice
+    7.93%     7.59%  [kernel]             [k] mutex_unlock
+    7.38%     0.01%  [kernel]             [k] new_sync_read
+    7.36%     0.10%  [kernel]             [k] pipe_read
+    6.46%     0.07%  [kernel]             [k] __mutex_lock.constprop.0
+    6.34%     6.23%  [kernel]             [k] mutex_spin_on_owner
+    6.20%     0.00%  [kernel]             [k] secondary_startup_64
+    6.20%     0.00%  [kernel]             [k] cpu_startup_entry
+    6.05%     3.89%  [kernel]             [k] mutex_lock
+    5.78%     0.00%  [kernel]             [k] intel_idle
+    5.53%     0.00%  [kernel]             [k] iwl_mvm_mac_conf_tx
+    5.43%     0.00%  [kernel]             [k] run_one_async_start
+    5.43%     0.00%  [kernel]             [k] btrfs_submit_bio_start
+    5.31%     0.06%  [kernel]             [k] blake2b_update
+    5.11%     1.39%  [kernel]             [k] pipe_double_lock
+    4.28%     0.01%  [kernel]             [k] proc_reg_read
+    4.27%     0.09%  [kernel]             [k] seq_read
For a higher level overview, try: perf top --sort comm,dso

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.6, slow send/receive, < 1MB/s
  2020-05-04  8:03 ` Chris Murphy
@ 2020-05-04  8:12   ` Chris Murphy
  2020-05-04  8:13   ` Nikolay Borisov
  1 sibling, 0 replies; 14+ messages in thread
From: Chris Murphy @ 2020-05-04  8:12 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

So far it's consistently improving the performance by 3-6x just by
running perf top -g -U.


---
Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.6, slow send/receive, < 1MB/s
  2020-05-04  8:03 ` Chris Murphy
  2020-05-04  8:12   ` Chris Murphy
@ 2020-05-04  8:13   ` Nikolay Borisov
  2020-05-04  8:25     ` Chris Murphy
  1 sibling, 1 reply; 14+ messages in thread
From: Nikolay Borisov @ 2020-05-04  8:13 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS



On 4.05.20 г. 11:03 ч., Chris Murphy wrote:
> receive (rw,noatime,seclabel,compress-force=zstd:5,space_cache=v2,subvolid=5,subvol=/)
> send    (rw,noatime,seclabel,compress=zstd:1,nossd,notreelog,space_cache=v2,subvolid=5,subvol=/)
> 
> Both are on dm-crypt.
> 
> perf top -g -U consumes about 85% CPU according to top, and every time
> I run it, the btrfs send performance *increases*. When I cancel this
> perf top command, it returns to the slower performance. Curious.
> 

Well this still doesn't show the stack traces, after all the + sign
means you can expand that (with the 'e' key). But looking at this I
don't see any particular lock contention - just compression-related
functions.

> Samples
>   Children      Self  Shared Object        Symbol
> +   40.56%     0.09%  [kernel]             [k] entry_SYSCALL_64_after_hwframe
> +   40.30%     0.07%  [kernel]             [k] do_syscall_64
> +   27.88%     0.00%  [kernel]             [k] ret_from_fork
> +   27.88%     0.00%  [kernel]             [k] kthread
> +   27.20%     0.02%  [kernel]             [k] process_one_work
> +   25.69%     0.01%  [kernel]             [k] worker_thread
> +   22.05%     0.01%  [kernel]             [k] btrfs_work_helper
> +   19.01%     0.01%  [kernel]             [k] __x64_sys_splice
> +   18.98%     2.02%  [kernel]             [k] do_splice
> +   14.70%     0.00%  [kernel]             [k] async_cow_start
> +   14.70%     0.00%  [kernel]             [k] compress_file_range
> +   14.40%     0.00%  [kernel]             [k] temp_notif.12+0x4
> +   13.20%     0.00%  [kernel]             [k] ZSTD_compressContinue_internal
> +   13.14%     0.00%  [kernel]             [k] garp_protos+0x35
> +   12.99%     0.01%  [kernel]             [k] nft_request_module
> +   12.95%     0.01%  [kernel]             [k] ZSTD_compressStream
> +   11.95%     0.01%  [kernel]             [k] ksys_read
> +   11.87%     0.02%  [kernel]             [k] vfs_read
> +   11.76%    11.45%  [kernel]             [k] fuse_perform_write
> +   11.26%     0.02%  [kernel]             [k] do_idle
> +   10.80%     0.01%  [kernel]             [k] cpuidle_enter
> +   10.79%     0.01%  [kernel]             [k] cpuidle_enter_state
> +   10.54%    10.54%  [kernel]             [k] mwait_idle_with_hints.constprop.0
> +   10.33%     0.00%  libc-2.31.so         [.] splice
> +    7.93%     7.59%  [kernel]             [k] mutex_unlock
> +    7.38%     0.01%  [kernel]             [k] new_sync_read
> +    7.36%     0.10%  [kernel]             [k] pipe_read
> +    6.46%     0.07%  [kernel]             [k] __mutex_lock.constprop.0
> +    6.34%     6.23%  [kernel]             [k] mutex_spin_on_owner
> +    6.20%     0.00%  [kernel]             [k] secondary_startup_64
> +    6.20%     0.00%  [kernel]             [k] cpu_startup_entry
> +    6.05%     3.89%  [kernel]             [k] mutex_lock
> +    5.78%     0.00%  [kernel]             [k] intel_idle
> +    5.53%     0.00%  [kernel]             [k] iwl_mvm_mac_conf_tx
> +    5.43%     0.00%  [kernel]             [k] run_one_async_start
> +    5.43%     0.00%  [kernel]             [k] btrfs_submit_bio_start
> +    5.31%     0.06%  [kernel]             [k] blake2b_update
> +    5.11%     1.39%  [kernel]             [k] pipe_double_lock
> +    4.28%     0.01%  [kernel]             [k] proc_reg_read
> +    4.27%     0.09%  [kernel]             [k] seq_read
> For a higher level overview, try: perf top --sort comm,dso
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.6, slow send/receive, < 1MB/s
  2020-05-04  8:13   ` Nikolay Borisov
@ 2020-05-04  8:25     ` Chris Murphy
  2020-05-04  8:43       ` Chris Murphy
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2020-05-04  8:25 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Chris Murphy, Btrfs BTRFS

On Mon, May 4, 2020 at 2:13 AM Nikolay Borisov <nborisov@suse.com> wrote:
>
>
>
> On 4.05.20 г. 11:03 ч., Chris Murphy wrote:
> > receive (rw,noatime,seclabel,compress-force=zstd:5,space_cache=v2,subvolid=5,subvol=/)
> > send    (rw,noatime,seclabel,compress=zstd:1,nossd,notreelog,space_cache=v2,subvolid=5,subvol=/)
> >
> > Both are on dm-crypt.
> >
> > perf top -g -U consumes about 85% CPU according to top, and every time
> > I run it, the btrfs send performance *increases*. When I cancel this
> > perf top command, it returns to the slower performance. Curious.
> >
>
> Well this still doesn't show the stack traces, after all the + sign
> means you can expand that (with the 'e' key). But looking at this I
> don't see any particular lock contention - just compression-related
> functions.

I'm not sure which ones...

Samples
  Children      Self  Shared Object       Symbol
-   62.58%     0.10%  [kernel]            [k]
entry_SYSCALL_64_after_hwframe
                                                       ◆
   - 62.47% entry_SYSCALL_64_after_hwframe

                              ▒
      - 62.17% do_syscall_64

                              ▒
         - 23.84% ksys_read

                              ▒
            - 23.62% vfs_read

                              ▒
               - 14.79% proc_reg_read

                              ▒
                  - seq_read

                              ▒
                     - 7.07% s_show

                              ▒
                        - seq_printf

                              ▒
                           - vsnprintf

                              ▒
                                1.87% format_decode

                              ▒
                                1.49% number

                              ▒
                                0.84% string

                              ▒
                                0.68% memcpy_erms

                              ▒
                     - 6.23% s_next

                              ▒
                        - update_iter

                              ▒
                             4.49% module_get_kallsym

                              ▒
                             1.41% kallsyms_expand_symbol.constprop.0

                              ▒
                     - 0.79% s_start

                              ▒
                        - update_iter

                              ▒
                             0.57% module_get_kallsym

                              ▒
               - 8.38% new_sync_read

                              ▒
                  - 8.35% pipe_read

                              ▒
                     - 6.46% __mutex_lock.constprop.0

                              ▒
                          6.33% mutex_spin_on_owner

                              ▒
                     - 0.86% copy_page_to_iter

                              ▒
                        - 0.78% copyout

                              ▒
                             0.77% copy_user_enhanced_fast_string

                              ▒
         - 17.96% __x64_sys_splice

                              ▒
            - 17.92% do_splice

                              ▒
                 7.80% mutex_unlock

                              ▒
               - 4.55% pipe_double_lock

                              ▒
                  - 2.88% mutex_lock

                              ▒
                       0.95% _cond_resched

                              ▒
               - 2.61% mutex_lock

                              ▒
                    0.82% _cond_resched

                              ▒
                 0.52% pipe_unlock

                              ▒
         - 9.80% __x64_sys_ioctl

                              ▒
            - ksys_ioctl

                              ▒
               - 9.79% rpc_populate.constprop.0

                              ▒
For a higher level overview, try: perf top --sort comm,dso

                              ▒


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.6, slow send/receive, < 1MB/s
  2020-05-04  8:25     ` Chris Murphy
@ 2020-05-04  8:43       ` Chris Murphy
  2020-05-06  5:48         ` Chris Murphy
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2020-05-04  8:43 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Nikolay Borisov, Btrfs BTRFS

Samples
  Children      Self  Shared Objec  Symbol
-   97.03%     0.01%  [kernel]      [k] entry_SYSCALL_64_after_hwframe

                              ◆
   - 52.00% entry_SYSCALL_64_after_hwframe

                              ▒
      - 51.95% do_syscall_64

                              ▒
         - 44.40% __x64_sys_splice

                              ▒
            - 65.95% do_splice

                              ▒
               - 31.93% mutex_unlock

                              ▒
               - 16.82% pipe_double_lock

                              ▒
                  - 10.75% mutex_lock

                              ▒
                     - 3.90% _cond_resched

                              ▒
                          1.52% rcu_all_qs

                              ▒
                  - 3.49% __mutex_lock.constprop.0

                              ▒
                       3.19% mutex_spin_on_owner

                              ▒
               - 9.99% mutex_lock

                              ▒
                  - 3.30% _cond_resched

                              ▒
                       1.07% rcu_all_qs

                              ▒
                 2.16% pipe_unlock

                              ▒
               - 1.27% __mutex_lock.constprop.0

                              ▒
                    1.25% mutex_spin_on_owner

                              ▒
         - 11.49% __x64_sys_ioctl

                              ▒
              ksys_ioctl

                              ▒
              rpc_populate.constprop.0

                              ▒
              _btrfs_ioctl_send

                              ▒
            - btrfs_ioctl_send

                              ▒
               - 12.87% process_extent

                              ▒
                  - 13.00% send_extent_data

                              ▒
                     - 4.72% send_cmd

                              ▒
                        - 4.64% kernel_write

                              ▒
                           - 4.73% vfs_write

                              ▒
                              - 4.52% new_sync_write

                              ▒
                                 - 4.83% pipe_write

                              ▒
                                    - 1.89% copy_page_from_iter

                              ▒
                                       - 3.13% _copy_from_iter

                              ▒
                                            2.93% memcpy_erms

                              ▒
                                    - 1.29% __alloc_pages_nodemask

                              ▒
                                         1.05% get_page_from_freelist

                              ▒
                                    - 0.72% schedule

                              ▒
                                         0.77% __schedule

                              ▒
                     - 2.72% __do_page_cache_readahead

                              ▒
                        - 2.07% read_pages.isra.0

                              ▒
                           - 2.05% extent_readpages

                              ▒
For a higher level overview, try: perf top --sort comm,dso

                              ▒

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.6, slow send/receive, < 1MB/s
  2020-05-04  8:43       ` Chris Murphy
@ 2020-05-06  5:48         ` Chris Murphy
  2020-05-06 21:04           ` Chris Murphy
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2020-05-06  5:48 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Nikolay Borisov, Btrfs BTRFS

btrfs-progs-5.6-1.fc32.x86_64
kernel-5.6.8-300.fc32.x86_64

I've created a new file system, filled it with ~600G, set the seed
flag, and I'm creating a sprout by adding a 2nd device and removing
the 1st. This used to go at device speeds. Today it's running at
~35M/s where I expect 95-120M/s. I don't know that it's related to
send/receive performance.

The remove process with 'perf top -g -U' is only using 7% CPU, unlike
the send/receive cases.

Samples
  Children      Self  Shared Objec  Symbol
-   82.39%     0.04%  [kernel]      [k] relocate_block_group
   - 17.88% relocate_block_group
      - 7.78% prepare_to_relocate
         - 7.21% btrfs_commit_transaction
            - 7.51% btrfs_run_delayed_refs
               - __btrfs_run_delayed_refs
                  - 3.19% __tracepoint_kvm_mmu_spte_requested
                     - 3.15% pipapo_get.constprop.0
                        - 3.57% __add_to_free_space_tree
                             3.61% btrfs_del_items
                             1.37% btrfs_insert_empty_items
                  - 1.41% __tracepoint_kvm_entry
                    0.92% __tracepoint_kvm_apic_accept_irq
      - 6.41% relocate_data_extent
         - 7.52% relocate_file_extent_cluster
            - 2.35% __do_page_cache_readahead
               - 1.70% read_pages.isra.0
                  - 2.03% extent_readpages
                       1.32% __do_readpage
                 0.93% __alloc_pages_nodemask
            - 1.67% clear_extent_bit
                 1.92% __clear_extent_bit
            - 1.40% lock_extent_bits
                 1.59% __set_extent_bit
              1.19% btrfs_delalloc_reserve_metadata
              1.15% __set_page_dirty_nobuffers
      - 2.23% prepare_to_merge
         - 1.84% btrfs_commit_transaction
            - 2.15% btrfs_run_delayed_refs
               - __btrfs_run_delayed_refs
                    1.47% __btrfs_inc_extent_ref.isra.0
+   38.75%     0.47%  [kernel]      [k] __btrfs_run_delayed_refs
+   32.98%     0.21%  [kernel]      [k] relocate_file_extent_cluster
+   29.88%     0.00%  [kernel]      [k] relocate_data_extent
+   21.63%     0.00%  libc-2.31.so  [.] __GI___ioctl
+   21.46%     0.00%  [kernel]      [k] entry_SYSCALL_64_after_hwframe
+   21.46%     0.00%  [kernel]      [k] do_syscall_64
+   21.46%     0.00%  [kernel]      [k] __x64_sys_ioctl
+   21.46%     0.00%  [kernel]      [k] ksys_ioctl
+   21.46%     0.00%  [kernel]      [k] rpc_info_open
+   21.46%     0.00%  [kernel]      [k] btrfs_rm_device
For a higher level overview, try: perf top --sort comm,dso


And also running perf doesn't improve performance, which it does
rather remarkably for send/receive.


--
Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.6, slow send/receive, < 1MB/s
  2020-05-06  5:48         ` Chris Murphy
@ 2020-05-06 21:04           ` Chris Murphy
  2020-05-06 21:07             ` Chris Murphy
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2020-05-06 21:04 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Nikolay Borisov, Btrfs BTRFS

On Tue, May 5, 2020 at 11:48 PM Chris Murphy <lists@colorremedies.com> wrote:
>
> btrfs-progs-5.6-1.fc32.x86_64
> kernel-5.6.8-300.fc32.x86_64
>
> I've created a new file system, filled it with ~600G, set the seed
> flag, and I'm creating a sprout by adding a 2nd device and removing
> the 1st. This used to go at device speeds. Today it's running at
> ~35M/s where I expect 95-120M/s. I don't know that it's related to
> send/receive performance.

Same brand new file system, and a second new file system, doing
send/receive with a single snapshot (no other snapshots or subvolumes
related to it) and I'm seeing this same abysmal performance. ~5MB/s
send. The instant I do 'perf top -g -U' on the send process, the send
performance increases 10x to over 50MB/s. As soon as I quit perf top,
abysmal performance. Both of these drives scrub at ~100M/s. I'm not
sure what's going on or how to get more information but it's a
regression.

I'll attach a few perf top text files to the next email.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.6, slow send/receive, < 1MB/s
  2020-05-06 21:04           ` Chris Murphy
@ 2020-05-06 21:07             ` Chris Murphy
  2020-05-23 16:11               ` Jorge Bastos
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2020-05-06 21:07 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 1324 bytes --]

On Wed, May 6, 2020 at 3:04 PM Chris Murphy <lists@colorremedies.com> wrote:
>
> On Tue, May 5, 2020 at 11:48 PM Chris Murphy <lists@colorremedies.com> wrote:
> >
> > btrfs-progs-5.6-1.fc32.x86_64
> > kernel-5.6.8-300.fc32.x86_64
> >
> > I've created a new file system, filled it with ~600G, set the seed
> > flag, and I'm creating a sprout by adding a 2nd device and removing
> > the 1st. This used to go at device speeds. Today it's running at
> > ~35M/s where I expect 95-120M/s. I don't know that it's related to
> > send/receive performance.
>
> Same brand new file system, and a second new file system, doing
> send/receive with a single snapshot (no other snapshots or subvolumes
> related to it) and I'm seeing this same abysmal performance. ~5MB/s
> send. The instant I do 'perf top -g -U' on the send process, the send
> performance increases 10x to over 50MB/s. As soon as I quit perf top,
> abysmal performance. Both of these drives scrub at ~100M/s. I'm not
> sure what's going on or how to get more information but it's a
> regression.
>
> I'll attach a few perf top text files to the next email.

[chris@fnuc ~]$ sudo perf top -g -U -p 263438 -d 10

The pid is for the 'btrfs send' process. Both btrfs send and receive
processes are  using nearly 100% CPU which itself just seems
excessive.

-- 
Chris Murphy

[-- Attachment #2: perftopsend1c.txt --]
[-- Type: text/plain, Size: 3377 bytes --]

Samples
  Children      Self  Shared Object     Symbol
+   96.31%     0.02%  [kernel]          [k] entry_SYSCALL_64_after_hwframe      
+   96.19%     0.00%  [kernel]          [k] do_syscall_64                       
+   72.48%     0.00%  libc-2.31.so      [.] splice                              
+   71.99%     0.01%  [kernel]          [k] __x64_sys_splice                    
+   71.93%     6.54%  [kernel]          [k] do_splice                           
+   34.57%    33.41%  [kernel]          [k] mutex_unlock                        
+   27.52%     0.00%  libc-2.31.so      [.] __GI___ioctl                        
+   24.20%     0.00%  [kernel]          [k] __x64_sys_ioctl                     
+   24.20%     0.00%  [kernel]          [k] ksys_ioctl                          
+   24.20%     0.00%  [kernel]          [k] rpc_populate.constprop.0            
+   24.20%     0.00%  [kernel]          [k] _btrfs_ioctl_send                   
+   24.20%     0.03%  [kernel]          [k] btrfs_ioctl_send                    
+   23.96%     0.03%  [kernel]          [k] process_extent                      
+   21.72%     3.73%  [kernel]          [k] send_extent_data                    
+   19.06%    12.52%  [kernel]          [k] mutex_lock                          
+   17.62%     3.83%  [kernel]          [k] pipe_double_lock                    
+    8.65%     0.06%  [kernel]          [k] send_cmd                            
+    8.52%     0.02%  [kernel]          [k] kernel_write                        
+    8.49%     0.05%  [kernel]          [k] vfs_write                           
+    8.12%     0.07%  [kernel]          [k] new_sync_write                      
+    8.05%     0.28%  [kernel]          [k] pipe_write                          
+    6.50%     3.82%  [kernel]          [k] _cond_resched                       
+    5.70%     0.27%  [kernel]          [k] __mutex_lock.constprop.0            
+    5.39%     0.02%  [kernel]          [k] __do_page_cache_readahead           
+    5.34%     5.24%  [kernel]          [k] mutex_spin_on_owner                 
+    5.15%     5.00%  [kernel]          [k] memcpy_erms                         
+    4.13%     0.00%  [kernel]          [k] read_pages.isra.0                   
+    4.00%     0.05%  [kernel]          [k] extent_readpages                    
+    3.40%     0.11%  [kernel]          [k] copy_page_from_iter                 
+    3.32%     0.12%  [kernel]          [k] _copy_from_iter                     
+    2.97%     0.24%  [kernel]          [k] __alloc_pages_nodemask              
+    2.79%     2.13%  [kernel]          [k] rcu_all_qs                          
+    2.50%     1.92%  [kernel]          [k] pipe_unlock                         
+    2.31%     1.07%  [kernel]          [k] get_page_from_freelist              
+    1.95%     0.02%  [kernel]          [k] tlv_put                             
+    1.90%     0.23%  [kernel]          [k] __do_readpage                       
+    1.78%     0.10%  [kernel]          [k] __schedule                          
+    1.77%     0.00%  [kernel]          [k] iterate_extent_inodes               
+    1.74%     0.03%  [kernel]          [k] schedule                            
+    1.48%     0.01%  [kernel]          [k] apic_timer_interrupt                
Press '?' for help on key bindings                                              


[-- Attachment #3: perftopsend3c.txt --]
[-- Type: text/plain, Size: 3373 bytes --]

Samples
  Children      Self  Shared Objec  Symbol
                     - 1.64% tlv_put                                            
                          1.62% memcpy_erms                                     
                  - 0.75% iterate_extent_inodes                                 
+   96.66%     0.01%  [kernel]      [k] do_syscall_64                           
+   74.70%     0.01%  [kernel]      [k] __x64_sys_splice                        
-   74.64%     7.46%  [kernel]      [k] do_splice                               
   - 52.97% do_splice                                                           
        32.61% mutex_unlock                                                     
      - 14.18% pipe_double_lock                                                 
         - 9.97% mutex_lock                                                     
            - 3.59% _cond_resched                                               
                 1.30% rcu_all_qs                                               
         - 3.62% __mutex_lock.constprop.0                                       
              3.45% mutex_spin_on_owner                                         
      - 9.01% mutex_lock                                                        
         - 3.12% _cond_resched                                                  
              1.15% rcu_all_qs                                                  
      - 1.59% __mutex_lock.constprop.0                                          
           1.56% mutex_spin_on_owner                                            
        1.51% pipe_unlock                                                       
   - 7.46% splice                                                               
        entry_SYSCALL_64_after_hwframe                                          
        do_syscall_64                                                           
        __x64_sys_splice                                                        
      - do_splice                                                               
-   33.71%    32.42%  [kernel]      [k] mutex_unlock                            
   - 32.39% splice                                                              
        entry_SYSCALL_64_after_hwframe                                          
        do_syscall_64                                                           
        __x64_sys_splice                                                        
      - do_splice                                                               
           31.67% mutex_unlock                                                  
     0.95% mutex_unlock                                                         
+   25.11%     0.00%  libc-2.31.so  [.] splice                                  
+   21.95%     0.02%  [kernel]      [k] btrfs_ioctl_send                        
+   21.74%     0.03%  [kernel]      [k] process_extent                          
+   21.28%    13.95%  [kernel]      [k] mutex_lock                              
+   19.60%     3.29%  [kernel]      [k] send_extent_data                        
+   19.29%     4.38%  [kernel]      [k] pipe_double_lock                        
+    9.27%     0.00%  libc-2.31.so  [.] __GI___ioctl                            
Press '?' for help on key bindings                                              


[-- Attachment #4: perftopsend2c.txt --]
[-- Type: text/plain, Size: 3373 bytes --]

Samples
  Children      Self  Shared Objec  Symbol
-   96.85%     0.02%  [kernel]      [k] entry_SYSCALL_64_after_hwframe          
   - 30.54% entry_SYSCALL_64_after_hwframe                                      
      - 30.53% do_syscall_64                                                    
         - 29.04% __x64_sys_splice                                              
            - 60.72% do_splice                                                  
                 32.47% mutex_unlock                                            
               - 14.37% pipe_double_lock                                        
                  - 10.21% mutex_lock                                           
                     - 3.72% _cond_resched                                      
                          1.33% rcu_all_qs                                      
                  - 3.39% __mutex_lock.constprop.0                              
                       3.20% mutex_spin_on_owner                                
               - 9.35% mutex_lock                                               
                  - 3.25% _cond_resched                                         
                       1.16% rcu_all_qs                                         
                 2.14% pipe_unlock                                              
               - 1.58% __mutex_lock.constprop.0                                 
                    1.56% mutex_spin_on_owner                                   
         - 7.32% __x64_sys_ioctl                                                
              ksys_ioctl                                                        
              rpc_populate.constprop.0                                          
              _btrfs_ioctl_send                                                 
            - btrfs_ioctl_send                                                  
               - 9.39% process_extent                                           
                  - 8.81% send_extent_data                                      
                     - 2.69% send_cmd                                           
                        - 2.64% kernel_write                                    
                           - 2.71% vfs_write                                    
                              - 2.68% new_sync_write                            
                                 - 3.14% pipe_write                             
                                    - 1.18% copy_page_from_iter                 
                                       - 2.88% _copy_from_iter                  
                                            2.72% memcpy_erms                   
                                    - 0.93% __alloc_pages_nodemask              
                                       - 0.98% get_page_from_freelist           
                     - 1.75% __do_page_cache_readahead                          
                        - 1.33% read_pages.isra.0                               
                           - 1.49% extent_readpages                             
                              - 0.90% __do_readpage                             
                        - 0.66% __alloc_pages_nodemask                          
Press '?' for help on key bindings                                              


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.6, slow send/receive, < 1MB/s
  2020-05-06 21:07             ` Chris Murphy
@ 2020-05-23 16:11               ` Jorge Bastos
  2020-05-23 16:42                 ` A L
  0 siblings, 1 reply; 14+ messages in thread
From: Jorge Bastos @ 2020-05-23 16:11 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

Hi there,

I'm also having much lower than normal send/receive performance since
upgrading to kernel 5.6 (currently kernel 5.6.13 - btrfs-progs 5.6),
there's also a very high CPU utilization during it, anything that can
be done for now?

Regards,
Jorge Bastos

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.6, slow send/receive, < 1MB/s
  2020-05-23 16:11               ` Jorge Bastos
@ 2020-05-23 16:42                 ` A L
  2020-06-06 16:16                   ` Jorge Bastos
  0 siblings, 1 reply; 14+ messages in thread
From: A L @ 2020-05-23 16:42 UTC (permalink / raw)
  To: Btrfs BTRFS



---- From: Jorge Bastos <jorge.mrbastos@gmail.com> -- Sent: 2020-05-23 - 18:11 ----

> Hi there,
> 
> I'm also having much lower than normal send/receive performance since
> upgrading to kernel 5.6 (currently kernel 5.6.13 - btrfs-progs 5.6),
> there's also a very high CPU utilization during it, anything that can
> be done for now?
> 
> Regards,
> Jorge Bastos


One possibility is to add "mbuffer" in the pipeline. It is async compared to "pv". This helps quite a bit on spinning HDD's. 

# btrfs send /some/subvol | mbuffer | btrfs receive /other/destination

PS. There was a possibility of a bug in <=mbuffer-20150412 so don't use these old versions. 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.6, slow send/receive, < 1MB/s
  2020-05-23 16:42                 ` A L
@ 2020-06-06 16:16                   ` Jorge Bastos
  2020-06-06 16:24                     ` Jorge Bastos
  0 siblings, 1 reply; 14+ messages in thread
From: Jorge Bastos @ 2020-06-06 16:16 UTC (permalink / raw)
  To: A L; +Cc: Btrfs BTRFS

Hi there,

Happy to report my issue is not related to btrfs, since I remembered
reading this post and noticed performance issues after upgrading to
kernel 5.6 wrongly assumed it was related, my issue is caused by pv,
for some reason sometimes it starts using a lot of CPU and the
send/receive barely crawls along, but it's not always, i.e.same
send/receive can go at 250MB/s one time and the next time it will
transfer at less than 10Mb/s, anyway sorry to waste everyone's time.

Regards,
Jorge Bastos

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.6, slow send/receive, < 1MB/s
  2020-06-06 16:16                   ` Jorge Bastos
@ 2020-06-06 16:24                     ` Jorge Bastos
  0 siblings, 0 replies; 14+ messages in thread
From: Jorge Bastos @ 2020-06-06 16:24 UTC (permalink / raw)
  To: A L; +Cc: Btrfs BTRFS

Well forget that, it's not pv, it's doing it again without it, will
need to do more testing.

On Sat, Jun 6, 2020 at 5:16 PM Jorge Bastos <jorge.mrbastos@gmail.com> wrote:
>
> Hi there,
>
> Happy to report my issue is not related to btrfs, since I remembered
> reading this post and noticed performance issues after upgrading to
> kernel 5.6 wrongly assumed it was related, my issue is caused by pv,
> for some reason sometimes it starts using a lot of CPU and the
> send/receive barely crawls along, but it's not always, i.e.same
> send/receive can go at 250MB/s one time and the next time it will
> transfer at less than 10Mb/s, anyway sorry to waste everyone's time.
>
> Regards,
> Jorge Bastos

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-06-06 16:24 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-04  7:43 5.6, slow send/receive, < 1MB/s Chris Murphy
2020-05-04  7:47 ` Nikolay Borisov
2020-05-04  8:03 ` Chris Murphy
2020-05-04  8:12   ` Chris Murphy
2020-05-04  8:13   ` Nikolay Borisov
2020-05-04  8:25     ` Chris Murphy
2020-05-04  8:43       ` Chris Murphy
2020-05-06  5:48         ` Chris Murphy
2020-05-06 21:04           ` Chris Murphy
2020-05-06 21:07             ` Chris Murphy
2020-05-23 16:11               ` Jorge Bastos
2020-05-23 16:42                 ` A L
2020-06-06 16:16                   ` Jorge Bastos
2020-06-06 16:24                     ` Jorge Bastos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).