From: Hans van Kranenburg <hans@knorrie.org>
To: linux-btrfs@vger.kernel.org
Subject: Debugging abysmal write performance with 100% cpu kworker/u16:X+flush-btrfs-2
Date: Sat, 25 Jul 2020 16:24:57 +0200 [thread overview]
Message-ID: <2523ce77-31a3-ecec-f36d-8d74132eae02@knorrie.org> (raw)
[-- Attachment #1: Type: text/plain, Size: 4302 bytes --]
Hi,
I have a filesystem here that I'm filling up with data from elsewhere.
Most of it is done by rsync, and part by send/receive. So, receiving
data over the network, and then writing the files to disk. There can be
a dozen of these processes running in parallel.
Now, when doing so, the kworker/u16:X+flush-btrfs-2 process (with
varying X) often is using nearly 100% cpu, while enormously slowing down
disk writes. This shows as disk IO wait for the rsync and btrfs receive
processes.
The underlying storage (iSCSI connected over 10Gb/s network) can easily
eat a few hundred MiB/s. When looking at actual disk business on the
storage device, percentages <5% utilization are reported for the actual
disks.
It's clearly kworker/u16:X+flush-btrfs-2 which is the bottleneck here.
I just did a 'perf record -g -a sleep 60' while disk writes were down to
under 1MiB (!) per second and then 'perf report'. Attached some 'screen
shot' of it. Also attached an example of what nmon shows to give an idea
about the situation.
If the kworker/u16:X+flush-btrfs-2 cpu usage decreases, I immediately
see network and disk write speed ramping up, easily over 200 MiB/s,
until it soon plummets again.
I see the same behavior with a recent 4.19 kernel and with 5.7.6 (which
is booted now).
So, what I'm looking for is:
* Does anyone else see this happen when doing a lot of concurrent writes?
* What does this flush thing do?
* Why is it using 100% cpu all the time?
* How can I debug this more?
* Ultimately, of course... how can we improve this?
I can recompile the kernel image to e.g. put more trace points in, in
different places.
I just have no idea where to start.
Thanks,
Hans
P.S. /proc/<pid>/stack for the kworker/u16:X+flush-btrfs-2 mostly shows
nothing at all, and sometimes when it does show some output, it mostly
looks like this:
----
[<0>] rq_qos_wait+0xfa/0x170
[<0>] wbt_wait+0x98/0xe0
[<0>] __rq_qos_throttle+0x23/0x30
[<0>] blk_mq_make_request+0x12a/0x5d0
[<0>] generic_make_request+0xcf/0x310
[<0>] submit_bio+0x42/0x1c0
[<0>] btrfs_map_bio+0x1c0/0x380 [btrfs]
[<0>] btrfs_submit_bio_hook+0x8c/0x180 [btrfs]
[<0>] submit_one_bio+0x31/0x50 [btrfs]
[<0>] submit_extent_page+0x102/0x210 [btrfs]
[<0>] __extent_writepage_io+0x1cf/0x380 [btrfs]
[<0>] __extent_writepage+0x101/0x300 [btrfs]
[<0>] extent_write_cache_pages+0x2bb/0x440 [btrfs]
[<0>] extent_writepages+0x44/0x90 [btrfs]
[<0>] do_writepages+0x41/0xd0
[<0>] __writeback_single_inode+0x3d/0x340
[<0>] writeback_sb_inodes+0x1e5/0x480
[<0>] __writeback_inodes_wb+0x5d/0xb0
[<0>] wb_writeback+0x25f/0x2f0
[<0>] wb_workfn+0x2fe/0x3f0
[<0>] process_one_work+0x1ad/0x370
[<0>] worker_thread+0x30/0x390
[<0>] kthread+0x112/0x130
[<0>] ret_from_fork+0x1f/0x40
----
[<0>] rq_qos_wait+0xfa/0x170
[<0>] wbt_wait+0x98/0xe0
[<0>] __rq_qos_throttle+0x23/0x30
[<0>] blk_mq_make_request+0x12a/0x5d0
[<0>] generic_make_request+0xcf/0x310
[<0>] submit_bio+0x42/0x1c0
[<0>] btrfs_map_bio+0x1c0/0x380 [btrfs]
[<0>] btrfs_submit_bio_hook+0x8c/0x180 [btrfs]
[<0>] submit_one_bio+0x31/0x50 [btrfs]
[<0>] extent_writepages+0x5d/0x90 [btrfs]
[<0>] do_writepages+0x41/0xd0
[<0>] __writeback_single_inode+0x3d/0x340
[<0>] writeback_sb_inodes+0x1e5/0x480
[<0>] __writeback_inodes_wb+0x5d/0xb0
[<0>] wb_writeback+0x25f/0x2f0
[<0>] wb_workfn+0x2fe/0x3f0
[<0>] process_one_work+0x1ad/0x370
[<0>] worker_thread+0x30/0x390
[<0>] kthread+0x112/0x130
[<0>] ret_from_fork+0x1f/0x40
----
[<0>] rq_qos_wait+0xfa/0x170
[<0>] wbt_wait+0x98/0xe0
[<0>] __rq_qos_throttle+0x23/0x30
[<0>] blk_mq_make_request+0x12a/0x5d0
[<0>] generic_make_request+0xcf/0x310
[<0>] submit_bio+0x42/0x1c0
[<0>] btrfs_map_bio+0x1c0/0x380 [btrfs]
[<0>] btrfs_submit_bio_hook+0x8c/0x180 [btrfs]
[<0>] submit_one_bio+0x31/0x50 [btrfs]
[<0>] submit_extent_page+0x102/0x210 [btrfs]
[<0>] __extent_writepage_io+0x1cf/0x380 [btrfs]
[<0>] __extent_writepage+0x101/0x300 [btrfs]
[<0>] extent_write_cache_pages+0x2bb/0x440 [btrfs]
[<0>] extent_writepages+0x44/0x90 [btrfs]
[<0>] do_writepages+0x41/0xd0
[<0>] __writeback_single_inode+0x3d/0x340
[<0>] writeback_sb_inodes+0x1e5/0x480
[<0>] __writeback_inodes_wb+0x5d/0xb0
[<0>] wb_writeback+0x25f/0x2f0
[<0>] wb_workfn+0x2fe/0x3f0
[<0>] process_one_work+0x1ad/0x370
[<0>] worker_thread+0x30/0x390
[<0>] kthread+0x112/0x130
[<0>] ret_from_fork+0x1f/0x40
[-- Attachment #2: 2020-07-25-perf-report.txt --]
[-- Type: text/plain, Size: 6268 bytes --]
Samples: 1M of event 'cpu-clock:pppH', Event count (approx.): 311991500000
Children Self Command Shared Object Symbol
+ 80.53% 0.00% swapper [kernel.kallsyms] [k] secondary_startup_64
+ 80.53% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry
+ 80.53% 0.00% swapper [kernel.kallsyms] [k] do_idle
+ 80.52% 0.00% swapper [kernel.kallsyms] [k] default_idle
+ 80.52% 80.51% swapper [kernel.kallsyms] [k] native_safe_halt
+ 63.42% 0.00% swapper [kernel.kallsyms] [k] start_secondary
+ 19.10% 0.00% kworker/u16:11+ [kernel.kallsyms] [k] ret_from_fork
+ 19.10% 0.00% kworker/u16:11+ [kernel.kallsyms] [k] kthread
+ 19.10% 0.00% kworker/u16:11+ [kernel.kallsyms] [k] worker_thread
+ 19.10% 0.00% kworker/u16:11+ [kernel.kallsyms] [k] process_one_work
+ 19.10% 0.00% kworker/u16:11+ [kernel.kallsyms] [k] wb_workfn
+ 19.10% 0.00% kworker/u16:11+ [kernel.kallsyms] [k] wb_writeback
+ 19.10% 0.00% kworker/u16:11+ [kernel.kallsyms] [k] __writeback_inodes_wb
+ 19.10% 0.00% kworker/u16:11+ [kernel.kallsyms] [k] writeback_sb_inodes
+ 19.10% 0.00% kworker/u16:11+ [kernel.kallsyms] [k] __writeback_single_inode
+ 19.10% 0.00% kworker/u16:11+ [kernel.kallsyms] [k] do_writepages
+ 19.10% 0.00% kworker/u16:11+ [kernel.kallsyms] [k] extent_writepages
+ 19.08% 0.00% kworker/u16:11+ [kernel.kallsyms] [k] extent_write_cache_pages
+ 19.07% 0.00% kworker/u16:11+ [kernel.kallsyms] [k] __extent_writepage
+ 19.04% 0.00% kworker/u16:11+ [kernel.kallsyms] [k] writepage_delalloc
+ 19.03% 0.00% kworker/u16:11+ [kernel.kallsyms] [k] btrfs_run_delalloc_range
+ 19.03% 0.00% kworker/u16:11+ [kernel.kallsyms] [k] cow_file_range
+ 19.01% 0.00% kworker/u16:11+ [kernel.kallsyms] [k] btrfs_reserve_extent
+ 19.01% 4.26% kworker/u16:11+ [kernel.kallsyms] [k] find_free_extent
+ 17.11% 0.00% swapper [kernel.kallsyms] [k] start_kernel
+ 7.18% 7.18% kworker/u16:11+ [kernel.kallsyms] [k] _raw_spin_lock
+ 4.45% 4.45% kworker/u16:11+ [kernel.kallsyms] [k] btrfs_get_block_group
+ 1.49% 1.34% kworker/u16:11+ [kernel.kallsyms] [k] down_read
+ 0.56% 0.56% kworker/u16:11+ [kernel.kallsyms] [k] btrfs_put_block_group
0.50% 0.50% kworker/u16:11+ [kernel.kallsyms] [k] __raw_callee_save___pv_queued_spin_unlock
0.44% 0.44% kworker/u16:11+ [kernel.kallsyms] [k] up_read
0.28% 0.13% kworker/u16:11+ [kernel.kallsyms] [k] _cond_resched
0.15% 0.15% kworker/u16:11+ [kernel.kallsyms] [k] rcu_all_qs
0.05% 0.00% nmon [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe
0.05% 0.00% nmon [kernel.kallsyms] [k] do_syscall_64
0.04% 0.00% perf_5.7 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe
0.04% 0.00% perf_5.7 [kernel.kallsyms] [k] do_syscall_64
0.04% 0.00% btrfs-transacti [kernel.kallsyms] [k] ret_from_fork
0.04% 0.00% btrfs-transacti [kernel.kallsyms] [k] kthread
0.04% 0.00% btrfs-transacti [kernel.kallsyms] [k] transaction_kthread
0.04% 0.00% btrfs-transacti [kernel.kallsyms] [k] btrfs_commit_transaction
0.04% 0.00% perf_5.7 libpthread-2.28.so [.] __libc_write
0.04% 0.00% perf_5.7 [kernel.kallsyms] [k] ksys_write
0.04% 0.00% perf_5.7 [kernel.kallsyms] [k] vfs_write
0.04% 0.00% perf_5.7 [kernel.kallsyms] [k] new_sync_write
0.04% 0.00% perf_5.7 [kernel.kallsyms] [k] btrfs_file_write_iter
0.04% 0.00% perf_5.7 [kernel.kallsyms] [k] btrfs_buffered_write.isra.28
0.03% 0.00% kswapd0 [kernel.kallsyms] [k] ret_from_fork
0.03% 0.00% kswapd0 [kernel.kallsyms] [k] kthread
0.03% 0.00% kswapd0 [kernel.kallsyms] [k] kswapd
0.03% 0.00% kswapd0 [kernel.kallsyms] [k] balance_pgdat
0.03% 0.00% kswapd0 [kernel.kallsyms] [k] shrink_node
0.03% 0.00% btrfs-transacti [kernel.kallsyms] [k] btrfs_run_delayed_refs
0.03% 0.00% btrfs-transacti [kernel.kallsyms] [k] __btrfs_run_delayed_refs
0.03% 0.00% kworker/u16:11+ [kernel.kallsyms] [k] __extent_writepage_io
0.03% 0.00% kworker/u16:11+ [kernel.kallsyms] [k] submit_one_bio
0.03% 0.00% kworker/u16:11+ [kernel.kallsyms] [k] btrfs_submit_bio_hook
0.03% 0.00% kworker/u16:11+ [kernel.kallsyms] [k] btrfs_map_bio
0.02% 0.00% kswapd0 [kernel.kallsyms] [k] shrink_slab
0.02% 0.00% kswapd0 [kernel.kallsyms] [k] do_shrink_slab
0.02% 0.00% kworker/u16:4-b [kernel.kallsyms] [k] ret_from_fork
0.02% 0.00% kworker/u16:4-b [kernel.kallsyms] [k] kthread
0.02% 0.00% kworker/u16:4-b [kernel.kallsyms] [k] worker_thread
0.02% 0.00% kworker/u16:4-b [kernel.kallsyms] [k] process_one_work
0.02% 0.00% kswapd0 [kernel.kallsyms] [k] super_cache_scan
0.02% 0.00% kworker/u16:4-b [kernel.kallsyms] [k] btrfs_work_helper
0.02% 0.00% kworker/u16:3-b [kernel.kallsyms] [k] ret_from_fork
0.02% 0.00% kworker/u16:3-b [kernel.kallsyms] [k] kthread
0.02% 0.00% kworker/u16:3-b [kernel.kallsyms] [k] worker_thread
0.02% 0.00% kworker/u16:3-b [kernel.kallsyms] [k] process_one_work
0.02% 0.00% perf_5.7 [unknown] [k] 0xffffffff9d9bc42e
0.02% 0.00% kworker/u16:3-b [kernel.kallsyms] [k] btrfs_work_helper
0.02% 0.00% btrfs-transacti [kernel.kallsyms] [k] alloc_reserved_file_extent
0.02% 0.00% nmon libc-2.28.so [.] open64
0.02% 0.00% nmon [kernel.kallsyms] [k] do_sys_open
[-- Attachment #3: 2020-07-25-nmon.txt --]
[-- Type: text/plain, Size: 5851 bytes --]
---- >8 ---- CPU Utilisation ---- >8 ----
---------------------------+-------------------------------------------------+
CPU User% Sys% Wait% Steal|0 |25 |50 |75 100|
1 5.2 3.1 90.2 1.6|UUsWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW >
2 0.0 0.0 99.5 0.5|WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW>
3 0.5 0.0 99.5 0.0|WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW>
4 0.0 0.0 100.0 0.0|WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW>
5 0.0 100.0 0.0 0.0|sssssssssssssssssssssssssssssssssssssssssssssssss>
6 0.0 0.0 100.0 0.0|WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW>
7 1.0 0.5 98.0 0.5|WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW >
8 1.0 0.0 98.5 0.5|WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW>
---------------------------+-------------------------------------------------+
Avg 0.9 13.0 85.7 0.3|ssssssWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW>|
---------------------------+-------------------------------------------------+
---- >8 ---- Memory and Swap ---- >8 ----
PageSize:4KB RAM-Memory Swap-Space High-Memory Low-Memory
Total (MB) 16015.3 0.0 - not in use - not in use
Free (MB) 1471.1 0.0
Free Percent 9.2% 0.0%
Linux Kernel Internal Memory (MB)
Cached= 12582.6 Active= 3756.1
Buffers= 0.2 Swapcached= 0.0 Inactive = 8994.9
Dirty = 2387.0 Writeback = 0.5 Mapped = 27.3
Slab = 1653.0 Commit_AS = 321.2 PageTables= 4.4
---- >8 ---- Kernel and Load Average ---- >8 ----
Global-CPU-Stats----> 7.5% user Load Average CPU use since boottime
/proc/stat line 1 0.0% user_nice 1 mins 14.02 Uptime Days Hours Mins
100 ticks per second 101.1% system 5 mins 14.05 Uptime 20 15 19
100%=1 CPUcoreThread 0.0% idle 15 mins 13.33 Idle 15 335 17633
3 RunQueue 677.8% iowait Uptime has overflowed
13 Blocked 0.0% irq
2338.1 Context 2.0% softirq 8 CPU core threads
Switch 2.5% steal
0.0 Forks 0.0% guest Boot time 1593903119
3174.7 Interrupts 0.0% guest_nice 12:51 AM 05-Jul-2020
---- >8 ---- Network I/O ---- >8 ----
I/F Name Recv=KB/s Trans=KB/s packin packout insize outsize Peak->Recv Trans
lo 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
eth0 12250.1 88.6 968.6 999.0 12950.7 90.8 388740.8 2413.5
---- >8 ---- Disk I/O -- /proc/diskstats -- mostly in KB/s ---- >8 ----
DiskName Busy Read WriteKB|0 |25 |50 |75 100|
xvda 0% 0.0 0.0| >
xvdb 0% 31.9 0.0|R >
xvdc 23% 0.0 1171.3|WWWWWWWWWWWW >
xvdd 0% 0.0 0.0| >
xvde 39% 0.0 970.1|WWWWWWWWWWWWWWWWWWWW >
Totals Read-MB/s=0.0 Writes-MB/s=2.1 Transfers/sec=21.9
---- >8 ---- Top Processes Procs=260-mode=3-1=Base 3=Perf 4=Size 5=I/O[RootOnly] u=Args ---- >8 ----
PID %CPU Size Res Res Res Res Shared Faults Faults Command
Used KB Set Text Data Lib KB Min Maj
7909 99.6 0 0 0 0 0 0 0 0 kworker/u16:2+flush-btrfs-2
22780 1.5 19780 10548 400 4588 0 5768 0 0 ssh
21840 1.0 19772 10608 400 4580 0 5836 0 0 ssh
22638 1.0 19776 10616 400 4584 0 5844 0 0 ssh
23115 1.0 19772 10592 400 4580 0 5824 0 0 ssh
23460 1.0 19816 10644 400 4624 0 5828 0 0 ssh
23571 1.0 19780 10564 400 4588 0 5784 0 0 ssh
23759 1.0 19768 10472 400 4576 0 5708 0 0 ssh
23770 1.0 19740 10588 400 4548 0 5848 0 0 ssh
30708 1.0 19996 10752 400 4804 0 5756 0 0 ssh
30710 1.0 8544 1124 396 392 0 960 0 0 btrfs
993 0.5 8672 3160 104 4652 0 440 0 0 nmon
10290 0.5 19844 10688 400 4652 0 5848 0 0 ssh
18186 0.5 0 0 0 0 0 0 0 0 kworker/u16:6-btrfs-endio-write
18616 0.5 8544 1204 396 392 0 1036 0 0 btrfs
18659 0.5 5420 788 56 440 0 724 0 0 pv
18660 0.5 8544 1196 396 392 0 1032 0 0 btrfs
21842 0.5 7896 1940 328 2300 0 1344 0 0 rsync
22807 0.5 7896 1984 328 2300 0 1384 0 0 rsync
23461 0.5 7896 1928 328 2300 0 1332 0 0 rsync
23467 0.5 19724 10580 400 4532 0 5856 0 0 ssh
23468 0.5 8476 3084 328 2880 0 1272 0 0 rsync
next reply other threads:[~2020-07-25 14:25 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-25 14:24 Hans van Kranenburg [this message]
2020-07-25 15:37 ` Debugging abysmal write performance with 100% cpu kworker/u16:X+flush-btrfs-2 Holger Hoffstätte
2020-07-25 16:43 ` Hans van Kranenburg
2020-07-25 19:44 ` Holger Hoffstätte
2020-07-25 21:03 ` Hans van Kranenburg
2020-07-26 1:00 ` Chris Murphy
2020-07-25 21:27 ` Hans van Kranenburg
2020-07-26 8:10 ` A L
2020-07-26 0:50 ` Chris Murphy
2020-07-27 11:09 ` Qu Wenruo
2020-07-27 17:17 ` Hans van Kranenburg
2020-07-27 19:23 ` Chris Murphy
2020-07-27 23:16 ` Chris Murphy
2020-07-28 0:51 ` Qu Wenruo
2020-07-28 1:52 ` Qu Wenruo
2020-07-28 14:52 ` Hans van Kranenburg
2020-07-29 0:15 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2523ce77-31a3-ecec-f36d-8d74132eae02@knorrie.org \
--to=hans@knorrie.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.