All of lore.kernel.org
 help / color / mirror / Atom feed
From: Philipp Falk <philipp.falk@thinkparq.com>
To: linux-fsdevel@vger.kernel.org
Subject: Throughput drop and high CPU load on fast NVMe drives
Date: Tue, 22 Jun 2021 19:15:58 +0200	[thread overview]
Message-ID: <YNIaztBNK+I5w44w@xps13> (raw)

We are facing a performance issue on XFS and other filesystems running on
fast NVMe drives when reading large amounts of data through the page cache
with fio.

Streaming read performance starts off near the NVMe hardware limit until
around the total size of system memory worth of data has been read.
Performance then drops to around half the hardware limit and CPU load
increases significantly. Using perf, we were able to establish that most of
the CPU load is caused by a spin lock in native_queued_spin_lock_slowpath:

-   58,93%    58,92%  fio              [kernel.kallsyms]         [k]
native_queued_spin_lock_slowpath
     45,72% __libc_read
        entry_SYSCALL_64_after_hwframe
        do_syscall_64
        ksys_read
        vfs_read
        new_sync_read
        xfs_file_read_iter
        xfs_file_buffered_aio_read
      - generic_file_read_iter
         - 45,72% ondemand_readahead
            - __do_page_cache_readahead
               - 34,64% __alloc_pages_nodemask
                  - 34,34% __alloc_pages_slowpath
                     - 34,33% try_to_free_pages
                          do_try_to_free_pages
                        - shrink_node
                           - 34,33% shrink_lruvec
                              - shrink_inactive_list
                                 - 28,22% shrink_page_list
                                    - 28,10% __remove_mapping
                                       - 28,10% _raw_spin_lock_irqsave
                                            native_queued_spin_lock_slowpath
                                 + 6,10% _raw_spin_lock_irq
               + 11,09% read_pages

When direct I/O is used, hardware level read throughput is sustained during
the entire experiment and CPU load stays low. Threads stay in D state most
of the time.

Very similar results are described around half-way through this article
[1].

Is this a known issue with the page cache and high throughput I/O? Is there
any tuning that can be applied to get around the CPU bottleneck? We have
tried disabling readahead on the drives, which lead to very bad throughput
(~-90%). Various other scheduler related tuning was tried as well but the
results were always similar.

Experiment setup can be found below. I am happy to provide more detail if
required. If this is the wrong place to post this, please kindly let me
know.

Best regards
- Philipp

Experiment setup:

[1] https://tanelpoder.com/posts/11m-iops-with-10-ssds-on-amd-threadripper-pro-workstation/

CPU: 2x Intel(R) Xeon(R) Platinum 8352Y 2.2 GHz, 32c/64t each, 512GB memory
NVMe: 16x 1.6TB, 8 per NUMA node
FS: one XFS per disk, but reproducible on ext4 and ZFS
Kernel: Linux 5.3 (SLES), but reproducible on 5.12 (SUSE Tumbleweed)
NVMe scheduler: both "none" and "mq-deadline", very similar results
fio: 4 threads per NVMe drive, 20GiB of data per thread, ioengine=sync
  Sustained read throughput direct=1: ~52GiB/s (~3.2 GiB/s*disk)
  Sustained read throughput direct=0: ~25GiB/s (~1.5 GiB/s*disk)

             reply	other threads:[~2021-06-22 17:16 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-22 17:15 Philipp Falk [this message]
2021-06-22 17:36 ` Throughput drop and high CPU load on fast NVMe drives Matthew Wilcox
2021-06-23 11:33   ` Philipp Falk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YNIaztBNK+I5w44w@xps13 \
    --to=philipp.falk@thinkparq.com \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.