linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Christoph Hellwig <hch@lst.de>, Keith Busch <kbusch@kernel.org>,
	linux-nvme@lists.infradead.org
Cc: Anil Vasudevan <anil.vasudevan@intel.com>,
	Mark Wunderlich <mark.wunderlich@intel.com>
Subject: [PATCH 0/2 ] nvme-tcp I/O path optimizations
Date: Fri,  1 May 2020 14:25:43 -0700	[thread overview]
Message-ID: <20200501212545.21856-1-sagi@grimberg.me> (raw)

Hey All,

Here are two data-path optimizations that result in a measurable reduction
in latency and context switches.

First optimization is a heuristic relevant for polling oriented workloads
avoiding scheduling io_work when the application is polling. The second
optimization is an opportunistic attempt to send the request in queue_rq if
it is the first in line (statistic sampling reveals that this is often the
case for read-intensive workloads), otherwise we assume io_work will handle
it and we don't want to contend with it. The benefit is that we don't absorb
the extra context switch if we don't have to. Do note that given that network
send operations, despite setting MSG_DONTWAIT, may sleep, so we need to set
blocking dispatch (BLK_MQ_F_BLOCKING).

There are more data path optimizations being evaluated, both for the host and
the target. The ideas came from Mark and Anil from Intel who also benchmarked
and instrumented the system (thanks!). Testing was done using a NIC device from
Intel, but should not be specific to any device.

Representative fio micro-benchmark testing:
- ram device (nvmet-tcp)
- single CPU core (pinned)
- 100% 4k reads
- engine io_uring
- hipri flag set (polling)

Baseline:
========
QDepth/Batch    | IOPs [k]  | Avg. Lat [us]  | 99.99% Lat [us]  | ctx-switches
------------------------------------------------------------------------------
1/1             | 35.1      | 27.42          | 47.87		| ~119914
32/8            | 234       | 122.98         | 239		| ~143450

With patches applied:
====================
QDepth/Batch    | IOPs [k]  | Avg. Lat [us]  | 99.99% Lat [us]  | ctx-switches
------------------------------------------------------------------------------
1/1             | 39.6      | 24.25          | 36.6             | ~357
32/8            | 247       | 113.95         | 249              | ~37298

Sagi Grimberg (2):
  nvme-tcp: avoid scheduling io_work if we are already polling
  nvme-tcp: try to send request in queue_rq context

 drivers/nvme/host/tcp.c | 49 +++++++++++++++++++++++++++++++----------
 1 file changed, 37 insertions(+), 12 deletions(-)

-- 

2.20.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

             reply	other threads:[~2020-05-01 21:25 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-01 21:25 Sagi Grimberg [this message]
2020-05-01 21:25 ` [PATCH 1/2] nvme-tcp: avoid scheduling io_work if we are already polling Sagi Grimberg
2020-05-01 21:25 ` [PATCH 2/2] nvme-tcp: try to send request in queue_rq context Sagi Grimberg
2020-05-06  6:57 ` [PATCH 0/2 ] nvme-tcp I/O path optimizations Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200501212545.21856-1-sagi@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=anil.vasudevan@intel.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=mark.wunderlich@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).