[PATCH 0/2 ] nvme-tcp I/O path optimizations

* [PATCH 0/2 ] nvme-tcp I/O path optimizations
@ 2020-05-01 21:25 Sagi Grimberg
  2020-05-01 21:25 ` [PATCH 1/2] nvme-tcp: avoid scheduling io_work if we are already polling Sagi Grimberg
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Sagi Grimberg @ 2020-05-01 21:25 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch, linux-nvme
  Cc: Anil Vasudevan, Mark Wunderlich

Hey All,

Here are two data-path optimizations that result in a measurable reduction
in latency and context switches.

First optimization is a heuristic relevant for polling oriented workloads
avoiding scheduling io_work when the application is polling. The second
optimization is an opportunistic attempt to send the request in queue_rq if
it is the first in line (statistic sampling reveals that this is often the
case for read-intensive workloads), otherwise we assume io_work will handle
it and we don't want to contend with it. The benefit is that we don't absorb
the extra context switch if we don't have to. Do note that given that network
send operations, despite setting MSG_DONTWAIT, may sleep, so we need to set
blocking dispatch (BLK_MQ_F_BLOCKING).

There are more data path optimizations being evaluated, both for the host and
the target. The ideas came from Mark and Anil from Intel who also benchmarked
and instrumented the system (thanks!). Testing was done using a NIC device from
Intel, but should not be specific to any device.

Representative fio micro-benchmark testing:
- ram device (nvmet-tcp)
- single CPU core (pinned)
- 100% 4k reads
- engine io_uring
- hipri flag set (polling)

Baseline:
========
QDepth/Batch    | IOPs [k]  | Avg. Lat [us]  | 99.99% Lat [us]  | ctx-switches
------------------------------------------------------------------------------
1/1             | 35.1      | 27.42          | 47.87		| ~119914
32/8            | 234       | 122.98         | 239		| ~143450

With patches applied:
====================
QDepth/Batch    | IOPs [k]  | Avg. Lat [us]  | 99.99% Lat [us]  | ctx-switches
------------------------------------------------------------------------------
1/1             | 39.6      | 24.25          | 36.6             | ~357
32/8            | 247       | 113.95         | 249              | ~37298

Sagi Grimberg (2):
  nvme-tcp: avoid scheduling io_work if we are already polling
  nvme-tcp: try to send request in queue_rq context

 drivers/nvme/host/tcp.c | 49 +++++++++++++++++++++++++++++++----------
 1 file changed, 37 insertions(+), 12 deletions(-)

-- 

2.20.1

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 4+ messages in thread