linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH rfc 0/2] allow for busy poll improvements
@ 2019-09-24 16:06 Wunderlich, Mark
  0 siblings, 0 replies; only message in thread
From: Wunderlich, Mark @ 2019-09-24 16:06 UTC (permalink / raw)
  To: linux-nvme; +Cc: Sagi Grimberg


Proposing a small series of two patches that provide for improved packet processing for a fabric network interface that is operating in poll mode vs.
standard interrupt mode.

Patch 1:  Modifies the do/while terminate condition in nvmet_tcp_io_work() to be time based (for poll mode) Vs. the existing code that is operations based (works well for interrupt mode).  Being time based  allows increased opportunity to reap for send or recv completions without premature exit when a single iteration of loop is idle (pending being false).

In either case, after exiting the do/while loop, it is desirable to re-queue the work item if there was previous activity.  In the case of poll mode, this is best measured by any accumulated ops completion over the complete do/while period, vs.  for interrupt mode it was coded to re-queue only if the last iteration through the loop showed successful activity.

There is opportunity to simplify the changes proposed here to bring more commonality between modes, and not require ifdef check, that are under consideration.  For example, using a time based check as proposed (and as used for the host side for this similar function) and identify a default time quota used by interrupt mode.  If at the start of the function, the busy mode time period is not set then a default defined interrupt mode time period (same as host side of 1 msec?) would be used.  Then with time defining the mode, if we are in interrupt mode we can preserve use of 'pending' and break out of loop at the end if that loop iteration is false.  Then just after the loop, add a check that if (poll mode and ops > 0) then set pending to true.  We can then preserve the final check that re-queues the worker if pending is true.  Issue with this option being if user happens to set the sk_ll_usec busy mode time period to the default interrupt time period.

Patch 2:  This patch builds upon previous kernel network patches listed below that enabled enhanced symmetric queuing:

- a4fd1f4 Documentation: Add explanation for XPS using Rx-queue(s) map
- 8af2c06 net-sysfs: Add interface for Rx queue(s) map per Tx queue
- fc9bab2 net: Enable Tx queue selection based on Rx queues
- c6345ce net: Record receive queue number for a connection

Setting the socket priority to a non-zero value, via the proposed module parameter, will trigger indication to the network NIC that optimized network processing and queue selection can or should be considered.  The default value for priority remains zero to support default NIC behavior related to priority.

When applied, and running with an optimized busy polling NIC, there is a Measurable improvement in I/O performance.  The data below shows FIO results For 4K random read operations to a single remote nvme device.  The queue depth Is 32 and the batch size is 8.  One set of data represents a baseline for the standard Linux kernel running on the host and target (5.2.1 stable).  In the other, the two Proposed patches were applied to the target only, same kernel version.  For comparison, The number of threads for the fio job are scaled from 1 to 8 until we reach nvme device I/O saturation.

Baseline 5.2.1 stable kernel:
Threads  |  IOPS (K)  |  Avg Lat (usec)  | 99.99 (usec)
*****************************
1  |   80.8  |  371.40  |  758
2  |  160  |  376.55  |  717
3  |  255  |  352.41  |  750
4  |  356  |  336.87  |  734
5  |  465  |  324.32  |  734
6  |  558  |  330.18  |  750
7  |  572  |  376.95  |  750
8  |  585  |  431.15  |  742

With patches applied on Target kernel:
******************************
1  |  169  |  163.67  |  469
2  |  344  |  154.19  |  445
3  |  513  |  159.16  |  465
4  |  495  |  226.16  |  494
5  |  583  |  254.52  |  510
6  |  586  |  315.72  |  510
7  |  586  |  370.84  |  515
8  |  586  |  427.18  |  660

Note:  Data was gathered using kernel 5.2.1 stable.  The patches posted to this mailing list were merged to the infradead tree.  Currently working build issues with that tree that prevent testing with it as the base.

Mark Wunderlich (2):
- nvmet-tcp: time based stop condition in io_work
- nvmet-tcp: set SO_PRIORITY for accepted sockets


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2019-09-24 16:06 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-24 16:06 [PATCH rfc 0/2] allow for busy poll improvements Wunderlich, Mark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).