[net-next PATCH v6 0/7] Symmetric queue selection using XPS for Rx queues

* [net-next PATCH v6 0/7] Symmetric queue selection using XPS for Rx queues
@ 2018-06-30  4:26 Amritha Nambiar
  2018-06-30  4:26 ` [net-next PATCH v6 1/7] net: Refactor XPS for CPUs and " Amritha Nambiar
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Amritha Nambiar @ 2018-06-30  4:26 UTC (permalink / raw)
  To: netdev, davem
  Cc: alexander.h.duyck, willemdebruijn.kernel, amritha.nambiar,
	sridhar.samudrala, alexander.duyck, edumazet, hannes, tom, tom

This patch series implements support for Tx queue selection based on
Rx queue(s) map. This is done by configuring Rx queue(s) map per Tx-queue
using sysfs attribute. If the user configuration for Rx queues does
not apply, then the Tx queue selection falls back to XPS using CPUs and
finally to hashing.

XPS is refactored to support Tx queue selection based on either the
CPUs map or the Rx-queues map. The config option CONFIG_XPS needs to be
enabled. By default no receive queues are configured for the Tx queue.

- /sys/class/net/<dev>/queues/tx-*/xps_rxqs

A set of receive queues can be mapped to a set of transmit queues (many:many),
although the common use case is a 1:1 mapping. This will enable sending
packets on the same Tx-Rx queue association as this is useful for busy polling
multi-threaded workloads where it is not possible to pin the threads to
a CPU. This is a rework of Sridhar's patch for symmetric queueing via
socket option:
https://www.spinics.net/lists/netdev/msg453106.html

Testing Hints:
Kernel:  Linux 4.17.0-rc7+
Interface:
driver: ixgbe
version: 5.1.0-k
firmware-version: 0x00015e0b

Configuration:
ethtool -L $iface combined 16
ethtool -C $iface rx-usecs 1000
sysctl net.core.busy_poll=1000
ATR disabled:
ethtool -K $iface ntuple on

Workload:
Modified memcached that changes the thread selection policy to be based
on the incoming rx-queue of a connection using SO_INCOMING_NAPI_ID socket
option. The default is round-robin.

Default: No rxqs_map configured
Symmetric queues: Enable rxqs_map for all queues 1:1 mapped to Tx queue

System:
Architecture:          x86_64
CPU(s):                72
Model name:            Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

16 threads  400K requests/sec
=============================
-------------------------------------------------------------------------------
                                Default                 Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max                 4/51/2215               2/30/5163
(usec)

intr/sec                        26655                   18606

contextswitch/sec               5145                    4044

insn per cycle                  0.43                    0.72

cache-misses                    6.919                   4.310
(% of all cache refs)

L1-dcache-load-                 4.49                    3.29
-misses
(% of all L1-dcache hits)

LLC-load-misses                 13.26                   8.96
(% of all LL-cache hits)

-------------------------------------------------------------------------------

32 threads  400K requests/sec
=============================
-------------------------------------------------------------------------------
                                Default                 Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max                 10/112/5562             9/46/4637
(usec)

intr/sec                        30456                   27666

contextswitch/sec               7552                    5133

insn per cycle                  0.41                    0.49

cache-misses                    9.357                   2.769
(% of all cache refs)

L1-dcache-load-                 4.09                    3.98
-misses
(% of all L1-dcache hits)

LLC-load-misses                 12.96                   3.96
(% of all LL-cache hits)

-------------------------------------------------------------------------------

16 threads  800K requests/sec
=============================
-------------------------------------------------------------------------------
                                Default                 Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max                  5/151/4989             9/69/2611
(usec)

intr/sec                        35686                   22907

contextswitch/sec               25522                   12281

insn per cycle                  0.67                    0.74

cache-misses                    8.652                   6.38
(% of all cache refs)

L1-dcache-load-                 3.19                    2.86
-misses
(% of all L1-dcache hits)

LLC-load-misses                 16.53                   11.99
(% of all LL-cache hits)

-------------------------------------------------------------------------------
32 threads  800K requests/sec
=============================
-------------------------------------------------------------------------------
                                Default                 Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max                  6/163/6152             8/88/4209
(usec)

intr/sec                        47079                   26548

contextswitch/sec               42190                   39168

insn per cycle                  0.45                    0.54

cache-misses                    8.798                   4.668
(% of all cache refs)

L1-dcache-load-                 6.55                    6.29
-misses
(% of all L1-dcache hits)

LLC-load-misses                 13.91                   10.44
(% of all LL-cache hits)

-------------------------------------------------------------------------------

v6:
- Changed the names of some functions to begin with net_if.
- Cleaned up sk_tx_queue_set/sk_rx_queue_set functions.
- Added sk_rx_queue_clear to make it consistent with tx_queue_mapping
  initialization.

---

Amritha Nambiar (7):
      net: Refactor XPS for CPUs and Rx queues
      net: Use static_key for XPS maps
      net: sock: Change tx_queue_mapping in sock_common to unsigned short
      net: Record receive queue number for a connection
      net: Enable Tx queue selection based on Rx queues
      net-sysfs: Add interface for Rx queue(s) map per Tx queue
      Documentation: Add explanation for XPS using Rx-queue(s) map

 Documentation/ABI/testing/sysfs-class-net-queues |   11 +
 Documentation/networking/scaling.txt             |   61 ++++-
 include/linux/cpumask.h                          |   11 +
 include/linux/netdevice.h                        |   98 +++++++
 include/net/busy_poll.h                          |    1 
 include/net/sock.h                               |   52 ++++
 net/core/dev.c                                   |  288 +++++++++++++++-------
 net/core/net-sysfs.c                             |   87 ++++++-
 net/core/sock.c                                  |    2 
 net/ipv4/tcp_input.c                             |    3 
 10 files changed, 505 insertions(+), 109 deletions(-)

^ permalink raw reply	[flat|nested] 23+ messages in thread