netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC,net-next,x86 0/6] Nontemporal copies in unix socket write path
@ 2022-05-11  3:54 Joe Damato
  2022-05-11  3:54 ` [RFC,net-next,x86 1/6] arch, x86, uaccess: Add nontemporal copy functions Joe Damato
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Joe Damato @ 2022-05-11  3:54 UTC (permalink / raw)
  To: netdev, davem, kuba, linux-kernel, x86; +Cc: Joe Damato

Greetings:

The purpose of this RFC is to gauge initial thoughts/reactions to adding a
path in af_unix for nontemporal copies in the write path. The network stack
supports something similar, but it is enabled for the entire NIC via the
NETIF_F_NOCACHE_COPY bit and cannot (AFAICT) be controlled or adjusted per
socket or per-write and does not affect unix sockets.

This work seeks to build on the existing nontemporal (NT) copy work in the
kernel by adding support in the unix socket write path via a new sendmsg
flag: MSG_NTCOPY. This could also be accomplished via a setsockopt flag,
as well, but this initial implementation adds MSG_NTCOPY for ease of use
and to save an extra system call or two.

In the future, MSG_NTCOPY could be supported by other protocols, and
perhaps used in place of NETIF_F_NOCACHE_COPY to allow user programs to
enable this functionality on a per-write (or per-socket) basis.

If supporting NT copies in the unix write path is acceptable in principle,
I am open to making whatever modifications are requested or needed to get
this RFC closer to a v1. I am sure there will be many; this is just a PoC
in its current form.

As you'll see below, NT copies in the unix write path have a large
measureable impact on certain application architectures and CPUs.

Initial benchmarks are extremely encouraging. I wrote a simple C program to
benchmark this patchset, the program:
  - Creates a unix socket pair
  - Forks a child process
  - The parent process writes to the unix socket using MSG_NTCOPY - or not -
    depending on the command line flags
  - The child process uses splice to move the data from the unix socket to
    a pipe buffer, followed by a second splice call to move the data from
    the pipe buffer to a file descriptor opened on /dev/null.
  - taskset is used when launching the benchmark to ensure the parent and
    child run on appropriate CPUs for various scenarios

The source of the test program is available for examination [1] and results
for three benchmarks I ran are provided below.

Test system: AMD EPYC 7662 64-Core Processor,
	     64 cores / 128 threads,
	     512kb L2 per core shared by sibling CPUs,
	     16mb L3 per NUMA zone,
	     AMD specific settings: NPS=1 and L3 as NUMA enabled 

Test: 1048576 byte object,
      100,000 iterations,
      512kb pipe buffer size,
      512kb unix socket send buffer size

Sample command lines for running the tests provided below. Note that the
command line shows how to run a "normal" copy benchmark. To run the
benchmark in MSG_NTCOPY mode, change command line argument 3 from 0 to 1.

Test pinned to CPUs 1 and 2 which do *not* share an L2 cache, but do share
an L3.

Command line for "normal" copy:
% time taskset -ac 1,2 ./unix-nt-bench 1048576 100000 0 524288 524288

Mode			real time (sec.)		throughput (Mb/s)
"Normal" copy		10.630				78,928
MSG_NTCOPY		7.429				112,935 

Same test as above, but pinned to CPUs 1 and 65 which share an L2 (512kb) and L3
cache (16mb).

Command line for "normal" copy:
% time taskset -ac 1,65 ./unix-nt-bench 1048576 100000 0 524288 524288

Mode			real time (sec.)		throughput (Mb/s)
"Normal" copy		12.532				66,941
MSG_NTCOPY		9.445				88,826	

Same test as above, pinned to CPUs 1 and 65, but with 128kb unix send
buffer and pipe buffer sizes (to avoid spilling L2).

Command line for "normal" copy:
% time taskset -ac 1,65 ./unix-nt-bench 1048576 100000 0 131072 131072

Mode			real time (sec.)		throughput (Mb/s)
"Normal" copy		12.451				67,377
MSG_NTCOPY		9.451				88,768

Thanks,
Joe

[1]: https://gist.githubusercontent.com/jdamato-fsly/03a2f0cd4e71ebe0fef97f7f2980d9e5/raw/19cfd3aca59109ebf5b03871d952ea1360f3e982/unix-nt-copy-bench.c

Joe Damato (6):
  arch, x86, uaccess: Add nontemporal copy functions
  iov_iter: Allow custom copyin function
  iov_iter: Add a nocache copy iov iterator
  net: Add a struct for managing copy functions
  net: Add a way to copy skbs without affect cache
  net: unix: Add MSG_NTCOPY

 arch/x86/include/asm/uaccess_64.h |  6 ++++
 include/linux/skbuff.h            |  2 ++
 include/linux/socket.h            |  1 +
 include/linux/uaccess.h           |  6 ++++
 include/linux/uio.h               |  2 ++
 lib/iov_iter.c                    | 34 ++++++++++++++++++----
 net/core/datagram.c               | 61 ++++++++++++++++++++++++++++-----------
 net/unix/af_unix.c                | 13 +++++++--
 8 files changed, 100 insertions(+), 25 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-05-31  6:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-11  3:54 [RFC,net-next,x86 0/6] Nontemporal copies in unix socket write path Joe Damato
2022-05-11  3:54 ` [RFC,net-next,x86 1/6] arch, x86, uaccess: Add nontemporal copy functions Joe Damato
2022-05-11  3:54 ` [RFC,net-next 2/6] iov_iter: Allow custom copyin function Joe Damato
2022-05-11  3:54 ` [RFC,net-next 3/6] iov_iter: Add a nocache copy iov iterator Joe Damato
2022-05-11  3:54 ` [RFC,net-next 4/6] net: Add a struct for managing copy functions Joe Damato
2022-05-11  3:54 ` [RFC,net-next 5/6] net: Add a way to copy skbs without affect cache Joe Damato
2022-05-11  3:54 ` [RFC,net-next 6/6] net: unix: Add MSG_NTCOPY Joe Damato
2022-05-11 23:25 ` [RFC,net-next,x86 0/6] Nontemporal copies in unix socket write path Jakub Kicinski
2022-05-12  1:01   ` Joe Damato
2022-05-12 19:46     ` Jakub Kicinski
2022-05-12 22:53       ` Joe Damato
2022-05-12 23:12         ` Jakub Kicinski
2022-05-31  6:04 ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).