linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC,net-next,x86 v2 0/8] Nontemporal copies in sendmsg path
@ 2022-06-12  8:57 Joe Damato
  2022-06-12  8:57 ` [RFC,x86 v2 1/8] arch, x86, uaccess: Add nontemporal copy functions Joe Damato
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Joe Damato @ 2022-06-12  8:57 UTC (permalink / raw)
  To: x86, Alexander Viro, Borislav Petkov, Dave Hansen, David Ahern,
	David S. Miller, Eric Dumazet, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jakub Kicinski, linux-kernel, netdev, Paolo Abeni,
	Thomas Gleixner
  Cc: Joe Damato

Greetings:

Welcome to RFC v2.

This is my first series that touches more than 1 subsystem; hope I got the
various subject lines and to/cc-lists correct.

Based on the feedback on RFC v1 [1], I've made a few changes:

- Removed the indirect calls.
- Simplified the code a bit by pushing logic down to a wrapper around
  copyin.
- Added support for the 'MSG_NTCOPY' flag to udp, udp-lite, tcp, and unix.

I think this series is much closer to a v1 that can be submit for
consideration, but wanted to test the waters with an RFC first :)

This new set of code allows applications to request non-temporal copies on
individual calls to sendmsg for several socket types, not just unix.

The result is that:

1. Users don't need to specify no cache copy for the entire interface as
   they had been doing previously with ethtool. There is more fine grained
   control of which sendmsgs are non-temporal. I think it makes sense for
   this to be application specific (vs interface-wide) since applications
   will have a better idea of which copy is appropriate.

2. Previously, the ethool bit for enabling no-cache-copy only seems to have
   affected TCP sockets, IIUC. This series supports UDP, UDP-Lite, TCP, and
   Unix. This means the behavior and accessibility of non-temporal copies
   is normalized bit more than it had been previously.

The performance results on my AMD Zen2 test system are identical to the
previous RFC, so I've included those results below.

As you'll see below, NT copies in the unix write path have a large
measureable impact on certain application architectures and CPUs.

Initial benchmarks are extremely encouraging. I wrote a simple C program to
benchmark this patchset, the program:
  - Creates a unix socket pair
  - Forks a child process
  - The parent process writes to the unix socket using MSG_NTCOPY, or not,
    depending on the command line flags
  - The child process uses splice to move the data from the unix socket to
    a pipe buffer, followed by a second splice call to move the data from
    the pipe buffer to a file descriptor opened on /dev/null.
  - taskset is used when launching the benchmark to ensure the parent and
    child run on appropriate CPUs for various scenarios

The source of the test program is available for examination [2] and results
for three benchmarks I ran are provided below.

Test system: AMD EPYC 7662 64-Core Processor,
	     64 cores / 128 threads,
	     512kb L2 per core shared by sibling CPUs,
	     16mb L3 per NUMA zone,
	     AMD specific settings: NPS=1 and L3 as NUMA enabled 

Test: 1048576 byte object,
      100,000 iterations,
      512kb pipe buffer size,
      512kb unix socket send buffer size

Sample command lines for running the tests provided below. Note that the
command line shows how to run a "normal" copy benchmark. To run the
benchmark in MSG_NTCOPY mode, change command line argument 3 from 0 to 1.

Test pinned to CPUs 1 and 2 which do *not* share an L2 cache, but do share
an L3.

Command line for "normal" copy:
% time taskset -ac 1,2 ./unix-nt-bench 1048576 100000 0 524288 524288

Mode			real time (sec.)		throughput (Mb/s)
"Normal" copy		10.630				78,928
MSG_NTCOPY		7.429				112,935 

Same test as above, but pinned to CPUs 1 and 65 which share an L2 (512kb)
and L3 cache (16mb).

Command line for "normal" copy:
% time taskset -ac 1,65 ./unix-nt-bench 1048576 100000 0 524288 524288

Mode			real time (sec.)		throughput (Mb/s)
"Normal" copy		12.532				66,941
MSG_NTCOPY		9.445				88,826	

Same test as above, pinned to CPUs 1 and 65, but with 128kb unix send
buffer and pipe buffer sizes (to avoid spilling L2).

Command line for "normal" copy:
% time taskset -ac 1,65 ./unix-nt-bench 1048576 100000 0 131072 131072

Mode			real time (sec.)		throughput (Mb/s)
"Normal" copy		12.451				67,377
MSG_NTCOPY		9.451				88,768

Thanks,
Joe

[1]: https://patchwork.kernel.org/project/netdevbpf/cover/1652241268-46732-1-git-send-email-jdamato@fastly.com/
[2]: https://gist.githubusercontent.com/jdamato-fsly/03a2f0cd4e71ebe0fef97f7f2980d9e5/raw/19cfd3aca59109ebf5b03871d952ea1360f3e982/unix-nt-copy-bench.c

Joe Damato (8):
  arch, x86, uaccess: Add nontemporal copy functions
  iov_iter: Introduce iter_copy_type
  iov_iter: add copyin_iovec helper
  net: Add MSG_NTCOPY sendmsg flag
  net: unix: Support MSG_NTCOPY
  net: ip: Support MSG_NTCOPY
  net: udplite: Support MSG_NTCOPY
  net: tcp: Support MSG_NTCOPY

 arch/x86/include/asm/uaccess_64.h |  6 ++++++
 include/linux/socket.h            |  9 +++++++++
 include/linux/uaccess.h           |  6 ++++++
 include/linux/uio.h               | 17 +++++++++++++++++
 include/net/sock.h                |  2 +-
 include/net/udplite.h             |  1 +
 lib/iov_iter.c                    | 25 ++++++++++++++++++++-----
 net/ipv4/ip_output.c              |  1 +
 net/ipv4/tcp.c                    |  2 ++
 net/unix/af_unix.c                |  4 ++++
 10 files changed, 67 insertions(+), 6 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC,x86 v2 1/8] arch, x86, uaccess: Add nontemporal copy functions
  2022-06-12  8:57 [RFC,net-next,x86 v2 0/8] Nontemporal copies in sendmsg path Joe Damato
@ 2022-06-12  8:57 ` Joe Damato
  2022-06-12  8:57 ` [RFC,iov_iter v2 2/8] iov_iter: Introduce iter_copy_type Joe Damato
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Joe Damato @ 2022-06-12  8:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel
  Cc: Joe Damato

Add a generic non-temporal wrapper to uaccess which can be overridden by
arches that support non-temporal copies.

An implementation is added for x86 which wraps an existing non-temporal
copy in the kernel.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 arch/x86/include/asm/uaccess_64.h | 6 ++++++
 include/linux/uaccess.h           | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index 45697e0..ed41dba 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -65,6 +65,12 @@ extern long __copy_user_flushcache(void *dst, const void __user *src, unsigned s
 extern void memcpy_page_flushcache(char *to, struct page *page, size_t offset,
 			   size_t len);
 
+static inline unsigned long
+__copy_from_user_nocache(void *dst, const void __user *src, unsigned long size)
+{
+	return (unsigned long)__copy_user_nocache(dst, src, (unsigned int) size, 0);
+}
+
 static inline int
 __copy_from_user_inatomic_nocache(void *dst, const void __user *src,
 				  unsigned size)
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 5a328cf..6612c37 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -256,6 +256,12 @@ static inline size_t probe_subpage_writeable(char __user *uaddr, size_t size)
 #ifndef ARCH_HAS_NOCACHE_UACCESS
 
 static inline __must_check unsigned long
+__copy_from_user_nocache(void *to, const void __user *from, unsigned long n)
+{
+	return __copy_from_user(to, from, n);
+}
+
+static inline __must_check unsigned long
 __copy_from_user_inatomic_nocache(void *to, const void __user *from,
 				  unsigned long n)
 {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC,iov_iter v2 2/8] iov_iter: Introduce iter_copy_type
  2022-06-12  8:57 [RFC,net-next,x86 v2 0/8] Nontemporal copies in sendmsg path Joe Damato
  2022-06-12  8:57 ` [RFC,x86 v2 1/8] arch, x86, uaccess: Add nontemporal copy functions Joe Damato
@ 2022-06-12  8:57 ` Joe Damato
  2022-06-12  8:57 ` [RFC,iov_iter v2 3/8] iov_iter: add copyin_iovec helper Joe Damato
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Joe Damato @ 2022-06-12  8:57 UTC (permalink / raw)
  To: Alexander Viro, linux-kernel; +Cc: Joe Damato

struct iov_iter has a new member: iter_copy_type. This field holds a value
designating which type of copy to use: a regular temporal copy (ITER_COPY)
or a non-temporal copy (ITER_NOCACHE_COPY).

iov_iter initializers have been updated to set the default ITER_COPY
type.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 include/linux/uio.h | 17 +++++++++++++++++
 lib/iov_iter.c      |  6 ++++++
 2 files changed, 23 insertions(+)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 739285f..59573ee 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -28,6 +28,12 @@ enum iter_type {
 	ITER_DISCARD,
 };
 
+enum iter_copy_type {
+	/* iter copy types */
+	ITER_COPY,
+	ITER_NOCACHE_COPY,
+};
+
 struct iov_iter_state {
 	size_t iov_offset;
 	size_t count;
@@ -35,6 +41,7 @@ struct iov_iter_state {
 };
 
 struct iov_iter {
+	u8 iter_copy_type;
 	u8 iter_type;
 	bool nofault;
 	bool data_source;
@@ -62,6 +69,11 @@ static inline enum iter_type iov_iter_type(const struct iov_iter *i)
 	return i->iter_type;
 }
 
+static inline enum iter_copy_type iov_iter_copy_type(const struct iov_iter *i)
+{
+	return i->iter_copy_type;
+}
+
 static inline void iov_iter_save_state(struct iov_iter *iter,
 				       struct iov_iter_state *state)
 {
@@ -95,6 +107,11 @@ static inline bool iov_iter_is_discard(const struct iov_iter *i)
 	return iov_iter_type(i) == ITER_DISCARD;
 }
 
+static inline bool iov_iter_copy_is_nt(const struct iov_iter *i)
+{
+	return iov_iter_copy_type(i) == ITER_NOCACHE_COPY;
+}
+
 static inline bool iov_iter_is_xarray(const struct iov_iter *i)
 {
 	return iov_iter_type(i) == ITER_XARRAY;
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 6dd5330..d32d7e5 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -511,6 +511,7 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
 {
 	WARN_ON(direction & ~(READ | WRITE));
 	*i = (struct iov_iter) {
+		.iter_copy_type = ITER_COPY,
 		.iter_type = ITER_IOVEC,
 		.nofault = false,
 		.data_source = direction,
@@ -1175,6 +1176,7 @@ void iov_iter_kvec(struct iov_iter *i, unsigned int direction,
 {
 	WARN_ON(direction & ~(READ | WRITE));
 	*i = (struct iov_iter){
+		.iter_copy_type = ITER_COPY,
 		.iter_type = ITER_KVEC,
 		.data_source = direction,
 		.kvec = kvec,
@@ -1191,6 +1193,7 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction,
 {
 	WARN_ON(direction & ~(READ | WRITE));
 	*i = (struct iov_iter){
+		.iter_copy_type = ITER_COPY,
 		.iter_type = ITER_BVEC,
 		.data_source = direction,
 		.bvec = bvec,
@@ -1208,6 +1211,7 @@ void iov_iter_pipe(struct iov_iter *i, unsigned int direction,
 	BUG_ON(direction != READ);
 	WARN_ON(pipe_full(pipe->head, pipe->tail, pipe->ring_size));
 	*i = (struct iov_iter){
+		.iter_copy_type = ITER_COPY,
 		.iter_type = ITER_PIPE,
 		.data_source = false,
 		.pipe = pipe,
@@ -1237,6 +1241,7 @@ void iov_iter_xarray(struct iov_iter *i, unsigned int direction,
 {
 	BUG_ON(direction & ~1);
 	*i = (struct iov_iter) {
+		.iter_copy_type = ITER_COPY,
 		.iter_type = ITER_XARRAY,
 		.data_source = direction,
 		.xarray = xarray,
@@ -1260,6 +1265,7 @@ void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count)
 {
 	BUG_ON(direction != READ);
 	*i = (struct iov_iter){
+		.iter_copy_type = ITER_COPY,
 		.iter_type = ITER_DISCARD,
 		.data_source = false,
 		.count = count,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC,iov_iter v2 3/8] iov_iter: add copyin_iovec helper
  2022-06-12  8:57 [RFC,net-next,x86 v2 0/8] Nontemporal copies in sendmsg path Joe Damato
  2022-06-12  8:57 ` [RFC,x86 v2 1/8] arch, x86, uaccess: Add nontemporal copy functions Joe Damato
  2022-06-12  8:57 ` [RFC,iov_iter v2 2/8] iov_iter: Introduce iter_copy_type Joe Damato
@ 2022-06-12  8:57 ` Joe Damato
  2022-06-13  4:25   ` Al Viro
  2022-06-13  7:53   ` David Laight
  2022-06-12  8:57 ` [RFC,net-next v2 4/8] net: Add MSG_NTCOPY sendmsg flag Joe Damato
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 14+ messages in thread
From: Joe Damato @ 2022-06-12  8:57 UTC (permalink / raw)
  To: Alexander Viro, linux-kernel; +Cc: Joe Damato

copyin_iovec is a helper which wraps copyin and selects the right copy
method based on the iter_copy_type.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 lib/iov_iter.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index d32d7e5..6720cb2 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -168,6 +168,15 @@ static int copyin(void *to, const void __user *from, size_t n)
 	return n;
 }
 
+static int copyin_iovec(void *to, const void __user *from, size_t n,
+			struct iov_iter *i)
+{
+	if (unlikely(iov_iter_copy_is_nt(i)))
+		return __copy_from_user_nocache(to, from, n);
+	else
+		return copyin(to, from, n);
+}
+
 static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t bytes,
 			 struct iov_iter *i)
 {
@@ -278,7 +287,7 @@ static size_t copy_page_from_iter_iovec(struct page *page, size_t offset, size_t
 		to = kaddr + offset;
 
 		/* first chunk, usually the only one */
-		left = copyin(to, buf, copy);
+		left = copyin_iovec(to, buf, copy, i);
 		copy -= left;
 		skip += copy;
 		to += copy;
@@ -288,7 +297,7 @@ static size_t copy_page_from_iter_iovec(struct page *page, size_t offset, size_t
 			iov++;
 			buf = iov->iov_base;
 			copy = min(bytes, iov->iov_len);
-			left = copyin(to, buf, copy);
+			left = copyin_iovec(to, buf, copy, i);
 			copy -= left;
 			skip = copy;
 			to += copy;
@@ -307,7 +316,7 @@ static size_t copy_page_from_iter_iovec(struct page *page, size_t offset, size_t
 
 	kaddr = kmap(page);
 	to = kaddr + offset;
-	left = copyin(to, buf, copy);
+	left = copyin_iovec(to, buf, copy, i);
 	copy -= left;
 	skip += copy;
 	to += copy;
@@ -316,7 +325,7 @@ static size_t copy_page_from_iter_iovec(struct page *page, size_t offset, size_t
 		iov++;
 		buf = iov->iov_base;
 		copy = min(bytes, iov->iov_len);
-		left = copyin(to, buf, copy);
+		left = copyin_iovec(to, buf, copy, i);
 		copy -= left;
 		skip = copy;
 		to += copy;
@@ -766,7 +775,7 @@ size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
 	if (iter_is_iovec(i))
 		might_fault();
 	iterate_and_advance(i, bytes, base, len, off,
-		copyin(addr + off, base, len),
+		copyin_iovec(addr + off, base, len, i),
 		memcpy(addr + off, base, len)
 	)
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC,net-next v2 4/8] net: Add MSG_NTCOPY sendmsg flag
  2022-06-12  8:57 [RFC,net-next,x86 v2 0/8] Nontemporal copies in sendmsg path Joe Damato
                   ` (2 preceding siblings ...)
  2022-06-12  8:57 ` [RFC,iov_iter v2 3/8] iov_iter: add copyin_iovec helper Joe Damato
@ 2022-06-12  8:57 ` Joe Damato
  2022-06-12  8:57 ` [RFC,net-next v2 5/8] net: unix: Support MSG_NTCOPY Joe Damato
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Joe Damato @ 2022-06-12  8:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: Joe Damato

Add MSG_NTCOPY so that user applications can ask the kernel for a
non-temporal copy when copying data into the kernel for TX.

A simple helper is provided to set the iovec iterator copy type if
MSG_NTCOPY is set.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 include/linux/socket.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 17311ad..98cb735 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -318,6 +318,7 @@ struct ucred {
 					  * plain text and require encryption
 					  */
 
+#define MSG_NTCOPY	0x2000000	/* Use a non-temporal copy */
 #define MSG_ZEROCOPY	0x4000000	/* Use user data in kernel path */
 #define MSG_FASTOPEN	0x20000000	/* Send data in TCP SYN */
 #define MSG_CMSG_CLOEXEC 0x40000000	/* Set close_on_exec for file
@@ -378,6 +379,14 @@ struct ucred {
 extern int move_addr_to_kernel(void __user *uaddr, int ulen, struct sockaddr_storage *kaddr);
 extern int put_cmsg(struct msghdr*, int level, int type, int len, void *data);
 
+static inline void msg_set_iter_copy_type(struct msghdr *msg)
+{
+	if (msg->msg_flags & MSG_NTCOPY)
+		msg->msg_iter.iter_copy_type = ITER_NOCACHE_COPY;
+	else
+		msg->msg_iter.iter_copy_type = ITER_COPY;
+}
+
 struct timespec64;
 struct __kernel_timespec;
 struct old_timespec32;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC,net-next v2 5/8] net: unix: Support MSG_NTCOPY
  2022-06-12  8:57 [RFC,net-next,x86 v2 0/8] Nontemporal copies in sendmsg path Joe Damato
                   ` (3 preceding siblings ...)
  2022-06-12  8:57 ` [RFC,net-next v2 4/8] net: Add MSG_NTCOPY sendmsg flag Joe Damato
@ 2022-06-12  8:57 ` Joe Damato
  2022-06-12  8:57 ` [RFC,net-next v2 6/8] net: ip: " Joe Damato
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Joe Damato @ 2022-06-12  8:57 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev, linux-kernel
  Cc: Joe Damato

Add support for MSG_NTCOPY to unix sockets. The helper function
msg_set_iter_copy_type is used to set the copy flag on the iovec iterator
correctly.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 net/unix/af_unix.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 3453e00..b8f522d 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1905,6 +1905,8 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 	skb_put(skb, len - data_len);
 	skb->data_len = data_len;
 	skb->len = len;
+
+	msg_set_iter_copy_type(msg);
 	err = skb_copy_datagram_from_iter(skb, 0, &msg->msg_iter, len);
 	if (err)
 		goto out_free;
@@ -2165,6 +2167,8 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 		skb_put(skb, size - data_len);
 		skb->data_len = data_len;
 		skb->len = size;
+
+		msg_set_iter_copy_type(msg);
 		err = skb_copy_datagram_from_iter(skb, 0, &msg->msg_iter, size);
 		if (err) {
 			kfree_skb(skb);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC,net-next v2 6/8] net: ip: Support MSG_NTCOPY
  2022-06-12  8:57 [RFC,net-next,x86 v2 0/8] Nontemporal copies in sendmsg path Joe Damato
                   ` (4 preceding siblings ...)
  2022-06-12  8:57 ` [RFC,net-next v2 5/8] net: unix: Support MSG_NTCOPY Joe Damato
@ 2022-06-12  8:57 ` Joe Damato
  2022-06-12  8:57 ` [RFC,net-next v2 7/8] net: udplite: " Joe Damato
  2022-06-12  8:57 ` [RFC,net-next v2 8/8] net: tcp: " Joe Damato
  7 siblings, 0 replies; 14+ messages in thread
From: Joe Damato @ 2022-06-12  8:57 UTC (permalink / raw)
  To: David S. Miller, Hideaki YOSHIFUJI, David Ahern, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev, linux-kernel
  Cc: Joe Damato

Support nontemporal copies of IP data when applications set the
MSG_NTCOPY flag and checksumming is offloaded.

ip_generic_getfrag is used by UDP and raw sockets, so this change
effectively enables MSG_NTCOPY support for both.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 net/ipv4/ip_output.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 00b4bf2..75c8627 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -933,6 +933,7 @@ ip_generic_getfrag(void *from, char *to, int offset, int len, int odd, struct sk
 	struct msghdr *msg = from;
 
 	if (skb->ip_summed == CHECKSUM_PARTIAL) {
+		msg_set_iter_copy_type(msg);
 		if (!copy_from_iter_full(to, len, &msg->msg_iter))
 			return -EFAULT;
 	} else {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC,net-next v2 7/8] net: udplite: Support MSG_NTCOPY
  2022-06-12  8:57 [RFC,net-next,x86 v2 0/8] Nontemporal copies in sendmsg path Joe Damato
                   ` (5 preceding siblings ...)
  2022-06-12  8:57 ` [RFC,net-next v2 6/8] net: ip: " Joe Damato
@ 2022-06-12  8:57 ` Joe Damato
  2022-06-12  8:57 ` [RFC,net-next v2 8/8] net: tcp: " Joe Damato
  7 siblings, 0 replies; 14+ messages in thread
From: Joe Damato @ 2022-06-12  8:57 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev, linux-kernel
  Cc: Joe Damato

Support non-temporal copies of udp-lite data when applications set the
MSG_NTCOPY.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 include/net/udplite.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/net/udplite.h b/include/net/udplite.h
index a3c5311..000677c5 100644
--- a/include/net/udplite.h
+++ b/include/net/udplite.h
@@ -21,6 +21,7 @@ static __inline__ int udplite_getfrag(void *from, char *to, int  offset,
 				      int len, int odd, struct sk_buff *skb)
 {
 	struct msghdr *msg = from;
+	msg_set_iter_copy_type(msg);
 	return copy_from_iter_full(to, len, &msg->msg_iter) ? 0 : -EFAULT;
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC,net-next v2 8/8] net: tcp: Support MSG_NTCOPY
  2022-06-12  8:57 [RFC,net-next,x86 v2 0/8] Nontemporal copies in sendmsg path Joe Damato
                   ` (6 preceding siblings ...)
  2022-06-12  8:57 ` [RFC,net-next v2 7/8] net: udplite: " Joe Damato
@ 2022-06-12  8:57 ` Joe Damato
  7 siblings, 0 replies; 14+ messages in thread
From: Joe Damato @ 2022-06-12  8:57 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Hideaki YOSHIFUJI, David Ahern, netdev, linux-kernel
  Cc: Joe Damato

Support non-temporal copies in the TCP sendmsg path. Previously, the only
way to enable non-temporal copies was to enable them for the entire
interface (via ethtool).

This change allows user programs to request non-temporal copies for
specific sendmsg calls.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 include/net/sock.h | 2 +-
 net/ipv4/tcp.c     | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 0063e84..b666ecd 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2200,7 +2200,7 @@ static inline int skb_do_copy_data_nocache(struct sock *sk, struct sk_buff *skb,
 		if (!csum_and_copy_from_iter_full(to, copy, &csum, from))
 			return -EFAULT;
 		skb->csum = csum_block_add(skb->csum, csum, offset);
-	} else if (sk->sk_route_caps & NETIF_F_NOCACHE_COPY) {
+	} else if (sk->sk_route_caps & NETIF_F_NOCACHE_COPY || iov_iter_copy_is_nt(from)) {
 		if (!copy_from_iter_full_nocache(to, copy, from))
 			return -EFAULT;
 	} else if (!copy_from_iter_full(to, copy, from))
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 14ebb4e..5b36e00 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1201,6 +1201,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 
 	flags = msg->msg_flags;
 
+	msg_set_iter_copy_type(msg);
+
 	if (flags & MSG_ZEROCOPY && size && sock_flag(sk, SOCK_ZEROCOPY)) {
 		skb = tcp_write_queue_tail(sk);
 		uarg = msg_zerocopy_realloc(sk, size, skb_zcopy(skb));
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC,iov_iter v2 3/8] iov_iter: add copyin_iovec helper
  2022-06-12  8:57 ` [RFC,iov_iter v2 3/8] iov_iter: add copyin_iovec helper Joe Damato
@ 2022-06-13  4:25   ` Al Viro
  2022-06-13  6:32     ` Joe Damato
  2022-06-13  7:53   ` David Laight
  1 sibling, 1 reply; 14+ messages in thread
From: Al Viro @ 2022-06-13  4:25 UTC (permalink / raw)
  To: Joe Damato; +Cc: linux-kernel

On Sun, Jun 12, 2022 at 01:57:52AM -0700, Joe Damato wrote:
> copyin_iovec is a helper which wraps copyin and selects the right copy
> method based on the iter_copy_type.
> 
> Signed-off-by: Joe Damato <jdamato@fastly.com>
> ---
>  lib/iov_iter.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index d32d7e5..6720cb2 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -168,6 +168,15 @@ static int copyin(void *to, const void __user *from, size_t n)
>  	return n;
>  }
>  
> +static int copyin_iovec(void *to, const void __user *from, size_t n,
> +			struct iov_iter *i)
> +{
> +	if (unlikely(iov_iter_copy_is_nt(i)))
> +		return __copy_from_user_nocache(to, from, n);
> +	else
> +		return copyin(to, from, n);
> +}

Just a sanity check - your testing is *not* with KASAN/KCSAN, right?

And BTW, why is that only on the userland side?  If you are doing
that at all, it would make sense to cover the memcpy() side as
well...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC,iov_iter v2 3/8] iov_iter: add copyin_iovec helper
  2022-06-13  4:25   ` Al Viro
@ 2022-06-13  6:32     ` Joe Damato
  0 siblings, 0 replies; 14+ messages in thread
From: Joe Damato @ 2022-06-13  6:32 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel

On Mon, Jun 13, 2022 at 04:25:39AM +0000, Al Viro wrote:
> On Sun, Jun 12, 2022 at 01:57:52AM -0700, Joe Damato wrote:
> > copyin_iovec is a helper which wraps copyin and selects the right copy
> > method based on the iter_copy_type.
> > 
> > Signed-off-by: Joe Damato <jdamato@fastly.com>
> > ---
> >  lib/iov_iter.c | 19 ++++++++++++++-----
> >  1 file changed, 14 insertions(+), 5 deletions(-)
> > 
> > diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> > index d32d7e5..6720cb2 100644
> > --- a/lib/iov_iter.c
> > +++ b/lib/iov_iter.c
> > @@ -168,6 +168,15 @@ static int copyin(void *to, const void __user *from, size_t n)
> >  	return n;
> >  }
> >  
> > +static int copyin_iovec(void *to, const void __user *from, size_t n,
> > +			struct iov_iter *i)
> > +{
> > +	if (unlikely(iov_iter_copy_is_nt(i)))
> > +		return __copy_from_user_nocache(to, from, n);
> > +	else
> > +		return copyin(to, from, n);
> > +}
> 
> Just a sanity check - your testing is *not* with KASAN/KCSAN, right?

Yes, that is correct.

> And BTW, why is that only on the userland side?  If you are doing
> that at all, it would make sense to cover the memcpy() side as
> well...

I assume here you mean the memcpy() in the splice() path? I do have a
separate change I've been testing which does this, but I thought that can
be sent separately.

This RFC basically takes an existing kernel feature (tx-nocache-copy) and
makes it applicable to more protocols and more fine grained so that it does
not need to be enabled interface-wide. The memcpy() change you mention is,
in my mind, a separate change which adds a new feature and can be sent if
this change is accepted upstream.

Let me know if that makes sense and if there are any issues you think I
should address before I send a v1 for consideration.

Thanks for taking a look!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC,iov_iter v2 3/8] iov_iter: add copyin_iovec helper
  2022-06-12  8:57 ` [RFC,iov_iter v2 3/8] iov_iter: add copyin_iovec helper Joe Damato
  2022-06-13  4:25   ` Al Viro
@ 2022-06-13  7:53   ` David Laight
  2022-06-13 14:42     ` Joe Damato
  1 sibling, 1 reply; 14+ messages in thread
From: David Laight @ 2022-06-13  7:53 UTC (permalink / raw)
  To: 'Joe Damato', Alexander Viro, linux-kernel

From: Joe Damato
> Sent: 12 June 2022 09:58
> 
> copyin_iovec is a helper which wraps copyin and selects the right copy
> method based on the iter_copy_type.

A pretty bad description.

> Signed-off-by: Joe Damato <jdamato@fastly.com>
> ---
>  lib/iov_iter.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index d32d7e5..6720cb2 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -168,6 +168,15 @@ static int copyin(void *to, const void __user *from, size_t n)
>  	return n;
>  }
> 
> +static int copyin_iovec(void *to, const void __user *from, size_t n,
> +			struct iov_iter *i)
> +{
> +	if (unlikely(iov_iter_copy_is_nt(i)))
> +		return __copy_from_user_nocache(to, from, n);
> +	else
> +		return copyin(to, from, n);
> +}

Isn't this extra conditional going to have a measurable impact
on all the normal copy paths?

The additional costs of all the 'iovec' types is bad enough
already.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC,iov_iter v2 3/8] iov_iter: add copyin_iovec helper
  2022-06-13  7:53   ` David Laight
@ 2022-06-13 14:42     ` Joe Damato
  2022-06-13 15:31       ` David Laight
  0 siblings, 1 reply; 14+ messages in thread
From: Joe Damato @ 2022-06-13 14:42 UTC (permalink / raw)
  To: David Laight; +Cc: Alexander Viro, linux-kernel

On Mon, Jun 13, 2022 at 07:53:19AM +0000, David Laight wrote:
> From: Joe Damato
> > Sent: 12 June 2022 09:58
> > 
> > copyin_iovec is a helper which wraps copyin and selects the right copy
> > method based on the iter_copy_type.
> 
> A pretty bad description.

Thanks, David. I'll be sure to fix the commit description in the next
revision.

> > Signed-off-by: Joe Damato <jdamato@fastly.com>
> > ---
> >  lib/iov_iter.c | 19 ++++++++++++++-----
> >  1 file changed, 14 insertions(+), 5 deletions(-)
> > 
> > diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> > index d32d7e5..6720cb2 100644
> > --- a/lib/iov_iter.c
> > +++ b/lib/iov_iter.c
> > @@ -168,6 +168,15 @@ static int copyin(void *to, const void __user *from, size_t n)
> >  	return n;
> >  }
> > 
> > +static int copyin_iovec(void *to, const void __user *from, size_t n,
> > +			struct iov_iter *i)
> > +{
> > +	if (unlikely(iov_iter_copy_is_nt(i)))
> > +		return __copy_from_user_nocache(to, from, n);
> > +	else
> > +		return copyin(to, from, n);
> > +}
> 
> Isn't this extra conditional going to have a measurable impact
> on all the normal copy paths?

The kernel already does a conditional for tx-nocache-copy on TCP sockets
when copying skbs to check for the NETIF_F_NOCACHE_COPY bit, but I hear
what you are saying.

I suppose I could push the NT copy check logic out of iov_iter, but to do
that I think I'd probably have to significantly refactor the iov code to
break apart copy_page_from_iter_iovec.

I'll spend a bit more time thinking through this, but I'm open to
suggestions if you have one; the benefit of supporting non-temporal copies
in this path is pretty significant, so I hope a path forward can be found.

> The additional costs of all the 'iovec' types is bad enough
> already.

Do you have data you can share on this?

Thanks for taking a look!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC,iov_iter v2 3/8] iov_iter: add copyin_iovec helper
  2022-06-13 14:42     ` Joe Damato
@ 2022-06-13 15:31       ` David Laight
  0 siblings, 0 replies; 14+ messages in thread
From: David Laight @ 2022-06-13 15:31 UTC (permalink / raw)
  To: 'Joe Damato'; +Cc: Alexander Viro, linux-kernel

From: Joe Damato
> Sent: 13 June 2022 15:43
...
> > The additional costs of all the 'iovec' types is bad enough
> > already.
> 
> Do you have data you can share on this?

I was thinking of the performance drop noticed when (IIRC)
reads/writes of /dev/null were pushed through the iovec code.

But there is a lot of overhead code for the usual case
of a single user buffer being copied.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-06-13 18:39 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-12  8:57 [RFC,net-next,x86 v2 0/8] Nontemporal copies in sendmsg path Joe Damato
2022-06-12  8:57 ` [RFC,x86 v2 1/8] arch, x86, uaccess: Add nontemporal copy functions Joe Damato
2022-06-12  8:57 ` [RFC,iov_iter v2 2/8] iov_iter: Introduce iter_copy_type Joe Damato
2022-06-12  8:57 ` [RFC,iov_iter v2 3/8] iov_iter: add copyin_iovec helper Joe Damato
2022-06-13  4:25   ` Al Viro
2022-06-13  6:32     ` Joe Damato
2022-06-13  7:53   ` David Laight
2022-06-13 14:42     ` Joe Damato
2022-06-13 15:31       ` David Laight
2022-06-12  8:57 ` [RFC,net-next v2 4/8] net: Add MSG_NTCOPY sendmsg flag Joe Damato
2022-06-12  8:57 ` [RFC,net-next v2 5/8] net: unix: Support MSG_NTCOPY Joe Damato
2022-06-12  8:57 ` [RFC,net-next v2 6/8] net: ip: " Joe Damato
2022-06-12  8:57 ` [RFC,net-next v2 7/8] net: udplite: " Joe Damato
2022-06-12  8:57 ` [RFC,net-next v2 8/8] net: tcp: " Joe Damato

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).