All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12 v2 RFC]  Generic zcopy_* functions
@ 2020-12-22  0:09 Jonathan Lemon
  2020-12-22  0:09 ` [PATCH 01/12 v2 RFC] net: group skb_shinfo zerocopy related bits together Jonathan Lemon
                   ` (11 more replies)
  0 siblings, 12 replies; 33+ messages in thread
From: Jonathan Lemon @ 2020-12-22  0:09 UTC (permalink / raw)
  To: netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

This is set of cleanup patches for zerocopy which are intended
to allow a introduction of a different zerocopy implementation.

The top level API will use the skb_zcopy_*() functions, while
the current TCP specific zerocopy ends up using msg_zerocopy_*()
calls.

There should be no functional changes from these patches.

v1->v2:
 Break changes to skb_zcopy_put into 3 patches, in order to
 make it easier to follow the changes.  Add Willem's suggestion
 about renaming sock_zerocopy_

Patch 1:
  Move zerocopy bits from tx_flags into zc_flags for clarity.
  These bits will be used in the RX path in the future.
Patch 2: remove dead function
Patch 3: simplify sock_zerocopy_put
Patch 4: push status/refcounts into sock_zerocopy_callback
Patch 5: replace sock_zerocopy_put with skb_zcopy_put
Patch 6: rename sock_zerocopy_get
Patch 7:
  Add an optional skb parameter to callback, allowing access to
  the attached skb from the callback.
Patch 8:
  Add skb_zcopy_put_abort, and move zerocopy logic into the
  callback function.  There unfortunately is still a check
  against the callback type here.
Patch 9:
  Set the skb zc_flags from the ubuf being attached, instead
  of a fixed value, allowing different initialization types.
Patch 10: Replace open-coded assignments
Patch 11: Relocate skb_zcopy_clear() in skb_release_data()
Patch 12: rename sock_zerocopy_ to msg_zerocpy_

Jonathan Lemon (12):
  net: group skb_shinfo zerocopy related bits together.
  skbuff: remove unused skb_zcopy_abort function
  skbuff: simplify sock_zerocopy_put
  skbuff: Push status and refcounts into sock_zerocopy_callback
  skbuff: replace sock_zerocopy_put() with skb_zcopy_put()
  skbuff: replace sock_zerocopy_get with skb_zcopy_get
  skbuff: Add skb parameter to the ubuf zerocopy callback
  skbuff: Call sock_zerocopy_put_abort from skb_zcopy_put_abort
  skbuff: add zc_flags to ubuf_info for ubuf setup
  tap/tun: use skb_zcopy_set() instead of open coded assignment
  skbuff: Call skb_zcopy_clear() before unref'ing fragments
  skbuff: rename sock_zerocopy_* to msg_zerocopy_*

 drivers/net/tap.c                   |  6 +-
 drivers/net/tun.c                   |  6 +-
 drivers/net/xen-netback/common.h    |  3 +-
 drivers/net/xen-netback/interface.c |  4 +-
 drivers/net/xen-netback/netback.c   |  7 ++-
 drivers/vhost/net.c                 |  4 +-
 include/linux/skbuff.h              | 95 +++++++++++++++--------------
 net/core/skbuff.c                   | 66 ++++++++++----------
 net/ipv4/ip_output.c                |  5 +-
 net/ipv4/tcp.c                      |  8 +--
 net/ipv6/ip6_output.c               |  5 +-
 net/kcm/kcmsock.c                   |  4 +-
 12 files changed, 106 insertions(+), 107 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 01/12 v2 RFC] net: group skb_shinfo zerocopy related bits together.
  2020-12-22  0:09 [PATCH 00/12 v2 RFC] Generic zcopy_* functions Jonathan Lemon
@ 2020-12-22  0:09 ` Jonathan Lemon
  2020-12-22 14:43   ` Willem de Bruijn
  2020-12-22  0:09 ` [PATCH 02/12 v2 RFC] skbuff: remove unused skb_zcopy_abort function Jonathan Lemon
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Jonathan Lemon @ 2020-12-22  0:09 UTC (permalink / raw)
  To: netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

In preparation for expanded zerocopy (TX and RX), move
the ZC related bits out of tx_flags into their own flag
word.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 drivers/net/tap.c                   |  3 +--
 drivers/net/tun.c                   |  3 +--
 drivers/net/xen-netback/interface.c |  4 ++--
 include/linux/skbuff.h              | 33 ++++++++++++++++-------------
 net/core/skbuff.c                   | 10 ++++-----
 net/ipv4/tcp.c                      |  2 +-
 net/kcm/kcmsock.c                   |  4 ++--
 7 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 1f4bdd94407a..3e9fb753ce88 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -723,8 +723,7 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
 	/* copy skb_ubuf_info for callback when skb has no error */
 	if (zerocopy) {
 		skb_shinfo(skb)->destructor_arg = msg_control;
-		skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
-		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
+		skb_shinfo(skb)->zc_flags |= SKBZC_FRAGMENTS;
 	} else if (msg_control) {
 		struct ubuf_info *uarg = msg_control;
 		uarg->callback(uarg, false);
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index fbed05ae7b0f..80cb3bef3afd 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1815,8 +1815,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 	/* copy skb_ubuf_info for callback when skb has no error */
 	if (zerocopy) {
 		skb_shinfo(skb)->destructor_arg = msg_control;
-		skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
-		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
+		skb_shinfo(skb)->zc_flags |= SKBZC_FRAGMENTS;
 	} else if (msg_control) {
 		struct ubuf_info *uarg = msg_control;
 		uarg->callback(uarg, false);
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index acb786d8b1d8..ec790df75be3 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -47,7 +47,7 @@
 /* Number of bytes allowed on the internal guest Rx queue. */
 #define XENVIF_RX_QUEUE_BYTES (XEN_NETIF_RX_RING_SIZE/2 * PAGE_SIZE)
 
-/* This function is used to set SKBTX_DEV_ZEROCOPY as well as
+/* This function is used to set SKBZC_ENABLE as well as
  * increasing the inflight counter. We need to increase the inflight
  * counter because core driver calls into xenvif_zerocopy_callback
  * which calls xenvif_skb_zerocopy_complete.
@@ -55,7 +55,7 @@
 void xenvif_skb_zerocopy_prepare(struct xenvif_queue *queue,
 				 struct sk_buff *skb)
 {
-	skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
+	skb_shinfo(skb)->zc_flags |= SKBZC_ENABLE;
 	atomic_inc(&queue->inflight_packets);
 }
 
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 333bcdc39635..69588b304f83 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -430,24 +430,27 @@ enum {
 	/* device driver is going to provide hardware time stamp */
 	SKBTX_IN_PROGRESS = 1 << 2,
 
-	/* device driver supports TX zero-copy buffers */
-	SKBTX_DEV_ZEROCOPY = 1 << 3,
-
 	/* generate wifi status information (where possible) */
 	SKBTX_WIFI_STATUS = 1 << 4,
 
+	/* generate software time stamp when entering packet scheduling */
+	SKBTX_SCHED_TSTAMP = 1 << 6,
+};
+
+/* Definitions for zc_flags in struct skb_shared_info */
+enum {
+	/* use zcopy routines */
+	SKBZC_ENABLE = BIT(0),
+
 	/* This indicates at least one fragment might be overwritten
 	 * (as in vmsplice(), sendfile() ...)
 	 * If we need to compute a TX checksum, we'll need to copy
 	 * all frags to avoid possible bad checksum
 	 */
-	SKBTX_SHARED_FRAG = 1 << 5,
-
-	/* generate software time stamp when entering packet scheduling */
-	SKBTX_SCHED_TSTAMP = 1 << 6,
+	SKBZC_SHARED_FRAG = BIT(1),
 };
 
-#define SKBTX_ZEROCOPY_FRAG	(SKBTX_DEV_ZEROCOPY | SKBTX_SHARED_FRAG)
+#define SKBZC_FRAGMENTS		(SKBZC_ENABLE | SKBZC_SHARED_FRAG)
 #define SKBTX_ANY_SW_TSTAMP	(SKBTX_SW_TSTAMP    | \
 				 SKBTX_SCHED_TSTAMP)
 #define SKBTX_ANY_TSTAMP	(SKBTX_HW_TSTAMP | SKBTX_ANY_SW_TSTAMP)
@@ -510,7 +513,7 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb,
  * the end of the header data, ie. at skb->end.
  */
 struct skb_shared_info {
-	__u8		__unused;
+	__u8		zc_flags;
 	__u8		meta_len;
 	__u8		nr_frags;
 	__u8		tx_flags;
@@ -1437,7 +1440,7 @@ static inline struct skb_shared_hwtstamps *skb_hwtstamps(struct sk_buff *skb)
 
 static inline struct ubuf_info *skb_zcopy(struct sk_buff *skb)
 {
-	bool is_zcopy = skb && skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY;
+	bool is_zcopy = skb && skb_shinfo(skb)->zc_flags & SKBZC_ENABLE;
 
 	return is_zcopy ? skb_uarg(skb) : NULL;
 }
@@ -1451,14 +1454,14 @@ static inline void skb_zcopy_set(struct sk_buff *skb, struct ubuf_info *uarg,
 		else
 			sock_zerocopy_get(uarg);
 		skb_shinfo(skb)->destructor_arg = uarg;
-		skb_shinfo(skb)->tx_flags |= SKBTX_ZEROCOPY_FRAG;
+		skb_shinfo(skb)->zc_flags |= SKBZC_FRAGMENTS;
 	}
 }
 
 static inline void skb_zcopy_set_nouarg(struct sk_buff *skb, void *val)
 {
 	skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t) val | 0x1UL);
-	skb_shinfo(skb)->tx_flags |= SKBTX_ZEROCOPY_FRAG;
+	skb_shinfo(skb)->zc_flags |= SKBZC_FRAGMENTS;
 }
 
 static inline bool skb_zcopy_is_nouarg(struct sk_buff *skb)
@@ -1486,7 +1489,7 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy)
 			uarg->callback(uarg, zerocopy);
 		}
 
-		skb_shinfo(skb)->tx_flags &= ~SKBTX_ZEROCOPY_FRAG;
+		skb_shinfo(skb)->zc_flags &= ~SKBZC_FRAGMENTS;
 	}
 }
 
@@ -1497,7 +1500,7 @@ static inline void skb_zcopy_abort(struct sk_buff *skb)
 
 	if (uarg) {
 		sock_zerocopy_put_abort(uarg, false);
-		skb_shinfo(skb)->tx_flags &= ~SKBTX_ZEROCOPY_FRAG;
+		skb_shinfo(skb)->zc_flags &= ~SKBZC_FRAGMENTS;
 	}
 }
 
@@ -3323,7 +3326,7 @@ static inline int skb_linearize(struct sk_buff *skb)
 static inline bool skb_has_shared_frag(const struct sk_buff *skb)
 {
 	return skb_is_nonlinear(skb) &&
-	       skb_shinfo(skb)->tx_flags & SKBTX_SHARED_FRAG;
+	       skb_shinfo(skb)->zc_flags & SKBZC_SHARED_FRAG;
 }
 
 /**
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f62cae3f75d8..327ee8938f78 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1330,7 +1330,7 @@ static int skb_zerocopy_clone(struct sk_buff *nskb, struct sk_buff *orig,
  *	@skb: the skb to modify
  *	@gfp_mask: allocation priority
  *
- *	This must be called on SKBTX_DEV_ZEROCOPY skb.
+ *	This must be called on SKBZC_ENABLE skb.
  *	It will copy all frags into kernel and drop the reference
  *	to userspace pages.
  *
@@ -3267,8 +3267,8 @@ void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len)
 {
 	int pos = skb_headlen(skb);
 
-	skb_shinfo(skb1)->tx_flags |= skb_shinfo(skb)->tx_flags &
-				      SKBTX_SHARED_FRAG;
+	skb_shinfo(skb1)->zc_flags |= skb_shinfo(skb)->zc_flags &
+				      SKBZC_SHARED_FRAG;
 	skb_zerocopy_clone(skb1, skb, 0);
 	if (len < pos)	/* Split line is inside header. */
 		skb_split_inside_header(skb, skb1, len, pos);
@@ -3957,8 +3957,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 		skb_copy_from_linear_data_offset(head_skb, offset,
 						 skb_put(nskb, hsize), hsize);
 
-		skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
-					      SKBTX_SHARED_FRAG;
+		skb_shinfo(nskb)->zc_flags |= skb_shinfo(head_skb)->zc_flags &
+					      SKBZC_SHARED_FRAG;
 
 		if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
 		    skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index ed42d2193c5c..fea9bae370e4 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1010,7 +1010,7 @@ struct sk_buff *tcp_build_frag(struct sock *sk, int size_goal, int flags,
 	}
 
 	if (!(flags & MSG_NO_SHARED_FRAGS))
-		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
+		skb_shinfo(skb)->zc_flags |= SKBZC_SHARED_FRAG;
 
 	skb->len += copy;
 	skb->data_len += copy;
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index 56dad9565bc9..55c04d8c659a 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -786,7 +786,7 @@ static ssize_t kcm_sendpage(struct socket *sock, struct page *page,
 
 		if (skb_can_coalesce(skb, i, page, offset)) {
 			skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], size);
-			skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
+			skb_shinfo(skb)->zc_flags |= SKBZC_SHARED_FRAG;
 			goto coalesced;
 		}
 
@@ -834,7 +834,7 @@ static ssize_t kcm_sendpage(struct socket *sock, struct page *page,
 
 	get_page(page);
 	skb_fill_page_desc(skb, i, page, offset, size);
-	skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
+	skb_shinfo(skb)->zc_flags |= SKBZC_SHARED_FRAG;
 
 coalesced:
 	skb->len += size;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 02/12 v2 RFC] skbuff: remove unused skb_zcopy_abort function
  2020-12-22  0:09 [PATCH 00/12 v2 RFC] Generic zcopy_* functions Jonathan Lemon
  2020-12-22  0:09 ` [PATCH 01/12 v2 RFC] net: group skb_shinfo zerocopy related bits together Jonathan Lemon
@ 2020-12-22  0:09 ` Jonathan Lemon
  2020-12-22 15:00   ` Willem de Bruijn
  2020-12-22  0:09 ` [PATCH 03/12 v2 RFC] skbuff: simplify sock_zerocopy_put Jonathan Lemon
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Jonathan Lemon @ 2020-12-22  0:09 UTC (permalink / raw)
  To: netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

skb_zcopy_abort() has no in-tree consumers, remove it.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 include/linux/skbuff.h | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 69588b304f83..fb6dd6af0f82 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1493,17 +1493,6 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy)
 	}
 }
 
-/* Abort a zerocopy operation and revert zckey on error in send syscall */
-static inline void skb_zcopy_abort(struct sk_buff *skb)
-{
-	struct ubuf_info *uarg = skb_zcopy(skb);
-
-	if (uarg) {
-		sock_zerocopy_put_abort(uarg, false);
-		skb_shinfo(skb)->zc_flags &= ~SKBZC_FRAGMENTS;
-	}
-}
-
 static inline void skb_mark_not_on_list(struct sk_buff *skb)
 {
 	skb->next = NULL;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 03/12 v2 RFC] skbuff: simplify sock_zerocopy_put
  2020-12-22  0:09 [PATCH 00/12 v2 RFC] Generic zcopy_* functions Jonathan Lemon
  2020-12-22  0:09 ` [PATCH 01/12 v2 RFC] net: group skb_shinfo zerocopy related bits together Jonathan Lemon
  2020-12-22  0:09 ` [PATCH 02/12 v2 RFC] skbuff: remove unused skb_zcopy_abort function Jonathan Lemon
@ 2020-12-22  0:09 ` Jonathan Lemon
  2020-12-22 14:43   ` Willem de Bruijn
  2020-12-22 16:52   ` David Ahern
  2020-12-22  0:09 ` [PATCH 04/12 v2 RFC] skbuff: Push status and refcounts into sock_zerocopy_callback Jonathan Lemon
                   ` (8 subsequent siblings)
  11 siblings, 2 replies; 33+ messages in thread
From: Jonathan Lemon @ 2020-12-22  0:09 UTC (permalink / raw)
  To: netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

All 'struct ubuf_info' users should have a callback defined.
Remove the dead code path to consume_skb(), which makes
unwarranted assumptions about how the structure was allocated.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 net/core/skbuff.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 327ee8938f78..ea32b3414ad6 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1245,12 +1245,8 @@ EXPORT_SYMBOL_GPL(sock_zerocopy_callback);
 
 void sock_zerocopy_put(struct ubuf_info *uarg)
 {
-	if (uarg && refcount_dec_and_test(&uarg->refcnt)) {
-		if (uarg->callback)
-			uarg->callback(uarg, uarg->zerocopy);
-		else
-			consume_skb(skb_from_uarg(uarg));
-	}
+	if (uarg && refcount_dec_and_test(&uarg->refcnt))
+		uarg->callback(uarg, uarg->zerocopy);
 }
 EXPORT_SYMBOL_GPL(sock_zerocopy_put);
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 04/12 v2 RFC] skbuff: Push status and refcounts into sock_zerocopy_callback
  2020-12-22  0:09 [PATCH 00/12 v2 RFC] Generic zcopy_* functions Jonathan Lemon
                   ` (2 preceding siblings ...)
  2020-12-22  0:09 ` [PATCH 03/12 v2 RFC] skbuff: simplify sock_zerocopy_put Jonathan Lemon
@ 2020-12-22  0:09 ` Jonathan Lemon
  2020-12-22 14:43   ` Willem de Bruijn
  2020-12-22  0:09 ` [PATCH 05/12 v2 RFC] skbuff: replace sock_zerocopy_put() with skb_zcopy_put() Jonathan Lemon
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Jonathan Lemon @ 2020-12-22  0:09 UTC (permalink / raw)
  To: netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

Before this change, the caller of sock_zerocopy_callback would
need to save the zerocopy status, decrement and check the refcount,
and then call the callback function - the callback was only invoked
when the refcount reached zero.

Now, the caller just passes the status into the callback function,
which saves the status and handles its own refcounts.

This makes the behavior of the sock_zerocopy_callback identical
to the tpacket and vhost callbacks.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 include/linux/skbuff.h |  3 ---
 net/core/skbuff.c      | 14 +++++++++++---
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index fb6dd6af0f82..c9d7de9d624d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1482,9 +1482,6 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy)
 	if (uarg) {
 		if (skb_zcopy_is_nouarg(skb)) {
 			/* no notification callback */
-		} else if (uarg->callback == sock_zerocopy_callback) {
-			uarg->zerocopy = uarg->zerocopy && zerocopy;
-			sock_zerocopy_put(uarg);
 		} else {
 			uarg->callback(uarg, zerocopy);
 		}
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index ea32b3414ad6..73699dbdc4a1 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1194,7 +1194,7 @@ static bool skb_zerocopy_notify_extend(struct sk_buff *skb, u32 lo, u16 len)
 	return true;
 }
 
-void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
+static void __sock_zerocopy_callback(struct ubuf_info *uarg)
 {
 	struct sk_buff *tail, *skb = skb_from_uarg(uarg);
 	struct sock_exterr_skb *serr;
@@ -1222,7 +1222,7 @@ void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
 	serr->ee.ee_origin = SO_EE_ORIGIN_ZEROCOPY;
 	serr->ee.ee_data = hi;
 	serr->ee.ee_info = lo;
-	if (!success)
+	if (!uarg->zerocopy)
 		serr->ee.ee_code |= SO_EE_CODE_ZEROCOPY_COPIED;
 
 	q = &sk->sk_error_queue;
@@ -1241,11 +1241,19 @@ void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
 	consume_skb(skb);
 	sock_put(sk);
 }
+
+void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
+{
+	uarg->zerocopy = uarg->zerocopy & success;
+
+	if (refcount_dec_and_test(&uarg->refcnt))
+		__sock_zerocopy_callback(uarg);
+}
 EXPORT_SYMBOL_GPL(sock_zerocopy_callback);
 
 void sock_zerocopy_put(struct ubuf_info *uarg)
 {
-	if (uarg && refcount_dec_and_test(&uarg->refcnt))
+	if (uarg)
 		uarg->callback(uarg, uarg->zerocopy);
 }
 EXPORT_SYMBOL_GPL(sock_zerocopy_put);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 05/12 v2 RFC] skbuff: replace sock_zerocopy_put() with skb_zcopy_put()
  2020-12-22  0:09 [PATCH 00/12 v2 RFC] Generic zcopy_* functions Jonathan Lemon
                   ` (3 preceding siblings ...)
  2020-12-22  0:09 ` [PATCH 04/12 v2 RFC] skbuff: Push status and refcounts into sock_zerocopy_callback Jonathan Lemon
@ 2020-12-22  0:09 ` Jonathan Lemon
  2020-12-22 14:42   ` Willem de Bruijn
  2020-12-22  0:09 ` [PATCH 06/12 v2 RFC] skbuff: replace sock_zerocopy_get with skb_zcopy_get Jonathan Lemon
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Jonathan Lemon @ 2020-12-22  0:09 UTC (permalink / raw)
  To: netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

Replace sock_zerocopy_put with the generic skb_zcopy_put()
function.  Pass 'true' as the success argument, as this
is identical to no change.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 include/linux/skbuff.h | 7 ++++++-
 net/core/skbuff.c      | 9 +--------
 net/ipv4/tcp.c         | 2 +-
 3 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index c9d7de9d624d..238166b522f7 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -499,7 +499,6 @@ static inline void sock_zerocopy_get(struct ubuf_info *uarg)
 	refcount_inc(&uarg->refcnt);
 }
 
-void sock_zerocopy_put(struct ubuf_info *uarg);
 void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref);
 
 void sock_zerocopy_callback(struct ubuf_info *uarg, bool success);
@@ -1474,6 +1473,12 @@ static inline void *skb_zcopy_get_nouarg(struct sk_buff *skb)
 	return (void *)((uintptr_t) skb_shinfo(skb)->destructor_arg & ~0x1UL);
 }
 
+static inline void skb_zcopy_put(struct ubuf_info *uarg)
+{
+	if (uarg)
+		uarg->callback(uarg, true);
+}
+
 /* Release a reference on a zerocopy structure */
 static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy)
 {
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 73699dbdc4a1..984760dd670b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1251,13 +1251,6 @@ void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
 }
 EXPORT_SYMBOL_GPL(sock_zerocopy_callback);
 
-void sock_zerocopy_put(struct ubuf_info *uarg)
-{
-	if (uarg)
-		uarg->callback(uarg, uarg->zerocopy);
-}
-EXPORT_SYMBOL_GPL(sock_zerocopy_put);
-
 void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref)
 {
 	if (uarg) {
@@ -1267,7 +1260,7 @@ void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref)
 		uarg->len--;
 
 		if (have_uref)
-			sock_zerocopy_put(uarg);
+			skb_zcopy_put(uarg);
 	}
 }
 EXPORT_SYMBOL_GPL(sock_zerocopy_put_abort);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index fea9bae370e4..5c38080df13f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1429,7 +1429,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 		tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
 	}
 out_nopush:
-	sock_zerocopy_put(uarg);
+	skb_zcopy_put(uarg);
 	return copied + copied_syn;
 
 do_error:
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 06/12 v2 RFC] skbuff: replace sock_zerocopy_get with skb_zcopy_get
  2020-12-22  0:09 [PATCH 00/12 v2 RFC] Generic zcopy_* functions Jonathan Lemon
                   ` (4 preceding siblings ...)
  2020-12-22  0:09 ` [PATCH 05/12 v2 RFC] skbuff: replace sock_zerocopy_put() with skb_zcopy_put() Jonathan Lemon
@ 2020-12-22  0:09 ` Jonathan Lemon
  2020-12-22  0:09 ` [PATCH 07/12 v2 RFC] skbuff: Add skb parameter to the ubuf zerocopy callback Jonathan Lemon
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 33+ messages in thread
From: Jonathan Lemon @ 2020-12-22  0:09 UTC (permalink / raw)
  To: netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

Rename the get routines for consistency.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 include/linux/skbuff.h | 12 ++++++------
 net/core/skbuff.c      |  2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 238166b522f7..42e8dbde9504 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -494,11 +494,6 @@ struct ubuf_info *sock_zerocopy_alloc(struct sock *sk, size_t size);
 struct ubuf_info *sock_zerocopy_realloc(struct sock *sk, size_t size,
 					struct ubuf_info *uarg);
 
-static inline void sock_zerocopy_get(struct ubuf_info *uarg)
-{
-	refcount_inc(&uarg->refcnt);
-}
-
 void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref);
 
 void sock_zerocopy_callback(struct ubuf_info *uarg, bool success);
@@ -1444,6 +1439,11 @@ static inline struct ubuf_info *skb_zcopy(struct sk_buff *skb)
 	return is_zcopy ? skb_uarg(skb) : NULL;
 }
 
+static inline void skb_zcopy_get(struct ubuf_info *uarg)
+{
+	refcount_inc(&uarg->refcnt);
+}
+
 static inline void skb_zcopy_set(struct sk_buff *skb, struct ubuf_info *uarg,
 				 bool *have_ref)
 {
@@ -1451,7 +1451,7 @@ static inline void skb_zcopy_set(struct sk_buff *skb, struct ubuf_info *uarg,
 		if (unlikely(have_ref && *have_ref))
 			*have_ref = false;
 		else
-			sock_zerocopy_get(uarg);
+			skb_zcopy_get(uarg);
 		skb_shinfo(skb)->destructor_arg = uarg;
 		skb_shinfo(skb)->zc_flags |= SKBZC_FRAGMENTS;
 	}
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 984760dd670b..fbf0a145467a 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1163,7 +1163,7 @@ struct ubuf_info *sock_zerocopy_realloc(struct sock *sk, size_t size,
 
 			/* no extra ref when appending to datagram (MSG_MORE) */
 			if (sk->sk_type == SOCK_STREAM)
-				sock_zerocopy_get(uarg);
+				skb_zcopy_get(uarg);
 
 			return uarg;
 		}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 07/12 v2 RFC] skbuff: Add skb parameter to the ubuf zerocopy callback
  2020-12-22  0:09 [PATCH 00/12 v2 RFC] Generic zcopy_* functions Jonathan Lemon
                   ` (5 preceding siblings ...)
  2020-12-22  0:09 ` [PATCH 06/12 v2 RFC] skbuff: replace sock_zerocopy_get with skb_zcopy_get Jonathan Lemon
@ 2020-12-22  0:09 ` Jonathan Lemon
  2020-12-22  0:09 ` [PATCH 08/12 v2 RFC] skbuff: Call sock_zerocopy_put_abort from skb_zcopy_put_abort Jonathan Lemon
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 33+ messages in thread
From: Jonathan Lemon @ 2020-12-22  0:09 UTC (permalink / raw)
  To: netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

Add an optional skb parameter to the zerocopy callback parameter,
which is passed down from skb_zcopy_clear().  This gives access
to the original skb, which is needed for upcoming RX zero-copy
error handling.

Make the name of the the zerocopy success pararmeter consistent
for all callbacks.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 drivers/net/tap.c                 |  2 +-
 drivers/net/tun.c                 |  2 +-
 drivers/net/xen-netback/common.h  |  3 ++-
 drivers/net/xen-netback/netback.c |  7 ++++---
 drivers/vhost/net.c               |  3 ++-
 include/linux/skbuff.h            | 20 +++++++++-----------
 net/core/skbuff.c                 |  3 ++-
 7 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 3e9fb753ce88..c2bcbf9218dc 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -726,7 +726,7 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
 		skb_shinfo(skb)->zc_flags |= SKBZC_FRAGMENTS;
 	} else if (msg_control) {
 		struct ubuf_info *uarg = msg_control;
-		uarg->callback(uarg, false);
+		uarg->callback(NULL, uarg, false);
 	}
 
 	if (tap) {
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 80cb3bef3afd..bad4b0229584 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1818,7 +1818,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		skb_shinfo(skb)->zc_flags |= SKBZC_FRAGMENTS;
 	} else if (msg_control) {
 		struct ubuf_info *uarg = msg_control;
-		uarg->callback(uarg, false);
+		uarg->callback(NULL, uarg, false);
 	}
 
 	skb_reset_network_header(skb);
diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 8ee24e351bdc..4df001e960ca 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -399,7 +399,8 @@ void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb);
 void xenvif_carrier_on(struct xenvif *vif);
 
 /* Callback from stack when TX packet can be released */
-void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success);
+void xenvif_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *ubuf,
+			      bool success);
 
 /* Unmap a pending page and release it back to the guest */
 void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx);
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index bc3421d14576..49288ae2c4dd 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1091,7 +1091,7 @@ static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *s
 	uarg = skb_shinfo(skb)->destructor_arg;
 	/* increase inflight counter to offset decrement in callback */
 	atomic_inc(&queue->inflight_packets);
-	uarg->callback(uarg, true);
+	uarg->callback(NULL, uarg, true);
 	skb_shinfo(skb)->destructor_arg = NULL;
 
 	/* Fill the skb with the new (local) frags. */
@@ -1228,7 +1228,8 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
 	return work_done;
 }
 
-void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
+void xenvif_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *ubuf,
+			      bool success)
 {
 	unsigned long flags;
 	pending_ring_idx_t index;
@@ -1253,7 +1254,7 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
 	} while (ubuf);
 	spin_unlock_irqrestore(&queue->callback_lock, flags);
 
-	if (likely(zerocopy_success))
+	if (likely(success))
 		queue->stats.tx_zerocopy_success++;
 	else
 		queue->stats.tx_zerocopy_fail++;
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 531a00d703cd..bf28d0b75c1b 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -381,7 +381,8 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net,
 	}
 }
 
-static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
+static void vhost_zerocopy_callback(struct sk_buff *skb,
+				    struct ubuf_info *ubuf, bool success)
 {
 	struct vhost_net_ubuf_ref *ubufs = ubuf->ctx;
 	struct vhost_virtqueue *vq = ubufs->vq;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 42e8dbde9504..8403ea509cee 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -458,13 +458,13 @@ enum {
 /*
  * The callback notifies userspace to release buffers when skb DMA is done in
  * lower device, the skb last reference should be 0 when calling this.
- * The zerocopy_success argument is true if zero copy transmit occurred,
- * false on data copy or out of memory error caused by data copy attempt.
+ * The success argument is true if zero copy transmit occurred, false on
+ * data copy or out of memory error caused by data copy attempt.
  * The ctx field is used to track device context.
  * The desc field is used to track userspace buffer index.
  */
 struct ubuf_info {
-	void (*callback)(struct ubuf_info *, bool zerocopy_success);
+	void (*callback)(struct sk_buff *, struct ubuf_info *, bool success);
 	union {
 		struct {
 			unsigned long desc;
@@ -496,7 +496,8 @@ struct ubuf_info *sock_zerocopy_realloc(struct sock *sk, size_t size,
 
 void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref);
 
-void sock_zerocopy_callback(struct ubuf_info *uarg, bool success);
+void sock_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg,
+			    bool success);
 
 int skb_zerocopy_iter_dgram(struct sk_buff *skb, struct msghdr *msg, int len);
 int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb,
@@ -1476,20 +1477,17 @@ static inline void *skb_zcopy_get_nouarg(struct sk_buff *skb)
 static inline void skb_zcopy_put(struct ubuf_info *uarg)
 {
 	if (uarg)
-		uarg->callback(uarg, true);
+		uarg->callback(NULL, uarg, true);
 }
 
 /* Release a reference on a zerocopy structure */
-static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy)
+static inline void skb_zcopy_clear(struct sk_buff *skb, bool success)
 {
 	struct ubuf_info *uarg = skb_zcopy(skb);
 
 	if (uarg) {
-		if (skb_zcopy_is_nouarg(skb)) {
-			/* no notification callback */
-		} else {
-			uarg->callback(uarg, zerocopy);
-		}
+		if (!skb_zcopy_is_nouarg(skb))
+			uarg->callback(skb, uarg, success);
 
 		skb_shinfo(skb)->zc_flags &= ~SKBZC_FRAGMENTS;
 	}
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index fbf0a145467a..328385cd141e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1242,7 +1242,8 @@ static void __sock_zerocopy_callback(struct ubuf_info *uarg)
 	sock_put(sk);
 }
 
-void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
+void sock_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg,
+			    bool success)
 {
 	uarg->zerocopy = uarg->zerocopy & success;
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 08/12 v2 RFC] skbuff: Call sock_zerocopy_put_abort from skb_zcopy_put_abort
  2020-12-22  0:09 [PATCH 00/12 v2 RFC] Generic zcopy_* functions Jonathan Lemon
                   ` (6 preceding siblings ...)
  2020-12-22  0:09 ` [PATCH 07/12 v2 RFC] skbuff: Add skb parameter to the ubuf zerocopy callback Jonathan Lemon
@ 2020-12-22  0:09 ` Jonathan Lemon
  2020-12-22  0:09 ` [PATCH 09/12 v2 RFC] skbuff: add zc_flags to ubuf_info for ubuf setup Jonathan Lemon
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 33+ messages in thread
From: Jonathan Lemon @ 2020-12-22  0:09 UTC (permalink / raw)
  To: netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

The sock_zerocopy_put_abort function contains logic which is
specific to the current zerocopy implementation.  Add a wrapper
which checks the callback and dispatches apppropriately.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 include/linux/skbuff.h | 10 ++++++++++
 net/core/skbuff.c      | 12 +++++-------
 net/ipv4/ip_output.c   |  3 +--
 net/ipv4/tcp.c         |  2 +-
 net/ipv6/ip6_output.c  |  3 +--
 5 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 8403ea509cee..da0c1dddd0da 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1480,6 +1480,16 @@ static inline void skb_zcopy_put(struct ubuf_info *uarg)
 		uarg->callback(NULL, uarg, true);
 }
 
+static inline void skb_zcopy_put_abort(struct ubuf_info *uarg, bool have_uref)
+{
+	if (uarg) {
+		if (uarg->callback == sock_zerocopy_callback)
+			sock_zerocopy_put_abort(uarg, have_uref);
+		else if (have_uref)
+			skb_zcopy_put(uarg);
+	}
+}
+
 /* Release a reference on a zerocopy structure */
 static inline void skb_zcopy_clear(struct sk_buff *skb, bool success)
 {
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 328385cd141e..8352da29f052 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1254,15 +1254,13 @@ EXPORT_SYMBOL_GPL(sock_zerocopy_callback);
 
 void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref)
 {
-	if (uarg) {
-		struct sock *sk = skb_from_uarg(uarg)->sk;
+	struct sock *sk = skb_from_uarg(uarg)->sk;
 
-		atomic_dec(&sk->sk_zckey);
-		uarg->len--;
+	atomic_dec(&sk->sk_zckey);
+	uarg->len--;
 
-		if (have_uref)
-			skb_zcopy_put(uarg);
-	}
+	if (have_uref)
+		sock_zerocopy_callback(NULL, uarg, true);
 }
 EXPORT_SYMBOL_GPL(sock_zerocopy_put_abort);
 
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 879b76ae4435..65f2299fd682 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1230,8 +1230,7 @@ static int __ip_append_data(struct sock *sk,
 error_efault:
 	err = -EFAULT;
 error:
-	if (uarg)
-		sock_zerocopy_put_abort(uarg, extra_uref);
+	skb_zcopy_put_abort(uarg, extra_uref);
 	cork->length -= length;
 	IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTDISCARDS);
 	refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5c38080df13f..900b6bb7b280 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1440,7 +1440,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 	if (copied + copied_syn)
 		goto out;
 out_err:
-	sock_zerocopy_put_abort(uarg, true);
+	skb_zcopy_put_abort(uarg, true);
 	err = sk_stream_error(sk, flags, err);
 	/* make sure we wake any epoll edge trigger waiter */
 	if (unlikely(tcp_rtx_and_write_queues_empty(sk) && err == -EAGAIN)) {
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 749ad72386b2..c8c87891533a 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1715,8 +1715,7 @@ static int __ip6_append_data(struct sock *sk,
 error_efault:
 	err = -EFAULT;
 error:
-	if (uarg)
-		sock_zerocopy_put_abort(uarg, extra_uref);
+	skb_zcopy_put_abort(uarg, extra_uref);
 	cork->length -= length;
 	IP6_INC_STATS(sock_net(sk), rt->rt6i_idev, IPSTATS_MIB_OUTDISCARDS);
 	refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 09/12 v2 RFC] skbuff: add zc_flags to ubuf_info for ubuf setup
  2020-12-22  0:09 [PATCH 00/12 v2 RFC] Generic zcopy_* functions Jonathan Lemon
                   ` (7 preceding siblings ...)
  2020-12-22  0:09 ` [PATCH 08/12 v2 RFC] skbuff: Call sock_zerocopy_put_abort from skb_zcopy_put_abort Jonathan Lemon
@ 2020-12-22  0:09 ` Jonathan Lemon
  2020-12-22 15:00   ` Willem de Bruijn
  2020-12-22  0:09 ` [PATCH 10/12 v2 RFC] tap/tun: use skb_zcopy_set() instead of open coded assignment Jonathan Lemon
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Jonathan Lemon @ 2020-12-22  0:09 UTC (permalink / raw)
  To: netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

Currently, an ubuf is attached to a new skb, the skb zc_flags
is initialized to a fixed value.  Instead of doing this, set
the default zc_flags in the ubuf, and have new skb's inherit
from this default.

This is needed when setting up different zerocopy types.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 include/linux/skbuff.h | 3 ++-
 net/core/skbuff.c      | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index da0c1dddd0da..b90be4b0b2de 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -478,6 +478,7 @@ struct ubuf_info {
 		};
 	};
 	refcount_t refcnt;
+	u8 zc_flags;
 
 	struct mmpin {
 		struct user_struct *user;
@@ -1454,7 +1455,7 @@ static inline void skb_zcopy_set(struct sk_buff *skb, struct ubuf_info *uarg,
 		else
 			skb_zcopy_get(uarg);
 		skb_shinfo(skb)->destructor_arg = uarg;
-		skb_shinfo(skb)->zc_flags |= SKBZC_FRAGMENTS;
+		skb_shinfo(skb)->zc_flags |= uarg->zc_flags;
 	}
 }
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 8352da29f052..463078ba663f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1118,6 +1118,7 @@ struct ubuf_info *sock_zerocopy_alloc(struct sock *sk, size_t size)
 	uarg->len = 1;
 	uarg->bytelen = size;
 	uarg->zerocopy = 1;
+	uarg->zc_flags = SKBZC_FRAGMENTS;
 	refcount_set(&uarg->refcnt, 1);
 	sock_hold(sk);
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 10/12 v2 RFC] tap/tun: use skb_zcopy_set() instead of open coded assignment
  2020-12-22  0:09 [PATCH 00/12 v2 RFC] Generic zcopy_* functions Jonathan Lemon
                   ` (8 preceding siblings ...)
  2020-12-22  0:09 ` [PATCH 09/12 v2 RFC] skbuff: add zc_flags to ubuf_info for ubuf setup Jonathan Lemon
@ 2020-12-22  0:09 ` Jonathan Lemon
  2020-12-22  0:09 ` [PATCH 11/12 v2 RFC] skbuff: Call skb_zcopy_clear() before unref'ing fragments Jonathan Lemon
  2020-12-22  0:09 ` [PATCH 12/12 v2 RFC] skbuff: rename sock_zerocopy_* to msg_zerocopy_* Jonathan Lemon
  11 siblings, 0 replies; 33+ messages in thread
From: Jonathan Lemon @ 2020-12-22  0:09 UTC (permalink / raw)
  To: netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

Replace direct assignments with skb_zcopy_set() for clarity.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 drivers/net/tap.c   | 3 +--
 drivers/net/tun.c   | 3 +--
 drivers/vhost/net.c | 1 +
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index c2bcbf9218dc..7e7a4c7ca891 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -722,8 +722,7 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
 	tap = rcu_dereference(q->tap);
 	/* copy skb_ubuf_info for callback when skb has no error */
 	if (zerocopy) {
-		skb_shinfo(skb)->destructor_arg = msg_control;
-		skb_shinfo(skb)->zc_flags |= SKBZC_FRAGMENTS;
+		skb_zcopy_set(skb, msg_control, NULL);
 	} else if (msg_control) {
 		struct ubuf_info *uarg = msg_control;
 		uarg->callback(NULL, uarg, false);
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index bad4b0229584..0844da91e2ed 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1814,8 +1814,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 
 	/* copy skb_ubuf_info for callback when skb has no error */
 	if (zerocopy) {
-		skb_shinfo(skb)->destructor_arg = msg_control;
-		skb_shinfo(skb)->zc_flags |= SKBZC_FRAGMENTS;
+		skb_zcopy_set(skb, msg_control, NULL);
 	} else if (msg_control) {
 		struct ubuf_info *uarg = msg_control;
 		uarg->callback(NULL, uarg, false);
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index bf28d0b75c1b..174c05c90872 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -904,6 +904,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 			ubuf->callback = vhost_zerocopy_callback;
 			ubuf->ctx = nvq->ubufs;
 			ubuf->desc = nvq->upend_idx;
+			ubuf->zc_flags = SKBZC_FRAGMENTS;
 			refcount_set(&ubuf->refcnt, 1);
 			msg.msg_control = &ctl;
 			ctl.type = TUN_MSG_UBUF;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 11/12 v2 RFC] skbuff: Call skb_zcopy_clear() before unref'ing fragments
  2020-12-22  0:09 [PATCH 00/12 v2 RFC] Generic zcopy_* functions Jonathan Lemon
                   ` (9 preceding siblings ...)
  2020-12-22  0:09 ` [PATCH 10/12 v2 RFC] tap/tun: use skb_zcopy_set() instead of open coded assignment Jonathan Lemon
@ 2020-12-22  0:09 ` Jonathan Lemon
  2020-12-22  0:09 ` [PATCH 12/12 v2 RFC] skbuff: rename sock_zerocopy_* to msg_zerocopy_* Jonathan Lemon
  11 siblings, 0 replies; 33+ messages in thread
From: Jonathan Lemon @ 2020-12-22  0:09 UTC (permalink / raw)
  To: netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

RX zerocopy fragment pages which are not allocated from the
system page pool require special handling.  Give the callback
in skb_zcopy_clear() a chance to process them first.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 net/core/skbuff.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 463078ba663f..ee75279c7c78 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -605,13 +605,14 @@ static void skb_release_data(struct sk_buff *skb)
 			      &shinfo->dataref))
 		return;
 
+	skb_zcopy_clear(skb, true);
+
 	for (i = 0; i < shinfo->nr_frags; i++)
 		__skb_frag_unref(&shinfo->frags[i]);
 
 	if (shinfo->frag_list)
 		kfree_skb_list(shinfo->frag_list);
 
-	skb_zcopy_clear(skb, true);
 	skb_free_head(skb);
 }
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 12/12 v2 RFC] skbuff: rename sock_zerocopy_* to msg_zerocopy_*
  2020-12-22  0:09 [PATCH 00/12 v2 RFC] Generic zcopy_* functions Jonathan Lemon
                   ` (10 preceding siblings ...)
  2020-12-22  0:09 ` [PATCH 11/12 v2 RFC] skbuff: Call skb_zcopy_clear() before unref'ing fragments Jonathan Lemon
@ 2020-12-22  0:09 ` Jonathan Lemon
  2020-12-22 14:55   ` Willem de Bruijn
  11 siblings, 1 reply; 33+ messages in thread
From: Jonathan Lemon @ 2020-12-22  0:09 UTC (permalink / raw)
  To: netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

At Willem's suggestion, rename the sock_zerocopy_* functions
so that they match the MSG_ZEROCOPY flag, which makes it clear
they are specific to this zerocopy implementation.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 include/linux/skbuff.h | 16 ++++++++--------
 net/core/skbuff.c      | 28 ++++++++++++++--------------
 net/ipv4/ip_output.c   |  2 +-
 net/ipv4/tcp.c         |  2 +-
 net/ipv6/ip6_output.c  |  2 +-
 5 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b90be4b0b2de..23a5d2637635 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -491,13 +491,13 @@ struct ubuf_info {
 int mm_account_pinned_pages(struct mmpin *mmp, size_t size);
 void mm_unaccount_pinned_pages(struct mmpin *mmp);
 
-struct ubuf_info *sock_zerocopy_alloc(struct sock *sk, size_t size);
-struct ubuf_info *sock_zerocopy_realloc(struct sock *sk, size_t size,
-					struct ubuf_info *uarg);
+struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size);
+struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size,
+				       struct ubuf_info *uarg);
 
-void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref);
+void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref);
 
-void sock_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg,
+void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg,
 			    bool success);
 
 int skb_zerocopy_iter_dgram(struct sk_buff *skb, struct msghdr *msg, int len);
@@ -1484,8 +1484,8 @@ static inline void skb_zcopy_put(struct ubuf_info *uarg)
 static inline void skb_zcopy_put_abort(struct ubuf_info *uarg, bool have_uref)
 {
 	if (uarg) {
-		if (uarg->callback == sock_zerocopy_callback)
-			sock_zerocopy_put_abort(uarg, have_uref);
+		if (uarg->callback == msg_zerocopy_callback)
+			msg_zerocopy_put_abort(uarg, have_uref);
 		else if (have_uref)
 			skb_zcopy_put(uarg);
 	}
@@ -2779,7 +2779,7 @@ static inline int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask)
 	if (likely(!skb_zcopy(skb)))
 		return 0;
 	if (!skb_zcopy_is_nouarg(skb) &&
-	    skb_uarg(skb)->callback == sock_zerocopy_callback)
+	    skb_uarg(skb)->callback == msg_zerocopy_callback)
 		return 0;
 	return skb_copy_ubufs(skb, gfp_mask);
 }
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index ee75279c7c78..a5c419c69aad 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1094,7 +1094,7 @@ void mm_unaccount_pinned_pages(struct mmpin *mmp)
 }
 EXPORT_SYMBOL_GPL(mm_unaccount_pinned_pages);
 
-struct ubuf_info *sock_zerocopy_alloc(struct sock *sk, size_t size)
+struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size)
 {
 	struct ubuf_info *uarg;
 	struct sk_buff *skb;
@@ -1114,7 +1114,7 @@ struct ubuf_info *sock_zerocopy_alloc(struct sock *sk, size_t size)
 		return NULL;
 	}
 
-	uarg->callback = sock_zerocopy_callback;
+	uarg->callback = msg_zerocopy_callback;
 	uarg->id = ((u32)atomic_inc_return(&sk->sk_zckey)) - 1;
 	uarg->len = 1;
 	uarg->bytelen = size;
@@ -1125,15 +1125,15 @@ struct ubuf_info *sock_zerocopy_alloc(struct sock *sk, size_t size)
 
 	return uarg;
 }
-EXPORT_SYMBOL_GPL(sock_zerocopy_alloc);
+EXPORT_SYMBOL_GPL(msg_zerocopy_alloc);
 
 static inline struct sk_buff *skb_from_uarg(struct ubuf_info *uarg)
 {
 	return container_of((void *)uarg, struct sk_buff, cb);
 }
 
-struct ubuf_info *sock_zerocopy_realloc(struct sock *sk, size_t size,
-					struct ubuf_info *uarg)
+struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size,
+				       struct ubuf_info *uarg)
 {
 	if (uarg) {
 		const u32 byte_limit = 1 << 19;		/* limit to a few TSO */
@@ -1172,9 +1172,9 @@ struct ubuf_info *sock_zerocopy_realloc(struct sock *sk, size_t size,
 	}
 
 new_alloc:
-	return sock_zerocopy_alloc(sk, size);
+	return msg_zerocopy_alloc(sk, size);
 }
-EXPORT_SYMBOL_GPL(sock_zerocopy_realloc);
+EXPORT_SYMBOL_GPL(msg_zerocopy_realloc);
 
 static bool skb_zerocopy_notify_extend(struct sk_buff *skb, u32 lo, u16 len)
 {
@@ -1196,7 +1196,7 @@ static bool skb_zerocopy_notify_extend(struct sk_buff *skb, u32 lo, u16 len)
 	return true;
 }
 
-static void __sock_zerocopy_callback(struct ubuf_info *uarg)
+static void __msg_zerocopy_callback(struct ubuf_info *uarg)
 {
 	struct sk_buff *tail, *skb = skb_from_uarg(uarg);
 	struct sock_exterr_skb *serr;
@@ -1244,17 +1244,17 @@ static void __sock_zerocopy_callback(struct ubuf_info *uarg)
 	sock_put(sk);
 }
 
-void sock_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg,
+void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg,
 			    bool success)
 {
 	uarg->zerocopy = uarg->zerocopy & success;
 
 	if (refcount_dec_and_test(&uarg->refcnt))
-		__sock_zerocopy_callback(uarg);
+		__msg_zerocopy_callback(uarg);
 }
-EXPORT_SYMBOL_GPL(sock_zerocopy_callback);
+EXPORT_SYMBOL_GPL(msg_zerocopy_callback);
 
-void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref)
+void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref)
 {
 	struct sock *sk = skb_from_uarg(uarg)->sk;
 
@@ -1262,9 +1262,9 @@ void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref)
 	uarg->len--;
 
 	if (have_uref)
-		sock_zerocopy_callback(NULL, uarg, true);
+		msg_zerocopy_callback(NULL, uarg, true);
 }
-EXPORT_SYMBOL_GPL(sock_zerocopy_put_abort);
+EXPORT_SYMBOL_GPL(msg_zerocopy_put_abort);
 
 int skb_zerocopy_iter_dgram(struct sk_buff *skb, struct msghdr *msg, int len)
 {
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 65f2299fd682..d8eb8b827794 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1018,7 +1018,7 @@ static int __ip_append_data(struct sock *sk,
 		csummode = CHECKSUM_PARTIAL;
 
 	if (flags & MSG_ZEROCOPY && length && sock_flag(sk, SOCK_ZEROCOPY)) {
-		uarg = sock_zerocopy_realloc(sk, length, skb_zcopy(skb));
+		uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb));
 		if (!uarg)
 			return -ENOBUFS;
 		extra_uref = !skb_zcopy(skb);	/* only ref on new uarg */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 900b6bb7b280..7a6ffa64f129 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1217,7 +1217,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 
 	if (flags & MSG_ZEROCOPY && size && sock_flag(sk, SOCK_ZEROCOPY)) {
 		skb = tcp_write_queue_tail(sk);
-		uarg = sock_zerocopy_realloc(sk, size, skb_zcopy(skb));
+		uarg = msg_zerocopy_realloc(sk, size, skb_zcopy(skb));
 		if (!uarg) {
 			err = -ENOBUFS;
 			goto out_err;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index c8c87891533a..f59cfa39686a 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1471,7 +1471,7 @@ static int __ip6_append_data(struct sock *sk,
 		csummode = CHECKSUM_PARTIAL;
 
 	if (flags & MSG_ZEROCOPY && length && sock_flag(sk, SOCK_ZEROCOPY)) {
-		uarg = sock_zerocopy_realloc(sk, length, skb_zcopy(skb));
+		uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb));
 		if (!uarg)
 			return -ENOBUFS;
 		extra_uref = !skb_zcopy(skb);	/* only ref on new uarg */
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 05/12 v2 RFC] skbuff: replace sock_zerocopy_put() with skb_zcopy_put()
  2020-12-22  0:09 ` [PATCH 05/12 v2 RFC] skbuff: replace sock_zerocopy_put() with skb_zcopy_put() Jonathan Lemon
@ 2020-12-22 14:42   ` Willem de Bruijn
  2020-12-22 17:19     ` Jonathan Lemon
  0 siblings, 1 reply; 33+ messages in thread
From: Willem de Bruijn @ 2020-12-22 14:42 UTC (permalink / raw)
  To: Jonathan Lemon; +Cc: Network Development, Eric Dumazet, Kernel Team

On Mon, Dec 21, 2020 at 7:09 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
>
> From: Jonathan Lemon <bsd@fb.com>
>
> Replace sock_zerocopy_put with the generic skb_zcopy_put()
> function.  Pass 'true' as the success argument, as this
> is identical to no change.
>
> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>

uarg->zerocopy may be false if sock_zerocopy_put_abort is called from
tcp_sendmsg_locked

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/12 v2 RFC] net: group skb_shinfo zerocopy related bits together.
  2020-12-22  0:09 ` [PATCH 01/12 v2 RFC] net: group skb_shinfo zerocopy related bits together Jonathan Lemon
@ 2020-12-22 14:43   ` Willem de Bruijn
  2020-12-22 17:21     ` Jonathan Lemon
  0 siblings, 1 reply; 33+ messages in thread
From: Willem de Bruijn @ 2020-12-22 14:43 UTC (permalink / raw)
  To: Jonathan Lemon; +Cc: Network Development, Eric Dumazet, Kernel Team

On Mon, Dec 21, 2020 at 7:09 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
>
> From: Jonathan Lemon <bsd@fb.com>
>
> In preparation for expanded zerocopy (TX and RX), move
> the ZC related bits out of tx_flags into their own flag
> word.
>
> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>

I think it's better to expand tx_flags to a u16 and add the two new
flags that you need.

That allows the additional 7 bits to be used for arbitrary flags, not
stranding 8 bits exactly only for zerocopy features.

Moving around a few u8's in the same cacheline won't be a problem.

I also prefer not to rename flags that some of us are familiar with,
if it's not needed.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/12 v2 RFC] skbuff: simplify sock_zerocopy_put
  2020-12-22  0:09 ` [PATCH 03/12 v2 RFC] skbuff: simplify sock_zerocopy_put Jonathan Lemon
@ 2020-12-22 14:43   ` Willem de Bruijn
  2020-12-22 16:52   ` David Ahern
  1 sibling, 0 replies; 33+ messages in thread
From: Willem de Bruijn @ 2020-12-22 14:43 UTC (permalink / raw)
  To: Jonathan Lemon; +Cc: Network Development, Eric Dumazet, Kernel Team

On Mon, Dec 21, 2020 at 7:09 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
>
> From: Jonathan Lemon <bsd@fb.com>
>
> All 'struct ubuf_info' users should have a callback defined.
> Remove the dead code path to consume_skb(), which makes
> unwarranted assumptions about how the structure was allocated.
>
> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>

Please link to the commit I shared that made the consume_skb path
obsolete. Before that this would have been unsafe.

should have a callback defined -> have a callback defined as of commit
0a4a060bb204 ("sock: fix
zerocopy_success regression with msg_zerocopy").

With that explanation why this is correct

Acked-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/12 v2 RFC] skbuff: Push status and refcounts into sock_zerocopy_callback
  2020-12-22  0:09 ` [PATCH 04/12 v2 RFC] skbuff: Push status and refcounts into sock_zerocopy_callback Jonathan Lemon
@ 2020-12-22 14:43   ` Willem de Bruijn
  2020-12-22 16:49     ` David Ahern
  2020-12-22 17:48     ` Jonathan Lemon
  0 siblings, 2 replies; 33+ messages in thread
From: Willem de Bruijn @ 2020-12-22 14:43 UTC (permalink / raw)
  To: Jonathan Lemon; +Cc: Network Development, Eric Dumazet, Kernel Team

On Mon, Dec 21, 2020 at 7:09 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
>
> From: Jonathan Lemon <bsd@fb.com>
>
> Before this change, the caller of sock_zerocopy_callback would
> need to save the zerocopy status, decrement and check the refcount,
> and then call the callback function - the callback was only invoked
> when the refcount reached zero.
>
> Now, the caller just passes the status into the callback function,
> which saves the status and handles its own refcounts.
>
> This makes the behavior of the sock_zerocopy_callback identical
> to the tpacket and vhost callbacks.
>
> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> ---
>  include/linux/skbuff.h |  3 ---
>  net/core/skbuff.c      | 14 +++++++++++---
>  2 files changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index fb6dd6af0f82..c9d7de9d624d 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1482,9 +1482,6 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy)
>         if (uarg) {
>                 if (skb_zcopy_is_nouarg(skb)) {
>                         /* no notification callback */
> -               } else if (uarg->callback == sock_zerocopy_callback) {
> -                       uarg->zerocopy = uarg->zerocopy && zerocopy;
> -                       sock_zerocopy_put(uarg);
>                 } else {
>                         uarg->callback(uarg, zerocopy);
>                 }
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index ea32b3414ad6..73699dbdc4a1 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1194,7 +1194,7 @@ static bool skb_zerocopy_notify_extend(struct sk_buff *skb, u32 lo, u16 len)
>         return true;
>  }
>
> -void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
> +static void __sock_zerocopy_callback(struct ubuf_info *uarg)
>  {
>         struct sk_buff *tail, *skb = skb_from_uarg(uarg);
>         struct sock_exterr_skb *serr;
> @@ -1222,7 +1222,7 @@ void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
>         serr->ee.ee_origin = SO_EE_ORIGIN_ZEROCOPY;
>         serr->ee.ee_data = hi;
>         serr->ee.ee_info = lo;
> -       if (!success)
> +       if (!uarg->zerocopy)
>                 serr->ee.ee_code |= SO_EE_CODE_ZEROCOPY_COPIED;
>
>         q = &sk->sk_error_queue;
> @@ -1241,11 +1241,19 @@ void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
>         consume_skb(skb);
>         sock_put(sk);
>  }
> +
> +void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
> +{
> +       uarg->zerocopy = uarg->zerocopy & success;
> +
> +       if (refcount_dec_and_test(&uarg->refcnt))
> +               __sock_zerocopy_callback(uarg);
> +}
>  EXPORT_SYMBOL_GPL(sock_zerocopy_callback);

I still think this helper is unnecessary. Just return immediately in
existing sock_zerocopy_callback if refcount is not zero.

>  void sock_zerocopy_put(struct ubuf_info *uarg)
>  {
> -       if (uarg && refcount_dec_and_test(&uarg->refcnt))
> +       if (uarg)
>                 uarg->callback(uarg, uarg->zerocopy);
>  }
>  EXPORT_SYMBOL_GPL(sock_zerocopy_put);

This does increase the number of indirect function calls. Which are
not cheap post spectre.

In the common case for msg_zerocopy we only have two clones, one sent
out and one on the retransmit queue. So I guess the cost will be
acceptable.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 12/12 v2 RFC] skbuff: rename sock_zerocopy_* to msg_zerocopy_*
  2020-12-22  0:09 ` [PATCH 12/12 v2 RFC] skbuff: rename sock_zerocopy_* to msg_zerocopy_* Jonathan Lemon
@ 2020-12-22 14:55   ` Willem de Bruijn
  0 siblings, 0 replies; 33+ messages in thread
From: Willem de Bruijn @ 2020-12-22 14:55 UTC (permalink / raw)
  To: Jonathan Lemon; +Cc: Network Development, Eric Dumazet, Kernel Team

On Mon, Dec 21, 2020 at 7:09 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
>
> From: Jonathan Lemon <bsd@fb.com>
>
> At Willem's suggestion, rename the sock_zerocopy_* functions
> so that they match the MSG_ZEROCOPY flag, which makes it clear
> they are specific to this zerocopy implementation.
>
> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>

Acked-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 09/12 v2 RFC] skbuff: add zc_flags to ubuf_info for ubuf setup
  2020-12-22  0:09 ` [PATCH 09/12 v2 RFC] skbuff: add zc_flags to ubuf_info for ubuf setup Jonathan Lemon
@ 2020-12-22 15:00   ` Willem de Bruijn
  2020-12-22 18:17     ` Jonathan Lemon
  0 siblings, 1 reply; 33+ messages in thread
From: Willem de Bruijn @ 2020-12-22 15:00 UTC (permalink / raw)
  To: Jonathan Lemon; +Cc: Network Development, Eric Dumazet, Kernel Team

On Mon, Dec 21, 2020 at 7:09 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
>
> From: Jonathan Lemon <bsd@fb.com>
>
> Currently, an ubuf is attached to a new skb, the skb zc_flags
> is initialized to a fixed value.  Instead of doing this, set
> the default zc_flags in the ubuf, and have new skb's inherit
> from this default.
>
> This is needed when setting up different zerocopy types.
>
> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> ---
>  include/linux/skbuff.h | 3 ++-
>  net/core/skbuff.c      | 1 +
>  2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index da0c1dddd0da..b90be4b0b2de 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -478,6 +478,7 @@ struct ubuf_info {
>                 };
>         };
>         refcount_t refcnt;
> +       u8 zc_flags;
>
>         struct mmpin {
>                 struct user_struct *user;

When allocating ubuf_info for msg_zerocopy, we actually allocate the
notification skb, to be sure that notifications won't be dropped due
to memory pressure at notification time. We actually allocate the skb
and place ubuf_info in skb->cb[].

The struct is exactly 48 bytes on 64-bit platforms, filling all of cb.
This new field fills a 4B hole, so it should still be fine.

Just being very explicit, as this is a fragile bit of code. Come to
think of it, this probably deserves a BUILD_BUG_ON.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/12 v2 RFC] skbuff: remove unused skb_zcopy_abort function
  2020-12-22  0:09 ` [PATCH 02/12 v2 RFC] skbuff: remove unused skb_zcopy_abort function Jonathan Lemon
@ 2020-12-22 15:00   ` Willem de Bruijn
  0 siblings, 0 replies; 33+ messages in thread
From: Willem de Bruijn @ 2020-12-22 15:00 UTC (permalink / raw)
  To: Jonathan Lemon; +Cc: Network Development, Eric Dumazet, Kernel Team

On Mon, Dec 21, 2020 at 7:09 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
>
> From: Jonathan Lemon <bsd@fb.com>
>
> skb_zcopy_abort() has no in-tree consumers, remove it.
>
> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>

Acked-by: Willem de Bruijn <willemb@google.com>

Indeed unused. This was used in the packet and raw zerocopy patches in
the original patchset. But we never merged those, for lack of copy
avoidance benefit at small packet sizes.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/12 v2 RFC] skbuff: Push status and refcounts into sock_zerocopy_callback
  2020-12-22 14:43   ` Willem de Bruijn
@ 2020-12-22 16:49     ` David Ahern
  2020-12-22 17:48     ` Jonathan Lemon
  1 sibling, 0 replies; 33+ messages in thread
From: David Ahern @ 2020-12-22 16:49 UTC (permalink / raw)
  To: Willem de Bruijn, Jonathan Lemon
  Cc: Network Development, Eric Dumazet, Kernel Team

On 12/22/20 7:43 AM, Willem de Bruijn wrote:
> 
>>  void sock_zerocopy_put(struct ubuf_info *uarg)
>>  {
>> -       if (uarg && refcount_dec_and_test(&uarg->refcnt))
>> +       if (uarg)
>>                 uarg->callback(uarg, uarg->zerocopy);
>>  }
>>  EXPORT_SYMBOL_GPL(sock_zerocopy_put);
> 
> This does increase the number of indirect function calls. Which are
> not cheap post spectre.
> 
> In the common case for msg_zerocopy we only have two clones, one sent
> out and one on the retransmit queue. So I guess the cost will be
> acceptable.
> 


sock_zerocopy_callback seems to be the only one at the moment - or the
most dominant if I goofed my search. Could use the INDIRECT_CALL macros
for it.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/12 v2 RFC] skbuff: simplify sock_zerocopy_put
  2020-12-22  0:09 ` [PATCH 03/12 v2 RFC] skbuff: simplify sock_zerocopy_put Jonathan Lemon
  2020-12-22 14:43   ` Willem de Bruijn
@ 2020-12-22 16:52   ` David Ahern
  2020-12-22 16:56     ` David Ahern
  1 sibling, 1 reply; 33+ messages in thread
From: David Ahern @ 2020-12-22 16:52 UTC (permalink / raw)
  To: Jonathan Lemon, netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

On 12/21/20 5:09 PM, Jonathan Lemon wrote:
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 327ee8938f78..ea32b3414ad6 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1245,12 +1245,8 @@ EXPORT_SYMBOL_GPL(sock_zerocopy_callback);
>  
>  void sock_zerocopy_put(struct ubuf_info *uarg)
>  {
> -	if (uarg && refcount_dec_and_test(&uarg->refcnt)) {
> -		if (uarg->callback)
> -			uarg->callback(uarg, uarg->zerocopy);
> -		else
> -			consume_skb(skb_from_uarg(uarg));
> -	}
> +	if (uarg && refcount_dec_and_test(&uarg->refcnt))
> +		uarg->callback(uarg, uarg->zerocopy);
>  }
>  EXPORT_SYMBOL_GPL(sock_zerocopy_put);
>  
> 

since it is down to 2 lines, move to skbuff.h as an inline?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/12 v2 RFC] skbuff: simplify sock_zerocopy_put
  2020-12-22 16:52   ` David Ahern
@ 2020-12-22 16:56     ` David Ahern
  0 siblings, 0 replies; 33+ messages in thread
From: David Ahern @ 2020-12-22 16:56 UTC (permalink / raw)
  To: Jonathan Lemon, netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

On 12/22/20 9:52 AM, David Ahern wrote:
> On 12/21/20 5:09 PM, Jonathan Lemon wrote:
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 327ee8938f78..ea32b3414ad6 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -1245,12 +1245,8 @@ EXPORT_SYMBOL_GPL(sock_zerocopy_callback);
>>  
>>  void sock_zerocopy_put(struct ubuf_info *uarg)
>>  {
>> -	if (uarg && refcount_dec_and_test(&uarg->refcnt)) {
>> -		if (uarg->callback)
>> -			uarg->callback(uarg, uarg->zerocopy);
>> -		else
>> -			consume_skb(skb_from_uarg(uarg));
>> -	}
>> +	if (uarg && refcount_dec_and_test(&uarg->refcnt))
>> +		uarg->callback(uarg, uarg->zerocopy);
>>  }
>>  EXPORT_SYMBOL_GPL(sock_zerocopy_put);
>>  
>>
> 
> since it is down to 2 lines, move to skbuff.h as an inline?
> 

nm. that is done in patch 5.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 05/12 v2 RFC] skbuff: replace sock_zerocopy_put() with skb_zcopy_put()
  2020-12-22 14:42   ` Willem de Bruijn
@ 2020-12-22 17:19     ` Jonathan Lemon
  2020-12-22 22:36       ` Willem de Bruijn
  0 siblings, 1 reply; 33+ messages in thread
From: Jonathan Lemon @ 2020-12-22 17:19 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, Eric Dumazet, Kernel Team

On Tue, Dec 22, 2020 at 09:42:40AM -0500, Willem de Bruijn wrote:
> On Mon, Dec 21, 2020 at 7:09 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
> >
> > From: Jonathan Lemon <bsd@fb.com>
> >
> > Replace sock_zerocopy_put with the generic skb_zcopy_put()
> > function.  Pass 'true' as the success argument, as this
> > is identical to no change.
> >
> > Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> 
> uarg->zerocopy may be false if sock_zerocopy_put_abort is called from
> tcp_sendmsg_locked

Yes, it may well be false.  The original logic goes:

   sock_zerocopy_put_abort()
   sock_zerocopy_put()
   sock_zerocopy_callback(..., success = uarg->zerocopy)
     if (success)

The new logic is now:

   sock_zerocopy_put_abort()
   sock_zerocopy_callback(..., success = true)
     uarg->zerocopy = uarg->zerocopy & success
     if (uarg->zerocopy)

The success value ls latched into uarg->zerocopy, and any failure
is persistent.  Hence my comment about passing 'true' not changing
the logic.
-- 
Jonathan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/12 v2 RFC] net: group skb_shinfo zerocopy related bits together.
  2020-12-22 14:43   ` Willem de Bruijn
@ 2020-12-22 17:21     ` Jonathan Lemon
  2020-12-22 22:26       ` Willem de Bruijn
  0 siblings, 1 reply; 33+ messages in thread
From: Jonathan Lemon @ 2020-12-22 17:21 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, Eric Dumazet, Kernel Team

On Tue, Dec 22, 2020 at 09:43:15AM -0500, Willem de Bruijn wrote:
> On Mon, Dec 21, 2020 at 7:09 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
> >
> > From: Jonathan Lemon <bsd@fb.com>
> >
> > In preparation for expanded zerocopy (TX and RX), move
> > the ZC related bits out of tx_flags into their own flag
> > word.
> >
> > Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> 
> I think it's better to expand tx_flags to a u16 and add the two new
> flags that you need.

Okay, but in that case, tx_flags is now wrong, since some of these flags
will also be checked on the rx path.


> That allows the additional 7 bits to be used for arbitrary flags, not
> stranding 8 bits exactly only for zerocopy features.
> 
> Moving around a few u8's in the same cacheline won't be a problem.

How about rearranging them to form 16 bits, and just calling it 'flags'?

 
> I also prefer not to rename flags that some of us are familiar with,
> if it's not needed.

Ack.
--
Jonathan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/12 v2 RFC] skbuff: Push status and refcounts into sock_zerocopy_callback
  2020-12-22 14:43   ` Willem de Bruijn
  2020-12-22 16:49     ` David Ahern
@ 2020-12-22 17:48     ` Jonathan Lemon
  2020-12-22 22:38       ` Willem de Bruijn
  1 sibling, 1 reply; 33+ messages in thread
From: Jonathan Lemon @ 2020-12-22 17:48 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, Eric Dumazet, Kernel Team

On Tue, Dec 22, 2020 at 09:43:39AM -0500, Willem de Bruijn wrote:
> On Mon, Dec 21, 2020 at 7:09 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
> >
> > From: Jonathan Lemon <bsd@fb.com>
> >
> > Before this change, the caller of sock_zerocopy_callback would
> > need to save the zerocopy status, decrement and check the refcount,
> > and then call the callback function - the callback was only invoked
> > when the refcount reached zero.
> >
> > Now, the caller just passes the status into the callback function,
> > which saves the status and handles its own refcounts.
> >
> > This makes the behavior of the sock_zerocopy_callback identical
> > to the tpacket and vhost callbacks.
> >
> > Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> > ---
> >  include/linux/skbuff.h |  3 ---
> >  net/core/skbuff.c      | 14 +++++++++++---
> >  2 files changed, 11 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index fb6dd6af0f82..c9d7de9d624d 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -1482,9 +1482,6 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy)
> >         if (uarg) {
> >                 if (skb_zcopy_is_nouarg(skb)) {
> >                         /* no notification callback */
> > -               } else if (uarg->callback == sock_zerocopy_callback) {
> > -                       uarg->zerocopy = uarg->zerocopy && zerocopy;
> > -                       sock_zerocopy_put(uarg);
> >                 } else {
> >                         uarg->callback(uarg, zerocopy);
> >                 }
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index ea32b3414ad6..73699dbdc4a1 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -1194,7 +1194,7 @@ static bool skb_zerocopy_notify_extend(struct sk_buff *skb, u32 lo, u16 len)
> >         return true;
> >  }
> >
> > -void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
> > +static void __sock_zerocopy_callback(struct ubuf_info *uarg)
> >  {
> >         struct sk_buff *tail, *skb = skb_from_uarg(uarg);
> >         struct sock_exterr_skb *serr;
> > @@ -1222,7 +1222,7 @@ void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
> >         serr->ee.ee_origin = SO_EE_ORIGIN_ZEROCOPY;
> >         serr->ee.ee_data = hi;
> >         serr->ee.ee_info = lo;
> > -       if (!success)
> > +       if (!uarg->zerocopy)
> >                 serr->ee.ee_code |= SO_EE_CODE_ZEROCOPY_COPIED;
> >
> >         q = &sk->sk_error_queue;
> > @@ -1241,11 +1241,19 @@ void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
> >         consume_skb(skb);
> >         sock_put(sk);
> >  }
> > +
> > +void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
> > +{
> > +       uarg->zerocopy = uarg->zerocopy & success;
> > +
> > +       if (refcount_dec_and_test(&uarg->refcnt))
> > +               __sock_zerocopy_callback(uarg);
> > +}
> >  EXPORT_SYMBOL_GPL(sock_zerocopy_callback);
> 
> I still think this helper is unnecessary. Just return immediately in
> existing sock_zerocopy_callback if refcount is not zero.

I think the helper makes the logic clearer, and prevents misuse of
the success variable in the main function (use of last value vs the
latched value).  If you really feel that strongly about it, I'll 
fold it into one function.


> >  void sock_zerocopy_put(struct ubuf_info *uarg)
> >  {
> > -       if (uarg && refcount_dec_and_test(&uarg->refcnt))
> > +       if (uarg)
> >                 uarg->callback(uarg, uarg->zerocopy);
> >  }
> >  EXPORT_SYMBOL_GPL(sock_zerocopy_put);
> 
> This does increase the number of indirect function calls. Which are
> not cheap post spectre.
> 
> In the common case for msg_zerocopy we only have two clones, one sent
> out and one on the retransmit queue. So I guess the cost will be
> acceptable.

Yes, this was the source of my original comment about this being 
a slight pessimization - moving the refcount into the function.

I briefly considered adding a flag like 'use_refcnt', so the logic
becomes:

    if (uarg && (!uarg->use_refcnt || refcount_dec_and_test(&uarg->refcnt)))

But thought this might be too much micro-optimization.  But if 
the call overhead is significant, I can put this back in.
-- 
Jonathan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 09/12 v2 RFC] skbuff: add zc_flags to ubuf_info for ubuf setup
  2020-12-22 15:00   ` Willem de Bruijn
@ 2020-12-22 18:17     ` Jonathan Lemon
  2020-12-22 18:29       ` Willem de Bruijn
  0 siblings, 1 reply; 33+ messages in thread
From: Jonathan Lemon @ 2020-12-22 18:17 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, Eric Dumazet, Kernel Team

On Tue, Dec 22, 2020 at 10:00:37AM -0500, Willem de Bruijn wrote:
> On Mon, Dec 21, 2020 at 7:09 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
> >
> > From: Jonathan Lemon <bsd@fb.com>
> >
> > Currently, an ubuf is attached to a new skb, the skb zc_flags
> > is initialized to a fixed value.  Instead of doing this, set
> > the default zc_flags in the ubuf, and have new skb's inherit
> > from this default.
> >
> > This is needed when setting up different zerocopy types.
> >
> > Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> > ---
> >  include/linux/skbuff.h | 3 ++-
> >  net/core/skbuff.c      | 1 +
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index da0c1dddd0da..b90be4b0b2de 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -478,6 +478,7 @@ struct ubuf_info {
> >                 };
> >         };
> >         refcount_t refcnt;
> > +       u8 zc_flags;
> >
> >         struct mmpin {
> >                 struct user_struct *user;
> 
> When allocating ubuf_info for msg_zerocopy, we actually allocate the
> notification skb, to be sure that notifications won't be dropped due
> to memory pressure at notification time. We actually allocate the skb
> and place ubuf_info in skb->cb[].
> 
> The struct is exactly 48 bytes on 64-bit platforms, filling all of cb.
> This new field fills a 4B hole, so it should still be fine.

Yes, I was careful not to increase the size.  I have future changes
for this structure, moving 'struct mmpin' into a union.   Making the
flags 16 bits shouldn't be a problem either.


> Just being very explicit, as this is a fragile bit of code. Come to
> think of it, this probably deserves a BUILD_BUG_ON.

You mean like the one which exists in sock_zerocopy_alloc()?

        BUILD_BUG_ON(sizeof(*uarg) > sizeof(skb->cb));

-- 
Jonathan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 09/12 v2 RFC] skbuff: add zc_flags to ubuf_info for ubuf setup
  2020-12-22 18:17     ` Jonathan Lemon
@ 2020-12-22 18:29       ` Willem de Bruijn
  0 siblings, 0 replies; 33+ messages in thread
From: Willem de Bruijn @ 2020-12-22 18:29 UTC (permalink / raw)
  To: Jonathan Lemon; +Cc: Network Development, Eric Dumazet, Kernel Team

On Tue, Dec 22, 2020 at 1:17 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
>
> On Tue, Dec 22, 2020 at 10:00:37AM -0500, Willem de Bruijn wrote:
> > On Mon, Dec 21, 2020 at 7:09 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
> > >
> > > From: Jonathan Lemon <bsd@fb.com>
> > >
> > > Currently, an ubuf is attached to a new skb, the skb zc_flags
> > > is initialized to a fixed value.  Instead of doing this, set
> > > the default zc_flags in the ubuf, and have new skb's inherit
> > > from this default.
> > >
> > > This is needed when setting up different zerocopy types.
> > >
> > > Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> > > ---
> > >  include/linux/skbuff.h | 3 ++-
> > >  net/core/skbuff.c      | 1 +
> > >  2 files changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > index da0c1dddd0da..b90be4b0b2de 100644
> > > --- a/include/linux/skbuff.h
> > > +++ b/include/linux/skbuff.h
> > > @@ -478,6 +478,7 @@ struct ubuf_info {
> > >                 };
> > >         };
> > >         refcount_t refcnt;
> > > +       u8 zc_flags;
> > >
> > >         struct mmpin {
> > >                 struct user_struct *user;
> >
> > When allocating ubuf_info for msg_zerocopy, we actually allocate the
> > notification skb, to be sure that notifications won't be dropped due
> > to memory pressure at notification time. We actually allocate the skb
> > and place ubuf_info in skb->cb[].
> >
> > The struct is exactly 48 bytes on 64-bit platforms, filling all of cb.
> > This new field fills a 4B hole, so it should still be fine.
>
> Yes, I was careful not to increase the size.  I have future changes
> for this structure, moving 'struct mmpin' into a union.   Making the
> flags 16 bits shouldn't be a problem either.
>
>
> > Just being very explicit, as this is a fragile bit of code. Come to
> > think of it, this probably deserves a BUILD_BUG_ON.
>
> You mean like the one which exists in sock_zerocopy_alloc()?
>
>         BUILD_BUG_ON(sizeof(*uarg) > sizeof(skb->cb));

Ah good. Yes. I should have remembered that was there.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/12 v2 RFC] net: group skb_shinfo zerocopy related bits together.
  2020-12-22 17:21     ` Jonathan Lemon
@ 2020-12-22 22:26       ` Willem de Bruijn
  2020-12-22 22:40         ` Jonathan Lemon
  0 siblings, 1 reply; 33+ messages in thread
From: Willem de Bruijn @ 2020-12-22 22:26 UTC (permalink / raw)
  To: Jonathan Lemon; +Cc: Network Development, Eric Dumazet, Kernel Team

On Tue, Dec 22, 2020 at 12:22 PM Jonathan Lemon
<jonathan.lemon@gmail.com> wrote:
>
> On Tue, Dec 22, 2020 at 09:43:15AM -0500, Willem de Bruijn wrote:
> > On Mon, Dec 21, 2020 at 7:09 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
> > >
> > > From: Jonathan Lemon <bsd@fb.com>
> > >
> > > In preparation for expanded zerocopy (TX and RX), move
> > > the ZC related bits out of tx_flags into their own flag
> > > word.
> > >
> > > Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> >
> > I think it's better to expand tx_flags to a u16 and add the two new
> > flags that you need.
>
> Okay, but in that case, tx_flags is now wrong, since some of these flags
> will also be checked on the rx path.
>
>
> > That allows the additional 7 bits to be used for arbitrary flags, not
> > stranding 8 bits exactly only for zerocopy features.
> >
> > Moving around a few u8's in the same cacheline won't be a problem.
>
> How about rearranging them to form 16 bits, and just calling it 'flags'?

I thought that would be a good idea, but tx_flags is used in a lot
more places than I expected. That will be a lot of renaming. Let's
just either keep the name or add a new field.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 05/12 v2 RFC] skbuff: replace sock_zerocopy_put() with skb_zcopy_put()
  2020-12-22 17:19     ` Jonathan Lemon
@ 2020-12-22 22:36       ` Willem de Bruijn
  0 siblings, 0 replies; 33+ messages in thread
From: Willem de Bruijn @ 2020-12-22 22:36 UTC (permalink / raw)
  To: Jonathan Lemon; +Cc: Network Development, Eric Dumazet, Kernel Team

On Tue, Dec 22, 2020 at 12:19 PM Jonathan Lemon
<jonathan.lemon@gmail.com> wrote:
>
> On Tue, Dec 22, 2020 at 09:42:40AM -0500, Willem de Bruijn wrote:
> > On Mon, Dec 21, 2020 at 7:09 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
> > >
> > > From: Jonathan Lemon <bsd@fb.com>
> > >
> > > Replace sock_zerocopy_put with the generic skb_zcopy_put()
> > > function.  Pass 'true' as the success argument, as this
> > > is identical to no change.
> > >
> > > Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> >
> > uarg->zerocopy may be false if sock_zerocopy_put_abort is called from
> > tcp_sendmsg_locked
>
> Yes, it may well be false.  The original logic goes:
>
>    sock_zerocopy_put_abort()
>    sock_zerocopy_put()
>    sock_zerocopy_callback(..., success = uarg->zerocopy)
>      if (success)
>
> The new logic is now:
>
>    sock_zerocopy_put_abort()
>    sock_zerocopy_callback(..., success = true)
>      uarg->zerocopy = uarg->zerocopy & success
>      if (uarg->zerocopy)
>
> The success value ls latched into uarg->zerocopy, and any failure
> is persistent.

Good point.

It's not entirely obvious, and a bit unintuitive to pass success in an
abort statement. But it works.

Abort does not schedule a notification if no skb associated with the
uarg was sent. And if it was, the zerocopy state will reflect the & of
all those packets. On which note, if renaming the currently
inconsistent name, maybe renaming to zerocopy_success is the more
descriptive one, then.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/12 v2 RFC] skbuff: Push status and refcounts into sock_zerocopy_callback
  2020-12-22 17:48     ` Jonathan Lemon
@ 2020-12-22 22:38       ` Willem de Bruijn
  0 siblings, 0 replies; 33+ messages in thread
From: Willem de Bruijn @ 2020-12-22 22:38 UTC (permalink / raw)
  To: Jonathan Lemon; +Cc: Network Development, Eric Dumazet, Kernel Team

On Tue, Dec 22, 2020 at 12:48 PM Jonathan Lemon
<jonathan.lemon@gmail.com> wrote:
>
> On Tue, Dec 22, 2020 at 09:43:39AM -0500, Willem de Bruijn wrote:
> > On Mon, Dec 21, 2020 at 7:09 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
> > >
> > > From: Jonathan Lemon <bsd@fb.com>
> > >
> > > Before this change, the caller of sock_zerocopy_callback would
> > > need to save the zerocopy status, decrement and check the refcount,
> > > and then call the callback function - the callback was only invoked
> > > when the refcount reached zero.
> > >
> > > Now, the caller just passes the status into the callback function,
> > > which saves the status and handles its own refcounts.
> > >
> > > This makes the behavior of the sock_zerocopy_callback identical
> > > to the tpacket and vhost callbacks.
> > >
> > > Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> > > ---
> > >  include/linux/skbuff.h |  3 ---
> > >  net/core/skbuff.c      | 14 +++++++++++---
> > >  2 files changed, 11 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > index fb6dd6af0f82..c9d7de9d624d 100644
> > > --- a/include/linux/skbuff.h
> > > +++ b/include/linux/skbuff.h
> > > @@ -1482,9 +1482,6 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy)
> > >         if (uarg) {
> > >                 if (skb_zcopy_is_nouarg(skb)) {
> > >                         /* no notification callback */
> > > -               } else if (uarg->callback == sock_zerocopy_callback) {
> > > -                       uarg->zerocopy = uarg->zerocopy && zerocopy;
> > > -                       sock_zerocopy_put(uarg);
> > >                 } else {
> > >                         uarg->callback(uarg, zerocopy);
> > >                 }
> > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > index ea32b3414ad6..73699dbdc4a1 100644
> > > --- a/net/core/skbuff.c
> > > +++ b/net/core/skbuff.c
> > > @@ -1194,7 +1194,7 @@ static bool skb_zerocopy_notify_extend(struct sk_buff *skb, u32 lo, u16 len)
> > >         return true;
> > >  }
> > >
> > > -void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
> > > +static void __sock_zerocopy_callback(struct ubuf_info *uarg)
> > >  {
> > >         struct sk_buff *tail, *skb = skb_from_uarg(uarg);
> > >         struct sock_exterr_skb *serr;
> > > @@ -1222,7 +1222,7 @@ void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
> > >         serr->ee.ee_origin = SO_EE_ORIGIN_ZEROCOPY;
> > >         serr->ee.ee_data = hi;
> > >         serr->ee.ee_info = lo;
> > > -       if (!success)
> > > +       if (!uarg->zerocopy)
> > >                 serr->ee.ee_code |= SO_EE_CODE_ZEROCOPY_COPIED;
> > >
> > >         q = &sk->sk_error_queue;
> > > @@ -1241,11 +1241,19 @@ void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
> > >         consume_skb(skb);
> > >         sock_put(sk);
> > >  }
> > > +
> > > +void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
> > > +{
> > > +       uarg->zerocopy = uarg->zerocopy & success;
> > > +
> > > +       if (refcount_dec_and_test(&uarg->refcnt))
> > > +               __sock_zerocopy_callback(uarg);
> > > +}
> > >  EXPORT_SYMBOL_GPL(sock_zerocopy_callback);
> >
> > I still think this helper is unnecessary. Just return immediately in
> > existing sock_zerocopy_callback if refcount is not zero.
>
> I think the helper makes the logic clearer, and prevents misuse of
> the success variable in the main function (use of last value vs the
> latched value).  If you really feel that strongly about it, I'll
> fold it into one function.

Ok. Thanks, no, it's fine.

>
> > >  void sock_zerocopy_put(struct ubuf_info *uarg)
> > >  {
> > > -       if (uarg && refcount_dec_and_test(&uarg->refcnt))
> > > +       if (uarg)
> > >                 uarg->callback(uarg, uarg->zerocopy);
> > >  }
> > >  EXPORT_SYMBOL_GPL(sock_zerocopy_put);
> >
> > This does increase the number of indirect function calls. Which are
> > not cheap post spectre.
> >
> > In the common case for msg_zerocopy we only have two clones, one sent
> > out and one on the retransmit queue. So I guess the cost will be
> > acceptable.
>
> Yes, this was the source of my original comment about this being
> a slight pessimization - moving the refcount into the function.
>
> I briefly considered adding a flag like 'use_refcnt', so the logic
> becomes:
>
>     if (uarg && (!uarg->use_refcnt || refcount_dec_and_test(&uarg->refcnt)))
>
> But thought this might be too much micro-optimization.  But if
> the call overhead is significant, I can put this back in.

Agreed on the premature optimization. Let's find out :)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/12 v2 RFC] net: group skb_shinfo zerocopy related bits together.
  2020-12-22 22:26       ` Willem de Bruijn
@ 2020-12-22 22:40         ` Jonathan Lemon
  2020-12-22 22:56           ` Willem de Bruijn
  0 siblings, 1 reply; 33+ messages in thread
From: Jonathan Lemon @ 2020-12-22 22:40 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, Eric Dumazet, Kernel Team

On Tue, Dec 22, 2020 at 05:26:15PM -0500, Willem de Bruijn wrote:
> On Tue, Dec 22, 2020 at 12:22 PM Jonathan Lemon
> <jonathan.lemon@gmail.com> wrote:
> >
> > On Tue, Dec 22, 2020 at 09:43:15AM -0500, Willem de Bruijn wrote:
> > > On Mon, Dec 21, 2020 at 7:09 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
> > > >
> > > > From: Jonathan Lemon <bsd@fb.com>
> > > >
> > > > In preparation for expanded zerocopy (TX and RX), move
> > > > the ZC related bits out of tx_flags into their own flag
> > > > word.
> > > >
> > > > Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> > >
> > > I think it's better to expand tx_flags to a u16 and add the two new
> > > flags that you need.
> >
> > Okay, but in that case, tx_flags is now wrong, since some of these flags
> > will also be checked on the rx path.
> >
> >
> > > That allows the additional 7 bits to be used for arbitrary flags, not
> > > stranding 8 bits exactly only for zerocopy features.
> > >
> > > Moving around a few u8's in the same cacheline won't be a problem.
> >
> > How about rearranging them to form 16 bits, and just calling it 'flags'?
> 
> I thought that would be a good idea, but tx_flags is used in a lot
> more places than I expected. That will be a lot of renaming. Let's
> just either keep the name or add a new field.

Hmm, which?  I went with "add new field" = zc_flags.  I can rename this
to something else if that's too specific.  Suggestions?
-- 
Jonathan   "naming is hard!"

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/12 v2 RFC] net: group skb_shinfo zerocopy related bits together.
  2020-12-22 22:40         ` Jonathan Lemon
@ 2020-12-22 22:56           ` Willem de Bruijn
  0 siblings, 0 replies; 33+ messages in thread
From: Willem de Bruijn @ 2020-12-22 22:56 UTC (permalink / raw)
  To: Jonathan Lemon; +Cc: Network Development, Eric Dumazet, Kernel Team

On Tue, Dec 22, 2020 at 5:40 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
>
> On Tue, Dec 22, 2020 at 05:26:15PM -0500, Willem de Bruijn wrote:
> > On Tue, Dec 22, 2020 at 12:22 PM Jonathan Lemon
> > <jonathan.lemon@gmail.com> wrote:
> > >
> > > On Tue, Dec 22, 2020 at 09:43:15AM -0500, Willem de Bruijn wrote:
> > > > On Mon, Dec 21, 2020 at 7:09 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
> > > > >
> > > > > From: Jonathan Lemon <bsd@fb.com>
> > > > >
> > > > > In preparation for expanded zerocopy (TX and RX), move
> > > > > the ZC related bits out of tx_flags into their own flag
> > > > > word.
> > > > >
> > > > > Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> > > >
> > > > I think it's better to expand tx_flags to a u16 and add the two new
> > > > flags that you need.
> > >
> > > Okay, but in that case, tx_flags is now wrong, since some of these flags
> > > will also be checked on the rx path.
> > >
> > >
> > > > That allows the additional 7 bits to be used for arbitrary flags, not
> > > > stranding 8 bits exactly only for zerocopy features.
> > > >
> > > > Moving around a few u8's in the same cacheline won't be a problem.
> > >
> > > How about rearranging them to form 16 bits, and just calling it 'flags'?
> >
> > I thought that would be a good idea, but tx_flags is used in a lot
> > more places than I expected. That will be a lot of renaming. Let's
> > just either keep the name or add a new field.
>
> Hmm, which?  I went with "add new field" = zc_flags.  I can rename this
> to something else if that's too specific.  Suggestions?

Just flags, for now? High order bit is to not move and rename all the
existing flags for the sake of it. Name itself is less important.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2020-12-22 22:57 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-22  0:09 [PATCH 00/12 v2 RFC] Generic zcopy_* functions Jonathan Lemon
2020-12-22  0:09 ` [PATCH 01/12 v2 RFC] net: group skb_shinfo zerocopy related bits together Jonathan Lemon
2020-12-22 14:43   ` Willem de Bruijn
2020-12-22 17:21     ` Jonathan Lemon
2020-12-22 22:26       ` Willem de Bruijn
2020-12-22 22:40         ` Jonathan Lemon
2020-12-22 22:56           ` Willem de Bruijn
2020-12-22  0:09 ` [PATCH 02/12 v2 RFC] skbuff: remove unused skb_zcopy_abort function Jonathan Lemon
2020-12-22 15:00   ` Willem de Bruijn
2020-12-22  0:09 ` [PATCH 03/12 v2 RFC] skbuff: simplify sock_zerocopy_put Jonathan Lemon
2020-12-22 14:43   ` Willem de Bruijn
2020-12-22 16:52   ` David Ahern
2020-12-22 16:56     ` David Ahern
2020-12-22  0:09 ` [PATCH 04/12 v2 RFC] skbuff: Push status and refcounts into sock_zerocopy_callback Jonathan Lemon
2020-12-22 14:43   ` Willem de Bruijn
2020-12-22 16:49     ` David Ahern
2020-12-22 17:48     ` Jonathan Lemon
2020-12-22 22:38       ` Willem de Bruijn
2020-12-22  0:09 ` [PATCH 05/12 v2 RFC] skbuff: replace sock_zerocopy_put() with skb_zcopy_put() Jonathan Lemon
2020-12-22 14:42   ` Willem de Bruijn
2020-12-22 17:19     ` Jonathan Lemon
2020-12-22 22:36       ` Willem de Bruijn
2020-12-22  0:09 ` [PATCH 06/12 v2 RFC] skbuff: replace sock_zerocopy_get with skb_zcopy_get Jonathan Lemon
2020-12-22  0:09 ` [PATCH 07/12 v2 RFC] skbuff: Add skb parameter to the ubuf zerocopy callback Jonathan Lemon
2020-12-22  0:09 ` [PATCH 08/12 v2 RFC] skbuff: Call sock_zerocopy_put_abort from skb_zcopy_put_abort Jonathan Lemon
2020-12-22  0:09 ` [PATCH 09/12 v2 RFC] skbuff: add zc_flags to ubuf_info for ubuf setup Jonathan Lemon
2020-12-22 15:00   ` Willem de Bruijn
2020-12-22 18:17     ` Jonathan Lemon
2020-12-22 18:29       ` Willem de Bruijn
2020-12-22  0:09 ` [PATCH 10/12 v2 RFC] tap/tun: use skb_zcopy_set() instead of open coded assignment Jonathan Lemon
2020-12-22  0:09 ` [PATCH 11/12 v2 RFC] skbuff: Call skb_zcopy_clear() before unref'ing fragments Jonathan Lemon
2020-12-22  0:09 ` [PATCH 12/12 v2 RFC] skbuff: rename sock_zerocopy_* to msg_zerocopy_* Jonathan Lemon
2020-12-22 14:55   ` Willem de Bruijn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.