All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9 v1 RFC] Generic zcopy_* functions
@ 2020-12-18 20:16 Jonathan Lemon
  2020-12-18 20:16 ` [PATCH 1/9 v1 RFC] net: group skb_shinfo zerocopy related bits together Jonathan Lemon
                   ` (9 more replies)
  0 siblings, 10 replies; 19+ messages in thread
From: Jonathan Lemon @ 2020-12-18 20:16 UTC (permalink / raw)
  To: netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

This is set of cleanup patches for zerocopy which are intended
to allow a introduction of a different zerocopy implementation.

The top level api will use the skb_zcopy_*() functions, while
the current TCP specific zerocopy would use the sock_zerocopy_*()
calls.

There should be no functional changes from these patches.

Patch 1:
  Move zerocopy bits from tx_flags into zc_flags for clarity.
  These bits will be used in the RX path in the future.
Patch 2: remove dead function
Patch 3:
  Replace sock_zerocopy_put() with skb_zcopy_put(), moving
  the zerocopy logic into sock_zerocopy_callback().  Push the
  refcounting into the callback, since not all implemenetations
  will have a refcount.
Patch 4: rename sock_zerocopy_get for consistency.
Patch 5:
  Add an optional skb parameter to callback, allowing access to
  the attached skb from the callback.
Patch 6:
  Add skb_zcopy_put_abort, and move zerocopy logic into the 
  callback function.  There unfortunately is still a check 
  against the callback type here.
Patch 7:
  Set the skb zc_flags from the ubuf being attached, instead
  of a fixed value, allowing different initialization types.
Patch 8: Replace open-coded assignments
Patch 9: Relocate skb_zcopy_clear() in skb_release_data()

Jonathan Lemon (9):
  net: group skb_shinfo zerocopy related bits together.
  skbuff: remove unused skb_zcopy_abort function
  skbuff: replace sock_zerocopy_put() with skb_zcopy_put()
  skbuff: replace sock_zerocopy_get with skb_zcopy_get
  skbuff: Add skb parameter to the ubuf zerocopy callback
  skbuff: Call sock_zerocopy_put_abort from skb_zcopy_put_abort
  skbuff: add zc_flags to ubuf_info for ubuf setup
  tap/tun: use skb_zcopy_set() instead of open coded assignment
  skbuff: Call skb_zcopy_clear() before unref'ing fragments

 drivers/net/tap.c                   |  6 +--
 drivers/net/tun.c                   |  6 +--
 drivers/net/xen-netback/common.h    |  3 +-
 drivers/net/xen-netback/interface.c |  4 +-
 drivers/net/xen-netback/netback.c   |  7 +--
 drivers/vhost/net.c                 |  4 +-
 include/linux/skbuff.h              | 77 +++++++++++++++--------------
 net/core/skbuff.c                   | 48 +++++++++---------
 net/ipv4/ip_output.c                |  3 +-
 net/ipv4/tcp.c                      |  6 +--
 net/ipv6/ip6_output.c               |  3 +-
 net/kcm/kcmsock.c                   |  4 +-
 12 files changed, 85 insertions(+), 86 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/9 v1 RFC] net: group skb_shinfo zerocopy related bits together.
  2020-12-18 20:16 [PATCH 0/9 v1 RFC] Generic zcopy_* functions Jonathan Lemon
@ 2020-12-18 20:16 ` Jonathan Lemon
  2020-12-18 20:16 ` [PATCH 2/9 v1 RFC] skbuff: remove unused skb_zcopy_abort function Jonathan Lemon
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Jonathan Lemon @ 2020-12-18 20:16 UTC (permalink / raw)
  To: netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

In preparation for expanded zerocopy (TX and RX), move
the ZC related bits out of tx_flags into their own flag
word.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 drivers/net/tap.c                   |  3 +--
 drivers/net/tun.c                   |  3 +--
 drivers/net/xen-netback/interface.c |  4 ++--
 include/linux/skbuff.h              | 33 ++++++++++++++++-------------
 net/core/skbuff.c                   | 10 ++++-----
 net/ipv4/tcp.c                      |  2 +-
 net/kcm/kcmsock.c                   |  4 ++--
 7 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 1f4bdd94407a..3e9fb753ce88 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -723,8 +723,7 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
 	/* copy skb_ubuf_info for callback when skb has no error */
 	if (zerocopy) {
 		skb_shinfo(skb)->destructor_arg = msg_control;
-		skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
-		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
+		skb_shinfo(skb)->zc_flags |= SKBZC_FRAGMENTS;
 	} else if (msg_control) {
 		struct ubuf_info *uarg = msg_control;
 		uarg->callback(uarg, false);
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index fbed05ae7b0f..80cb3bef3afd 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1815,8 +1815,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 	/* copy skb_ubuf_info for callback when skb has no error */
 	if (zerocopy) {
 		skb_shinfo(skb)->destructor_arg = msg_control;
-		skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
-		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
+		skb_shinfo(skb)->zc_flags |= SKBZC_FRAGMENTS;
 	} else if (msg_control) {
 		struct ubuf_info *uarg = msg_control;
 		uarg->callback(uarg, false);
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index acb786d8b1d8..ec790df75be3 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -47,7 +47,7 @@
 /* Number of bytes allowed on the internal guest Rx queue. */
 #define XENVIF_RX_QUEUE_BYTES (XEN_NETIF_RX_RING_SIZE/2 * PAGE_SIZE)
 
-/* This function is used to set SKBTX_DEV_ZEROCOPY as well as
+/* This function is used to set SKBZC_ENABLE as well as
  * increasing the inflight counter. We need to increase the inflight
  * counter because core driver calls into xenvif_zerocopy_callback
  * which calls xenvif_skb_zerocopy_complete.
@@ -55,7 +55,7 @@
 void xenvif_skb_zerocopy_prepare(struct xenvif_queue *queue,
 				 struct sk_buff *skb)
 {
-	skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
+	skb_shinfo(skb)->zc_flags |= SKBZC_ENABLE;
 	atomic_inc(&queue->inflight_packets);
 }
 
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 333bcdc39635..69588b304f83 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -430,24 +430,27 @@ enum {
 	/* device driver is going to provide hardware time stamp */
 	SKBTX_IN_PROGRESS = 1 << 2,
 
-	/* device driver supports TX zero-copy buffers */
-	SKBTX_DEV_ZEROCOPY = 1 << 3,
-
 	/* generate wifi status information (where possible) */
 	SKBTX_WIFI_STATUS = 1 << 4,
 
+	/* generate software time stamp when entering packet scheduling */
+	SKBTX_SCHED_TSTAMP = 1 << 6,
+};
+
+/* Definitions for zc_flags in struct skb_shared_info */
+enum {
+	/* use zcopy routines */
+	SKBZC_ENABLE = BIT(0),
+
 	/* This indicates at least one fragment might be overwritten
 	 * (as in vmsplice(), sendfile() ...)
 	 * If we need to compute a TX checksum, we'll need to copy
 	 * all frags to avoid possible bad checksum
 	 */
-	SKBTX_SHARED_FRAG = 1 << 5,
-
-	/* generate software time stamp when entering packet scheduling */
-	SKBTX_SCHED_TSTAMP = 1 << 6,
+	SKBZC_SHARED_FRAG = BIT(1),
 };
 
-#define SKBTX_ZEROCOPY_FRAG	(SKBTX_DEV_ZEROCOPY | SKBTX_SHARED_FRAG)
+#define SKBZC_FRAGMENTS		(SKBZC_ENABLE | SKBZC_SHARED_FRAG)
 #define SKBTX_ANY_SW_TSTAMP	(SKBTX_SW_TSTAMP    | \
 				 SKBTX_SCHED_TSTAMP)
 #define SKBTX_ANY_TSTAMP	(SKBTX_HW_TSTAMP | SKBTX_ANY_SW_TSTAMP)
@@ -510,7 +513,7 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb,
  * the end of the header data, ie. at skb->end.
  */
 struct skb_shared_info {
-	__u8		__unused;
+	__u8		zc_flags;
 	__u8		meta_len;
 	__u8		nr_frags;
 	__u8		tx_flags;
@@ -1437,7 +1440,7 @@ static inline struct skb_shared_hwtstamps *skb_hwtstamps(struct sk_buff *skb)
 
 static inline struct ubuf_info *skb_zcopy(struct sk_buff *skb)
 {
-	bool is_zcopy = skb && skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY;
+	bool is_zcopy = skb && skb_shinfo(skb)->zc_flags & SKBZC_ENABLE;
 
 	return is_zcopy ? skb_uarg(skb) : NULL;
 }
@@ -1451,14 +1454,14 @@ static inline void skb_zcopy_set(struct sk_buff *skb, struct ubuf_info *uarg,
 		else
 			sock_zerocopy_get(uarg);
 		skb_shinfo(skb)->destructor_arg = uarg;
-		skb_shinfo(skb)->tx_flags |= SKBTX_ZEROCOPY_FRAG;
+		skb_shinfo(skb)->zc_flags |= SKBZC_FRAGMENTS;
 	}
 }
 
 static inline void skb_zcopy_set_nouarg(struct sk_buff *skb, void *val)
 {
 	skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t) val | 0x1UL);
-	skb_shinfo(skb)->tx_flags |= SKBTX_ZEROCOPY_FRAG;
+	skb_shinfo(skb)->zc_flags |= SKBZC_FRAGMENTS;
 }
 
 static inline bool skb_zcopy_is_nouarg(struct sk_buff *skb)
@@ -1486,7 +1489,7 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy)
 			uarg->callback(uarg, zerocopy);
 		}
 
-		skb_shinfo(skb)->tx_flags &= ~SKBTX_ZEROCOPY_FRAG;
+		skb_shinfo(skb)->zc_flags &= ~SKBZC_FRAGMENTS;
 	}
 }
 
@@ -1497,7 +1500,7 @@ static inline void skb_zcopy_abort(struct sk_buff *skb)
 
 	if (uarg) {
 		sock_zerocopy_put_abort(uarg, false);
-		skb_shinfo(skb)->tx_flags &= ~SKBTX_ZEROCOPY_FRAG;
+		skb_shinfo(skb)->zc_flags &= ~SKBZC_FRAGMENTS;
 	}
 }
 
@@ -3323,7 +3326,7 @@ static inline int skb_linearize(struct sk_buff *skb)
 static inline bool skb_has_shared_frag(const struct sk_buff *skb)
 {
 	return skb_is_nonlinear(skb) &&
-	       skb_shinfo(skb)->tx_flags & SKBTX_SHARED_FRAG;
+	       skb_shinfo(skb)->zc_flags & SKBZC_SHARED_FRAG;
 }
 
 /**
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f62cae3f75d8..327ee8938f78 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1330,7 +1330,7 @@ static int skb_zerocopy_clone(struct sk_buff *nskb, struct sk_buff *orig,
  *	@skb: the skb to modify
  *	@gfp_mask: allocation priority
  *
- *	This must be called on SKBTX_DEV_ZEROCOPY skb.
+ *	This must be called on SKBZC_ENABLE skb.
  *	It will copy all frags into kernel and drop the reference
  *	to userspace pages.
  *
@@ -3267,8 +3267,8 @@ void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len)
 {
 	int pos = skb_headlen(skb);
 
-	skb_shinfo(skb1)->tx_flags |= skb_shinfo(skb)->tx_flags &
-				      SKBTX_SHARED_FRAG;
+	skb_shinfo(skb1)->zc_flags |= skb_shinfo(skb)->zc_flags &
+				      SKBZC_SHARED_FRAG;
 	skb_zerocopy_clone(skb1, skb, 0);
 	if (len < pos)	/* Split line is inside header. */
 		skb_split_inside_header(skb, skb1, len, pos);
@@ -3957,8 +3957,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 		skb_copy_from_linear_data_offset(head_skb, offset,
 						 skb_put(nskb, hsize), hsize);
 
-		skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
-					      SKBTX_SHARED_FRAG;
+		skb_shinfo(nskb)->zc_flags |= skb_shinfo(head_skb)->zc_flags &
+					      SKBZC_SHARED_FRAG;
 
 		if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
 		    skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index ed42d2193c5c..fea9bae370e4 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1010,7 +1010,7 @@ struct sk_buff *tcp_build_frag(struct sock *sk, int size_goal, int flags,
 	}
 
 	if (!(flags & MSG_NO_SHARED_FRAGS))
-		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
+		skb_shinfo(skb)->zc_flags |= SKBZC_SHARED_FRAG;
 
 	skb->len += copy;
 	skb->data_len += copy;
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index 56dad9565bc9..55c04d8c659a 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -786,7 +786,7 @@ static ssize_t kcm_sendpage(struct socket *sock, struct page *page,
 
 		if (skb_can_coalesce(skb, i, page, offset)) {
 			skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], size);
-			skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
+			skb_shinfo(skb)->zc_flags |= SKBZC_SHARED_FRAG;
 			goto coalesced;
 		}
 
@@ -834,7 +834,7 @@ static ssize_t kcm_sendpage(struct socket *sock, struct page *page,
 
 	get_page(page);
 	skb_fill_page_desc(skb, i, page, offset, size);
-	skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
+	skb_shinfo(skb)->zc_flags |= SKBZC_SHARED_FRAG;
 
 coalesced:
 	skb->len += size;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/9 v1 RFC] skbuff: remove unused skb_zcopy_abort function
  2020-12-18 20:16 [PATCH 0/9 v1 RFC] Generic zcopy_* functions Jonathan Lemon
  2020-12-18 20:16 ` [PATCH 1/9 v1 RFC] net: group skb_shinfo zerocopy related bits together Jonathan Lemon
@ 2020-12-18 20:16 ` Jonathan Lemon
  2020-12-18 20:16 ` [PATCH 3/9 v1 RFC] skbuff: replace sock_zerocopy_put() with skb_zcopy_put() Jonathan Lemon
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Jonathan Lemon @ 2020-12-18 20:16 UTC (permalink / raw)
  To: netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

skb_zcopy_abort() has no in-tree consumers, remove it.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 include/linux/skbuff.h | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 69588b304f83..fb6dd6af0f82 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1493,17 +1493,6 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy)
 	}
 }
 
-/* Abort a zerocopy operation and revert zckey on error in send syscall */
-static inline void skb_zcopy_abort(struct sk_buff *skb)
-{
-	struct ubuf_info *uarg = skb_zcopy(skb);
-
-	if (uarg) {
-		sock_zerocopy_put_abort(uarg, false);
-		skb_shinfo(skb)->zc_flags &= ~SKBZC_FRAGMENTS;
-	}
-}
-
 static inline void skb_mark_not_on_list(struct sk_buff *skb)
 {
 	skb->next = NULL;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/9 v1 RFC] skbuff: replace sock_zerocopy_put() with skb_zcopy_put()
  2020-12-18 20:16 [PATCH 0/9 v1 RFC] Generic zcopy_* functions Jonathan Lemon
  2020-12-18 20:16 ` [PATCH 1/9 v1 RFC] net: group skb_shinfo zerocopy related bits together Jonathan Lemon
  2020-12-18 20:16 ` [PATCH 2/9 v1 RFC] skbuff: remove unused skb_zcopy_abort function Jonathan Lemon
@ 2020-12-18 20:16 ` Jonathan Lemon
  2020-12-19 18:46   ` Willem de Bruijn
  2020-12-18 20:16 ` [PATCH 4/9 v1 RFC] skbuff: replace sock_zerocopy_get with skb_zcopy_get Jonathan Lemon
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Jonathan Lemon @ 2020-12-18 20:16 UTC (permalink / raw)
  To: netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

In preparation for further work, the zcopy* routines will
become basic building blocks, while the zerocopy* ones will
be specific for the existing zerocopy implementation.

All uargs should have a callback function, (unless nouarg
is set), so push all special case logic handling down into
the callbacks.  This slightly pessimizes the refcounted cases,
but makes the skb_zcopy_*() routines clearer.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 include/linux/skbuff.h | 19 +++++++++----------
 net/core/skbuff.c      | 21 +++++++++------------
 net/ipv4/tcp.c         |  2 +-
 3 files changed, 19 insertions(+), 23 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index fb6dd6af0f82..df98d61e8c51 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -499,7 +499,6 @@ static inline void sock_zerocopy_get(struct ubuf_info *uarg)
 	refcount_inc(&uarg->refcnt);
 }
 
-void sock_zerocopy_put(struct ubuf_info *uarg);
 void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref);
 
 void sock_zerocopy_callback(struct ubuf_info *uarg, bool success);
@@ -1474,20 +1473,20 @@ static inline void *skb_zcopy_get_nouarg(struct sk_buff *skb)
 	return (void *)((uintptr_t) skb_shinfo(skb)->destructor_arg & ~0x1UL);
 }
 
+static inline void skb_zcopy_put(struct ubuf_info *uarg)
+{
+	if (uarg)
+		uarg->callback(uarg, true);
+}
+
 /* Release a reference on a zerocopy structure */
-static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy)
+static inline void skb_zcopy_clear(struct sk_buff *skb, bool succsss)
 {
 	struct ubuf_info *uarg = skb_zcopy(skb);
 
 	if (uarg) {
-		if (skb_zcopy_is_nouarg(skb)) {
-			/* no notification callback */
-		} else if (uarg->callback == sock_zerocopy_callback) {
-			uarg->zerocopy = uarg->zerocopy && zerocopy;
-			sock_zerocopy_put(uarg);
-		} else {
-			uarg->callback(uarg, zerocopy);
-		}
+		if (!skb_zcopy_is_nouarg(skb))
+			uarg->callback(uarg, succsss);
 
 		skb_shinfo(skb)->zc_flags &= ~SKBZC_FRAGMENTS;
 	}
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 327ee8938f78..984760dd670b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1194,7 +1194,7 @@ static bool skb_zerocopy_notify_extend(struct sk_buff *skb, u32 lo, u16 len)
 	return true;
 }
 
-void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
+static void __sock_zerocopy_callback(struct ubuf_info *uarg)
 {
 	struct sk_buff *tail, *skb = skb_from_uarg(uarg);
 	struct sock_exterr_skb *serr;
@@ -1222,7 +1222,7 @@ void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
 	serr->ee.ee_origin = SO_EE_ORIGIN_ZEROCOPY;
 	serr->ee.ee_data = hi;
 	serr->ee.ee_info = lo;
-	if (!success)
+	if (!uarg->zerocopy)
 		serr->ee.ee_code |= SO_EE_CODE_ZEROCOPY_COPIED;
 
 	q = &sk->sk_error_queue;
@@ -1241,18 +1241,15 @@ void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
 	consume_skb(skb);
 	sock_put(sk);
 }
-EXPORT_SYMBOL_GPL(sock_zerocopy_callback);
 
-void sock_zerocopy_put(struct ubuf_info *uarg)
+void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
 {
-	if (uarg && refcount_dec_and_test(&uarg->refcnt)) {
-		if (uarg->callback)
-			uarg->callback(uarg, uarg->zerocopy);
-		else
-			consume_skb(skb_from_uarg(uarg));
-	}
+	uarg->zerocopy = uarg->zerocopy & success;
+
+	if (refcount_dec_and_test(&uarg->refcnt))
+		__sock_zerocopy_callback(uarg);
 }
-EXPORT_SYMBOL_GPL(sock_zerocopy_put);
+EXPORT_SYMBOL_GPL(sock_zerocopy_callback);
 
 void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref)
 {
@@ -1263,7 +1260,7 @@ void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref)
 		uarg->len--;
 
 		if (have_uref)
-			sock_zerocopy_put(uarg);
+			skb_zcopy_put(uarg);
 	}
 }
 EXPORT_SYMBOL_GPL(sock_zerocopy_put_abort);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index fea9bae370e4..5c38080df13f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1429,7 +1429,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 		tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
 	}
 out_nopush:
-	sock_zerocopy_put(uarg);
+	skb_zcopy_put(uarg);
 	return copied + copied_syn;
 
 do_error:
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/9 v1 RFC] skbuff: replace sock_zerocopy_get with skb_zcopy_get
  2020-12-18 20:16 [PATCH 0/9 v1 RFC] Generic zcopy_* functions Jonathan Lemon
                   ` (2 preceding siblings ...)
  2020-12-18 20:16 ` [PATCH 3/9 v1 RFC] skbuff: replace sock_zerocopy_put() with skb_zcopy_put() Jonathan Lemon
@ 2020-12-18 20:16 ` Jonathan Lemon
  2020-12-18 20:16 ` [PATCH 5/9 v1 RFC] skbuff: Add skb parameter to the ubuf zerocopy callback Jonathan Lemon
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Jonathan Lemon @ 2020-12-18 20:16 UTC (permalink / raw)
  To: netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

Rename the get routines for consistency.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 include/linux/skbuff.h | 4 ++--
 net/core/skbuff.c      | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index df98d61e8c51..638feaf98f17 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -494,7 +494,7 @@ struct ubuf_info *sock_zerocopy_alloc(struct sock *sk, size_t size);
 struct ubuf_info *sock_zerocopy_realloc(struct sock *sk, size_t size,
 					struct ubuf_info *uarg);
 
-static inline void sock_zerocopy_get(struct ubuf_info *uarg)
+static inline void skb_zcopy_get(struct ubuf_info *uarg)
 {
 	refcount_inc(&uarg->refcnt);
 }
@@ -1451,7 +1451,7 @@ static inline void skb_zcopy_set(struct sk_buff *skb, struct ubuf_info *uarg,
 		if (unlikely(have_ref && *have_ref))
 			*have_ref = false;
 		else
-			sock_zerocopy_get(uarg);
+			skb_zcopy_get(uarg);
 		skb_shinfo(skb)->destructor_arg = uarg;
 		skb_shinfo(skb)->zc_flags |= SKBZC_FRAGMENTS;
 	}
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 984760dd670b..fbf0a145467a 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1163,7 +1163,7 @@ struct ubuf_info *sock_zerocopy_realloc(struct sock *sk, size_t size,
 
 			/* no extra ref when appending to datagram (MSG_MORE) */
 			if (sk->sk_type == SOCK_STREAM)
-				sock_zerocopy_get(uarg);
+				skb_zcopy_get(uarg);
 
 			return uarg;
 		}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/9 v1 RFC] skbuff: Add skb parameter to the ubuf zerocopy callback
  2020-12-18 20:16 [PATCH 0/9 v1 RFC] Generic zcopy_* functions Jonathan Lemon
                   ` (3 preceding siblings ...)
  2020-12-18 20:16 ` [PATCH 4/9 v1 RFC] skbuff: replace sock_zerocopy_get with skb_zcopy_get Jonathan Lemon
@ 2020-12-18 20:16 ` Jonathan Lemon
  2020-12-18 20:16 ` [PATCH 6/9 v1 RFC] skbuff: Call sock_zerocopy_put_abort from skb_zcopy_put_abort Jonathan Lemon
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Jonathan Lemon @ 2020-12-18 20:16 UTC (permalink / raw)
  To: netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

Add an optional skb parameter to the zerocopy callback parameter,
which is passed down from skb_zcopy_clear().  This gives access
to the original skb, which is needed for upcoming RX zero-copy
error handling.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 drivers/net/tap.c                 |  2 +-
 drivers/net/tun.c                 |  2 +-
 drivers/net/xen-netback/common.h  |  3 ++-
 drivers/net/xen-netback/netback.c |  7 ++++---
 drivers/vhost/net.c               |  3 ++-
 include/linux/skbuff.h            | 13 +++++++------
 net/core/skbuff.c                 |  3 ++-
 7 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 3e9fb753ce88..c2bcbf9218dc 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -726,7 +726,7 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
 		skb_shinfo(skb)->zc_flags |= SKBZC_FRAGMENTS;
 	} else if (msg_control) {
 		struct ubuf_info *uarg = msg_control;
-		uarg->callback(uarg, false);
+		uarg->callback(NULL, uarg, false);
 	}
 
 	if (tap) {
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 80cb3bef3afd..bad4b0229584 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1818,7 +1818,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		skb_shinfo(skb)->zc_flags |= SKBZC_FRAGMENTS;
 	} else if (msg_control) {
 		struct ubuf_info *uarg = msg_control;
-		uarg->callback(uarg, false);
+		uarg->callback(NULL, uarg, false);
 	}
 
 	skb_reset_network_header(skb);
diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 8ee24e351bdc..4df001e960ca 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -399,7 +399,8 @@ void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb);
 void xenvif_carrier_on(struct xenvif *vif);
 
 /* Callback from stack when TX packet can be released */
-void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success);
+void xenvif_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *ubuf,
+			      bool success);
 
 /* Unmap a pending page and release it back to the guest */
 void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx);
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index bc3421d14576..49288ae2c4dd 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1091,7 +1091,7 @@ static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *s
 	uarg = skb_shinfo(skb)->destructor_arg;
 	/* increase inflight counter to offset decrement in callback */
 	atomic_inc(&queue->inflight_packets);
-	uarg->callback(uarg, true);
+	uarg->callback(NULL, uarg, true);
 	skb_shinfo(skb)->destructor_arg = NULL;
 
 	/* Fill the skb with the new (local) frags. */
@@ -1228,7 +1228,8 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
 	return work_done;
 }
 
-void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
+void xenvif_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *ubuf,
+			      bool success)
 {
 	unsigned long flags;
 	pending_ring_idx_t index;
@@ -1253,7 +1254,7 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
 	} while (ubuf);
 	spin_unlock_irqrestore(&queue->callback_lock, flags);
 
-	if (likely(zerocopy_success))
+	if (likely(success))
 		queue->stats.tx_zerocopy_success++;
 	else
 		queue->stats.tx_zerocopy_fail++;
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 531a00d703cd..bf28d0b75c1b 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -381,7 +381,8 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net,
 	}
 }
 
-static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
+static void vhost_zerocopy_callback(struct sk_buff *skb,
+				    struct ubuf_info *ubuf, bool success)
 {
 	struct vhost_net_ubuf_ref *ubufs = ubuf->ctx;
 	struct vhost_virtqueue *vq = ubufs->vq;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 638feaf98f17..64ae6f3adcd5 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -458,13 +458,13 @@ enum {
 /*
  * The callback notifies userspace to release buffers when skb DMA is done in
  * lower device, the skb last reference should be 0 when calling this.
- * The zerocopy_success argument is true if zero copy transmit occurred,
- * false on data copy or out of memory error caused by data copy attempt.
+ * The success argument is true if zero copy transmit occurred, false on
+ * data copy or out of memory error caused by data copy attempt.
  * The ctx field is used to track device context.
  * The desc field is used to track userspace buffer index.
  */
 struct ubuf_info {
-	void (*callback)(struct ubuf_info *, bool zerocopy_success);
+	void (*callback)(struct sk_buff *, struct ubuf_info *, bool success);
 	union {
 		struct {
 			unsigned long desc;
@@ -501,7 +501,8 @@ static inline void skb_zcopy_get(struct ubuf_info *uarg)
 
 void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref);
 
-void sock_zerocopy_callback(struct ubuf_info *uarg, bool success);
+void sock_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg,
+			    bool success);
 
 int skb_zerocopy_iter_dgram(struct sk_buff *skb, struct msghdr *msg, int len);
 int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb,
@@ -1476,7 +1477,7 @@ static inline void *skb_zcopy_get_nouarg(struct sk_buff *skb)
 static inline void skb_zcopy_put(struct ubuf_info *uarg)
 {
 	if (uarg)
-		uarg->callback(uarg, true);
+		uarg->callback(NULL, uarg, true);
 }
 
 /* Release a reference on a zerocopy structure */
@@ -1486,7 +1487,7 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool succsss)
 
 	if (uarg) {
 		if (!skb_zcopy_is_nouarg(skb))
-			uarg->callback(uarg, succsss);
+			uarg->callback(skb, uarg, succsss);
 
 		skb_shinfo(skb)->zc_flags &= ~SKBZC_FRAGMENTS;
 	}
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index fbf0a145467a..328385cd141e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1242,7 +1242,8 @@ static void __sock_zerocopy_callback(struct ubuf_info *uarg)
 	sock_put(sk);
 }
 
-void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
+void sock_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg,
+			    bool success)
 {
 	uarg->zerocopy = uarg->zerocopy & success;
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/9 v1 RFC] skbuff: Call sock_zerocopy_put_abort from skb_zcopy_put_abort
  2020-12-18 20:16 [PATCH 0/9 v1 RFC] Generic zcopy_* functions Jonathan Lemon
                   ` (4 preceding siblings ...)
  2020-12-18 20:16 ` [PATCH 5/9 v1 RFC] skbuff: Add skb parameter to the ubuf zerocopy callback Jonathan Lemon
@ 2020-12-18 20:16 ` Jonathan Lemon
  2020-12-18 20:16 ` [PATCH 7/9 v1 RFC] skbuff: add zc_flags to ubuf_info for ubuf setup Jonathan Lemon
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Jonathan Lemon @ 2020-12-18 20:16 UTC (permalink / raw)
  To: netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

The sock_zerocopy_put_abort function contains logic which is
specific to the current zerocopy implementation.  Add a wrapper
which checks the callback and dispatches apppropriately.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 include/linux/skbuff.h | 10 ++++++++++
 net/core/skbuff.c      | 12 +++++-------
 net/ipv4/ip_output.c   |  3 +--
 net/ipv4/tcp.c         |  2 +-
 net/ipv6/ip6_output.c  |  3 +--
 5 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 64ae6f3adcd5..a50d52b796a7 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1480,6 +1480,16 @@ static inline void skb_zcopy_put(struct ubuf_info *uarg)
 		uarg->callback(NULL, uarg, true);
 }
 
+static inline void skb_zcopy_put_abort(struct ubuf_info *uarg, bool have_uref)
+{
+	if (uarg) {
+		if (uarg->callback == sock_zerocopy_callback)
+			sock_zerocopy_put_abort(uarg, have_uref);
+		else if (have_uref)
+			skb_zcopy_put(uarg);
+	}
+}
+
 /* Release a reference on a zerocopy structure */
 static inline void skb_zcopy_clear(struct sk_buff *skb, bool succsss)
 {
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 328385cd141e..8352da29f052 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1254,15 +1254,13 @@ EXPORT_SYMBOL_GPL(sock_zerocopy_callback);
 
 void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref)
 {
-	if (uarg) {
-		struct sock *sk = skb_from_uarg(uarg)->sk;
+	struct sock *sk = skb_from_uarg(uarg)->sk;
 
-		atomic_dec(&sk->sk_zckey);
-		uarg->len--;
+	atomic_dec(&sk->sk_zckey);
+	uarg->len--;
 
-		if (have_uref)
-			skb_zcopy_put(uarg);
-	}
+	if (have_uref)
+		sock_zerocopy_callback(NULL, uarg, true);
 }
 EXPORT_SYMBOL_GPL(sock_zerocopy_put_abort);
 
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 879b76ae4435..65f2299fd682 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1230,8 +1230,7 @@ static int __ip_append_data(struct sock *sk,
 error_efault:
 	err = -EFAULT;
 error:
-	if (uarg)
-		sock_zerocopy_put_abort(uarg, extra_uref);
+	skb_zcopy_put_abort(uarg, extra_uref);
 	cork->length -= length;
 	IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTDISCARDS);
 	refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5c38080df13f..900b6bb7b280 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1440,7 +1440,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 	if (copied + copied_syn)
 		goto out;
 out_err:
-	sock_zerocopy_put_abort(uarg, true);
+	skb_zcopy_put_abort(uarg, true);
 	err = sk_stream_error(sk, flags, err);
 	/* make sure we wake any epoll edge trigger waiter */
 	if (unlikely(tcp_rtx_and_write_queues_empty(sk) && err == -EAGAIN)) {
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 749ad72386b2..c8c87891533a 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1715,8 +1715,7 @@ static int __ip6_append_data(struct sock *sk,
 error_efault:
 	err = -EFAULT;
 error:
-	if (uarg)
-		sock_zerocopy_put_abort(uarg, extra_uref);
+	skb_zcopy_put_abort(uarg, extra_uref);
 	cork->length -= length;
 	IP6_INC_STATS(sock_net(sk), rt->rt6i_idev, IPSTATS_MIB_OUTDISCARDS);
 	refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 7/9 v1 RFC] skbuff: add zc_flags to ubuf_info for ubuf setup
  2020-12-18 20:16 [PATCH 0/9 v1 RFC] Generic zcopy_* functions Jonathan Lemon
                   ` (5 preceding siblings ...)
  2020-12-18 20:16 ` [PATCH 6/9 v1 RFC] skbuff: Call sock_zerocopy_put_abort from skb_zcopy_put_abort Jonathan Lemon
@ 2020-12-18 20:16 ` Jonathan Lemon
  2020-12-18 20:16 ` [PATCH 8/9 v1 RFC] tap/tun: use skb_zcopy_set() instead of open coded assignment Jonathan Lemon
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Jonathan Lemon @ 2020-12-18 20:16 UTC (permalink / raw)
  To: netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

Currently, an ubuf is attached to a new skb, the skb zc_flags
is initialized to a fixed value.  Instead of doing this, set
the default zc_flags in the ubuf, and have new skb's inherit
from this default.

This is needed when setting up different zerocopy types.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 include/linux/skbuff.h | 3 ++-
 net/core/skbuff.c      | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a50d52b796a7..65ef46b02f65 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -478,6 +478,7 @@ struct ubuf_info {
 		};
 	};
 	refcount_t refcnt;
+	u8 zc_flags;
 
 	struct mmpin {
 		struct user_struct *user;
@@ -1454,7 +1455,7 @@ static inline void skb_zcopy_set(struct sk_buff *skb, struct ubuf_info *uarg,
 		else
 			skb_zcopy_get(uarg);
 		skb_shinfo(skb)->destructor_arg = uarg;
-		skb_shinfo(skb)->zc_flags |= SKBZC_FRAGMENTS;
+		skb_shinfo(skb)->zc_flags |= uarg->zc_flags;
 	}
 }
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 8352da29f052..463078ba663f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1118,6 +1118,7 @@ struct ubuf_info *sock_zerocopy_alloc(struct sock *sk, size_t size)
 	uarg->len = 1;
 	uarg->bytelen = size;
 	uarg->zerocopy = 1;
+	uarg->zc_flags = SKBZC_FRAGMENTS;
 	refcount_set(&uarg->refcnt, 1);
 	sock_hold(sk);
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 8/9 v1 RFC] tap/tun: use skb_zcopy_set() instead of open coded assignment
  2020-12-18 20:16 [PATCH 0/9 v1 RFC] Generic zcopy_* functions Jonathan Lemon
                   ` (6 preceding siblings ...)
  2020-12-18 20:16 ` [PATCH 7/9 v1 RFC] skbuff: add zc_flags to ubuf_info for ubuf setup Jonathan Lemon
@ 2020-12-18 20:16 ` Jonathan Lemon
  2020-12-18 20:16 ` [PATCH 9/9 v1 RFC] skbuff: Call skb_zcopy_clear() before unref'ing fragments Jonathan Lemon
  2020-12-18 20:49 ` [PATCH 0/9 v1 RFC] Generic zcopy_* functions Willem de Bruijn
  9 siblings, 0 replies; 19+ messages in thread
From: Jonathan Lemon @ 2020-12-18 20:16 UTC (permalink / raw)
  To: netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

Replace direct assignments with skb_zcopy_set() for clarity.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 drivers/net/tap.c   | 3 +--
 drivers/net/tun.c   | 3 +--
 drivers/vhost/net.c | 1 +
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index c2bcbf9218dc..7e7a4c7ca891 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -722,8 +722,7 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
 	tap = rcu_dereference(q->tap);
 	/* copy skb_ubuf_info for callback when skb has no error */
 	if (zerocopy) {
-		skb_shinfo(skb)->destructor_arg = msg_control;
-		skb_shinfo(skb)->zc_flags |= SKBZC_FRAGMENTS;
+		skb_zcopy_set(skb, msg_control, NULL);
 	} else if (msg_control) {
 		struct ubuf_info *uarg = msg_control;
 		uarg->callback(NULL, uarg, false);
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index bad4b0229584..0844da91e2ed 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1814,8 +1814,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 
 	/* copy skb_ubuf_info for callback when skb has no error */
 	if (zerocopy) {
-		skb_shinfo(skb)->destructor_arg = msg_control;
-		skb_shinfo(skb)->zc_flags |= SKBZC_FRAGMENTS;
+		skb_zcopy_set(skb, msg_control, NULL);
 	} else if (msg_control) {
 		struct ubuf_info *uarg = msg_control;
 		uarg->callback(NULL, uarg, false);
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index bf28d0b75c1b..174c05c90872 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -904,6 +904,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 			ubuf->callback = vhost_zerocopy_callback;
 			ubuf->ctx = nvq->ubufs;
 			ubuf->desc = nvq->upend_idx;
+			ubuf->zc_flags = SKBZC_FRAGMENTS;
 			refcount_set(&ubuf->refcnt, 1);
 			msg.msg_control = &ctl;
 			ctl.type = TUN_MSG_UBUF;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 9/9 v1 RFC] skbuff: Call skb_zcopy_clear() before unref'ing fragments
  2020-12-18 20:16 [PATCH 0/9 v1 RFC] Generic zcopy_* functions Jonathan Lemon
                   ` (7 preceding siblings ...)
  2020-12-18 20:16 ` [PATCH 8/9 v1 RFC] tap/tun: use skb_zcopy_set() instead of open coded assignment Jonathan Lemon
@ 2020-12-18 20:16 ` Jonathan Lemon
  2020-12-18 20:49 ` [PATCH 0/9 v1 RFC] Generic zcopy_* functions Willem de Bruijn
  9 siblings, 0 replies; 19+ messages in thread
From: Jonathan Lemon @ 2020-12-18 20:16 UTC (permalink / raw)
  To: netdev, edumazet, willemdebruijn.kernel; +Cc: kernel-team

From: Jonathan Lemon <bsd@fb.com>

RX zerocopy fragment pages which are not allocated from the
system page pool require special handling.  Give the callback
in skb_zcopy_clear() a chance to process them first.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 net/core/skbuff.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 463078ba663f..ee75279c7c78 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -605,13 +605,14 @@ static void skb_release_data(struct sk_buff *skb)
 			      &shinfo->dataref))
 		return;
 
+	skb_zcopy_clear(skb, true);
+
 	for (i = 0; i < shinfo->nr_frags; i++)
 		__skb_frag_unref(&shinfo->frags[i]);
 
 	if (shinfo->frag_list)
 		kfree_skb_list(shinfo->frag_list);
 
-	skb_zcopy_clear(skb, true);
 	skb_free_head(skb);
 }
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/9 v1 RFC] Generic zcopy_* functions
  2020-12-18 20:16 [PATCH 0/9 v1 RFC] Generic zcopy_* functions Jonathan Lemon
                   ` (8 preceding siblings ...)
  2020-12-18 20:16 ` [PATCH 9/9 v1 RFC] skbuff: Call skb_zcopy_clear() before unref'ing fragments Jonathan Lemon
@ 2020-12-18 20:49 ` Willem de Bruijn
  2020-12-18 21:16   ` Jonathan Lemon
  9 siblings, 1 reply; 19+ messages in thread
From: Willem de Bruijn @ 2020-12-18 20:49 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: Network Development, Eric Dumazet, Willem de Bruijn, Kernel Team

On Fri, Dec 18, 2020 at 3:23 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
>
> From: Jonathan Lemon <bsd@fb.com>
>
> This is set of cleanup patches for zerocopy which are intended
> to allow a introduction of a different zerocopy implementation.

Can you describe in more detail what exactly is lacking in the current
zerocopy interface for this this different implementation? Or point to
a github tree with the feature patches attached, perhaps.

I think it's good to split into multiple smaller patchsets, starting
with core stack support. But find it hard to understand which of these
changes are truly needed to support a new use case.

If anything, eating up the last 8 bits in skb_shared_info should be last resort.

I'll take a look at the individual patches in more detail later.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/9 v1 RFC] Generic zcopy_* functions
  2020-12-18 20:49 ` [PATCH 0/9 v1 RFC] Generic zcopy_* functions Willem de Bruijn
@ 2020-12-18 21:16   ` Jonathan Lemon
  2020-12-19 19:00     ` Willem de Bruijn
  0 siblings, 1 reply; 19+ messages in thread
From: Jonathan Lemon @ 2020-12-18 21:16 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, Eric Dumazet, Kernel Team

On Fri, Dec 18, 2020 at 03:49:44PM -0500, Willem de Bruijn wrote:
> On Fri, Dec 18, 2020 at 3:23 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
> >
> > From: Jonathan Lemon <bsd@fb.com>
> >
> > This is set of cleanup patches for zerocopy which are intended
> > to allow a introduction of a different zerocopy implementation.
> 
> Can you describe in more detail what exactly is lacking in the current
> zerocopy interface for this this different implementation? Or point to
> a github tree with the feature patches attached, perhaps.

I'll get the zctap features up into a github tree.

Essentially, I need different behavior from ubuf_info:
  - no refcounts on RX packets (static ubuf)
  - access to the skb on RX skb free (for page handling)
  - no page pinning on TX/tx completion
  - marking the skb data as inaccessible so skb_condense()
    and skb_zeroocopy_clone() leave it alone.

> I think it's good to split into multiple smaller patchsets, starting
> with core stack support. But find it hard to understand which of these
> changes are truly needed to support a new use case.

Agree - kind of hard to see why this is done without a use case.
These patches are purely restructuring, and don't introduce any
new features.


> If anything, eating up the last 8 bits in skb_shared_info should be last resort.

I would like to add 2 more bits in the future, which is why I
moved them.  Is there a compelling reason to leave the bits alone?
--
Jonathan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/9 v1 RFC] skbuff: replace sock_zerocopy_put() with skb_zcopy_put()
  2020-12-18 20:16 ` [PATCH 3/9 v1 RFC] skbuff: replace sock_zerocopy_put() with skb_zcopy_put() Jonathan Lemon
@ 2020-12-19 18:46   ` Willem de Bruijn
  2020-12-21 19:18     ` Jonathan Lemon
  0 siblings, 1 reply; 19+ messages in thread
From: Willem de Bruijn @ 2020-12-19 18:46 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: Network Development, Eric Dumazet, Willem de Bruijn, Kernel Team

On Fri, Dec 18, 2020 at 3:20 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
>
> From: Jonathan Lemon <bsd@fb.com>
>
> In preparation for further work, the zcopy* routines will
> become basic building blocks, while the zerocopy* ones will
> be specific for the existing zerocopy implementation.

Plural. There already are multiple disjoint zerocopy implementations:
msg_zerocopy, tpacket and vhost_net.

Which API is each intended to use? After this patch
tcp_sendmsg_locked() calls both skb_zcopy_put and
sock_zerocopy_put_abort, so I don't think that that is simplifying the
situation.

This is tricky code. Perhaps best to change only what is needed
instead of targeting a larger cleanup. It's hard to reason that this
patch is safe in all three existing cases, for instance.

> All uargs should have a callback function, (unless nouarg
> is set), so push all special case logic handling down into
> the callbacks.  This slightly pessimizes the refcounted cases,

What does this mean?

> but makes the skb_zcopy_*() routines clearer.
>
> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> ---
>  include/linux/skbuff.h | 19 +++++++++----------
>  net/core/skbuff.c      | 21 +++++++++------------
>  net/ipv4/tcp.c         |  2 +-
>  3 files changed, 19 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index fb6dd6af0f82..df98d61e8c51 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -499,7 +499,6 @@ static inline void sock_zerocopy_get(struct ubuf_info *uarg)
>         refcount_inc(&uarg->refcnt);
>  }
>
> -void sock_zerocopy_put(struct ubuf_info *uarg);
>  void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref);

The rename of sock_zerocopy_put without rename of
sock_zerocopy_put_abort makes the API less consistent, I believe. See
how the calls are close together in tcp_sendmsg_locked.

>  void sock_zerocopy_callback(struct ubuf_info *uarg, bool success);
> @@ -1474,20 +1473,20 @@ static inline void *skb_zcopy_get_nouarg(struct sk_buff *skb)
>         return (void *)((uintptr_t) skb_shinfo(skb)->destructor_arg & ~0x1UL);
>  }
>
> +static inline void skb_zcopy_put(struct ubuf_info *uarg)
> +{
> +       if (uarg)
> +               uarg->callback(uarg, true);
> +}
> +

Can we just use skb_zcopy_clear?

>  /* Release a reference on a zerocopy structure */
> -static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy)
> +static inline void skb_zcopy_clear(struct sk_buff *skb, bool succsss)

succsss -> success. More importantly, why change the argument name?

>  {
>         struct ubuf_info *uarg = skb_zcopy(skb);
>
>         if (uarg) {
> -               if (skb_zcopy_is_nouarg(skb)) {
> -                       /* no notification callback */
> -               } else if (uarg->callback == sock_zerocopy_callback) {
> -                       uarg->zerocopy = uarg->zerocopy && zerocopy;
> -                       sock_zerocopy_put(uarg);
> -               } else {
> -                       uarg->callback(uarg, zerocopy);
> -               }
> +               if (!skb_zcopy_is_nouarg(skb))
> +                       uarg->callback(uarg, succsss);
>
>                 skb_shinfo(skb)->zc_flags &= ~SKBZC_FRAGMENTS;
>         }
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 327ee8938f78..984760dd670b 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1194,7 +1194,7 @@ static bool skb_zerocopy_notify_extend(struct sk_buff *skb, u32 lo, u16 len)
>         return true;
>  }
>
> -void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
> +static void __sock_zerocopy_callback(struct ubuf_info *uarg)
>  {
>         struct sk_buff *tail, *skb = skb_from_uarg(uarg);
>         struct sock_exterr_skb *serr;
> @@ -1222,7 +1222,7 @@ void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
>         serr->ee.ee_origin = SO_EE_ORIGIN_ZEROCOPY;
>         serr->ee.ee_data = hi;
>         serr->ee.ee_info = lo;
> -       if (!success)
> +       if (!uarg->zerocopy)
>                 serr->ee.ee_code |= SO_EE_CODE_ZEROCOPY_COPIED;
>
>         q = &sk->sk_error_queue;
> @@ -1241,18 +1241,15 @@ void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
>         consume_skb(skb);
>         sock_put(sk);
>  }
> -EXPORT_SYMBOL_GPL(sock_zerocopy_callback);
>
> -void sock_zerocopy_put(struct ubuf_info *uarg)
> +void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
>  {
> -       if (uarg && refcount_dec_and_test(&uarg->refcnt)) {
> -               if (uarg->callback)
> -                       uarg->callback(uarg, uarg->zerocopy);
> -               else
> -                       consume_skb(skb_from_uarg(uarg));

I suppose this can be removed after commit 0a4a060bb204 ("sock: fix
zerocopy_success regression with msg_zerocopy"). Cleaning that up
would better be a separate patch that explains why the removal is
safe.

It's also fine to bundle with moving refcount_dec_and_test into
sock_zerocopy_callback, which indeed follows from it.

> -       }
> +       uarg->zerocopy = uarg->zerocopy & success;
> +
> +       if (refcount_dec_and_test(&uarg->refcnt))
> +               __sock_zerocopy_callback(uarg);

This can be wrapped in existing sock_zerocopy_callback. No need for a
__sock_zerocopy_callback.

If you do want a separate API for existing msg_zerocopy distinct from
existing skb_zcopy, then maybe rename the functions only used by
msg_zerocopy to have prefix msg_zerocopy_ instead of sock_zerocopy_

>  }
> -EXPORT_SYMBOL_GPL(sock_zerocopy_put);
> +EXPORT_SYMBOL_GPL(sock_zerocopy_callback);
>
>  void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref)
>  {
> @@ -1263,7 +1260,7 @@ void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref)
>                 uarg->len--;
>
>                 if (have_uref)
> -                       sock_zerocopy_put(uarg);
> +                       skb_zcopy_put(uarg);
>         }
>  }
>  EXPORT_SYMBOL_GPL(sock_zerocopy_put_abort);
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index fea9bae370e4..5c38080df13f 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1429,7 +1429,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>                 tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
>         }
>  out_nopush:
> -       sock_zerocopy_put(uarg);
> +       skb_zcopy_put(uarg);
>         return copied + copied_syn;
>
>  do_error:
> --
> 2.24.1
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/9 v1 RFC] Generic zcopy_* functions
  2020-12-18 21:16   ` Jonathan Lemon
@ 2020-12-19 19:00     ` Willem de Bruijn
  2020-12-21 19:50       ` Jonathan Lemon
  0 siblings, 1 reply; 19+ messages in thread
From: Willem de Bruijn @ 2020-12-19 19:00 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: Willem de Bruijn, Network Development, Eric Dumazet, Kernel Team

On Fri, Dec 18, 2020 at 4:27 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
>
> On Fri, Dec 18, 2020 at 03:49:44PM -0500, Willem de Bruijn wrote:
> > On Fri, Dec 18, 2020 at 3:23 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
> > >
> > > From: Jonathan Lemon <bsd@fb.com>
> > >
> > > This is set of cleanup patches for zerocopy which are intended
> > > to allow a introduction of a different zerocopy implementation.
> >
> > Can you describe in more detail what exactly is lacking in the current
> > zerocopy interface for this this different implementation? Or point to
> > a github tree with the feature patches attached, perhaps.
>
> I'll get the zctap features up into a github tree.
>
> Essentially, I need different behavior from ubuf_info:
>   - no refcounts on RX packets (static ubuf)

That is already the case for vhost and tpacket zerocopy use cases.

>   - access to the skb on RX skb free (for page handling)

To refers only to patch 9, moving the callback earlier in
skb_release_data, right?

>   - no page pinning on TX/tx completion

That is not part of the skb zerocopy infrastructure?

>   - marking the skb data as inaccessible so skb_condense()
>     and skb_zeroocopy_clone() leave it alone.

Yep. Skipping content access on the Rx path will be interesting. I
wonder if that should be a separate opaque skb feature, independent
from whether the data is owned by userspace, peripheral memory, the
page cache or anything else.

> > I think it's good to split into multiple smaller patchsets, starting
> > with core stack support. But find it hard to understand which of these
> > changes are truly needed to support a new use case.
>
> Agree - kind of hard to see why this is done without a use case.
> These patches are purely restructuring, and don't introduce any
> new features.
>
>
> > If anything, eating up the last 8 bits in skb_shared_info should be last resort.
>
> I would like to add 2 more bits in the future, which is why I
> moved them.  Is there a compelling reason to leave the bits alone?

Opportunity cost.

We cannot grow skb_shared_info due to colocation with MTU sized linear
skbuff's in half a page.

It took me quite some effort to free up a few bytes in commit
4d276eb6a478 ("net: remove deprecated syststamp timestamp").

If we are very frugal, we could shadow some bits to have different
meaning in different paths. SKBTX_IN_PROGRESS is transmit only, I
think. But otherwise we'll have to just dedicate the byte to more
flags. Yours are likely not to be the last anyway.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/9 v1 RFC] skbuff: replace sock_zerocopy_put() with skb_zcopy_put()
  2020-12-19 18:46   ` Willem de Bruijn
@ 2020-12-21 19:18     ` Jonathan Lemon
  2020-12-21 22:49       ` Willem de Bruijn
  0 siblings, 1 reply; 19+ messages in thread
From: Jonathan Lemon @ 2020-12-21 19:18 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, Eric Dumazet, Kernel Team

On Sat, Dec 19, 2020 at 01:46:13PM -0500, Willem de Bruijn wrote:
> On Fri, Dec 18, 2020 at 3:20 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
> >
> > From: Jonathan Lemon <bsd@fb.com>
> >
> > In preparation for further work, the zcopy* routines will
> > become basic building blocks, while the zerocopy* ones will
> > be specific for the existing zerocopy implementation.
> 
> Plural. There already are multiple disjoint zerocopy implementations:
> msg_zerocopy, tpacket and vhost_net.

Yes - I'm having to take all of those into account when adding
zctap as well.


> Which API is each intended to use? After this patch
> tcp_sendmsg_locked() calls both skb_zcopy_put and
> sock_zerocopy_put_abort, so I don't think that that is simplifying the
> situation.

I'm trying to make the skb_zcopy_ routines the top level API, and 
the sock_zerocopy_ routines specific to msg_zerocopy().  Patch 6 adds
the skb_zcopy_put_abort() function, which unfortunately still uses
the uarg->callback for switching.


> This is tricky code. Perhaps best to change only what is needed
> instead of targeting a larger cleanup. It's hard to reason that this
> patch is safe in all three existing cases, for instance.

Exactly.  I'm trying to simplify things here so it's easier to reason
through all the cases.


> > All uargs should have a callback function, (unless nouarg
> > is set), so push all special case logic handling down into
> > the callbacks.  This slightly pessimizes the refcounted cases,
> 
> What does this mean?

The current zerocopy_put() code does:
  1) if uarg, dec refcount, if refcount == 0:
     if callback, run callback, else consume skb.

This is called from the main TCP/UDP send path.  These would be called
for the zctap case as well, so it should be made generic - not specific
to the current zerocopy implementation.  The patch changes this into:

  1) if uarg, run callback.

Then, the msg_zerocopy code does:

  1) save state,
  2) dec refcount, run rest of callback on 0.

Which is the same as before.  The !uarg case is never handled here.
The zctap cases switch to their own callbacks.


The current zerocopy clear code does:
  1) if no_uarg, skip 
  2) if msg_zerocopy, save state, dec refcount, run callback when 0.
  3) otherwise just run callback.
  4) clear flags

I would like to remove the msg_zerocopy specific logic from the function,
so this becomes:

  1) if uarg, run callback.
  2) clear flags



> > but makes the skb_zcopy_*() routines clearer.
> >
> > Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> > ---
> >  include/linux/skbuff.h | 19 +++++++++----------
> >  net/core/skbuff.c      | 21 +++++++++------------
> >  net/ipv4/tcp.c         |  2 +-
> >  3 files changed, 19 insertions(+), 23 deletions(-)
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index fb6dd6af0f82..df98d61e8c51 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -499,7 +499,6 @@ static inline void sock_zerocopy_get(struct ubuf_info *uarg)
> >         refcount_inc(&uarg->refcnt);
> >  }
> >
> > -void sock_zerocopy_put(struct ubuf_info *uarg);
> >  void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref);
> 
> The rename of sock_zerocopy_put without rename of
> sock_zerocopy_put_abort makes the API less consistent, I believe. See
> how the calls are close together in tcp_sendmsg_locked.

See patch 6.


> >  void sock_zerocopy_callback(struct ubuf_info *uarg, bool success);
> > @@ -1474,20 +1473,20 @@ static inline void *skb_zcopy_get_nouarg(struct sk_buff *skb)
> >         return (void *)((uintptr_t) skb_shinfo(skb)->destructor_arg & ~0x1UL);
> >  }
> >
> > +static inline void skb_zcopy_put(struct ubuf_info *uarg)
> > +{
> > +       if (uarg)
> > +               uarg->callback(uarg, true);
> > +}
> > +
> 
> Can we just use skb_zcopy_clear?

skb_zcopy_clear also clears the flags, so no.

 
> >  /* Release a reference on a zerocopy structure */
> > -static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy)
> > +static inline void skb_zcopy_clear(struct sk_buff *skb, bool succsss)
> 
> succsss -> success. More importantly, why change the argument name?

It is already named inconsistently:
   struct ubuf_info {
        void (*callback)(struct ubuf_info *, bool zerocopy_success);

   void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
   static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy)

I picked one and made them the same.




> 
> >  {
> >         struct ubuf_info *uarg = skb_zcopy(skb);
> >
> >         if (uarg) {
> > -               if (skb_zcopy_is_nouarg(skb)) {
> > -                       /* no notification callback */
> > -               } else if (uarg->callback == sock_zerocopy_callback) {
> > -                       uarg->zerocopy = uarg->zerocopy && zerocopy;
> > -                       sock_zerocopy_put(uarg);
> > -               } else {
> > -                       uarg->callback(uarg, zerocopy);
> > -               }
> > +               if (!skb_zcopy_is_nouarg(skb))
> > +                       uarg->callback(uarg, succsss);
> >
> >                 skb_shinfo(skb)->zc_flags &= ~SKBZC_FRAGMENTS;
> >         }
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 327ee8938f78..984760dd670b 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -1194,7 +1194,7 @@ static bool skb_zerocopy_notify_extend(struct sk_buff *skb, u32 lo, u16 len)
> >         return true;
> >  }
> >
> > -void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
> > +static void __sock_zerocopy_callback(struct ubuf_info *uarg)
> >  {
> >         struct sk_buff *tail, *skb = skb_from_uarg(uarg);
> >         struct sock_exterr_skb *serr;
> > @@ -1222,7 +1222,7 @@ void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
> >         serr->ee.ee_origin = SO_EE_ORIGIN_ZEROCOPY;
> >         serr->ee.ee_data = hi;
> >         serr->ee.ee_info = lo;
> > -       if (!success)
> > +       if (!uarg->zerocopy)
> >                 serr->ee.ee_code |= SO_EE_CODE_ZEROCOPY_COPIED;
> >
> >         q = &sk->sk_error_queue;
> > @@ -1241,18 +1241,15 @@ void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
> >         consume_skb(skb);
> >         sock_put(sk);
> >  }
> > -EXPORT_SYMBOL_GPL(sock_zerocopy_callback);
> >
> > -void sock_zerocopy_put(struct ubuf_info *uarg)
> > +void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
> >  {
> > -       if (uarg && refcount_dec_and_test(&uarg->refcnt)) {
> > -               if (uarg->callback)
> > -                       uarg->callback(uarg, uarg->zerocopy);
> > -               else
> > -                       consume_skb(skb_from_uarg(uarg));
> 
> I suppose this can be removed after commit 0a4a060bb204 ("sock: fix
> zerocopy_success regression with msg_zerocopy"). Cleaning that up
> would better be a separate patch that explains why the removal is
> safe.

I'll split the patches out.


> It's also fine to bundle with moving refcount_dec_and_test into
> sock_zerocopy_callback, which indeed follows from it.
> 
> > -       }
> > +       uarg->zerocopy = uarg->zerocopy & success;
> > +
> > +       if (refcount_dec_and_test(&uarg->refcnt))
> > +               __sock_zerocopy_callback(uarg);
> 
> This can be wrapped in existing sock_zerocopy_callback. No need for a
> __sock_zerocopy_callback.

The compiler will inline the helper anyway, since it's a single
callsite.


> 
> If you do want a separate API for existing msg_zerocopy distinct from
> existing skb_zcopy, then maybe rename the functions only used by
> msg_zerocopy to have prefix msg_zerocopy_ instead of sock_zerocopy_

That's a good suggestion, thanks!

 
> >  }
> > -EXPORT_SYMBOL_GPL(sock_zerocopy_put);
> > +EXPORT_SYMBOL_GPL(sock_zerocopy_callback);
> >
> >  void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref)
> >  {
> > @@ -1263,7 +1260,7 @@ void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref)
> >                 uarg->len--;
> >
> >                 if (have_uref)
> > -                       sock_zerocopy_put(uarg);
> > +                       skb_zcopy_put(uarg);
> >         }
> >  }
> >  EXPORT_SYMBOL_GPL(sock_zerocopy_put_abort);
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index fea9bae370e4..5c38080df13f 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -1429,7 +1429,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >                 tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
> >         }
> >  out_nopush:
> > -       sock_zerocopy_put(uarg);
> > +       skb_zcopy_put(uarg);
> >         return copied + copied_syn;
> >
> >  do_error:
> > --
> > 2.24.1
> >

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/9 v1 RFC] Generic zcopy_* functions
  2020-12-19 19:00     ` Willem de Bruijn
@ 2020-12-21 19:50       ` Jonathan Lemon
  2020-12-21 22:52         ` Willem de Bruijn
  0 siblings, 1 reply; 19+ messages in thread
From: Jonathan Lemon @ 2020-12-21 19:50 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, Eric Dumazet, Kernel Team

On Sat, Dec 19, 2020 at 02:00:55PM -0500, Willem de Bruijn wrote:
> On Fri, Dec 18, 2020 at 4:27 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
> >
> > On Fri, Dec 18, 2020 at 03:49:44PM -0500, Willem de Bruijn wrote:
> > > On Fri, Dec 18, 2020 at 3:23 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
> > > >
> > > > From: Jonathan Lemon <bsd@fb.com>
> > > >
> > > > This is set of cleanup patches for zerocopy which are intended
> > > > to allow a introduction of a different zerocopy implementation.
> > >
> > > Can you describe in more detail what exactly is lacking in the current
> > > zerocopy interface for this this different implementation? Or point to
> > > a github tree with the feature patches attached, perhaps.
> >
> > I'll get the zctap features up into a github tree.
> >
> > Essentially, I need different behavior from ubuf_info:
> >   - no refcounts on RX packets (static ubuf)
> 
> That is already the case for vhost and tpacket zerocopy use cases.
> 
> >   - access to the skb on RX skb free (for page handling)
> 
> To refers only to patch 9, moving the callback earlier in
> skb_release_data, right?

Yes.


> >   - no page pinning on TX/tx completion
> 
> That is not part of the skb zerocopy infrastructure?

That's specific to msg_zerocopy.  zctap uses the same network stack
paths, but pins the pages during setup, not during each each system call.


> >   - marking the skb data as inaccessible so skb_condense()
> >     and skb_zeroocopy_clone() leave it alone.
> 
> Yep. Skipping content access on the Rx path will be interesting. I
> wonder if that should be a separate opaque skb feature, independent
> from whether the data is owned by userspace, peripheral memory, the
> page cache or anything else.

Would that be indicated by a bit on the skb (like pfmemalloc), or 
a bit in the skb_shared structure, as I'm leaning towards doing here?


> > > I think it's good to split into multiple smaller patchsets, starting
> > > with core stack support. But find it hard to understand which of these
> > > changes are truly needed to support a new use case.
> >
> > Agree - kind of hard to see why this is done without a use case.
> > These patches are purely restructuring, and don't introduce any
> > new features.
> >
> >
> > > If anything, eating up the last 8 bits in skb_shared_info should be last resort.
> >
> > I would like to add 2 more bits in the future, which is why I
> > moved them.  Is there a compelling reason to leave the bits alone?
> 
> Opportunity cost.
> 
> We cannot grow skb_shared_info due to colocation with MTU sized linear
> skbuff's in half a page.
> 
> It took me quite some effort to free up a few bytes in commit
> 4d276eb6a478 ("net: remove deprecated syststamp timestamp").
> 
> If we are very frugal, we could shadow some bits to have different
> meaning in different paths. SKBTX_IN_PROGRESS is transmit only, I
> think. But otherwise we'll have to just dedicate the byte to more
> flags. Yours are likely not to be the last anyway.

The zerocopy/enable flags could be encoded in one of the lower 3 bits
in the destructor_arg, (similar to nouarg) but that seems messy.
-- 
Jonathan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/9 v1 RFC] skbuff: replace sock_zerocopy_put() with skb_zcopy_put()
  2020-12-21 19:18     ` Jonathan Lemon
@ 2020-12-21 22:49       ` Willem de Bruijn
  0 siblings, 0 replies; 19+ messages in thread
From: Willem de Bruijn @ 2020-12-21 22:49 UTC (permalink / raw)
  To: Jonathan Lemon; +Cc: Network Development, Eric Dumazet, Kernel Team

> > > All uargs should have a callback function, (unless nouarg
> > > is set), so push all special case logic handling down into
> > > the callbacks.  This slightly pessimizes the refcounted cases,
> >
> > What does this mean?
>
> The current zerocopy_put() code does:
>   1) if uarg, dec refcount, if refcount == 0:
>      if callback, run callback, else consume skb.
>
> This is called from the main TCP/UDP send path.  These would be called
> for the zctap case as well, so it should be made generic - not specific
> to the current zerocopy implementation.  The patch changes this into:
>
>   1) if uarg, run callback.
>
> Then, the msg_zerocopy code does:
>
>   1) save state,
>   2) dec refcount, run rest of callback on 0.
>
> Which is the same as before.  The !uarg case is never handled here.
> The zctap cases switch to their own callbacks.
>
>
> The current zerocopy clear code does:
>   1) if no_uarg, skip
>   2) if msg_zerocopy, save state, dec refcount, run callback when 0.
>   3) otherwise just run callback.
>   4) clear flags
>
> I would like to remove the msg_zerocopy specific logic from the function,
> so this becomes:
>
>   1) if uarg, run callback.
>   2) clear flags

That sounds fine. Especially since we can simplify the logic after the
commit I mentioned. I just didn't understand what you meant by
pessimize.

> > > -void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
> > > +static void __sock_zerocopy_callback(struct ubuf_info *uarg)
> > >  {
> > >         struct sk_buff *tail, *skb = skb_from_uarg(uarg);
> > >         struct sock_exterr_skb *serr;
> > > @@ -1222,7 +1222,7 @@ void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
> > >         serr->ee.ee_origin = SO_EE_ORIGIN_ZEROCOPY;
> > >         serr->ee.ee_data = hi;
> > >         serr->ee.ee_info = lo;
> > > -       if (!success)
> > > +       if (!uarg->zerocopy)
> > >                 serr->ee.ee_code |= SO_EE_CODE_ZEROCOPY_COPIED;
> > >
> > >         q = &sk->sk_error_queue;
> > > @@ -1241,18 +1241,15 @@ void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
> > >         consume_skb(skb);
> > >         sock_put(sk);
> > >  }
> > > -EXPORT_SYMBOL_GPL(sock_zerocopy_callback);
> > >
> > > -void sock_zerocopy_put(struct ubuf_info *uarg)
> > > +void sock_zerocopy_callback(struct ubuf_info *uarg, bool success)
> > >  {
> > > -       if (uarg && refcount_dec_and_test(&uarg->refcnt)) {
> > > -               if (uarg->callback)
> > > -                       uarg->callback(uarg, uarg->zerocopy);
> > > -               else
> > > -                       consume_skb(skb_from_uarg(uarg));
> >
> > I suppose this can be removed after commit 0a4a060bb204 ("sock: fix
> > zerocopy_success regression with msg_zerocopy"). Cleaning that up
> > would better be a separate patch that explains why the removal is
> > safe.
>
> I'll split the patches out.

Thanks. Yes, splitting that patch in two will help (me) follow it better.
>
> > It's also fine to bundle with moving refcount_dec_and_test into
> > sock_zerocopy_callback, which indeed follows from it.
> >
> > > -       }
> > > +       uarg->zerocopy = uarg->zerocopy & success;
> > > +
> > > +       if (refcount_dec_and_test(&uarg->refcnt))
> > > +               __sock_zerocopy_callback(uarg);
> >
> > This can be wrapped in existing sock_zerocopy_callback. No need for a
> > __sock_zerocopy_callback.
>
> The compiler will inline the helper anyway, since it's a single
> callsite.

True. I just don't think the wrapper adds much value here.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/9 v1 RFC] Generic zcopy_* functions
  2020-12-21 19:50       ` Jonathan Lemon
@ 2020-12-21 22:52         ` Willem de Bruijn
  2020-12-22  0:07           ` Jonathan Lemon
  0 siblings, 1 reply; 19+ messages in thread
From: Willem de Bruijn @ 2020-12-21 22:52 UTC (permalink / raw)
  To: Jonathan Lemon; +Cc: Network Development, Eric Dumazet, Kernel Team

> > >   - marking the skb data as inaccessible so skb_condense()
> > >     and skb_zeroocopy_clone() leave it alone.
> >
> > Yep. Skipping content access on the Rx path will be interesting. I
> > wonder if that should be a separate opaque skb feature, independent
> > from whether the data is owned by userspace, peripheral memory, the
> > page cache or anything else.
>
> Would that be indicated by a bit on the skb (like pfmemalloc), or
> a bit in the skb_shared structure, as I'm leaning towards doing here?

I would guide it in part by avoiding cold cacheline accesses. That
might be hard if using skb_shinfo. OTOH, you don't have to worry about
copying the bit during clone operations.

> > > > If anything, eating up the last 8 bits in skb_shared_info should be last resort.
> > >
> > > I would like to add 2 more bits in the future, which is why I
> > > moved them.  Is there a compelling reason to leave the bits alone?
> >
> > Opportunity cost.
> >
> > We cannot grow skb_shared_info due to colocation with MTU sized linear
> > skbuff's in half a page.
> >
> > It took me quite some effort to free up a few bytes in commit
> > 4d276eb6a478 ("net: remove deprecated syststamp timestamp").
> >
> > If we are very frugal, we could shadow some bits to have different
> > meaning in different paths. SKBTX_IN_PROGRESS is transmit only, I
> > think. But otherwise we'll have to just dedicate the byte to more
> > flags. Yours are likely not to be the last anyway.
>
> The zerocopy/enable flags could be encoded in one of the lower 3 bits
> in the destructor_arg, (similar to nouarg) but that seems messy.

Agreed :)

Let's just expand the flags for now. It may be better to have one
general purpose 16 bit flags bitmap, rather than reserving 8 bits
specifically to zerocopy features.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/9 v1 RFC] Generic zcopy_* functions
  2020-12-21 22:52         ` Willem de Bruijn
@ 2020-12-22  0:07           ` Jonathan Lemon
  0 siblings, 0 replies; 19+ messages in thread
From: Jonathan Lemon @ 2020-12-22  0:07 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, Eric Dumazet, Kernel Team

On Mon, Dec 21, 2020 at 05:52:08PM -0500, Willem de Bruijn wrote:
> > > >   - marking the skb data as inaccessible so skb_condense()
> > > >     and skb_zeroocopy_clone() leave it alone.
> > >
> > > Yep. Skipping content access on the Rx path will be interesting. I
> > > wonder if that should be a separate opaque skb feature, independent
> > > from whether the data is owned by userspace, peripheral memory, the
> > > page cache or anything else.
> >
> > Would that be indicated by a bit on the skb (like pfmemalloc), or
> > a bit in the skb_shared structure, as I'm leaning towards doing here?
> 
> I would guide it in part by avoiding cold cacheline accesses. That
> might be hard if using skb_shinfo. OTOH, you don't have to worry about
> copying the bit during clone operations.
> 
> > > > > If anything, eating up the last 8 bits in skb_shared_info should be last resort.
> > > >
> > > > I would like to add 2 more bits in the future, which is why I
> > > > moved them.  Is there a compelling reason to leave the bits alone?
> > >
> > > Opportunity cost.
> > >
> > > We cannot grow skb_shared_info due to colocation with MTU sized linear
> > > skbuff's in half a page.
> > >
> > > It took me quite some effort to free up a few bytes in commit
> > > 4d276eb6a478 ("net: remove deprecated syststamp timestamp").
> > >
> > > If we are very frugal, we could shadow some bits to have different
> > > meaning in different paths. SKBTX_IN_PROGRESS is transmit only, I
> > > think. But otherwise we'll have to just dedicate the byte to more
> > > flags. Yours are likely not to be the last anyway.
> >
> > The zerocopy/enable flags could be encoded in one of the lower 3 bits
> > in the destructor_arg, (similar to nouarg) but that seems messy.
> 
> Agreed :)
> 
> Let's just expand the flags for now. It may be better to have one
> general purpose 16 bit flags bitmap, rather than reserving 8 bits
> specifically to zerocopy features.

I was considering doing that also, but that would need to rearrange
the flags in skb_shared_info.  Then I realized that there are currently
only TX flags and ZC flags, so went with that.  I have no objections
to doing it either way.

My motivation here is when MSG_ZCTAP is added to tcp_sendmsg_locked(),
it the returned uarg is self-contained for the rest of the function.
-- 
Jonathan

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-12-22  0:08 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-18 20:16 [PATCH 0/9 v1 RFC] Generic zcopy_* functions Jonathan Lemon
2020-12-18 20:16 ` [PATCH 1/9 v1 RFC] net: group skb_shinfo zerocopy related bits together Jonathan Lemon
2020-12-18 20:16 ` [PATCH 2/9 v1 RFC] skbuff: remove unused skb_zcopy_abort function Jonathan Lemon
2020-12-18 20:16 ` [PATCH 3/9 v1 RFC] skbuff: replace sock_zerocopy_put() with skb_zcopy_put() Jonathan Lemon
2020-12-19 18:46   ` Willem de Bruijn
2020-12-21 19:18     ` Jonathan Lemon
2020-12-21 22:49       ` Willem de Bruijn
2020-12-18 20:16 ` [PATCH 4/9 v1 RFC] skbuff: replace sock_zerocopy_get with skb_zcopy_get Jonathan Lemon
2020-12-18 20:16 ` [PATCH 5/9 v1 RFC] skbuff: Add skb parameter to the ubuf zerocopy callback Jonathan Lemon
2020-12-18 20:16 ` [PATCH 6/9 v1 RFC] skbuff: Call sock_zerocopy_put_abort from skb_zcopy_put_abort Jonathan Lemon
2020-12-18 20:16 ` [PATCH 7/9 v1 RFC] skbuff: add zc_flags to ubuf_info for ubuf setup Jonathan Lemon
2020-12-18 20:16 ` [PATCH 8/9 v1 RFC] tap/tun: use skb_zcopy_set() instead of open coded assignment Jonathan Lemon
2020-12-18 20:16 ` [PATCH 9/9 v1 RFC] skbuff: Call skb_zcopy_clear() before unref'ing fragments Jonathan Lemon
2020-12-18 20:49 ` [PATCH 0/9 v1 RFC] Generic zcopy_* functions Willem de Bruijn
2020-12-18 21:16   ` Jonathan Lemon
2020-12-19 19:00     ` Willem de Bruijn
2020-12-21 19:50       ` Jonathan Lemon
2020-12-21 22:52         ` Willem de Bruijn
2020-12-22  0:07           ` Jonathan Lemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.