io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/6] implement io_uring notification (ubuf_info) stacking
@ 2024-04-12 12:55 Pavel Begunkov
  2024-04-12 12:55 ` [RFC 1/6] net: extend ubuf_info callback to ops structure Pavel Begunkov
                   ` (8 more replies)
  0 siblings, 9 replies; 25+ messages in thread
From: Pavel Begunkov @ 2024-04-12 12:55 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn

io_uring allocates a ubuf_info per zerocopy send request, it's convenient
for the userspace but with how things are it means that every time the 
TCP stack has to allocate a new skb instead of amending into a previous
one. Unless sends are big enough, there will be lots of small skbs
straining the stack and dipping performance.

The patchset implements notification, i.e. an io_uring's ubuf_info
extension, stacking. It tries to link ubuf_info's into a list, and
the entire link will be put down together once all references are
gone.

Testing with liburing/examples/send-zerocopy and another custom made
tool, with 4K bytes per send it improves performance ~6 times and
levels it with MSG_ZEROCOPY. Without the patchset it requires much
larger sends to utilise all potential.

bytes  | before | after (Kqps)  
100    | 283    | 936
1200   | 195    | 1023
4000   | 193    | 1386
8000   | 154    | 1058

Pavel Begunkov (6):
  net: extend ubuf_info callback to ops structure
  net: add callback for setting a ubuf_info to skb
  io_uring/notif: refactor io_tx_ubuf_complete()
  io_uring/notif: remove ctx var from io_notif_tw_complete
  io_uring/notif: simplify io_notif_flush()
  io_uring/notif: implement notification stacking

 drivers/net/tap.c      |  2 +-
 drivers/net/tun.c      |  2 +-
 drivers/vhost/net.c    |  8 +++-
 include/linux/skbuff.h | 21 ++++++----
 io_uring/notif.c       | 91 +++++++++++++++++++++++++++++++++++-------
 io_uring/notif.h       | 13 +++---
 net/core/skbuff.c      | 37 +++++++++++------
 7 files changed, 129 insertions(+), 45 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC 1/6] net: extend ubuf_info callback to ops structure
  2024-04-12 12:55 [RFC 0/6] implement io_uring notification (ubuf_info) stacking Pavel Begunkov
@ 2024-04-12 12:55 ` Pavel Begunkov
  2024-04-13 17:17   ` David Ahern
  2024-04-14 17:07   ` Willem de Bruijn
  2024-04-12 12:55 ` [RFC 2/6] net: add callback for setting a ubuf_info to skb Pavel Begunkov
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 25+ messages in thread
From: Pavel Begunkov @ 2024-04-12 12:55 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn

We'll need to associate additional callbacks with ubuf_info, introduce
a structure holding ubuf_info callbacks. Apart from a more smarter
io_uring notification management introduced in next patches, it can be
used to generalise msg_zerocopy_put_abort() and also store
->sg_from_iter, which is currently passed in struct msghdr.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 drivers/net/tap.c      |  2 +-
 drivers/net/tun.c      |  2 +-
 drivers/vhost/net.c    |  8 ++++++--
 include/linux/skbuff.h | 19 +++++++++++--------
 io_uring/notif.c       |  8 ++++++--
 net/core/skbuff.c      | 17 +++++++++++------
 6 files changed, 36 insertions(+), 20 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 9f0495e8df4d..bfdd3875fe86 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -754,7 +754,7 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
 		skb_zcopy_init(skb, msg_control);
 	} else if (msg_control) {
 		struct ubuf_info *uarg = msg_control;
-		uarg->callback(NULL, uarg, false);
+		uarg->ops->complete(NULL, uarg, false);
 	}
 
 	dev_queue_xmit(skb);
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 0b3f21cba552..b7401d990680 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1906,7 +1906,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		skb_zcopy_init(skb, msg_control);
 	} else if (msg_control) {
 		struct ubuf_info *uarg = msg_control;
-		uarg->callback(NULL, uarg, false);
+		uarg->ops->complete(NULL, uarg, false);
 	}
 
 	skb_reset_network_header(skb);
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index c64ded183f8d..f16279351db5 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -380,7 +380,7 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net,
 	}
 }
 
-static void vhost_zerocopy_callback(struct sk_buff *skb,
+static void vhost_zerocopy_complete(struct sk_buff *skb,
 				    struct ubuf_info *ubuf_base, bool success)
 {
 	struct ubuf_info_msgzc *ubuf = uarg_to_msgzc(ubuf_base);
@@ -408,6 +408,10 @@ static void vhost_zerocopy_callback(struct sk_buff *skb,
 	rcu_read_unlock_bh();
 }
 
+static const struct ubuf_info_ops vhost_ubuf_ops = {
+	.complete = vhost_zerocopy_complete,
+};
+
 static inline unsigned long busy_clock(void)
 {
 	return local_clock() >> 10;
@@ -879,7 +883,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 			vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
 			ubuf->ctx = nvq->ubufs;
 			ubuf->desc = nvq->upend_idx;
-			ubuf->ubuf.callback = vhost_zerocopy_callback;
+			ubuf->ubuf.ops = &vhost_ubuf_ops;
 			ubuf->ubuf.flags = SKBFL_ZEROCOPY_FRAG;
 			refcount_set(&ubuf->ubuf.refcnt, 1);
 			msg.msg_control = &ctl;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 9d24aec064e8..a110e97e074a 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -527,6 +527,11 @@ enum {
 #define SKBFL_ALL_ZEROCOPY	(SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY | \
 				 SKBFL_DONT_ORPHAN | SKBFL_MANAGED_FRAG_REFS)
 
+struct ubuf_info_ops {
+	void (*complete)(struct sk_buff *, struct ubuf_info *,
+			 bool zerocopy_success);
+};
+
 /*
  * The callback notifies userspace to release buffers when skb DMA is done in
  * lower device, the skb last reference should be 0 when calling this.
@@ -536,8 +541,7 @@ enum {
  * The desc field is used to track userspace buffer index.
  */
 struct ubuf_info {
-	void (*callback)(struct sk_buff *, struct ubuf_info *,
-			 bool zerocopy_success);
+	const struct ubuf_info_ops *ops;
 	refcount_t refcnt;
 	u8 flags;
 };
@@ -1662,14 +1666,13 @@ static inline void skb_set_end_offset(struct sk_buff *skb, unsigned int offset)
 }
 #endif
 
+extern const struct ubuf_info_ops msg_zerocopy_ubuf_ops;
+
 struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size,
 				       struct ubuf_info *uarg);
 
 void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref);
 
-void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg,
-			   bool success);
-
 int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk,
 			    struct sk_buff *skb, struct iov_iter *from,
 			    size_t length);
@@ -1757,13 +1760,13 @@ static inline void *skb_zcopy_get_nouarg(struct sk_buff *skb)
 static inline void net_zcopy_put(struct ubuf_info *uarg)
 {
 	if (uarg)
-		uarg->callback(NULL, uarg, true);
+		uarg->ops->complete(NULL, uarg, true);
 }
 
 static inline void net_zcopy_put_abort(struct ubuf_info *uarg, bool have_uref)
 {
 	if (uarg) {
-		if (uarg->callback == msg_zerocopy_callback)
+		if (uarg->ops == &msg_zerocopy_ubuf_ops)
 			msg_zerocopy_put_abort(uarg, have_uref);
 		else if (have_uref)
 			net_zcopy_put(uarg);
@@ -1777,7 +1780,7 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy_success)
 
 	if (uarg) {
 		if (!skb_zcopy_is_nouarg(skb))
-			uarg->callback(skb, uarg, zerocopy_success);
+			uarg->ops->complete(skb, uarg, zerocopy_success);
 
 		skb_shinfo(skb)->flags &= ~SKBFL_ALL_ZEROCOPY;
 	}
diff --git a/io_uring/notif.c b/io_uring/notif.c
index b561bd763435..7caaebf94312 100644
--- a/io_uring/notif.c
+++ b/io_uring/notif.c
@@ -24,7 +24,7 @@ void io_notif_tw_complete(struct io_kiocb *notif, struct io_tw_state *ts)
 	io_req_task_complete(notif, ts);
 }
 
-static void io_tx_ubuf_callback(struct sk_buff *skb, struct ubuf_info *uarg,
+static void io_tx_ubuf_complete(struct sk_buff *skb, struct ubuf_info *uarg,
 				bool success)
 {
 	struct io_notif_data *nd = container_of(uarg, struct io_notif_data, uarg);
@@ -43,6 +43,10 @@ static void io_tx_ubuf_callback(struct sk_buff *skb, struct ubuf_info *uarg,
 	}
 }
 
+static const struct ubuf_info_ops io_ubuf_ops = {
+	.complete = io_tx_ubuf_complete,
+};
+
 struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx)
 	__must_hold(&ctx->uring_lock)
 {
@@ -62,7 +66,7 @@ struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx)
 	nd->zc_report = false;
 	nd->account_pages = 0;
 	nd->uarg.flags = IO_NOTIF_UBUF_FLAGS;
-	nd->uarg.callback = io_tx_ubuf_callback;
+	nd->uarg.ops = &io_ubuf_ops;
 	refcount_set(&nd->uarg.refcnt, 1);
 	return notif;
 }
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b99127712e67..749abab23a67 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1708,7 +1708,7 @@ static struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size)
 		return NULL;
 	}
 
-	uarg->ubuf.callback = msg_zerocopy_callback;
+	uarg->ubuf.ops = &msg_zerocopy_ubuf_ops;
 	uarg->id = ((u32)atomic_inc_return(&sk->sk_zckey)) - 1;
 	uarg->len = 1;
 	uarg->bytelen = size;
@@ -1734,7 +1734,7 @@ struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size,
 		u32 bytelen, next;
 
 		/* there might be non MSG_ZEROCOPY users */
-		if (uarg->callback != msg_zerocopy_callback)
+		if (uarg->ops != &msg_zerocopy_ubuf_ops)
 			return NULL;
 
 		/* realloc only when socket is locked (TCP, UDP cork),
@@ -1845,8 +1845,8 @@ static void __msg_zerocopy_callback(struct ubuf_info_msgzc *uarg)
 	sock_put(sk);
 }
 
-void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg,
-			   bool success)
+static void msg_zerocopy_complete(struct sk_buff *skb, struct ubuf_info *uarg,
+				  bool success)
 {
 	struct ubuf_info_msgzc *uarg_zc = uarg_to_msgzc(uarg);
 
@@ -1855,7 +1855,7 @@ void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg,
 	if (refcount_dec_and_test(&uarg->refcnt))
 		__msg_zerocopy_callback(uarg_zc);
 }
-EXPORT_SYMBOL_GPL(msg_zerocopy_callback);
+
 
 void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref)
 {
@@ -1865,10 +1865,15 @@ void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref)
 	uarg_to_msgzc(uarg)->len--;
 
 	if (have_uref)
-		msg_zerocopy_callback(NULL, uarg, true);
+		msg_zerocopy_complete(NULL, uarg, true);
 }
 EXPORT_SYMBOL_GPL(msg_zerocopy_put_abort);
 
+const struct ubuf_info_ops msg_zerocopy_ubuf_ops = {
+	.complete = msg_zerocopy_complete,
+};
+EXPORT_SYMBOL_GPL(msg_zerocopy_ubuf_ops);
+
 int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb,
 			     struct msghdr *msg, int len,
 			     struct ubuf_info *uarg)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC 2/6] net: add callback for setting a ubuf_info to skb
  2024-04-12 12:55 [RFC 0/6] implement io_uring notification (ubuf_info) stacking Pavel Begunkov
  2024-04-12 12:55 ` [RFC 1/6] net: extend ubuf_info callback to ops structure Pavel Begunkov
@ 2024-04-12 12:55 ` Pavel Begunkov
  2024-04-13 17:18   ` David Ahern
  2024-04-12 12:55 ` [RFC 3/6] io_uring/notif: refactor io_tx_ubuf_complete() Pavel Begunkov
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Pavel Begunkov @ 2024-04-12 12:55 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn

At the moment an skb can only have one ubuf_info associated with it,
which might be a performance problem for zerocopy sends in cases like
TCP via io_uring. Add a callback for assigning ubuf_info to skb, this
way we will implement smarter assignment later like linking ubuf_info
together.

Note, it's an optional callback, which should be compatible with
skb_zcopy_set(), that's because the net stack might potentially decide
to clone an skb and take another reference to ubuf_info whenever it
wishes. Also, a correct implementation should always be able to bind to
an skb without prior ubuf_info, otherwise we could end up in a situation
when the send would not be able to progress.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/skbuff.h |  2 ++
 net/core/skbuff.c      | 20 ++++++++++++++------
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a110e97e074a..ced69f37977f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -530,6 +530,8 @@ enum {
 struct ubuf_info_ops {
 	void (*complete)(struct sk_buff *, struct ubuf_info *,
 			 bool zerocopy_success);
+	/* has to be compatible with skb_zcopy_set() */
+	int (*link_skb)(struct sk_buff *skb, struct ubuf_info *uarg);
 };
 
 /*
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 749abab23a67..1922e3d09c7f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1881,11 +1881,18 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb,
 	struct ubuf_info *orig_uarg = skb_zcopy(skb);
 	int err, orig_len = skb->len;
 
-	/* An skb can only point to one uarg. This edge case happens when
-	 * TCP appends to an skb, but zerocopy_realloc triggered a new alloc.
-	 */
-	if (orig_uarg && uarg != orig_uarg)
-		return -EEXIST;
+	if (uarg->ops->link_skb) {
+		err = uarg->ops->link_skb(skb, uarg);
+		if (err)
+			return err;
+	} else {
+		/* An skb can only point to one uarg. This edge case happens
+		 * when TCP appends to an skb, but zerocopy_realloc triggered
+		 * a new alloc.
+		 */
+		if (orig_uarg && uarg != orig_uarg)
+			return -EEXIST;
+	}
 
 	err = __zerocopy_sg_from_iter(msg, sk, skb, &msg->msg_iter, len);
 	if (err == -EFAULT || (err == -EMSGSIZE && skb->len == orig_len)) {
@@ -1899,7 +1906,8 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb,
 		return err;
 	}
 
-	skb_zcopy_set(skb, uarg, NULL);
+	if (!uarg->ops->link_skb)
+		skb_zcopy_set(skb, uarg, NULL);
 	return skb->len - orig_len;
 }
 EXPORT_SYMBOL_GPL(skb_zerocopy_iter_stream);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC 3/6] io_uring/notif: refactor io_tx_ubuf_complete()
  2024-04-12 12:55 [RFC 0/6] implement io_uring notification (ubuf_info) stacking Pavel Begunkov
  2024-04-12 12:55 ` [RFC 1/6] net: extend ubuf_info callback to ops structure Pavel Begunkov
  2024-04-12 12:55 ` [RFC 2/6] net: add callback for setting a ubuf_info to skb Pavel Begunkov
@ 2024-04-12 12:55 ` Pavel Begunkov
  2024-04-12 12:55 ` [RFC 4/6] io_uring/notif: remove ctx var from io_notif_tw_complete Pavel Begunkov
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Pavel Begunkov @ 2024-04-12 12:55 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn

Flip the dec_and_test if, so when we add more stuff later there is less
churn.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/notif.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/io_uring/notif.c b/io_uring/notif.c
index 7caaebf94312..5a8b2fdd67fd 100644
--- a/io_uring/notif.c
+++ b/io_uring/notif.c
@@ -37,10 +37,11 @@ static void io_tx_ubuf_complete(struct sk_buff *skb, struct ubuf_info *uarg,
 			WRITE_ONCE(nd->zc_copied, true);
 	}
 
-	if (refcount_dec_and_test(&uarg->refcnt)) {
-		notif->io_task_work.func = io_notif_tw_complete;
-		__io_req_task_work_add(notif, IOU_F_TWQ_LAZY_WAKE);
-	}
+	if (!refcount_dec_and_test(&uarg->refcnt))
+		return;
+
+	notif->io_task_work.func = io_notif_tw_complete;
+	__io_req_task_work_add(notif, IOU_F_TWQ_LAZY_WAKE);
 }
 
 static const struct ubuf_info_ops io_ubuf_ops = {
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC 4/6] io_uring/notif: remove ctx var from io_notif_tw_complete
  2024-04-12 12:55 [RFC 0/6] implement io_uring notification (ubuf_info) stacking Pavel Begunkov
                   ` (2 preceding siblings ...)
  2024-04-12 12:55 ` [RFC 3/6] io_uring/notif: refactor io_tx_ubuf_complete() Pavel Begunkov
@ 2024-04-12 12:55 ` Pavel Begunkov
  2024-04-12 12:55 ` [RFC 5/6] io_uring/notif: simplify io_notif_flush() Pavel Begunkov
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Pavel Begunkov @ 2024-04-12 12:55 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn

We don't need ctx in the hottest path, i.e. registered buffers,
let's get it only when we need it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/notif.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/io_uring/notif.c b/io_uring/notif.c
index 5a8b2fdd67fd..53532d78a947 100644
--- a/io_uring/notif.c
+++ b/io_uring/notif.c
@@ -12,13 +12,12 @@
 void io_notif_tw_complete(struct io_kiocb *notif, struct io_tw_state *ts)
 {
 	struct io_notif_data *nd = io_notif_to_data(notif);
-	struct io_ring_ctx *ctx = notif->ctx;
 
 	if (unlikely(nd->zc_report) && (nd->zc_copied || !nd->zc_used))
 		notif->cqe.res |= IORING_NOTIF_USAGE_ZC_COPIED;
 
-	if (nd->account_pages && ctx->user) {
-		__io_unaccount_mem(ctx->user, nd->account_pages);
+	if (nd->account_pages && notif->ctx->user) {
+		__io_unaccount_mem(notif->ctx->user, nd->account_pages);
 		nd->account_pages = 0;
 	}
 	io_req_task_complete(notif, ts);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC 5/6] io_uring/notif: simplify io_notif_flush()
  2024-04-12 12:55 [RFC 0/6] implement io_uring notification (ubuf_info) stacking Pavel Begunkov
                   ` (3 preceding siblings ...)
  2024-04-12 12:55 ` [RFC 4/6] io_uring/notif: remove ctx var from io_notif_tw_complete Pavel Begunkov
@ 2024-04-12 12:55 ` Pavel Begunkov
  2024-04-12 12:55 ` [RFC 6/6] io_uring/notif: implement notification stacking Pavel Begunkov
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Pavel Begunkov @ 2024-04-12 12:55 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn

io_notif_flush() is partially duplicating io_tx_ubuf_complete(), so
instead of duplicating it, make the flush call io_tx_ubuf_complete.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/notif.c | 6 +++---
 io_uring/notif.h | 9 +++------
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/io_uring/notif.c b/io_uring/notif.c
index 53532d78a947..26680176335f 100644
--- a/io_uring/notif.c
+++ b/io_uring/notif.c
@@ -9,7 +9,7 @@
 #include "notif.h"
 #include "rsrc.h"
 
-void io_notif_tw_complete(struct io_kiocb *notif, struct io_tw_state *ts)
+static void io_notif_tw_complete(struct io_kiocb *notif, struct io_tw_state *ts)
 {
 	struct io_notif_data *nd = io_notif_to_data(notif);
 
@@ -23,8 +23,8 @@ void io_notif_tw_complete(struct io_kiocb *notif, struct io_tw_state *ts)
 	io_req_task_complete(notif, ts);
 }
 
-static void io_tx_ubuf_complete(struct sk_buff *skb, struct ubuf_info *uarg,
-				bool success)
+void io_tx_ubuf_complete(struct sk_buff *skb, struct ubuf_info *uarg,
+			 bool success)
 {
 	struct io_notif_data *nd = container_of(uarg, struct io_notif_data, uarg);
 	struct io_kiocb *notif = cmd_to_io_kiocb(nd);
diff --git a/io_uring/notif.h b/io_uring/notif.h
index 52e124a9957c..394e1d33daa6 100644
--- a/io_uring/notif.h
+++ b/io_uring/notif.h
@@ -20,7 +20,8 @@ struct io_notif_data {
 };
 
 struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx);
-void io_notif_tw_complete(struct io_kiocb *notif, struct io_tw_state *ts);
+void io_tx_ubuf_complete(struct sk_buff *skb, struct ubuf_info *uarg,
+			 bool success);
 
 static inline struct io_notif_data *io_notif_to_data(struct io_kiocb *notif)
 {
@@ -32,11 +33,7 @@ static inline void io_notif_flush(struct io_kiocb *notif)
 {
 	struct io_notif_data *nd = io_notif_to_data(notif);
 
-	/* drop slot's master ref */
-	if (refcount_dec_and_test(&nd->uarg.refcnt)) {
-		notif->io_task_work.func = io_notif_tw_complete;
-		__io_req_task_work_add(notif, IOU_F_TWQ_LAZY_WAKE);
-	}
+	io_tx_ubuf_complete(NULL, &nd->uarg, true);
 }
 
 static inline int io_notif_account_mem(struct io_kiocb *notif, unsigned len)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC 6/6] io_uring/notif: implement notification stacking
  2024-04-12 12:55 [RFC 0/6] implement io_uring notification (ubuf_info) stacking Pavel Begunkov
                   ` (4 preceding siblings ...)
  2024-04-12 12:55 ` [RFC 5/6] io_uring/notif: simplify io_notif_flush() Pavel Begunkov
@ 2024-04-12 12:55 ` Pavel Begunkov
  2024-04-14 17:10   ` Willem de Bruijn
  2024-04-12 13:44 ` [RFC 0/6] implement io_uring notification (ubuf_info) stacking Jens Axboe
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Pavel Begunkov @ 2024-04-12 12:55 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn

The network stack allows only one ubuf_info per skb, and unlike
MSG_ZEROCOPY, each io_uring zerocopy send will carry a separate
ubuf_info. That means that send requests can't reuse a previosly
allocated skb and need to get one more or more of new ones. That's fine
for large sends, but otherwise it would spam the stack with lots of skbs
carrying just a little data each.

To help with that implement linking notification (i.e. an io_uring wrapper
around ubuf_info) into a list. Each is refcounted by skbs and the stack
as usual. additionally all non head entries keep a reference to the
head, which they put down when their refcount hits 0. When the head have
no more users, it'll efficiently put all notifications in a batch.

As mentioned previously about ->io_link_skb, the callback implementation
always allows to bind to an skb without a ubuf_info.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/notif.c | 71 +++++++++++++++++++++++++++++++++++++++++++-----
 io_uring/notif.h |  4 +++
 2 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/io_uring/notif.c b/io_uring/notif.c
index 26680176335f..d58cdc01e691 100644
--- a/io_uring/notif.c
+++ b/io_uring/notif.c
@@ -9,18 +9,28 @@
 #include "notif.h"
 #include "rsrc.h"
 
+static const struct ubuf_info_ops io_ubuf_ops;
+
 static void io_notif_tw_complete(struct io_kiocb *notif, struct io_tw_state *ts)
 {
 	struct io_notif_data *nd = io_notif_to_data(notif);
 
-	if (unlikely(nd->zc_report) && (nd->zc_copied || !nd->zc_used))
-		notif->cqe.res |= IORING_NOTIF_USAGE_ZC_COPIED;
+	do {
+		notif = cmd_to_io_kiocb(nd);
 
-	if (nd->account_pages && notif->ctx->user) {
-		__io_unaccount_mem(notif->ctx->user, nd->account_pages);
-		nd->account_pages = 0;
-	}
-	io_req_task_complete(notif, ts);
+		lockdep_assert(refcount_read(&nd->uarg.refcnt) == 0);
+
+		if (unlikely(nd->zc_report) && (nd->zc_copied || !nd->zc_used))
+			notif->cqe.res |= IORING_NOTIF_USAGE_ZC_COPIED;
+
+		if (nd->account_pages && notif->ctx->user) {
+			__io_unaccount_mem(notif->ctx->user, nd->account_pages);
+			nd->account_pages = 0;
+		}
+
+		nd = nd->next;
+		io_req_task_complete(notif, ts);
+	} while (nd);
 }
 
 void io_tx_ubuf_complete(struct sk_buff *skb, struct ubuf_info *uarg,
@@ -39,12 +49,56 @@ void io_tx_ubuf_complete(struct sk_buff *skb, struct ubuf_info *uarg,
 	if (!refcount_dec_and_test(&uarg->refcnt))
 		return;
 
+	if (nd->head != nd) {
+		io_tx_ubuf_complete(skb, &nd->head->uarg, success);
+		return;
+	}
 	notif->io_task_work.func = io_notif_tw_complete;
 	__io_req_task_work_add(notif, IOU_F_TWQ_LAZY_WAKE);
 }
 
+static int io_link_skb(struct sk_buff *skb, struct ubuf_info *uarg)
+{
+	struct io_notif_data *nd, *prev_nd;
+	struct io_kiocb *prev_notif, *notif;
+	struct ubuf_info *prev_uarg = skb_zcopy(skb);
+
+	nd = container_of(uarg, struct io_notif_data, uarg);
+	notif = cmd_to_io_kiocb(nd);
+
+	if (!prev_uarg) {
+		net_zcopy_get(&nd->uarg);
+		skb_zcopy_init(skb, &nd->uarg);
+		return 0;
+	}
+	/* handle it separately as we can't link a notif to itself */
+	if (unlikely(prev_uarg == &nd->uarg))
+		return 0;
+	/* we can't join two links together, just request a fresh skb */
+	if (unlikely(nd->head != nd || nd->next))
+		return -EEXIST;
+	/* don't mix zc providers */
+	if (unlikely(prev_uarg->ops != &io_ubuf_ops))
+		return -EEXIST;
+
+	prev_nd = container_of(prev_uarg, struct io_notif_data, uarg);
+	prev_notif = cmd_to_io_kiocb(nd);
+
+	/* make sure all noifications can be finished in the same task_work */
+	if (unlikely(notif->ctx != prev_notif->ctx ||
+		     notif->task != prev_notif->task))
+		return -EEXIST;
+
+	nd->head = prev_nd->head;
+	nd->next = prev_nd->next;
+	prev_nd->next = nd;
+	net_zcopy_get(&nd->head->uarg);
+	return 0;
+}
+
 static const struct ubuf_info_ops io_ubuf_ops = {
 	.complete = io_tx_ubuf_complete,
+	.link_skb = io_link_skb,
 };
 
 struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx)
@@ -65,6 +119,9 @@ struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx)
 	nd = io_notif_to_data(notif);
 	nd->zc_report = false;
 	nd->account_pages = 0;
+	nd->next = NULL;
+	nd->head = nd;
+
 	nd->uarg.flags = IO_NOTIF_UBUF_FLAGS;
 	nd->uarg.ops = &io_ubuf_ops;
 	refcount_set(&nd->uarg.refcnt, 1);
diff --git a/io_uring/notif.h b/io_uring/notif.h
index 394e1d33daa6..6d2e8b674b43 100644
--- a/io_uring/notif.h
+++ b/io_uring/notif.h
@@ -14,6 +14,10 @@ struct io_notif_data {
 	struct file		*file;
 	struct ubuf_info	uarg;
 	unsigned long		account_pages;
+
+	struct io_notif_data	*next;
+	struct io_notif_data	*head;
+
 	bool			zc_report;
 	bool			zc_used;
 	bool			zc_copied;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC 0/6] implement io_uring notification (ubuf_info) stacking
  2024-04-12 12:55 [RFC 0/6] implement io_uring notification (ubuf_info) stacking Pavel Begunkov
                   ` (5 preceding siblings ...)
  2024-04-12 12:55 ` [RFC 6/6] io_uring/notif: implement notification stacking Pavel Begunkov
@ 2024-04-12 13:44 ` Jens Axboe
  2024-04-12 14:52 ` Jens Axboe
  2024-04-13 17:17 ` David Ahern
  8 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2024-04-12 13:44 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, netdev
  Cc: David S . Miller, Jakub Kicinski, David Ahern, Eric Dumazet,
	Willem de Bruijn

On 4/12/24 6:55 AM, Pavel Begunkov wrote:
> io_uring allocates a ubuf_info per zerocopy send request, it's convenient
> for the userspace but with how things are it means that every time the 
> TCP stack has to allocate a new skb instead of amending into a previous
> one. Unless sends are big enough, there will be lots of small skbs
> straining the stack and dipping performance.
> 
> The patchset implements notification, i.e. an io_uring's ubuf_info
> extension, stacking. It tries to link ubuf_info's into a list, and
> the entire link will be put down together once all references are
> gone.

Excellent! I'll take a closer look, but I ran a quick test with my test
tool just to see the difference. This is on a 100G link.

Packet size	Before (Mbit)    After (Mbit)   Diff
====================================================
100		290		  1250		4.3x
200		560		  2460		4.4x
400		1190		  4900		4.1x
800		2300		  9700		4.2x
1600		4500		 19100		4.2x
3200		8900		 35000		3.9x

which are just rough numbers and the tool isn't that great, but
definitely encouraging. And it does have parity with sync MSG_ZEROPCY,
which is what I was really bugged about before.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 0/6] implement io_uring notification (ubuf_info) stacking
  2024-04-12 12:55 [RFC 0/6] implement io_uring notification (ubuf_info) stacking Pavel Begunkov
                   ` (6 preceding siblings ...)
  2024-04-12 13:44 ` [RFC 0/6] implement io_uring notification (ubuf_info) stacking Jens Axboe
@ 2024-04-12 14:52 ` Jens Axboe
  2024-04-13 17:17 ` David Ahern
  8 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2024-04-12 14:52 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, netdev
  Cc: David S . Miller, Jakub Kicinski, David Ahern, Eric Dumazet,
	Willem de Bruijn

Reviewed the patch set, and I think this is nice and clean and the right
fix. For the series:

Reviewed-by: Jens Axboe <axboe@kernel.dk>

If the net people agree, we'll have to coordinate staging of the first
two patches.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 0/6] implement io_uring notification (ubuf_info) stacking
  2024-04-12 12:55 [RFC 0/6] implement io_uring notification (ubuf_info) stacking Pavel Begunkov
                   ` (7 preceding siblings ...)
  2024-04-12 14:52 ` Jens Axboe
@ 2024-04-13 17:17 ` David Ahern
  2024-04-15  0:08   ` Pavel Begunkov
  8 siblings, 1 reply; 25+ messages in thread
From: David Ahern @ 2024-04-13 17:17 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, Eric Dumazet,
	Willem de Bruijn

On 4/12/24 6:55 AM, Pavel Begunkov wrote:
> io_uring allocates a ubuf_info per zerocopy send request, it's convenient
> for the userspace but with how things are it means that every time the 
> TCP stack has to allocate a new skb instead of amending into a previous
> one. Unless sends are big enough, there will be lots of small skbs
> straining the stack and dipping performance.

The ubuf_info forces TCP segmentation at less than MTU boundaries which
kills performance with small message sizes as TCP is forced to send
small packets. This is an interesting solution to allow the byte stream
to flow yet maintain the segmentation boundaries for callbacks.

> 
> The patchset implements notification, i.e. an io_uring's ubuf_info
> extension, stacking. It tries to link ubuf_info's into a list, and
> the entire link will be put down together once all references are
> gone.
> 
> Testing with liburing/examples/send-zerocopy and another custom made
> tool, with 4K bytes per send it improves performance ~6 times and
> levels it with MSG_ZEROCOPY. Without the patchset it requires much
> larger sends to utilise all potential.
> 
> bytes  | before | after (Kqps)  
> 100    | 283    | 936
> 1200   | 195    | 1023
> 4000   | 193    | 1386
> 8000   | 154    | 1058
> 
> Pavel Begunkov (6):
>   net: extend ubuf_info callback to ops structure
>   net: add callback for setting a ubuf_info to skb
>   io_uring/notif: refactor io_tx_ubuf_complete()
>   io_uring/notif: remove ctx var from io_notif_tw_complete
>   io_uring/notif: simplify io_notif_flush()
>   io_uring/notif: implement notification stacking
> 
>  drivers/net/tap.c      |  2 +-
>  drivers/net/tun.c      |  2 +-
>  drivers/vhost/net.c    |  8 +++-
>  include/linux/skbuff.h | 21 ++++++----
>  io_uring/notif.c       | 91 +++++++++++++++++++++++++++++++++++-------
>  io_uring/notif.h       | 13 +++---
>  net/core/skbuff.c      | 37 +++++++++++------
>  7 files changed, 129 insertions(+), 45 deletions(-)
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 1/6] net: extend ubuf_info callback to ops structure
  2024-04-12 12:55 ` [RFC 1/6] net: extend ubuf_info callback to ops structure Pavel Begunkov
@ 2024-04-13 17:17   ` David Ahern
  2024-04-14 17:07   ` Willem de Bruijn
  1 sibling, 0 replies; 25+ messages in thread
From: David Ahern @ 2024-04-13 17:17 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, Eric Dumazet,
	Willem de Bruijn

On 4/12/24 6:55 AM, Pavel Begunkov wrote:
> We'll need to associate additional callbacks with ubuf_info, introduce
> a structure holding ubuf_info callbacks. Apart from a more smarter
> io_uring notification management introduced in next patches, it can be
> used to generalise msg_zerocopy_put_abort() and also store
> ->sg_from_iter, which is currently passed in struct msghdr.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  drivers/net/tap.c      |  2 +-
>  drivers/net/tun.c      |  2 +-
>  drivers/vhost/net.c    |  8 ++++++--
>  include/linux/skbuff.h | 19 +++++++++++--------
>  io_uring/notif.c       |  8 ++++++--
>  net/core/skbuff.c      | 17 +++++++++++------
>  6 files changed, 36 insertions(+), 20 deletions(-)
> 

Reviewed-by: David Ahern <dsahern@kernel.org>



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 2/6] net: add callback for setting a ubuf_info to skb
  2024-04-12 12:55 ` [RFC 2/6] net: add callback for setting a ubuf_info to skb Pavel Begunkov
@ 2024-04-13 17:18   ` David Ahern
  0 siblings, 0 replies; 25+ messages in thread
From: David Ahern @ 2024-04-13 17:18 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, Eric Dumazet,
	Willem de Bruijn

On 4/12/24 6:55 AM, Pavel Begunkov wrote:
> At the moment an skb can only have one ubuf_info associated with it,
> which might be a performance problem for zerocopy sends in cases like
> TCP via io_uring. Add a callback for assigning ubuf_info to skb, this
> way we will implement smarter assignment later like linking ubuf_info
> together.
> 
> Note, it's an optional callback, which should be compatible with
> skb_zcopy_set(), that's because the net stack might potentially decide
> to clone an skb and take another reference to ubuf_info whenever it
> wishes. Also, a correct implementation should always be able to bind to
> an skb without prior ubuf_info, otherwise we could end up in a situation
> when the send would not be able to progress.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  include/linux/skbuff.h |  2 ++
>  net/core/skbuff.c      | 20 ++++++++++++++------
>  2 files changed, 16 insertions(+), 6 deletions(-)
> 

Reviewed-by: David Ahern <dsahern@kernel.org>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 1/6] net: extend ubuf_info callback to ops structure
  2024-04-12 12:55 ` [RFC 1/6] net: extend ubuf_info callback to ops structure Pavel Begunkov
  2024-04-13 17:17   ` David Ahern
@ 2024-04-14 17:07   ` Willem de Bruijn
  2024-04-15  0:07     ` Pavel Begunkov
  1 sibling, 1 reply; 25+ messages in thread
From: Willem de Bruijn @ 2024-04-14 17:07 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn

Pavel Begunkov wrote:
> We'll need to associate additional callbacks with ubuf_info, introduce
> a structure holding ubuf_info callbacks. Apart from a more smarter
> io_uring notification management introduced in next patches, it can be
> used to generalise msg_zerocopy_put_abort() and also store
> ->sg_from_iter, which is currently passed in struct msghdr.

This adds an extra indirection for all other ubuf implementations.
Can that be avoided?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 6/6] io_uring/notif: implement notification stacking
  2024-04-12 12:55 ` [RFC 6/6] io_uring/notif: implement notification stacking Pavel Begunkov
@ 2024-04-14 17:10   ` Willem de Bruijn
  2024-04-14 23:55     ` Pavel Begunkov
  0 siblings, 1 reply; 25+ messages in thread
From: Willem de Bruijn @ 2024-04-14 17:10 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn

Pavel Begunkov wrote:
> The network stack allows only one ubuf_info per skb, and unlike
> MSG_ZEROCOPY, each io_uring zerocopy send will carry a separate
> ubuf_info. That means that send requests can't reuse a previosly
> allocated skb and need to get one more or more of new ones. That's fine
> for large sends, but otherwise it would spam the stack with lots of skbs
> carrying just a little data each.

Can you give a little context why each send request has to be a
separate ubuf_info?

This patch series aims to make that model more efficient. Would it be
possible to just change the model instead? I assume you tried that and
it proved unworkable, but is it easy to explain what the fundamental
blocker is?

MSG_ZEROCOPY uses uarg->len to identify multiple consecutive send
operations that can be notified at once.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 6/6] io_uring/notif: implement notification stacking
  2024-04-14 17:10   ` Willem de Bruijn
@ 2024-04-14 23:55     ` Pavel Begunkov
  2024-04-15 15:15       ` Willem de Bruijn
  0 siblings, 1 reply; 25+ messages in thread
From: Pavel Begunkov @ 2024-04-14 23:55 UTC (permalink / raw)
  To: Willem de Bruijn, io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, David Ahern, Eric Dumazet

On 4/14/24 18:10, Willem de Bruijn wrote:
> Pavel Begunkov wrote:
>> The network stack allows only one ubuf_info per skb, and unlike
>> MSG_ZEROCOPY, each io_uring zerocopy send will carry a separate
>> ubuf_info. That means that send requests can't reuse a previosly
>> allocated skb and need to get one more or more of new ones. That's fine
>> for large sends, but otherwise it would spam the stack with lots of skbs
>> carrying just a little data each.
> 
> Can you give a little context why each send request has to be a
> separate ubuf_info?
> 
> This patch series aims to make that model more efficient. Would it be
> possible to just change the model instead? I assume you tried that and
> it proved unworkable, but is it easy to explain what the fundamental
> blocker is?

The uapi is so that you get a buffer completion (analogous to what you
get with recv(MSG_ERRQUEUE)) for each send request. With that, for skb
to serve multiple send requests it'd need to store a list of completions
in some way. One could try to track sockets, have one "active" ubuf_info
per socket which all sends would use, and then eventually flush the
active ubuf so it can post completions and create a new one. but io_uring
wouldn't know when it needs to "flush", whenever in the net stack it
happens naturally when it pushes skbs from the queue. Not to mention
that socket tracking has its own complications.

As for uapi, in early versions of io_uring's SEND_ZC, ubuf_info and
requests weren't entangled, roughly speaking, the user could choose
that this request should use this ubuf_info (I can elaborate if
interesting). It wasn't too complex, but all feedback was pointing
that it's much easier to use hot it is now, and honestly it does
buy with simplicity.

I'm not sure what a different model would give. We wouldn't win
in efficiency comparing to this patch, I can go into details
how there are no extra atomics/locks/kmalloc/etc., the only bit
is waking up waiting tasks, but that still would need to happen.
I can even optimise / ammortise ubuf refcounting if that would
matter.

> MSG_ZEROCOPY uses uarg->len to identify multiple consecutive send
> operations that can be notified at once.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 1/6] net: extend ubuf_info callback to ops structure
  2024-04-14 17:07   ` Willem de Bruijn
@ 2024-04-15  0:07     ` Pavel Begunkov
  2024-04-15 15:06       ` Willem de Bruijn
  2024-04-16 14:50       ` David Ahern
  0 siblings, 2 replies; 25+ messages in thread
From: Pavel Begunkov @ 2024-04-15  0:07 UTC (permalink / raw)
  To: Willem de Bruijn, io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, David Ahern, Eric Dumazet

On 4/14/24 18:07, Willem de Bruijn wrote:
> Pavel Begunkov wrote:
>> We'll need to associate additional callbacks with ubuf_info, introduce
>> a structure holding ubuf_info callbacks. Apart from a more smarter
>> io_uring notification management introduced in next patches, it can be
>> used to generalise msg_zerocopy_put_abort() and also store
>> ->sg_from_iter, which is currently passed in struct msghdr.
> 
> This adds an extra indirection for all other ubuf implementations.
> Can that be avoided?

It could be fitted directly into ubuf_info, but that doesn't feel
right. It should be hot, so does it even matter? On the bright side,
with the patch I'll also ->sg_from_iter from msghdr into it, so it
doesn't have to be in the generic path.

I think it's the right approach, but if you have a strong opinion
I can fit it as a new field in ubuf_info.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 0/6] implement io_uring notification (ubuf_info) stacking
  2024-04-13 17:17 ` David Ahern
@ 2024-04-15  0:08   ` Pavel Begunkov
  0 siblings, 0 replies; 25+ messages in thread
From: Pavel Begunkov @ 2024-04-15  0:08 UTC (permalink / raw)
  To: David Ahern, io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, Eric Dumazet,
	Willem de Bruijn

On 4/13/24 18:17, David Ahern wrote:
> On 4/12/24 6:55 AM, Pavel Begunkov wrote:
>> io_uring allocates a ubuf_info per zerocopy send request, it's convenient
>> for the userspace but with how things are it means that every time the
>> TCP stack has to allocate a new skb instead of amending into a previous
>> one. Unless sends are big enough, there will be lots of small skbs
>> straining the stack and dipping performance.
> 
> The ubuf_info forces TCP segmentation at less than MTU boundaries which
> kills performance with small message sizes as TCP is forced to send
> small packets. This is an interesting solution to allow the byte stream
> to flow yet maintain the segmentation boundaries for callbacks.

Thanks, I'll add your reviews if the patches survive in the
current form!


>> The patchset implements notification, i.e. an io_uring's ubuf_info
>> extension, stacking. It tries to link ubuf_info's into a list, and
>> the entire link will be put down together once all references are
>> gone.
>>
>> Testing with liburing/examples/send-zerocopy and another custom made
>> tool, with 4K bytes per send it improves performance ~6 times and
>> levels it with MSG_ZEROCOPY. Without the patchset it requires much
>> larger sends to utilise all potential.
>>
>> bytes  | before | after (Kqps)
>> 100    | 283    | 936
>> 1200   | 195    | 1023
>> 4000   | 193    | 1386
>> 8000   | 154    | 1058
>>
>> Pavel Begunkov (6):
>>    net: extend ubuf_info callback to ops structure
>>    net: add callback for setting a ubuf_info to skb
>>    io_uring/notif: refactor io_tx_ubuf_complete()
>>    io_uring/notif: remove ctx var from io_notif_tw_complete
>>    io_uring/notif: simplify io_notif_flush()
>>    io_uring/notif: implement notification stacking
>>
>>   drivers/net/tap.c      |  2 +-
>>   drivers/net/tun.c      |  2 +-
>>   drivers/vhost/net.c    |  8 +++-
>>   include/linux/skbuff.h | 21 ++++++----
>>   io_uring/notif.c       | 91 +++++++++++++++++++++++++++++++++++-------
>>   io_uring/notif.h       | 13 +++---
>>   net/core/skbuff.c      | 37 +++++++++++------
>>   7 files changed, 129 insertions(+), 45 deletions(-)
>>
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 1/6] net: extend ubuf_info callback to ops structure
  2024-04-15  0:07     ` Pavel Begunkov
@ 2024-04-15 15:06       ` Willem de Bruijn
  2024-04-15 18:55         ` Pavel Begunkov
  2024-04-16 14:50       ` David Ahern
  1 sibling, 1 reply; 25+ messages in thread
From: Willem de Bruijn @ 2024-04-15 15:06 UTC (permalink / raw)
  To: Pavel Begunkov, Willem de Bruijn, io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, David Ahern, Eric Dumazet

Pavel Begunkov wrote:
> On 4/14/24 18:07, Willem de Bruijn wrote:
> > Pavel Begunkov wrote:
> >> We'll need to associate additional callbacks with ubuf_info, introduce
> >> a structure holding ubuf_info callbacks. Apart from a more smarter
> >> io_uring notification management introduced in next patches, it can be
> >> used to generalise msg_zerocopy_put_abort() and also store
> >> ->sg_from_iter, which is currently passed in struct msghdr.
> > 
> > This adds an extra indirection for all other ubuf implementations.
> > Can that be avoided?
> 
> It could be fitted directly into ubuf_info, but that doesn't feel
> right. It should be hot, so does it even matter?

That depends on the workload (working set size)?

> On the bright side,
> with the patch I'll also ->sg_from_iter from msghdr into it, so it
> doesn't have to be in the generic path.

I don't follow this: is this suggested future work?

> 
> I think it's the right approach, but if you have a strong opinion
> I can fit it as a new field in ubuf_info.

If there is a significant cost, I suppose we could use
INDIRECT_CALL or go one step further and demultiplex
based on the new ops

    if (uarg->ops == &msg_zerocopy_ubuf_ops)
        msg_zerocopy_callback(..);



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 6/6] io_uring/notif: implement notification stacking
  2024-04-14 23:55     ` Pavel Begunkov
@ 2024-04-15 15:15       ` Willem de Bruijn
  2024-04-15 18:51         ` Pavel Begunkov
  0 siblings, 1 reply; 25+ messages in thread
From: Willem de Bruijn @ 2024-04-15 15:15 UTC (permalink / raw)
  To: Pavel Begunkov, Willem de Bruijn, io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, David Ahern, Eric Dumazet

Pavel Begunkov wrote:
> On 4/14/24 18:10, Willem de Bruijn wrote:
> > Pavel Begunkov wrote:
> >> The network stack allows only one ubuf_info per skb, and unlike
> >> MSG_ZEROCOPY, each io_uring zerocopy send will carry a separate
> >> ubuf_info. That means that send requests can't reuse a previosly
> >> allocated skb and need to get one more or more of new ones. That's fine
> >> for large sends, but otherwise it would spam the stack with lots of skbs
> >> carrying just a little data each.
> > 
> > Can you give a little context why each send request has to be a
> > separate ubuf_info?
> > 
> > This patch series aims to make that model more efficient. Would it be
> > possible to just change the model instead? I assume you tried that and
> > it proved unworkable, but is it easy to explain what the fundamental
> > blocker is?
> 
> The uapi is so that you get a buffer completion (analogous to what you
> get with recv(MSG_ERRQUEUE)) for each send request. With that, for skb
> to serve multiple send requests it'd need to store a list of completions
> in some way. 

I probably don't know the io_uring implementation well enough yet, so
take this with a huge grain of salt.

MSG_ZEROCOPY can generate completions for multiple send calls from a
single uarg, by virtue of completions being incrementing IDs.

Is there a fundamental reason why io_uring needs a 1:1 mapping between
request slots in the API and uarg in the datapath? Or differently, is
there no trivial way to associate a range of completions with a single
uarg?

> One could try to track sockets, have one "active" ubuf_info
> per socket which all sends would use, and then eventually flush the
> active ubuf so it can post completions and create a new one.

This is basically what MSG_ZEROCOPY does for TCP. It signals POLLERR
as soon as one completion arrives. Then when a process gets around to
calling MSG_ERRQUEUE, it returns the range of completions that have
arrived in the meantime. A process can thus decide to postpone
completion handling to increase batching.

> but io_uring
> wouldn't know when it needs to "flush", whenever in the net stack it
> happens naturally when it pushes skbs from the queue. Not to mention
> that socket tracking has its own complications.
> 
> As for uapi, in early versions of io_uring's SEND_ZC, ubuf_info and
> requests weren't entangled, roughly speaking, the user could choose
> that this request should use this ubuf_info (I can elaborate if
> interesting). It wasn't too complex, but all feedback was pointing
> that it's much easier to use hot it is now, and honestly it does
> buy with simplicity.

I see. I suppose that answers the 1:1 mapping the ABI question I
asked above. I should reread that patch.

> I'm not sure what a different model would give. We wouldn't win
> in efficiency comparing to this patch, I can go into details
> how there are no extra atomics/locks/kmalloc/etc., the only bit
> is waking up waiting tasks, but that still would need to happen.
> I can even optimise / ammortise ubuf refcounting if that would
> matter.

Slight aside: we know that MSG_ZEROCOPY is quite inefficient for
small sends. Very rough rule of thumb is you need around 16KB or
larger sends for it to outperform regular copy. Part of that is the
memory pinning. The other part is the notification handling.
MSG_ERRQUEUE is expensive. I hope that io_uring cannot just match, but
improve on MSG_ZEROCOPY, especially for smaller packets.

> 
> > MSG_ZEROCOPY uses uarg->len to identify multiple consecutive send
> > operations that can be notified at once.
> 
> -- 
> Pavel Begunkov



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 6/6] io_uring/notif: implement notification stacking
  2024-04-15 15:15       ` Willem de Bruijn
@ 2024-04-15 18:51         ` Pavel Begunkov
  2024-04-15 19:02           ` Willem de Bruijn
  0 siblings, 1 reply; 25+ messages in thread
From: Pavel Begunkov @ 2024-04-15 18:51 UTC (permalink / raw)
  To: Willem de Bruijn, io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, David Ahern, Eric Dumazet

On 4/15/24 16:15, Willem de Bruijn wrote:
> Pavel Begunkov wrote:
>> On 4/14/24 18:10, Willem de Bruijn wrote:
>>> Pavel Begunkov wrote:
>>>> The network stack allows only one ubuf_info per skb, and unlike
>>>> MSG_ZEROCOPY, each io_uring zerocopy send will carry a separate
>>>> ubuf_info. That means that send requests can't reuse a previosly
>>>> allocated skb and need to get one more or more of new ones. That's fine
>>>> for large sends, but otherwise it would spam the stack with lots of skbs
>>>> carrying just a little data each.
>>>
>>> Can you give a little context why each send request has to be a
>>> separate ubuf_info?
>>>
>>> This patch series aims to make that model more efficient. Would it be
>>> possible to just change the model instead? I assume you tried that and
>>> it proved unworkable, but is it easy to explain what the fundamental
>>> blocker is?
>>
>> The uapi is so that you get a buffer completion (analogous to what you
>> get with recv(MSG_ERRQUEUE)) for each send request. With that, for skb
>> to serve multiple send requests it'd need to store a list of completions
>> in some way.
> 
> I probably don't know the io_uring implementation well enough yet, so
> take this with a huge grain of salt.
> 
> MSG_ZEROCOPY can generate completions for multiple send calls from a
> single uarg, by virtue of completions being incrementing IDs.
> 
> Is there a fundamental reason why io_uring needs a 1:1 mapping between
> request slots in the API and uarg in the datapath? 

That's an ABI difference. Where MSG_ZEROCOPY returns a range of bytes
for the user to look up which buffers now can be reused, io_uring posts
one completion per send request, and by request I mean an io_uring
way of doing sendmsg(2). Hence the 1:1 mapping of uargs (which post
that completion) to send zc requests.

IOW, and if MSG_ZEROCOPY's uarg tracks byte range it covers, then
io_uring needs to know all requests associated with it, which
is currently just one request because of the 1:1 mapping.

> Or differently, is
> there no trivial way to associate a range of completions with a single
> uarg?

Quite non trivial without changing ABI, I'd say. And a ABI change
wouldn't be small and without pitfalls.

>> One could try to track sockets, have one "active" ubuf_info
>> per socket which all sends would use, and then eventually flush the
>> active ubuf so it can post completions and create a new one.
> 
> This is basically what MSG_ZEROCOPY does for TCP. It signals POLLERR
> as soon as one completion arrives. Then when a process gets around to
> calling MSG_ERRQUEUE, it returns the range of completions that have
> arrived in the meantime. A process can thus decide to postpone
> completion handling to increase batching.

Yes, there is that on the completion side, but in the submission
you need to also decide when to let the current uarg go and
allocate a new one. Not an issue if uarg is owned by the TCP
stack, you don't have to additionally reference it, you know
when you empty the queue and all that. Not that great if
io_uring needs to talk to socket to understand when uarg is
better to be dropped.

>> but io_uring
>> wouldn't know when it needs to "flush", whenever in the net stack it
>> happens naturally when it pushes skbs from the queue. Not to mention
>> that socket tracking has its own complications.
>>
>> As for uapi, in early versions of io_uring's SEND_ZC, ubuf_info and
>> requests weren't entangled, roughly speaking, the user could choose
>> that this request should use this ubuf_info (I can elaborate if
>> interesting). It wasn't too complex, but all feedback was pointing
>> that it's much easier to use hot it is now, and honestly it does
>> buy with simplicity.
> 
> I see. I suppose that answers the 1:1 mapping the ABI question I
> asked above. I should reread that patch.
> 
>> I'm not sure what a different model would give. We wouldn't win
>> in efficiency comparing to this patch, I can go into details
>> how there are no extra atomics/locks/kmalloc/etc., the only bit
>> is waking up waiting tasks, but that still would need to happen.
>> I can even optimise / ammortise ubuf refcounting if that would
>> matter.
> 
> Slight aside: we know that MSG_ZEROCOPY is quite inefficient for
> small sends. Very rough rule of thumb is you need around 16KB or
> larger sends for it to outperform regular copy. Part of that is the
> memory pinning. The other part is the notification handling.
> MSG_ERRQUEUE is expensive. I hope that io_uring cannot just match, but
> improve on MSG_ZEROCOPY, especially for smaller packets.

I has some numbers left from this patchset benchmarking. Not too
well suited to answer your question, but still gives an idea.
Just a benchmark, single buffer, 100g broadcom NIC IIRC. All is
io_uring based, -z<bool> switches copy vs zerocopy. Zero copy
uses registered buffers, so no page pinning and page table
traversal at runtime. 10s per run is not ideal, but was matching
longer runs.

# 1200 bytes
./send-zerocopy -4 tcp -D <ip> -t 10 -n 1 -l0 -b1 -d -s1200 -z0
packets=15004160 (MB=17170), rps=1470996 (MB/s=1683)
./send-zerocopy -4 tcp -D <ip> -t 10 -n 1 -l0 -b1 -d -s1200 -z1
packets=10440224 (MB=11947), rps=1023551 (MB/s=1171)

# 4000 bytes
./send-zerocopy -4 tcp -D <ip> -t 10 -n 1 -l0 -b1 -d -s4000 -z0
packets=11742688 (MB=44794), rps=1151243 (MB/s=4391)
./send-zerocopy -4 tcp -D <ip> -t 10 -n 1 -l0 -b1 -d -s4000 -z1
packets=14144048 (MB=53955), rps=1386671 (MB/s=5289)

# 8000 bytes
./send-zerocopy -4 tcp -D <ip> -t 10 -n 1 -l0 -b1 -d -s8000 -z0
packets=6868976 (MB=52406), rps=673429 (MB/s=5137)
./send-zerocopy -4 tcp -D <ip> -t 10 -n 1 -l0 -b1 -d -s8000 -z1
packets=10800784 (MB=82403), rps=1058900 (MB/s=8078)


-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 1/6] net: extend ubuf_info callback to ops structure
  2024-04-15 15:06       ` Willem de Bruijn
@ 2024-04-15 18:55         ` Pavel Begunkov
  2024-04-15 19:01           ` Willem de Bruijn
  0 siblings, 1 reply; 25+ messages in thread
From: Pavel Begunkov @ 2024-04-15 18:55 UTC (permalink / raw)
  To: Willem de Bruijn, io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, David Ahern, Eric Dumazet

On 4/15/24 16:06, Willem de Bruijn wrote:
> Pavel Begunkov wrote:
>> On 4/14/24 18:07, Willem de Bruijn wrote:
>>> Pavel Begunkov wrote:
>>>> We'll need to associate additional callbacks with ubuf_info, introduce
>>>> a structure holding ubuf_info callbacks. Apart from a more smarter
>>>> io_uring notification management introduced in next patches, it can be
>>>> used to generalise msg_zerocopy_put_abort() and also store
>>>> ->sg_from_iter, which is currently passed in struct msghdr.
>>>
>>> This adds an extra indirection for all other ubuf implementations.
>>> Can that be avoided?
>>
>> It could be fitted directly into ubuf_info, but that doesn't feel
>> right. It should be hot, so does it even matter?
> 
> That depends on the workload (working set size)?
>>> On the bright side,
>> with the patch I'll also ->sg_from_iter from msghdr into it, so it
>> doesn't have to be in the generic path.
> 
> I don't follow this: is this suggested future work?

Right, a small change I will add later. Without ops though
having 3 callback fields in uargs would be out of hands.

>> I think it's the right approach, but if you have a strong opinion
>> I can fit it as a new field in ubuf_info.
> 
> If there is a significant cost, I suppose we could use
> INDIRECT_CALL or go one step further and demultiplex
> based on the new ops
> 
>      if (uarg->ops == &msg_zerocopy_ubuf_ops)
>          msg_zerocopy_callback(..);

Let me note that the patch doesn't change the number of indirect
calls but only adds one extra deref to get the callback, i.e.
uarg->ops->callback() instead of uarg->callback(). Your snippet
goes an extra mile and removes the indirect call.

Can I take it as that you're fine with the direction of the
patch? Or do you want me to change anything?

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 1/6] net: extend ubuf_info callback to ops structure
  2024-04-15 18:55         ` Pavel Begunkov
@ 2024-04-15 19:01           ` Willem de Bruijn
  0 siblings, 0 replies; 25+ messages in thread
From: Willem de Bruijn @ 2024-04-15 19:01 UTC (permalink / raw)
  To: Pavel Begunkov, Willem de Bruijn, io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, David Ahern, Eric Dumazet

Pavel Begunkov wrote:
> On 4/15/24 16:06, Willem de Bruijn wrote:
> > Pavel Begunkov wrote:
> >> On 4/14/24 18:07, Willem de Bruijn wrote:
> >>> Pavel Begunkov wrote:
> >>>> We'll need to associate additional callbacks with ubuf_info, introduce
> >>>> a structure holding ubuf_info callbacks. Apart from a more smarter
> >>>> io_uring notification management introduced in next patches, it can be
> >>>> used to generalise msg_zerocopy_put_abort() and also store
> >>>> ->sg_from_iter, which is currently passed in struct msghdr.
> >>>
> >>> This adds an extra indirection for all other ubuf implementations.
> >>> Can that be avoided?
> >>
> >> It could be fitted directly into ubuf_info, but that doesn't feel
> >> right. It should be hot, so does it even matter?
> > 
> > That depends on the workload (working set size)?
> >>> On the bright side,
> >> with the patch I'll also ->sg_from_iter from msghdr into it, so it
> >> doesn't have to be in the generic path.
> > 
> > I don't follow this: is this suggested future work?
> 
> Right, a small change I will add later. Without ops though
> having 3 callback fields in uargs would be out of hands.
> 
> >> I think it's the right approach, but if you have a strong opinion
> >> I can fit it as a new field in ubuf_info.
> > 
> > If there is a significant cost, I suppose we could use
> > INDIRECT_CALL or go one step further and demultiplex
> > based on the new ops
> > 
> >      if (uarg->ops == &msg_zerocopy_ubuf_ops)
> >          msg_zerocopy_callback(..);
> 
> Let me note that the patch doesn't change the number of indirect
> calls but only adds one extra deref to get the callback, i.e.
> uarg->ops->callback() instead of uarg->callback().

Of course. Didn't mean to imply otherwise.

> Your snippet
> goes an extra mile and removes the indirect call.
>
> Can I take it as that you're fine with the direction of the
> patch? Or do you want me to change anything?

It's fine. I want to avoid new paths slowing down existing code where
possible. But if this extra deref would prove significant we have a
workaround.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 6/6] io_uring/notif: implement notification stacking
  2024-04-15 18:51         ` Pavel Begunkov
@ 2024-04-15 19:02           ` Willem de Bruijn
  0 siblings, 0 replies; 25+ messages in thread
From: Willem de Bruijn @ 2024-04-15 19:02 UTC (permalink / raw)
  To: Pavel Begunkov, Willem de Bruijn, io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, David Ahern, Eric Dumazet

> > 
> > Slight aside: we know that MSG_ZEROCOPY is quite inefficient for
> > small sends. Very rough rule of thumb is you need around 16KB or
> > larger sends for it to outperform regular copy. Part of that is the
> > memory pinning. The other part is the notification handling.
> > MSG_ERRQUEUE is expensive. I hope that io_uring cannot just match, but
> > improve on MSG_ZEROCOPY, especially for smaller packets.
> 
> I has some numbers left from this patchset benchmarking. Not too
> well suited to answer your question, but still gives an idea.
> Just a benchmark, single buffer, 100g broadcom NIC IIRC. All is
> io_uring based, -z<bool> switches copy vs zerocopy. Zero copy
> uses registered buffers, so no page pinning and page table
> traversal at runtime. 10s per run is not ideal, but was matching
> longer runs.
> 
> # 1200 bytes
> ./send-zerocopy -4 tcp -D <ip> -t 10 -n 1 -l0 -b1 -d -s1200 -z0
> packets=15004160 (MB=17170), rps=1470996 (MB/s=1683)
> ./send-zerocopy -4 tcp -D <ip> -t 10 -n 1 -l0 -b1 -d -s1200 -z1
> packets=10440224 (MB=11947), rps=1023551 (MB/s=1171)
> 
> # 4000 bytes
> ./send-zerocopy -4 tcp -D <ip> -t 10 -n 1 -l0 -b1 -d -s4000 -z0
> packets=11742688 (MB=44794), rps=1151243 (MB/s=4391)
> ./send-zerocopy -4 tcp -D <ip> -t 10 -n 1 -l0 -b1 -d -s4000 -z1
> packets=14144048 (MB=53955), rps=1386671 (MB/s=5289)
> 
> # 8000 bytes
> ./send-zerocopy -4 tcp -D <ip> -t 10 -n 1 -l0 -b1 -d -s8000 -z0
> packets=6868976 (MB=52406), rps=673429 (MB/s=5137)
> ./send-zerocopy -4 tcp -D <ip> -t 10 -n 1 -l0 -b1 -d -s8000 -z1
> packets=10800784 (MB=82403), rps=1058900 (MB/s=8078)

Parity around 4K. That is very encouraging :)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 1/6] net: extend ubuf_info callback to ops structure
  2024-04-15  0:07     ` Pavel Begunkov
  2024-04-15 15:06       ` Willem de Bruijn
@ 2024-04-16 14:50       ` David Ahern
  2024-04-16 15:31         ` Pavel Begunkov
  1 sibling, 1 reply; 25+ messages in thread
From: David Ahern @ 2024-04-16 14:50 UTC (permalink / raw)
  To: Pavel Begunkov, Willem de Bruijn, io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, Eric Dumazet

On 4/14/24 6:07 PM, Pavel Begunkov wrote:
> On the bright side,
> with the patch I'll also ->sg_from_iter from msghdr into it, so it
> doesn't have to be in the generic path.

So, what's old is new again? That's where it started:

https://lore.kernel.org/netdev/20220628225204.GA27554@u2004-local/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 1/6] net: extend ubuf_info callback to ops structure
  2024-04-16 14:50       ` David Ahern
@ 2024-04-16 15:31         ` Pavel Begunkov
  0 siblings, 0 replies; 25+ messages in thread
From: Pavel Begunkov @ 2024-04-16 15:31 UTC (permalink / raw)
  To: David Ahern, Willem de Bruijn, io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, Eric Dumazet

On 4/16/24 15:50, David Ahern wrote:
> On 4/14/24 6:07 PM, Pavel Begunkov wrote:
>> On the bright side,
>> with the patch I'll also ->sg_from_iter from msghdr into it, so it
>> doesn't have to be in the generic path.
> 
> So, what's old is new again? That's where it started:
> 
> https://lore.kernel.org/netdev/20220628225204.GA27554@u2004-local/

Hah, indeed, your patch had it in uarg. I wonder why I didn't put
them all in a table back then, if the argument was to keep struct
ubuf_info leaner.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-04-16 15:31 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-12 12:55 [RFC 0/6] implement io_uring notification (ubuf_info) stacking Pavel Begunkov
2024-04-12 12:55 ` [RFC 1/6] net: extend ubuf_info callback to ops structure Pavel Begunkov
2024-04-13 17:17   ` David Ahern
2024-04-14 17:07   ` Willem de Bruijn
2024-04-15  0:07     ` Pavel Begunkov
2024-04-15 15:06       ` Willem de Bruijn
2024-04-15 18:55         ` Pavel Begunkov
2024-04-15 19:01           ` Willem de Bruijn
2024-04-16 14:50       ` David Ahern
2024-04-16 15:31         ` Pavel Begunkov
2024-04-12 12:55 ` [RFC 2/6] net: add callback for setting a ubuf_info to skb Pavel Begunkov
2024-04-13 17:18   ` David Ahern
2024-04-12 12:55 ` [RFC 3/6] io_uring/notif: refactor io_tx_ubuf_complete() Pavel Begunkov
2024-04-12 12:55 ` [RFC 4/6] io_uring/notif: remove ctx var from io_notif_tw_complete Pavel Begunkov
2024-04-12 12:55 ` [RFC 5/6] io_uring/notif: simplify io_notif_flush() Pavel Begunkov
2024-04-12 12:55 ` [RFC 6/6] io_uring/notif: implement notification stacking Pavel Begunkov
2024-04-14 17:10   ` Willem de Bruijn
2024-04-14 23:55     ` Pavel Begunkov
2024-04-15 15:15       ` Willem de Bruijn
2024-04-15 18:51         ` Pavel Begunkov
2024-04-15 19:02           ` Willem de Bruijn
2024-04-12 13:44 ` [RFC 0/6] implement io_uring notification (ubuf_info) stacking Jens Axboe
2024-04-12 14:52 ` Jens Axboe
2024-04-13 17:17 ` David Ahern
2024-04-15  0:08   ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).