bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next] samples/bpf: optimize l2fwd performance in xdpsock
@ 2020-08-28 12:51 Magnus Karlsson
  2020-08-31 21:02 ` Daniel Borkmann
  0 siblings, 1 reply; 2+ messages in thread
From: Magnus Karlsson @ 2020-08-28 12:51 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jonathan.lemon; +Cc: bpf

Optimize the throughput performance of the l2fwd sub-app in the
xdpsock sample application by removing a duplicate syscall and
increasing the size of the fill ring.

The latter needs some further explanation. We recommend that you set
the fill ring size >= HW RX ring size + AF_XDP RX ring size. Make sure
you fill up the fill ring with buffers at regular intervals, and you
will with this setting avoid allocation failures in the driver. These
are usually quite expensive since drivers have not been written to
assume that allocation failures are common. For regular sockets,
kernel allocated memory is used that only runs out in OOM situations
that should be rare.

These two performance optimizations together lead to a 6% percent
improvement for the l2fwd app on my machine.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 samples/bpf/xdpsock_user.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 19c6794..9f54df7 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -613,7 +613,16 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 {
 	struct xsk_umem_info *umem;
 	struct xsk_umem_config cfg = {
-		.fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS,
+		/* We recommend that you set the fill ring size >= HW RX ring size +
+		 * AF_XDP RX ring size. Make sure you fill up the fill ring
+		 * with buffers at regular intervals, and you will with this setting
+		 * avoid allocation failures in the driver. These are usually quite
+		 * expensive since drivers have not been written to assume that
+		 * allocation failures are common. For regular sockets, kernel
+		 * allocated memory is used that only runs out in OOM situations
+		 * that should be rare.
+		 */
+		.fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS * 2,
 		.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS,
 		.frame_size = opt_xsk_frame_size,
 		.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM,
@@ -640,13 +649,13 @@ static void xsk_populate_fill_ring(struct xsk_umem_info *umem)
 	u32 idx;
 
 	ret = xsk_ring_prod__reserve(&umem->fq,
-				     XSK_RING_PROD__DEFAULT_NUM_DESCS, &idx);
-	if (ret != XSK_RING_PROD__DEFAULT_NUM_DESCS)
+				     XSK_RING_PROD__DEFAULT_NUM_DESCS * 2, &idx);
+	if (ret != XSK_RING_PROD__DEFAULT_NUM_DESCS * 2)
 		exit_with_error(-ret);
-	for (i = 0; i < XSK_RING_PROD__DEFAULT_NUM_DESCS; i++)
+	for (i = 0; i < XSK_RING_PROD__DEFAULT_NUM_DESCS * 2; i++)
 		*xsk_ring_prod__fill_addr(&umem->fq, idx++) =
 			i * opt_xsk_frame_size;
-	xsk_ring_prod__submit(&umem->fq, XSK_RING_PROD__DEFAULT_NUM_DESCS);
+	xsk_ring_prod__submit(&umem->fq, XSK_RING_PROD__DEFAULT_NUM_DESCS * 2);
 }
 
 static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem,
@@ -888,9 +897,6 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk,
 	if (!xsk->outstanding_tx)
 		return;
 
-	if (!opt_need_wakeup || xsk_ring_prod__needs_wakeup(&xsk->tx))
-		kick_tx(xsk);
-
 	ndescs = (xsk->outstanding_tx > opt_batch_size) ? opt_batch_size :
 		xsk->outstanding_tx;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH bpf-next] samples/bpf: optimize l2fwd performance in xdpsock
  2020-08-28 12:51 [PATCH bpf-next] samples/bpf: optimize l2fwd performance in xdpsock Magnus Karlsson
@ 2020-08-31 21:02 ` Daniel Borkmann
  0 siblings, 0 replies; 2+ messages in thread
From: Daniel Borkmann @ 2020-08-31 21:02 UTC (permalink / raw)
  To: Magnus Karlsson, bjorn.topel, ast, netdev, jonathan.lemon; +Cc: bpf

On 8/28/20 2:51 PM, Magnus Karlsson wrote:
> Optimize the throughput performance of the l2fwd sub-app in the
> xdpsock sample application by removing a duplicate syscall and
> increasing the size of the fill ring.
> 
> The latter needs some further explanation. We recommend that you set
> the fill ring size >= HW RX ring size + AF_XDP RX ring size. Make sure
> you fill up the fill ring with buffers at regular intervals, and you
> will with this setting avoid allocation failures in the driver. These
> are usually quite expensive since drivers have not been written to
> assume that allocation failures are common. For regular sockets,
> kernel allocated memory is used that only runs out in OOM situations
> that should be rare.
> 
> These two performance optimizations together lead to a 6% percent
> improvement for the l2fwd app on my machine.
> 
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>

Applied, thanks!

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-08-31 21:02 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-28 12:51 [PATCH bpf-next] samples/bpf: optimize l2fwd performance in xdpsock Magnus Karlsson
2020-08-31 21:02 ` Daniel Borkmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).