From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.7 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,FROM_EXCESS_BASE64, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89CFFC10F0E for ; Thu, 18 Apr 2019 10:10:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4330C217D7 for ; Thu, 18 Apr 2019 10:10:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ZEqQQ8fu" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388359AbfDRKKn (ORCPT ); Thu, 18 Apr 2019 06:10:43 -0400 Received: from mail-qt1-f195.google.com ([209.85.160.195]:39755 "EHLO mail-qt1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387519AbfDRKKm (ORCPT ); Thu, 18 Apr 2019 06:10:42 -0400 Received: by mail-qt1-f195.google.com with SMTP id f13so1163254qto.6 for ; Thu, 18 Apr 2019 03:10:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=KPkdfhb1SCnTVcJNlxl80Y/M4f8rsic5i8HIsPn+Nkg=; b=ZEqQQ8fuoe3F6B8dWwVx5v7iDGLpBeMZH6v/2tCM60HfGmlT9SI8NLr87ZDjtUA3Es L36Bt8oTkHMNe/JQxZbNMG9D9VxbmsRtXPSVNhpHexT67BzJvntO5C9MAxdPU9wWYS+g TNDUhAQq/3ngtNwJtokzE9BZRsujohkElxRm017B9hC4ijwKevTaKws4zDsbX9/YZowa sCnc0eH1rh6axL9CLi+p9OZfyDPETs4y9vGY33i1qwIrz5qfg0faI/BAp1I7R0FUNcpe UZvzqjVZDQoT5FhF0l4E/a6pAJK86rIlG3eXJxP/jzWiRYl7kUAEXoru0PCtWL/A2j+n 3tSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=KPkdfhb1SCnTVcJNlxl80Y/M4f8rsic5i8HIsPn+Nkg=; b=hRnpstTjmZNBk1HysGWLkPu7rnzlJlCL6KIqvLVTHlMPZni0H2QBYghXn2Te0DzajE fDxMS5tL+ZQi/YrAy/PNcNtfRSrqwf2Dn0hypCVmaxCRkaXHbxQVIx5cf7rJhbtlHf/k Z0Ov5AqeFC4fUKJtQZSJTxcVl6rw6/10bQ7PTYEjbTw1m/paWRp7a5IjR6dIrNw9l126 rm9y7yjVipjSMpzyOhpHIj4COjFdYaq4kXqXFYye3zbnsWTad8bPjIccTRCSMaWaL044 cZdHJFQonvYc9/WC6cN90kLySWd7E+CllegO0N071EjW+LgO4LdwgTENXyVVokM+KCtv kLdA== X-Gm-Message-State: APjAAAX50AQAHsdNNqzYyhrYKIG/L92ecMBt3MQ+itsuAInsgcq/LaYz zabTTh2i+HtFf/so+qXogVn6r6jM2F5+mXu0lAAP1cybGSJPBA== X-Google-Smtp-Source: APXvYqwIT/PoGbVdEOVV1PDYysfXocKns3hJA/UicyrAzzCIVdeqX9CczObgG8sFLf6oFohOONH1LH6MKx2O9cLL0qA= X-Received: by 2002:a0c:9246:: with SMTP id 6mr74872428qvz.194.1555582241268; Thu, 18 Apr 2019 03:10:41 -0700 (PDT) MIME-Version: 1.0 References: <20190417195853.2819390-1-jonathan.lemon@gmail.com> In-Reply-To: <20190417195853.2819390-1-jonathan.lemon@gmail.com> From: =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= Date: Thu, 18 Apr 2019 12:10:29 +0200 Message-ID: Subject: Re: [PATCH RFC] xdp: Support zero-copy XDP_TX from AF_XDP sockets. To: Jonathan Lemon Cc: Netdev , "Karlsson, Magnus" , kernel-team@fb.com, Jesper Dangaard Brouer , maciej.fijalkowski@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Wed, 17 Apr 2019 at 21:58, Jonathan Lemon wro= te: > > When the XDP program attached to a zero-copy AF_XDP socket returns XDP_TX= , > queue the umem frame on the XDP TX ring, and arrange for it to be release= d > via the ZCA free routine, which should place it back onto the reuseq. > There are a bunch of compiler errors, but I'll leave them out from the comm= ents! > Signed-off-by: Jonathan Lemon > --- > drivers/net/ethernet/intel/i40e/i40e_txrx.h | 1 + > drivers/net/ethernet/intel/i40e/i40e_xsk.c | 52 +++++++++++++++++++-- > include/net/xdp.h | 20 ++++++-- > 3 files changed, 65 insertions(+), 8 deletions(-) > > diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/et= hernet/intel/i40e/i40e_txrx.h > index 100e92d2982f..3e7954277737 100644 > --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h > +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h > @@ -274,6 +274,7 @@ static inline unsigned int i40e_txd_use_count(unsigne= d int size) > #define I40E_TX_FLAGS_TSYN BIT(8) > #define I40E_TX_FLAGS_FD_SB BIT(9) > #define I40E_TX_FLAGS_UDP_TUNNEL BIT(10) > +#define I40E_TX_FLAGS_ZC_FRAME BIT(11) > #define I40E_TX_FLAGS_VLAN_MASK 0xffff0000 > #define I40E_TX_FLAGS_VLAN_PRIO_MASK 0xe0000000 > #define I40E_TX_FLAGS_VLAN_PRIO_SHIFT 29 > diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/eth= ernet/intel/i40e/i40e_xsk.c > index d2e212d007c3..16a31c57906a 100644 > --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c > +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c > @@ -188,7 +188,6 @@ int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct = xdp_umem *umem, > static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *x= dp) > { > int err, result =3D I40E_XDP_PASS; > - struct i40e_ring *xdp_ring; > struct bpf_prog *xdp_prog; > u32 act; > > @@ -202,9 +201,8 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring,= struct xdp_buff *xdp) > switch (act) { > case XDP_PASS: > break; > - case XDP_TX: > - xdp_ring =3D rx_ring->vsi->xdp_rings[rx_ring->queue_index= ]; > - result =3D i40e_xmit_xdp_tx_ring(xdp, xdp_ring); > + case XDP_TX > + result =3D i40e_xmit_rcvd_zc(rx_ring, xdp); > break; > case XDP_REDIRECT: > err =3D xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); > @@ -623,6 +621,48 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, = int budget) > return failure ? budget : (int)total_rx_packets; > } > > +static int i40e_xmit_rcvd_zc(struct i40e_ring *rx_ring, struct xdp_buff = *xdp) > +{ > + struct i40e_ring *xdp_ring; > + struct i40e_tx_desc *tx_desc; > + struct i40e_tx_buffer *tx_bi; > + struct xdp_frame *xdpf; > + dma_addr_t dma; > + > + xdp_ring =3D rx_ring->vsi->xdp_rings[rx_ring->queue_index]; > + > + if (!unlikely(I40E_DESC_UNUSED(xdp_ring))) { > + xdp_ring->tx_stats.tx_busy++; > + return I40E_XDP_CONSUMED; > + } > + xpdf =3D convert_to_xdp_frame_keep_zc(xdp); > + if (unlikely(!xpdf) > + return I40E_XDP_CONSUMED; > + xpdf->handle =3D xdp->handle; > + > + dma =3D xdp_umem_get_dma(rx_ring->xsk_umem, xdp->handle); > + tx_bi =3D &xdp_ring->tx_bi[xdp_ring->next_to_use]; > + tx_bi->bytecount =3D xpdf->len; > + tx_bi->gso_segs =3D 1; > + tx_bi->xdpf =3D xdpf; > + tx_bi->tx_flags =3D I40E_TX_FLAGS_ZC_FRAME; > + > + tx_desc =3D I40E_TX_DESC(xdp_ring, xdp_ring->next_to_use); > + tx_desc->buffer_addr =3D cpu_to_le64(dma); > + tx_desc->cmd_type_offset_bsz =3D build_ctob(I40E_TX_DESC_CMD_ICRC= | > + I40E_TX_DESC_CMD_EOP, > + 0, xpdf->len, 0); > + smp_wmb(); > + > + xdp_ring->next_to_use++; > + if (xdp_ring->next_to_use =3D=3D xdp_ring->count) > + xdp_ring->next_to_use =3D 0; > + > + tx_bi->next_to_watch =3D tx_desc; > + > + return I40E_XDP_TX; > +} What you're basically doing here is a AF_XDP Tx, but triggered from the Rx path, and instead of completion (after the packet is sent) to the completion ring, it's recycled to the Rx HW ring (via zca_free). I like the idea but we need more plumbing first. Let me expand; Unfortunately, the current recycle mechanism requires that at the point of recycling, there has to be space in Rx ring. In the XDP_TX case, there's no completion ring, and we cannot guarantee that there's space on Rx ring (since Rx and Tx are asynchronous). IOW, Rx recycling can currently *only* be done in an Rx context. What I would like to do, is moving i40e-xsk to Jesper's page-pool, instead of the existing recycle mechanism. Then we could return the descriptor to the pool, if the Rx ring doesn't have space for the completed/sent buffer. TL;DR version: Passing zc-frames in XDP_TX cannot be done properly until the Rx recycle mechanism is more robust. :-( (I think Maciej is looking into using the page_pool on the ice driver.) But again, I like the idea! Thanks, Bj=C3=B6rn > + > /** > * i40e_xmit_zc - Performs zero-copy Tx AF_XDP > * @xdp_ring: XDP Tx ring > @@ -689,6 +729,10 @@ static void i40e_clean_xdp_tx_buffer(struct i40e_rin= g *tx_ring, > struct i40e_tx_buffer *tx_bi) > { > xdp_return_frame(tx_bi->xdpf); > + if (tx_bi->tx_flags & I40E_TX_FLAGS_ZC_FRAME) { > + tx_bi->tx_flags =3D 0; > + return; > + } > dma_unmap_single(tx_ring->dev, > dma_unmap_addr(tx_bi, dma), > dma_unmap_len(tx_bi, len), DMA_TO_DEVICE); > diff --git a/include/net/xdp.h b/include/net/xdp.h > index 0f25b3675c5c..191359c5ebdd 100644 > --- a/include/net/xdp.h > +++ b/include/net/xdp.h > @@ -82,6 +82,7 @@ struct xdp_frame { > */ > struct xdp_mem_info mem; > struct net_device *dev_rx; /* used by cpumap */ > + unsigned long handle; > }; > > /* Clear kernel pointers in xdp_frame */ > @@ -95,15 +96,12 @@ struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct = xdp_buff *xdp); > > /* Convert xdp_buff to xdp_frame */ > static inline > -struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp) > +struct xdp_frame *__convert_to_xdp_frame(struct xdp_buff *xdp) > { > struct xdp_frame *xdp_frame; > int metasize; > int headroom; > > - if (xdp->rxq->mem.type =3D=3D MEM_TYPE_ZERO_COPY) > - return xdp_convert_zc_to_xdp_frame(xdp); > - > /* Assure headroom is available for storing info */ > headroom =3D xdp->data - xdp->data_hard_start; > metasize =3D xdp->data - xdp->data_meta; > @@ -125,6 +123,20 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_bu= ff *xdp) > return xdp_frame; > } > > +static inline > +struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp) > +{ > + if (xdp->rxq->mem.type =3D=3D MEM_TYPE_ZERO_COPY) > + return xdp_convert_zc_to_xdp_frame(xdp); > + return __convert_to_xdp_frame(xdp); > +} > + > +static inline > +struct xdp_frame *convert_to_xdp_frame_keep_zc(struct xdp_buff *xdp) > +{ > + return __convert_to_xdp_frame(xdp); > +} > + > void xdp_return_frame(struct xdp_frame *xdpf); > void xdp_return_frame_rx_napi(struct xdp_frame *xdpf); > void xdp_return_buff(struct xdp_buff *xdp); > -- > 2.17.1 >