From mboxrd@z Thu Jan  1 00:00:00 1970
From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Subject: Re: [net-next, PATCH 1/2, v3] net: socionext: different approach on
 DMA
Date: Mon, 1 Oct 2018 14:20:21 +0300
Message-ID: <20181001112021.GA27469@apalos>
References: <1538220482-16129-1-git-send-email-ilias.apalodimas@linaro.org>
 <1538220482-16129-2-git-send-email-ilias.apalodimas@linaro.org>
 <20181001112631.4a1fbb62@redhat.com>
 <20181001094450.GA24329@apalos>
 <20181001095657.GA24568@apalos>
 <20181001130313.318065fd@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: netdev@vger.kernel.org, jaswinder.singh@linaro.org,
        ard.biesheuvel@linaro.org, masami.hiramatsu@linaro.org,
        arnd@arndb.de, bjorn.topel@intel.com, magnus.karlsson@intel.com,
        daniel@iogearbox.net, ast@kernel.org,
        jesus.sanchez-palencia@intel.com, vinicius.gomes@intel.com,
        makita.toshiaki@lab.ntt.co.jp, Tariq Toukan <tariqt@mellanox.com>,
        Tariq Toukan <ttoukan.linux@gmail.com>
To: Jesper Dangaard Brouer <brouer@redhat.com>
Return-path: <netdev-owner@vger.kernel.org>
Received: from mail-wm1-f67.google.com ([209.85.128.67]:36880 "EHLO
        mail-wm1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1728923AbeJAR5q (ORCPT
        <rfc822;netdev@vger.kernel.org>); Mon, 1 Oct 2018 13:57:46 -0400
Received: by mail-wm1-f67.google.com with SMTP id 185-v6so2849540wmt.2
        for <netdev@vger.kernel.org>; Mon, 01 Oct 2018 04:20:26 -0700 (PDT)
Content-Disposition: inline
In-Reply-To: <20181001130313.318065fd@redhat.com>
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

On Mon, Oct 01, 2018 at 01:03:13PM +0200, Jesper Dangaard Brouer wrote:
> On Mon, 1 Oct 2018 12:56:58 +0300
> Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote:
> 
> > > > #2: You have allocations on the XDP fast-path.
> > > > 
> > > > The REAL secret behind the XDP performance is to avoid allocations on
> > > > the fast-path.  While I just told you to use the page-allocator and
> > > > order-0 pages, this will actually kill performance.  Thus, to make this
> > > > fast, you need a driver local recycle scheme that avoids going through
> > > > the page allocator, which makes XDP_DROP and XDP_TX extremely fast.
> > > > For the XDP_REDIRECT action (which you seems to be interested in, as
> > > > this is needed for AF_XDP), there is a xdp_return_frame() API that can
> > > > make this fast.  
> > >
> > > I had an initial implementation that did exactly that (that's why you the
> > > dma_sync_single_for_cpu() -> dma_unmap_single_attrs() is there). In the case 
> > > of AF_XDP isn't that introducing a 'bottleneck' though? I mean you'll feed fresh
> > > buffers back to the hardware only when your packets have been processed from
> > > your userspace application 
> >
> > Just a clarification here. This is the case if ZC is implemented. In my case
> > the buffers will be 'ok' to be passed back to the hardware once the use
> > userspace payload has been copied by xdp_do_redirect()
> 
> Thanks for clarifying.  But no, this is not introducing a 'bottleneck'
> for AF_XDP.
> 
> For (1) the copy-mode-AF_XDP the frame (as you noticed) is "freed" or
> "returned" very quickly after it is copied.  The code is a bit hard to
> follow, but in __xsk_rcv() it calls xdp_return_buff() after the memcpy.
> Thus, the frame can be kept DMA mapped and reused in RX-ring quickly.
Ok makes sense. I'll send a v4 with page re-usage, while using your API for page
allocation

> 
> For (2) the zero-copy-AF_XDP, then you need to implement a new
> allocator of type MEM_TYPE_ZERO_COPY.  The performance trick here is
> that all DMA-map/unmap and allocations go away, given everything is
> preallocated by userspace.  Through the 4 rings (SPSC) are used for
> recycling the ZC-umem frames (read Documentation/networking/af_xdp.rst).
Noted in case we implement ZC support

Thanks
/Ilias