linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mina Almasry <almasrymina@google.com>
To: Yunsheng Lin <linyunsheng@huawei.com>
Cc: "Paolo Abeni" <pabeni@redhat.com>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"David Ahern" <dsahern@kernel.org>,
	"Willem de Bruijn" <willemdebruijn.kernel@gmail.com>,
	"Shuah Khan" <shuah@kernel.org>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Christian König" <christian.koenig@amd.com>,
	"Shakeel Butt" <shakeelb@google.com>,
	"Jeroen de Borst" <jeroendb@google.com>,
	"Praveen Kaligineedi" <pkaligineedi@google.com>,
	"Willem de Bruijn" <willemb@google.com>,
	"Kaiyuan Zhang" <kaiyuanz@google.com>
Subject: Re: [RFC PATCH v3 04/12] netdev: support binding dma-buf to netdevice
Date: Fri, 10 Nov 2023 01:45:29 -0800	[thread overview]
Message-ID: <CAHS8izP64Q3cP5vJ5ESQH9Y3fkSn8RCS+T84orKhK5=pzdJCLA@mail.gmail.com> (raw)
In-Reply-To: <aec0f586-c3b9-8da8-6a39-f313105267f8@huawei.com>

On Thu, Nov 9, 2023 at 11:38 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2023/11/10 10:59, Mina Almasry wrote:
> > On Thu, Nov 9, 2023 at 12:30 AM Paolo Abeni <pabeni@redhat.com> wrote:
> >>
> >> I'm trying to wrap my head around the whole infra... the above line is
> >> confusing. Why do you increment dma_addr? it will be re-initialized in
> >> the next iteration.
> >>
> >
> > That is just a mistake, sorry. Will remove this increment.
>
> You seems to be combining comments in different thread and replying in
> one thread, I am not sure that is a good practice and I almost missed the
> reply below as I don't seem to be cc'ed.
>

Sorry about that.

> >
> > On Thu, Nov 9, 2023 at 1:29 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:> >>>
> >>>>> gen_pool_destroy BUG_ON() if it's not empty at the time of destroying.
> >>>>> Technically that should never happen, because
> >>>>> __netdev_devmem_binding_free() should only be called when the refcount
> >>>>> hits 0, so all the chunks have been freed back to the gen_pool. But,
> >>>>> just in case, I don't want to crash the server just because I'm
> >>>>> leaking a chunk... this is a bit of defensive programming that is
> >>>>> typically frowned upon, but the behavior of gen_pool is so severe I
> >>>>> think the WARN() + check is warranted here.
> >>>>
> >>>> It seems it is pretty normal for the above to happen nowadays because of
> >>>> retransmits timeouts, NAPI defer schemes mentioned below:
> >>>>
> >>>> https://lkml.kernel.org/netdev/168269854650.2191653.8465259808498269815.stgit@firesoul/
> >>>>
> >>>> And currently page pool core handles that by using a workqueue.
> >>>
> >>> Forgive me but I'm not understanding the concern here.
> >>>
> >>> __netdev_devmem_binding_free() is called when binding->ref hits 0.
> >>>
> >>> binding->ref is incremented when an iov slice of the dma-buf is
> >>> allocated, and decremented when an iov is freed. So,
> >>> __netdev_devmem_binding_free() can't really be called unless all the
> >>> iovs have been freed, and gen_pool_size() == gen_pool_avail(),
> >>> regardless of what's happening on the page_pool side of things, right?
> >>
> >> I seems to misunderstand it. In that case, it seems to be about
> >> defensive programming like other checking.
> >>
> >> By looking at it more closely, it seems napi_frag_unref() call
> >> page_pool_page_put_many() directly, which means devmem seems to
> >> be bypassing the napi_safe optimization.
> >>
> >> Can napi_frag_unref() reuse napi_pp_put_page() in order to reuse
> >> the napi_safe optimization?
> >>
> >
> > I think it already does. page_pool_page_put_many() is only called if
> > !recycle or !napi_pp_put_page(). In that case
> > page_pool_page_put_many() is just a replacement for put_page(),
> > because this 'page' may be an iov.
>
> Is there a reason why not calling napi_pp_put_page() for devmem too
> instead of calling page_pool_page_put_many()? mem provider has a
> 'release_page' ops, calling page_pool_page_put_many() directly here
> seems to be bypassing the 'release_page' ops, which means devmem is
> bypassing most of the main features of page pool.
>

I think we're still calling napi_pp_put_page() as normal:

 /**
@@ -3441,13 +3466,13 @@ bool napi_pp_put_page(struct page *page, bool
napi_safe);
 static inline void
 napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe)
 {
-       struct page *page = skb_frag_page(frag);
-
 #ifdef CONFIG_PAGE_POOL
-       if (recycle && napi_pp_put_page(page, napi_safe))
+       if (recycle && napi_pp_put_page(frag->bv_page, napi_safe))
                return;
+       page_pool_page_put_many(frag->bv_page, 1);
+#else
+       put_page(skb_frag_page(frag));
 #endif
-       put_page(page);
 }

The code change here is to replace put_page() with
page_pool_page_put_many(), only, because bv_page may be a
page_pool_iov, so we need to use page_pool_page_put_many() which
handles page_pool_iov correcly. I did not change whether or not
napi_pp_put_page() is called. It's still called if recycle==true.

> As far as I can tell, the main features of page pool:
> 1. Allow lockless allocation and freeing in pool->alloc cache by
>    utilizing NAPI non-concurrent context.
> 2. Allow concurrent allocation and freeing in pool->ring cache by
>    utilizing ptr_ring.
> 3. Allow dma map/unmap and cache sync optimization.
> 4. Allow detailed stats logging and tracing.
> 5. Allow some bulk allocation and freeing.
> 6. support both skb packet and xdp frame.
>
> I am wondering what is the main features that devmem is utilizing
> by intergrating into page pool?
>
> It seems the driver can just call netdev_alloc_devmem() and
> napi_frag_unref() can call netdev_free_devmem() directly without
> intergrating into page pool and it should just works too?
>
> Maybe we should consider creating a new thin layer, in order to
> demux to page pool, devmem or other mem type if my suggestion does
> not work out too?
>

I went through this discussion with Jesper on RFC v2 in this thread:

https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4pSOD1Jw@mail.gmail.com/T/#ma9285d53735d82cc315717db67a1796477c89d86

which culminates with that email where he seems on board with the
change from a performance POV, and seems on board with hiding the
memory type implementation from the drivers. That thread fully goes
over the tradeoffs of integrating over the page pool or creating new
ones. Integrating with the page pool abstracts most of the devmem
implementation (and other memory types) from the driver. It reuses
page pool features like page recycling for example.

-- 
Thanks,
Mina

  reply	other threads:[~2023-11-10 18:03 UTC|newest]

Thread overview: 126+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-06  2:43 [RFC PATCH v3 00/12] Device Memory TCP Mina Almasry
2023-11-06  2:44 ` [RFC PATCH v3 01/12] net: page_pool: factor out releasing DMA from releasing the page Mina Almasry
2023-11-06  2:44 ` [RFC PATCH v3 02/12] net: page_pool: create hooks for custom page providers Mina Almasry
2023-11-07  7:44   ` Yunsheng Lin
2023-11-09 11:09   ` Paolo Abeni
2023-11-10 23:19   ` Jakub Kicinski
2023-11-13  3:28     ` Mina Almasry
2023-11-13 22:10       ` Jakub Kicinski
2023-11-06  2:44 ` [RFC PATCH v3 03/12] net: netdev netlink api to bind dma-buf to a net device Mina Almasry
2023-11-10 23:16   ` Jakub Kicinski
2023-11-06  2:44 ` [RFC PATCH v3 04/12] netdev: support binding dma-buf to netdevice Mina Almasry
2023-11-07  7:46   ` Yunsheng Lin
2023-11-07 21:59     ` Mina Almasry
2023-11-08  3:40       ` Yunsheng Lin
2023-11-09  2:22         ` Mina Almasry
2023-11-09  9:29           ` Yunsheng Lin
2023-11-08 23:47   ` David Wei
2023-11-09  2:25     ` Mina Almasry
2023-11-09  8:29   ` Paolo Abeni
2023-11-10  2:59     ` Mina Almasry
2023-11-10  7:38       ` Yunsheng Lin
2023-11-10  9:45         ` Mina Almasry [this message]
2023-11-10 23:19   ` Jakub Kicinski
2023-11-11  2:19     ` Mina Almasry
2023-11-06  2:44 ` [RFC PATCH v3 05/12] netdev: netdevice devmem allocator Mina Almasry
2023-11-06 23:44   ` David Ahern
2023-11-07 22:10     ` Mina Almasry
2023-11-07 22:55       ` David Ahern
2023-11-07 23:03         ` Mina Almasry
2023-11-09  1:15           ` David Wei
2023-11-10 14:26           ` Pavel Begunkov
2023-11-11 17:19             ` David Ahern
2023-11-14 16:09               ` Pavel Begunkov
2023-11-09  1:00         ` David Wei
2023-11-08  3:48       ` Yunsheng Lin
2023-11-09  1:41         ` Mina Almasry
2023-11-07  7:45   ` Yunsheng Lin
2023-11-09  8:44   ` Paolo Abeni
2023-11-06  2:44 ` [RFC PATCH v3 06/12] memory-provider: dmabuf devmem memory provider Mina Almasry
2023-11-06 21:02   ` Stanislav Fomichev
2023-11-06 23:49   ` David Ahern
2023-11-08  0:02     ` Mina Almasry
2023-11-08  0:10       ` David Ahern
2023-11-10 23:16   ` Jakub Kicinski
2023-11-13  4:54     ` Mina Almasry
2023-11-06  2:44 ` [RFC PATCH v3 07/12] page-pool: device memory support Mina Almasry
2023-11-07  8:00   ` Yunsheng Lin
2023-11-07 21:56     ` Mina Almasry
2023-11-08 10:56       ` Yunsheng Lin
2023-11-09  3:20         ` Mina Almasry
2023-11-09  9:30           ` Yunsheng Lin
2023-11-09 12:20             ` Mina Almasry
2023-11-09 13:23               ` Yunsheng Lin
2023-11-09  9:01   ` Paolo Abeni
2023-11-06  2:44 ` [RFC PATCH v3 08/12] net: support non paged skb frags Mina Almasry
2023-11-07  9:00   ` Yunsheng Lin
2023-11-07 21:19     ` Mina Almasry
2023-11-08 11:25       ` Yunsheng Lin
2023-11-09  9:14   ` Paolo Abeni
2023-11-10  4:06     ` Mina Almasry
2023-11-10 23:19   ` Jakub Kicinski
2023-11-13  6:05     ` Mina Almasry
2023-11-13 22:17       ` Jakub Kicinski
2023-11-06  2:44 ` [RFC PATCH v3 09/12] net: add support for skbs with unreadable frags Mina Almasry
2023-11-06 18:47   ` Stanislav Fomichev
2023-11-06 19:34     ` David Ahern
2023-11-06 20:31       ` Mina Almasry
2023-11-06 21:59         ` Stanislav Fomichev
2023-11-06 22:18           ` Mina Almasry
2023-11-06 22:59             ` Stanislav Fomichev
2023-11-06 23:27               ` Mina Almasry
2023-11-06 23:55                 ` Stanislav Fomichev
2023-11-07  0:07                   ` Willem de Bruijn
2023-11-07  0:14                     ` Stanislav Fomichev
2023-11-07  0:59                       ` Stanislav Fomichev
2023-11-07  2:23                         ` Willem de Bruijn
2023-11-07 17:44                           ` Stanislav Fomichev
2023-11-07 17:57                             ` Willem de Bruijn
2023-11-07 18:14                               ` Stanislav Fomichev
2023-11-07  0:20                     ` Mina Almasry
2023-11-07  1:06                       ` Stanislav Fomichev
2023-11-07 19:53                         ` Mina Almasry
2023-11-07 21:05                           ` Stanislav Fomichev
2023-11-07 21:17                             ` Eric Dumazet
2023-11-07 22:23                               ` Stanislav Fomichev
2023-11-10 23:17                                 ` Jakub Kicinski
2023-11-10 23:19                           ` Jakub Kicinski
2023-11-07  1:09                       ` David Ahern
2023-11-06 23:37             ` David Ahern
2023-11-07  0:03               ` Mina Almasry
2023-11-06 20:56   ` Stanislav Fomichev
2023-11-07  0:16   ` David Ahern
2023-11-07  0:23     ` Mina Almasry
2023-11-08 14:43   ` David Laight
2023-11-06  2:44 ` [RFC PATCH v3 10/12] tcp: RX path for devmem TCP Mina Almasry
2023-11-06 18:44   ` Stanislav Fomichev
2023-11-06 19:29     ` Mina Almasry
2023-11-06 21:14       ` Willem de Bruijn
2023-11-06 22:34         ` Stanislav Fomichev
2023-11-06 22:55           ` Willem de Bruijn
2023-11-06 23:32             ` Stanislav Fomichev
2023-11-06 23:55               ` David Ahern
2023-11-07  0:02                 ` Willem de Bruijn
2023-11-07 23:55                   ` Mina Almasry
2023-11-08  0:01                     ` David Ahern
2023-11-09  2:39                       ` Mina Almasry
2023-11-09 16:07                         ` Edward Cree
2023-12-08 20:12                           ` Pavel Begunkov
2023-11-09 11:05             ` Paolo Abeni
2023-11-10 23:16               ` Jakub Kicinski
2023-12-08 20:28             ` Pavel Begunkov
2023-12-08 20:09           ` Pavel Begunkov
2023-11-06 21:17       ` Stanislav Fomichev
2023-11-08 15:36         ` Edward Cree
2023-11-09 10:52   ` Paolo Abeni
2023-11-10 23:19   ` Jakub Kicinski
2023-11-06  2:44 ` [RFC PATCH v3 11/12] net: add SO_DEVMEM_DONTNEED setsockopt to release RX pages Mina Almasry
2023-11-06  2:44 ` [RFC PATCH v3 12/12] selftests: add ncdevmem, netcat for devmem TCP Mina Almasry
2023-11-09 11:03   ` Paolo Abeni
2023-11-10 23:13   ` Jakub Kicinski
2023-11-11  2:27     ` Mina Almasry
2023-11-11  2:35       ` Jakub Kicinski
2023-11-13  4:08         ` Mina Almasry
2023-11-13 22:20           ` Jakub Kicinski
2023-11-10 23:17   ` Jakub Kicinski
2023-11-07 15:18 ` [RFC PATCH v3 00/12] Device Memory TCP David Ahern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHS8izP64Q3cP5vJ5ESQH9Y3fkSn8RCS+T84orKhK5=pzdJCLA@mail.gmail.com' \
    --to=almasrymina@google.com \
    --cc=arnd@arndb.de \
    --cc=christian.koenig@amd.com \
    --cc=davem@davemloft.net \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=jeroendb@google.com \
    --cc=kaiyuanz@google.com \
    --cc=kuba@kernel.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linyunsheng@huawei.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pkaligineedi@google.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=sumit.semwal@linaro.org \
    --cc=willemb@google.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).