bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: Mina Almasry <almasrymina@google.com>
Cc: "Shailend Chand" <shailend@google.com>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-kselftest@vger.kernel.org, bpf@vger.kernel.org,
	linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Jeroen de Borst" <jeroendb@google.com>,
	"Praveen Kaligineedi" <pkaligineedi@google.com>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"David Ahern" <dsahern@kernel.org>,
	"Willem de Bruijn" <willemdebruijn.kernel@gmail.com>,
	"Shuah Khan" <shuah@kernel.org>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Christian König" <christian.koenig@amd.com>,
	"Yunsheng Lin" <linyunsheng@huawei.com>,
	"Harshitha Ramamurthy" <hramamurthy@google.com>,
	"Shakeel Butt" <shakeelb@google.com>,
	"Willem de Bruijn" <willemb@google.com>,
	"Kaiyuan Zhang" <kaiyuanz@google.com>
Subject: Re: [net-next v1 08/16] memory-provider: dmabuf devmem memory provider
Date: Tue, 19 Dec 2023 23:55:01 +0000	[thread overview]
Message-ID: <040a2f67-cdc2-4e75-84ba-36ec13cbc00b@gmail.com> (raw)
In-Reply-To: <CAHS8izOY9xm=LBEN8sYwEa3aFB4GWDvJVacom3o4mHZPdHzTUg@mail.gmail.com>

On 12/14/23 20:03, Mina Almasry wrote:
> On Mon, Dec 11, 2023 at 12:37 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
> ...
>>>> If you remove the branch, let it fall into ->release and rely
>>>> on refcounting there, then the callback could also fix up
>>>> release_cnt or ask pp to do it, like in the patch I linked above
>>>>
>>>
>>> Sadly I don't think this is possible due to the reasons I mention in
>>> the commit message of that patch. Prematurely releasing ppiov and not
>>> having them be candidates for recycling shows me a 4-5x degradation in
>>> performance.
>>
>> I don't think I follow. The concept is to only recycle a buffer (i.e.
>> make it available for allocation) when its refs drop to zero, which is
>> IMHO the only way it can work, and IIUC what this patchset is doing.
>>
>> That's also I suggest to do, but through a slightly different path.
>> Let's say at some moment there are 2 refs (e.g. 1 for an skb and
>> 1 for userspace/xarray).
>>
>> Say it first puts the skb:
>>
>> napi_pp_put_page()
>>     -> page_pool_return_page()
>>       -> mp_ops->release_page()
>>          -> need_to_free = put_buf()
>>             // not last ref, need_to_free==false,
>>             // don't recycle, don't increase release_cnt
>>
>> Then you put the last ref:
>>
>> page_pool_iov_put_many()
>>     -> page_pool_return_page()
>>       -> mp_ops->release_page()
>>          -> need_to_free = put_buf()
>>             // last ref, need_to_free==true,
>>             // recycle and release_cnt++
>>
>> And that last put can even be recycled right into the
>> pp / ptr_ring, in which case it doesn't need to touch
>> release_cnt. Does it make sense? I don't see where
>> 4-5x degradation would come from
>>
>>
> 
> Sorry for the late reply, I have been working on this locally.
> 
> What you're saying makes sense, and I'm no longer sure why I was
> seeing a perf degradation without '[net-next v1 10/16] page_pool:
> don't release iov on elevanted refcount'. However, even though what
> you're saying is technically correct, AFAIU it's actually semantically
> wrong. When a page is released by the page_pool, we should call
> page_pool_clear_pp_info() and completely disconnect the page from the
> pool. If we call release_page() on a page and then the page pool sees
> it again in page_pool_return_page(), I think that is considered a bug.

You're adding a new feature the semantics of which is already
different from what is in there, you can extend it any way as long
as it makes sense and agreed on. IMHO, it does. But well, if
there is a better solution I'm all for it.

> In fact I think what you're proposing is as a result of a bug because
> we don't call a page_pool_clear_pp_info() equivalent on releasing
> ppiov.

I don't get it, what bug? page_pool_clear_pp_info() is not called
for ppiov because it doesn't make sense to call it for ppiov,
there is no reason to clear ppiov->pp, nor there is any pp_magic.


> However, I'm reasonably confident I figured out the right thing to do
> here. The page_pool uses page->pp_frag_count for its refcounting.
> pp_frag_count is a misnomer, it's being renamed to pp_ref_count in
> Liang's series[1]). In this series I used a get_page/put_page
> equivalent for refcounting. Once I transitioned to using
> pp_[frag|ref]_count for refcounting inside the page_pool, the issue
> went away, and I no longer need the patch 'page_pool: don't release
> iov on elevanted refcount'.

Lovely, I'll take a look later! (also assuming it's in v5)


> There is an additional upside, since pages and ppiovs are both being
> refcounted using pp_[frag|ref]_count, we get some unified handling for
> ppiov and we reduce the checks around ppiov. This should be fixed
> properly in the next series.
> 
> I still need to do some work (~1 week) before I upload the next
> version as there is a new requirement from MM that we transition to a
> new type and not re-use page*, but I uploaded my changes github with
> the refcounting issues resolved in case they're useful to you. Sorry
> for the churn:
> 
> https://github.com/mina/linux/commits/tcpdevmem-v1.5/
> 
> [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=809049&state=*
> 

-- 
Pavel Begunkov

  reply	other threads:[~2023-12-20  0:00 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-08  0:52 [net-next v1 00/16] Device Memory TCP Mina Almasry
2023-12-08  0:52 ` [net-next v1 01/16] net: page_pool: factor out releasing DMA from releasing the page Mina Almasry
2023-12-10  3:49   ` Shakeel Butt
2023-12-12  8:11   ` Ilias Apalodimas
2023-12-08  0:52 ` [net-next v1 02/16] net: page_pool: create hooks for custom page providers Mina Almasry
2023-12-12  8:07   ` Ilias Apalodimas
2023-12-12 14:47     ` Mina Almasry
2023-12-08  0:52 ` [net-next v1 03/16] queue_api: define queue api Mina Almasry
2023-12-14  1:15   ` Jakub Kicinski
2023-12-08  0:52 ` [net-next v1 04/16] gve: implement " Mina Almasry
2024-03-05 11:45   ` Arnd Bergmann
2023-12-08  0:52 ` [net-next v1 05/16] net: netdev netlink api to bind dma-buf to a net device Mina Almasry
2023-12-14  1:17   ` Jakub Kicinski
2023-12-08  0:52 ` [net-next v1 06/16] netdev: support binding dma-buf to netdevice Mina Almasry
2023-12-08 15:40   ` kernel test robot
2023-12-08 16:02   ` kernel test robot
2023-12-08 17:48   ` David Ahern
2023-12-08 19:22     ` Mina Almasry
2023-12-08 20:32       ` Mina Almasry
2023-12-09 23:29       ` David Ahern
2023-12-11  2:19         ` Mina Almasry
2023-12-08  0:52 ` [net-next v1 07/16] netdev: netdevice devmem allocator Mina Almasry
2023-12-08 17:56   ` David Ahern
2023-12-08 19:27     ` Mina Almasry
2023-12-08  0:52 ` [net-next v1 08/16] memory-provider: dmabuf devmem memory provider Mina Almasry
2023-12-08 22:48   ` Pavel Begunkov
2023-12-08 23:25     ` Mina Almasry
2023-12-10  3:03       ` Pavel Begunkov
2023-12-11  2:30         ` Mina Almasry
2023-12-11 20:35           ` Pavel Begunkov
2023-12-14 20:03             ` Mina Almasry
2023-12-19 23:55               ` Pavel Begunkov [this message]
2023-12-08 23:05   ` Pavel Begunkov
2023-12-12 12:25   ` Jason Gunthorpe
2023-12-12 13:07     ` Christoph Hellwig
2023-12-12 14:26     ` Mina Almasry
2023-12-12 14:39       ` Jason Gunthorpe
2023-12-12 14:58         ` Mina Almasry
2023-12-12 15:08           ` Jason Gunthorpe
2023-12-13  1:09             ` Mina Almasry
2023-12-13  2:19               ` David Ahern
2023-12-13  7:49   ` Yinjun Zhang
2023-12-08  0:52 ` [net-next v1 09/16] page_pool: device memory support Mina Almasry
2023-12-08  9:30   ` Yunsheng Lin
2023-12-08 16:05     ` Mina Almasry
2023-12-11  2:04       ` Yunsheng Lin
2023-12-11  2:26         ` Mina Almasry
2023-12-11  4:04           ` Mina Almasry
2023-12-11 11:51             ` Yunsheng Lin
2023-12-11 18:14               ` Mina Almasry
2023-12-12 11:17                 ` Yunsheng Lin
2023-12-12 14:28                   ` Mina Almasry
2023-12-13 11:48                     ` Yunsheng Lin
2023-12-13  7:52             ` Mina Almasry
2023-12-08  0:52 ` [net-next v1 10/16] page_pool: don't release iov on elevanted refcount Mina Almasry
2023-12-08  0:52 ` [net-next v1 11/16] net: support non paged skb frags Mina Almasry
2023-12-08  0:52 ` [net-next v1 12/16] net: add support for skbs with unreadable frags Mina Almasry
2023-12-08  0:52 ` [net-next v1 13/16] tcp: RX path for devmem TCP Mina Almasry
2023-12-08 15:40   ` kernel test robot
2023-12-08 17:55   ` David Ahern
2023-12-08 19:23     ` Mina Almasry
2023-12-08  0:52 ` [net-next v1 14/16] net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags Mina Almasry
2023-12-12 19:08   ` Simon Horman
2023-12-08  0:52 ` [net-next v1 15/16] net: add devmem TCP documentation Mina Almasry
2023-12-12 19:14   ` Simon Horman
2023-12-08  0:52 ` [net-next v1 16/16] selftests: add ncdevmem, netcat for devmem TCP Mina Almasry
2023-12-08  1:47 ` [net-next v1 00/16] Device Memory TCP Mina Almasry
2023-12-08 17:57 ` David Ahern
2023-12-08 19:31   ` Mina Almasry
2023-12-10  3:48 ` Shakeel Butt
2023-12-12  5:58 ` Christoph Hellwig
2023-12-14  6:20 ` patchwork-bot+netdevbpf
2023-12-14  6:48   ` Christoph Hellwig
2023-12-14  6:51     ` Mina Almasry
2023-12-14  6:59       ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=040a2f67-cdc2-4e75-84ba-36ec13cbc00b@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=almasrymina@google.com \
    --cc=arnd@arndb.de \
    --cc=bpf@vger.kernel.org \
    --cc=christian.koenig@amd.com \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=hramamurthy@google.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=jeroendb@google.com \
    --cc=kaiyuanz@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linyunsheng@huawei.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pkaligineedi@google.com \
    --cc=shailend@google.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=sumit.semwal@linaro.org \
    --cc=willemb@google.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).