netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mina Almasry <almasrymina@google.com>
To: Yunsheng Lin <linyunsheng@huawei.com>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-alpha@vger.kernel.org,
	linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org,
	sparclinux@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	linux-arch@vger.kernel.org, bpf@vger.kernel.org,
	linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org,
	dri-devel@lists.freedesktop.org,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Ivan Kokshaysky" <ink@jurassic.park.msu.ru>,
	"Matt Turner" <mattst88@gmail.com>,
	"Thomas Bogendoerfer" <tsbogend@alpha.franken.de>,
	"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
	"Helge Deller" <deller@gmx.de>,
	"Andreas Larsson" <andreas@gaisler.com>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Masami Hiramatsu" <mhiramat@kernel.org>,
	"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Martin KaFai Lau" <martin.lau@linux.dev>,
	"Eduard Zingerman" <eddyz87@gmail.com>,
	"Song Liu" <song@kernel.org>,
	"Yonghong Song" <yonghong.song@linux.dev>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"KP Singh" <kpsingh@kernel.org>,
	"Stanislav Fomichev" <sdf@google.com>,
	"Hao Luo" <haoluo@google.com>, "Jiri Olsa" <jolsa@kernel.org>,
	"David Ahern" <dsahern@kernel.org>,
	"Willem de Bruijn" <willemdebruijn.kernel@gmail.com>,
	"Shuah Khan" <shuah@kernel.org>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Christian König" <christian.koenig@amd.com>,
	"Pavel Begunkov" <asml.silence@gmail.com>,
	"David Wei" <dw@davidwei.uk>, "Jason Gunthorpe" <jgg@ziepe.ca>,
	"Shailend Chand" <shailend@google.com>,
	"Harshitha Ramamurthy" <hramamurthy@google.com>,
	"Shakeel Butt" <shakeelb@google.com>,
	"Jeroen de Borst" <jeroendb@google.com>,
	"Praveen Kaligineedi" <pkaligineedi@google.com>,
	"Willem de Bruijn" <willemb@google.com>,
	"Kaiyuan Zhang" <kaiyuanz@google.com>
Subject: Re: [RFC PATCH net-next v6 05/15] netdev: support binding dma-buf to netdevice
Date: Tue, 5 Mar 2024 13:17:08 -0800	[thread overview]
Message-ID: <CAHS8izM7GbvWHrH=h9q0oG0DMU649EjT1udNEW_8F-hGeC15EQ@mail.gmail.com> (raw)
In-Reply-To: <da42cea9-c169-599e-f087-d38c419e3dab@huawei.com>

On Tue, Mar 5, 2024 at 4:55 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2024/3/5 10:01, Mina Almasry wrote:
>
> ...
>
> >
> > The netdev_dmabuf_binding struct is refcounted, and releases its
> > resources only when all the refs are released.
> >
> > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
> > Signed-off-by: Mina Almasry <almasrymina@google.com>
> >
> > ---
> >
> > RFC v6:
> > - Validate rx queue index
> > - Refactor new functions into devmem.c (Pavel)
>
> It seems odd that the functions or stucts in a file called devmem.c
> are named after 'dmabuf' instead of 'devmem'.
>

So my intention with this naming that devmem.c contains all the
functions for all devmem tcp specific support. Currently the only
devmem we support is dmabuf. In the future, other devmem may be
supported and it can fit nicely in devmem.c. For example, if we want
to extend devmem TCP to support NVMe devices, we need to add support
for p2pdma, maybe, and we can add that support under the devmem.c
umbrella rather than add new files.

But I can rename to dmabuf.c if there is strong objection to the current name.

> >
>
> ...
>
> > diff --git a/include/net/netmem.h b/include/net/netmem.h
> > index d8b810245c1d..72e932a1a948 100644
> > --- a/include/net/netmem.h
> > +++ b/include/net/netmem.h
> > @@ -8,6 +8,16 @@
> >  #ifndef _NET_NETMEM_H
> >  #define _NET_NETMEM_H
> >
> > +#include <net/devmem.h>
> > +
> > +/* net_iov */
> > +
> > +struct net_iov {
> > +     struct dmabuf_genpool_chunk_owner *owner;
> > +};
> > +
> > +/* netmem */
> > +
> >  /**
> >   * typedef netmem_ref - a nonexistent type marking a reference to generic
> >   * network memory.
> > diff --git a/net/core/Makefile b/net/core/Makefile
> > index 821aec06abf1..592f955c1241 100644
> > --- a/net/core/Makefile
> > +++ b/net/core/Makefile
> > @@ -13,7 +13,7 @@ obj-y                    += dev.o dev_addr_lists.o dst.o netevent.o \
> >                       neighbour.o rtnetlink.o utils.o link_watch.o filter.o \
> >                       sock_diag.o dev_ioctl.o tso.o sock_reuseport.o \
> >                       fib_notifier.o xdp.o flow_offload.o gro.o \
> > -                     netdev-genl.o netdev-genl-gen.o gso.o
> > +                     netdev-genl.o netdev-genl-gen.o gso.o devmem.o
> >
> >  obj-$(CONFIG_NETDEV_ADDR_LIST_TEST) += dev_addr_lists_test.o
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index fe054cbd41e9..bbea1b252529 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -155,6 +155,9 @@
> >  #include <net/netdev_rx_queue.h>
> >  #include <net/page_pool/types.h>
> >  #include <net/page_pool/helpers.h>
> > +#include <linux/genalloc.h>
> > +#include <linux/dma-buf.h>
> > +#include <net/devmem.h>
> >
> >  #include "dev.h"
> >  #include "net-sysfs.h"
> > diff --git a/net/core/devmem.c b/net/core/devmem.c
> > new file mode 100644
> > index 000000000000..779ad990971e
> > --- /dev/null
> > +++ b/net/core/devmem.c
> > @@ -0,0 +1,293 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +/*
> > + *      Devmem TCP
> > + *
> > + *      Authors:     Mina Almasry <almasrymina@google.com>
> > + *                   Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> > + *                   Kaiyuan Zhang <kaiyuanz@google.com
> > + */
> > +
> > +#include <linux/types.h>
> > +#include <linux/mm.h>
> > +#include <linux/netdevice.h>
> > +#include <trace/events/page_pool.h>
> > +#include <net/netdev_rx_queue.h>
> > +#include <net/page_pool/types.h>
> > +#include <net/page_pool/helpers.h>
> > +#include <linux/genalloc.h>
> > +#include <linux/dma-buf.h>
> > +#include <net/devmem.h>
> > +
> > +/* Device memory support */
> > +
> > +#ifdef CONFIG_DMA_SHARED_BUFFER
>
> I still think it is worth adding its own config for devmem or dma-buf
> for networking, thinking about the embeded system.
>

FWIW Willem did weigh on this previously and said he prefers to have
it unguarded by a CONFIG, but I will submit to whatever the consensus
here. It shouldn't be a huge deal to add a CONFIG technically
speaking.

> > +static void netdev_dmabuf_free_chunk_owner(struct gen_pool *genpool,
> > +                                        struct gen_pool_chunk *chunk,
> > +                                        void *not_used)
>
> It seems odd to still keep the netdev_ prefix as it is not really related
> to netdev, perhaps use 'net_' or something better.
>

Yes, thanks for catching. I can change to net_devmem_ maybe or net_dmabuf_*.

> > +{
> > +     struct dmabuf_genpool_chunk_owner *owner = chunk->owner;
> > +
> > +     kvfree(owner->niovs);
> > +     kfree(owner);
> > +}
> > +
> > +void __netdev_dmabuf_binding_free(struct netdev_dmabuf_binding *binding)
> > +{
> > +     size_t size, avail;
> > +
> > +     gen_pool_for_each_chunk(binding->chunk_pool,
> > +                             netdev_dmabuf_free_chunk_owner, NULL);
> > +
> > +     size = gen_pool_size(binding->chunk_pool);
> > +     avail = gen_pool_avail(binding->chunk_pool);
> > +
> > +     if (!WARN(size != avail, "can't destroy genpool. size=%lu, avail=%lu",
> > +               size, avail))
> > +             gen_pool_destroy(binding->chunk_pool);
> > +
> > +     dma_buf_unmap_attachment(binding->attachment, binding->sgt,
> > +                              DMA_BIDIRECTIONAL);
>
> For now DMA_FROM_DEVICE seems enough as tx is not supported yet.
>

Yes, good catch. I suspect we want to reuse this code for TX path. But
for now, I'll test with DMA_FROM_DEVICE and if I see no issues I'll
apply this change.

> > +     dma_buf_detach(binding->dmabuf, binding->attachment);
> > +     dma_buf_put(binding->dmabuf);
> > +     xa_destroy(&binding->bound_rxq_list);
> > +     kfree(binding);
> > +}
> > +
> > +static int netdev_restart_rx_queue(struct net_device *dev, int rxq_idx)
> > +{
> > +     void *new_mem;
> > +     void *old_mem;
> > +     int err;
> > +
> > +     if (!dev || !dev->netdev_ops)
> > +             return -EINVAL;
> > +
> > +     if (!dev->netdev_ops->ndo_queue_stop ||
> > +         !dev->netdev_ops->ndo_queue_mem_free ||
> > +         !dev->netdev_ops->ndo_queue_mem_alloc ||
> > +         !dev->netdev_ops->ndo_queue_start)
> > +             return -EOPNOTSUPP;
> > +
> > +     new_mem = dev->netdev_ops->ndo_queue_mem_alloc(dev, rxq_idx);
> > +     if (!new_mem)
> > +             return -ENOMEM;
> > +
> > +     err = dev->netdev_ops->ndo_queue_stop(dev, rxq_idx, &old_mem);
> > +     if (err)
> > +             goto err_free_new_mem;
> > +
> > +     err = dev->netdev_ops->ndo_queue_start(dev, rxq_idx, new_mem);
> > +     if (err)
> > +             goto err_start_queue;
> > +
> > +     dev->netdev_ops->ndo_queue_mem_free(dev, old_mem);
> > +
> > +     return 0;
> > +
> > +err_start_queue:
> > +     dev->netdev_ops->ndo_queue_start(dev, rxq_idx, old_mem);
>
> It might worth mentioning why queue start with old_mem will always
> success here as the return value seems to be ignored here.
>

So the old queue, we stopped it, and if we fail to bring up the new
queue, then we want to start the old queue back up to get the queue
back to a workable state.

I don't see what we can do to recover if restarting the old queue
fails. Seems like it should be a requirement that the driver tries as
much as possible to keep the old queue restartable.

I can improve this by at least logging or warning if restarting the
old queue fails.

> > +
> > +err_free_new_mem:
> > +     dev->netdev_ops->ndo_queue_mem_free(dev, new_mem);
> > +
> > +     return err;
> > +}
> > +
> > +/* Protected by rtnl_lock() */
> > +static DEFINE_XARRAY_FLAGS(netdev_dmabuf_bindings, XA_FLAGS_ALLOC1);
> > +
> > +void netdev_unbind_dmabuf(struct netdev_dmabuf_binding *binding)
> > +{
> > +     struct netdev_rx_queue *rxq;
> > +     unsigned long xa_idx;
> > +     unsigned int rxq_idx;
> > +
> > +     if (!binding)
> > +             return;
> > +
> > +     if (binding->list.next)
> > +             list_del(&binding->list);
>
> The above does not seems to be a good pattern to delete a entry, is
> there any reason having a checking before the list_del()? seems like
> defensive programming?
>

I think I needed to apply this condition to handle the case where
netdev_unbind_dmabuf() is called when binding->list is not initialized
or is empty.

netdev_nl_bind_rx_doit() will call unbind to free a partially
allocated binding in error paths, so, netdev_unbind_dmabuf() may be
called with a partially initialized binding. This is why we check for
binding->list is initialized here and check that rxq->binding ==
binding below. The main point is that netdev_unbind_dmabuf() may be
asked to unbind a partially bound dmabuf due to error paths.

Maybe a comment here will test this better. I will double confirm the
check is needed for the error paths in netdev_nl_bind_rx_doit().

> > +
> > +     xa_for_each(&binding->bound_rxq_list, xa_idx, rxq) {
> > +             if (rxq->binding == binding) {
>
> It seems like defensive programming here too?
>
> > +                     /* We hold the rtnl_lock while binding/unbinding
> > +                      * dma-buf, so we can't race with another thread that
> > +                      * is also modifying this value. However, the driver
> > +                      * may read this config while it's creating its
> > +                      * rx-queues. WRITE_ONCE() here to match the
> > +                      * READ_ONCE() in the driver.
> > +                      */
> > +                     WRITE_ONCE(rxq->binding, NULL);
> > +
> > +                     rxq_idx = get_netdev_rx_queue_index(rxq);
> > +
> > +                     netdev_restart_rx_queue(binding->dev, rxq_idx);
> > +             }
> > +     }
> > +
> > +     xa_erase(&netdev_dmabuf_bindings, binding->id);
> > +
> > +     netdev_dmabuf_binding_put(binding);
> > +}
> > +
>


-- 
Thanks,
Mina

  reply	other threads:[~2024-03-05 21:17 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-05  2:01 [RFC PATCH net-next v6 00/15] Device Memory TCP Mina Almasry
2024-03-05  2:01 ` [RFC PATCH net-next v6 01/15] queue_api: define queue api Mina Almasry
2024-03-08  1:30   ` Jakub Kicinski
2024-03-08  2:08     ` Mina Almasry
2024-03-08  3:36       ` Jakub Kicinski
2024-03-08 23:47   ` David Wei
2024-03-09  0:27     ` Mina Almasry
2024-03-11  1:12     ` David Ahern
2024-03-05  2:01 ` [RFC PATCH net-next v6 02/15] net: page_pool: create hooks for custom page providers Mina Almasry
2024-03-05 21:54   ` David Wei
2024-03-05 22:36     ` Mina Almasry
2024-03-06 14:29       ` Pavel Begunkov
2024-03-06 17:04         ` Mina Almasry
2024-03-06 19:12           ` Pavel Begunkov
2024-03-06 21:59             ` Mina Almasry
2024-03-07 14:25               ` Pavel Begunkov
2024-03-08  4:57   ` David Wei
2024-03-08 19:53     ` Mina Almasry
2024-03-18  2:02   ` Christoph Hellwig
2024-03-18  2:49     ` David Wei
2024-03-18 23:22       ` Christoph Hellwig
2024-03-22 17:40         ` Mina Almasry
2024-03-22 23:19           ` Jakub Kicinski
2024-03-24 23:35             ` Christoph Hellwig
2024-03-24 23:35           ` Christoph Hellwig
2024-03-22 17:54     ` Mina Almasry
2024-03-24 23:37       ` Christoph Hellwig
2024-03-26 20:19         ` Mina Almasry
2024-03-28  7:31           ` Christoph Hellwig
2024-04-01 19:22             ` Mina Almasry
2024-04-08 15:34               ` Cong Wang
2024-03-05  2:01 ` [RFC PATCH net-next v6 03/15] net: page_pool: factor out page_pool recycle check Mina Almasry
2024-03-05 12:55   ` Yunsheng Lin
2024-03-05  2:01 ` [RFC PATCH net-next v6 04/15] net: netdev netlink api to bind dma-buf to a net device Mina Almasry
2024-03-05  2:01 ` [RFC PATCH net-next v6 05/15] netdev: support binding dma-buf to netdevice Mina Almasry
2024-03-05  9:04   ` Arnd Bergmann
2024-03-05 20:00     ` Mina Almasry
2024-03-05 21:42       ` Arnd Bergmann
2024-03-05 12:55   ` Yunsheng Lin
2024-03-05 21:17     ` Mina Almasry [this message]
2024-03-06 12:38       ` Yunsheng Lin
2024-03-06 22:10         ` Mina Almasry
2024-03-07 12:15           ` Yunsheng Lin
2024-03-08  3:58   ` Jakub Kicinski
2024-03-05  2:01 ` [RFC PATCH net-next v6 06/15] netdev: netdevice devmem allocator Mina Almasry
2024-03-05  2:01 ` [RFC PATCH net-next v6 07/15] page_pool: convert to use netmem Mina Almasry
2024-03-05 21:30   ` Mina Almasry
2024-03-05  2:01 ` [RFC PATCH net-next v6 08/15] page_pool: devmem support Mina Almasry
2024-03-05 21:42   ` Mina Almasry
2024-03-05  2:01 ` [RFC PATCH net-next v6 09/15] memory-provider: dmabuf devmem memory provider Mina Almasry
2024-03-06  2:28   ` David Wei
2024-03-06  2:42     ` Mina Almasry
2024-03-06  2:46       ` David Wei
2024-03-06  2:54         ` Mina Almasry
2024-03-06 14:58       ` Pavel Begunkov
2024-03-06 16:51         ` Mina Almasry
2024-03-05  2:01 ` [RFC PATCH net-next v6 10/15] net: support non paged skb frags Mina Almasry
2024-03-05  2:01 ` [RFC PATCH net-next v6 11/15] net: add support for skbs with unreadable frags Mina Almasry
2024-03-05  2:01 ` [RFC PATCH net-next v6 12/15] tcp: RX path for devmem TCP Mina Almasry
2024-03-05  8:41   ` Arnd Bergmann
2024-03-05 19:22     ` Mina Almasry
2024-03-05 19:39       ` Arnd Bergmann
2024-03-05  2:01 ` [RFC PATCH net-next v6 13/15] net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags Mina Almasry
2024-03-05  2:01 ` [RFC PATCH net-next v6 14/15] net: add devmem TCP documentation Mina Almasry
2024-03-08  1:52   ` Jakub Kicinski
2024-03-05  2:01 ` [RFC PATCH net-next v6 15/15] selftests: add ncdevmem, netcat for devmem TCP Mina Almasry
2024-03-05  7:16 ` [RFC PATCH net-next v6 07/15] page_pool: convert to use netmem David Howells
2024-03-05 12:54 ` [RFC PATCH net-next v6 00/15] Device Memory TCP Yunsheng Lin
2024-03-05 19:38   ` Mina Almasry
2024-03-06 12:37     ` Yunsheng Lin
2024-03-26  0:28     ` Mina Almasry
2024-03-26 12:47       ` Yunsheng Lin
2024-03-26 20:14         ` Mina Almasry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHS8izM7GbvWHrH=h9q0oG0DMU649EjT1udNEW_8F-hGeC15EQ@mail.gmail.com' \
    --to=almasrymina@google.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=andreas@gaisler.com \
    --cc=andrii@kernel.org \
    --cc=arnd@arndb.de \
    --cc=asml.silence@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=christian.koenig@amd.com \
    --cc=corbet@lwn.net \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=deller@gmx.de \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=dsahern@kernel.org \
    --cc=dw@davidwei.uk \
    --cc=eddyz87@gmail.com \
    --cc=edumazet@google.com \
    --cc=haoluo@google.com \
    --cc=hawk@kernel.org \
    --cc=hramamurthy@google.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=ink@jurassic.park.msu.ru \
    --cc=jeroendb@google.com \
    --cc=jgg@ziepe.ca \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kaiyuanz@google.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=linyunsheng@huawei.com \
    --cc=martin.lau@linux.dev \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mattst88@gmail.com \
    --cc=mhiramat@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pkaligineedi@google.com \
    --cc=richard.henderson@linaro.org \
    --cc=rostedt@goodmis.org \
    --cc=sdf@google.com \
    --cc=shailend@google.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=sumit.semwal@linaro.org \
    --cc=tsbogend@alpha.franken.de \
    --cc=willemb@google.com \
    --cc=willemdebruijn.kernel@gmail.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).