All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexander Lobakin <alobakin@pm.me>
To: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: "Alexander Lobakin" <alobakin@pm.me>,
	"Björn Töpel" <bjorn@kernel.org>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Jason Wang" <jasowang@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Jonathan Lemon" <jonathan.lemon@gmail.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Martin KaFai Lau" <kafai@fb.com>,
	"Song Liu" <songliubraving@fb.com>, "Yonghong Song" <yhs@fb.com>,
	"KP Singh" <kpsingh@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Eric Dumazet" <eric.dumazet@gmail.com>,
	"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
	"Dust Li" <dust.li@linux.alibaba.com>,
	virtualization@lists.linux-foundation.org,
	"Network Development" <netdev@vger.kernel.org>,
	"open list" <linux-kernel@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH v4 bpf-next 6/6] xsk: build skb by page (aka generic zerocopy xmit)
Date: Tue, 16 Feb 2021 14:15:32 +0000	[thread overview]
Message-ID: <20210216141507.3263-1-alobakin@pm.me> (raw)
In-Reply-To: <CAJ8uoz0-ge=_jC8EbR371DMKxYSP8USni5OqVf0yk1-4Z=vnOg@mail.gmail.com>

From: Magnus Karlsson <magnus.karlsson@gmail.com>
Date: Tue, 16 Feb 2021 15:08:26 +0100

> On Tue, Feb 16, 2021 at 12:44 PM Alexander Lobakin <alobakin@pm.me> wrote:
> >
> > From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> >
> > This patch is used to construct skb based on page to save memory copy
> > overhead.
> >
> > This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the
> > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to
> > directly construct skb. If this feature is not supported, it is still
> > necessary to copy data to construct skb.
> >
> > ---------------- Performance Testing ------------
> >
> > The test environment is Aliyun ECS server.
> > Test cmd:
> > ```
> > xdpsock -i eth0 -t  -S -s <msg size>
> > ```
> >
> > Test result data:
> >
> > size    64      512     1024    1500
> > copy    1916747 1775988 1600203 1440054
> > page    1974058 1953655 1945463 1904478
> > percent 3.0%    10.0%   21.58%  32.3%
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
> > [ alobakin:
> >  - expand subject to make it clearer;
> >  - improve skb->truesize calculation;
> >  - reserve some headroom in skb for drivers;
> >  - tailroom is not needed as skb is non-linear ]
> > Signed-off-by: Alexander Lobakin <alobakin@pm.me>
> 
> Thank you Alexander!
> 
> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>

Thanks!

I have one more generic zerocopy to offer (inspired by this series)
that wouldn't require IFF_TX_SKB_NO_LINEAR, only a capability to xmit
S/G packets that almost every NIC has. I'll publish an RFC once this
and your upcoming changes get merged.

> > ---
> >  net/xdp/xsk.c | 119 ++++++++++++++++++++++++++++++++++++++++----------
> >  1 file changed, 95 insertions(+), 24 deletions(-)
> >
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index 143979ea4165..ff7bd06e1241 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -445,6 +445,96 @@ static void xsk_destruct_skb(struct sk_buff *skb)
> >         sock_wfree(skb);
> >  }
> >
> > +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> > +                                             struct xdp_desc *desc)
> > +{
> > +       struct xsk_buff_pool *pool = xs->pool;
> > +       u32 hr, len, offset, copy, copied;
> > +       struct sk_buff *skb;
> > +       struct page *page;
> > +       void *buffer;
> > +       int err, i;
> > +       u64 addr;
> > +
> > +       hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(xs->dev->needed_headroom));
> > +
> > +       skb = sock_alloc_send_skb(&xs->sk, hr, 1, &err);
> > +       if (unlikely(!skb))
> > +               return ERR_PTR(err);
> > +
> > +       skb_reserve(skb, hr);
> > +
> > +       addr = desc->addr;
> > +       len = desc->len;
> > +
> > +       buffer = xsk_buff_raw_get_data(pool, addr);
> > +       offset = offset_in_page(buffer);
> > +       addr = buffer - pool->addrs;
> > +
> > +       for (copied = 0, i = 0; copied < len; i++) {
> > +               page = pool->umem->pgs[addr >> PAGE_SHIFT];
> > +               get_page(page);
> > +
> > +               copy = min_t(u32, PAGE_SIZE - offset, len - copied);
> > +               skb_fill_page_desc(skb, i, page, offset, copy);
> > +
> > +               copied += copy;
> > +               addr += copy;
> > +               offset = 0;
> > +       }
> > +
> > +       skb->len += len;
> > +       skb->data_len += len;
> > +       skb->truesize += pool->unaligned ? len : pool->chunk_size;
> > +
> > +       refcount_add(skb->truesize, &xs->sk.sk_wmem_alloc);
> > +
> > +       return skb;
> > +}
> > +
> > +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> > +                                    struct xdp_desc *desc)
> > +{
> > +       struct net_device *dev = xs->dev;
> > +       struct sk_buff *skb;
> > +
> > +       if (dev->priv_flags & IFF_TX_SKB_NO_LINEAR) {
> > +               skb = xsk_build_skb_zerocopy(xs, desc);
> > +               if (IS_ERR(skb))
> > +                       return skb;
> > +       } else {
> > +               u32 hr, tr, len;
> > +               void *buffer;
> > +               int err;
> > +
> > +               hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(dev->needed_headroom));
> > +               tr = dev->needed_tailroom;
> > +               len = desc->len;
> > +
> > +               skb = sock_alloc_send_skb(&xs->sk, hr + len + tr, 1, &err);
> > +               if (unlikely(!skb))
> > +                       return ERR_PTR(err);
> > +
> > +               skb_reserve(skb, hr);
> > +               skb_put(skb, len);
> > +
> > +               buffer = xsk_buff_raw_get_data(xs->pool, desc->addr);
> > +               err = skb_store_bits(skb, 0, buffer, len);
> > +               if (unlikely(err)) {
> > +                       kfree_skb(skb);
> > +                       return ERR_PTR(err);
> > +               }
> > +       }
> > +
> > +       skb->dev = dev;
> > +       skb->priority = xs->sk.sk_priority;
> > +       skb->mark = xs->sk.sk_mark;
> > +       skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr;
> > +       skb->destructor = xsk_destruct_skb;
> > +
> > +       return skb;
> > +}
> > +
> >  static int xsk_generic_xmit(struct sock *sk)
> >  {
> >         struct xdp_sock *xs = xdp_sk(sk);
> > @@ -454,56 +544,37 @@ static int xsk_generic_xmit(struct sock *sk)
> >         struct sk_buff *skb;
> >         unsigned long flags;
> >         int err = 0;
> > -       u32 hr, tr;
> >
> >         mutex_lock(&xs->mutex);
> >
> >         if (xs->queue_id >= xs->dev->real_num_tx_queues)
> >                 goto out;
> >
> > -       hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(xs->dev->needed_headroom));
> > -       tr = xs->dev->needed_tailroom;
> > -
> >         while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) {
> > -               char *buffer;
> > -               u64 addr;
> > -               u32 len;
> > -
> >                 if (max_batch-- == 0) {
> >                         err = -EAGAIN;
> >                         goto out;
> >                 }
> >
> > -               len = desc.len;
> > -               skb = sock_alloc_send_skb(sk, hr + len + tr, 1, &err);
> > -               if (unlikely(!skb))
> > +               skb = xsk_build_skb(xs, &desc);
> > +               if (IS_ERR(skb)) {
> > +                       err = PTR_ERR(skb);
> >                         goto out;
> > +               }
> >
> > -               skb_reserve(skb, hr);
> > -               skb_put(skb, len);
> > -
> > -               addr = desc.addr;
> > -               buffer = xsk_buff_raw_get_data(xs->pool, addr);
> > -               err = skb_store_bits(skb, 0, buffer, len);
> >                 /* This is the backpressure mechanism for the Tx path.
> >                  * Reserve space in the completion queue and only proceed
> >                  * if there is space in it. This avoids having to implement
> >                  * any buffering in the Tx path.
> >                  */
> >                 spin_lock_irqsave(&xs->pool->cq_lock, flags);
> > -               if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) {
> > +               if (xskq_prod_reserve(xs->pool->cq)) {
> >                         spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
> >                         kfree_skb(skb);
> >                         goto out;
> >                 }
> >                 spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
> >
> > -               skb->dev = xs->dev;
> > -               skb->priority = sk->sk_priority;
> > -               skb->mark = sk->sk_mark;
> > -               skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
> > -               skb->destructor = xsk_destruct_skb;
> > -
> >                 err = __dev_direct_xmit(skb, xs->queue_id);
> >                 if  (err == NETDEV_TX_BUSY) {
> >                         /* Tell user-space to retry the send */
> > --
> > 2.30.1

Al


      reply	other threads:[~2021-02-16 14:16 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-16 11:38 [PATCH v4 bpf-next 0/6] xsk: build skb by page (aka generic zerocopy xmit) Alexander Lobakin
2021-02-16 11:38 ` [PATCH v4 bpf-next 1/6] netdev_priv_flags: add missing IFF_PHONY_HEADROOM self-definition Alexander Lobakin
2021-02-16 11:38 ` [PATCH v4 bpf-next 2/6] netdevice: check for net_device::priv_flags bitfield overflow Alexander Lobakin
2021-02-16 14:18   ` kernel test robot
2021-02-16 11:38 ` [PATCH v4 bpf-next 3/6] net: add priv_flags for allow tx skb without linear Alexander Lobakin
2021-02-16 11:38 ` [PATCH v4 bpf-next 4/6] virtio-net: support IFF_TX_SKB_NO_LINEAR Alexander Lobakin
2021-02-16 11:39 ` [PATCH v4 bpf-next 5/6] xsk: respect device's headroom and tailroom on generic xmit path Alexander Lobakin
2021-02-16 14:08   ` Magnus Karlsson
2021-02-16 11:39 ` [PATCH v4 bpf-next 6/6] xsk: build skb by page (aka generic zerocopy xmit) Alexander Lobakin
2021-02-16 14:08   ` Magnus Karlsson
2021-02-16 14:15     ` Alexander Lobakin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210216141507.3263-1-alobakin@pm.me \
    --to=alobakin@pm.me \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dust.li@linux.alibaba.com \
    --cc=eric.dumazet@gmail.com \
    --cc=hawk@kernel.org \
    --cc=jasowang@redhat.com \
    --cc=john.fastabend@gmail.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=magnus.karlsson@gmail.com \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=songliubraving@fb.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=xuanzhuo@linux.alibaba.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.