bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Magnus Karlsson <magnus.karlsson@gmail.com>
To: Alexander Lobakin <alobakin@pm.me>
Cc: "Eric Dumazet" <eric.dumazet@gmail.com>,
	"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Jason Wang" <jasowang@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Björn Töpel" <bjorn@kernel.org>,
	"Magnus Karlsson" <magnus.karlsson@intel.com>,
	"Jonathan Lemon" <jonathan.lemon@gmail.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Martin KaFai Lau" <kafai@fb.com>,
	"Song Liu" <songliubraving@fb.com>, "Yonghong Song" <yhs@fb.com>,
	"KP Singh" <kpsingh@kernel.org>,
	virtualization@lists.linux-foundation.org,
	bpf <bpf@vger.kernel.org>,
	"Network Development" <netdev@vger.kernel.org>,
	"open list" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH bpf-next v3 3/3] xsk: build skb by page
Date: Fri, 22 Jan 2021 13:18:47 +0100	[thread overview]
Message-ID: <CAJ8uoz0ve9iRmz6zkCTaBMMjckFrD0df43-uVreXVf_wM3mZ1A@mail.gmail.com> (raw)
In-Reply-To: <20210122115519.2183-1-alobakin@pm.me>

On Fri, Jan 22, 2021 at 12:57 PM Alexander Lobakin <alobakin@pm.me> wrote:
>
> From: Alexander Lobakin <alobakin@pm.me>
> Date: Fri, 22 Jan 2021 11:47:45 +0000
>
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Thu, 21 Jan 2021 16:41:33 +0100
> >
> > > On 1/21/21 2:47 PM, Xuan Zhuo wrote:
> > > > This patch is used to construct skb based on page to save memory copy
> > > > overhead.
> > > >
> > > > This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the
> > > > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to
> > > > directly construct skb. If this feature is not supported, it is still
> > > > necessary to copy data to construct skb.
> > > >
> > > > ---------------- Performance Testing ------------
> > > >
> > > > The test environment is Aliyun ECS server.
> > > > Test cmd:
> > > > ```
> > > > xdpsock -i eth0 -t  -S -s <msg size>
> > > > ```
> > > >
> > > > Test result data:
> > > >
> > > > size    64      512     1024    1500
> > > > copy    1916747 1775988 1600203 1440054
> > > > page    1974058 1953655 1945463 1904478
> > > > percent 3.0%    10.0%   21.58%  32.3%
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
> > > > ---
> > > >  net/xdp/xsk.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++----------
> > > >  1 file changed, 86 insertions(+), 18 deletions(-)
> > > >
> > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > > index 4a83117..38af7f1 100644
> > > > --- a/net/xdp/xsk.c
> > > > +++ b/net/xdp/xsk.c
> > > > @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff *skb)
> > > >   sock_wfree(skb);
> > > >  }
> > > >
> > > > +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> > > > +                                       struct xdp_desc *desc)
> > > > +{
> > > > + u32 len, offset, copy, copied;
> > > > + struct sk_buff *skb;
> > > > + struct page *page;
> > > > + void *buffer;
> > > > + int err, i;
> > > > + u64 addr;
> > > > +
> > > > + skb = sock_alloc_send_skb(&xs->sk, 0, 1, &err);
> > > > + if (unlikely(!skb))
> > > > +         return ERR_PTR(err);
> > > > +
> > > > + addr = desc->addr;
> > > > + len = desc->len;
> > > > +
> > > > + buffer = xsk_buff_raw_get_data(xs->pool, addr);
> > > > + offset = offset_in_page(buffer);
> > > > + addr = buffer - xs->pool->addrs;
> > > > +
> > > > + for (copied = 0, i = 0; copied < len; i++) {
> > > > +         page = xs->pool->umem->pgs[addr >> PAGE_SHIFT];
> > > > +
> > > > +         get_page(page);
> > > > +
> > > > +         copy = min_t(u32, PAGE_SIZE - offset, len - copied);
> > > > +
> > > > +         skb_fill_page_desc(skb, i, page, offset, copy);
> > > > +
> > > > +         copied += copy;
> > > > +         addr += copy;
> > > > +         offset = 0;
> > > > + }
> > > > +
> > > > + skb->len += len;
> > > > + skb->data_len += len;
> > >
> > > > + skb->truesize += len;
> > >
> > > This is not the truesize, unfortunately.
> > >
> > > We need to account for the number of pages, not number of bytes.
> >
> > The easiest solution is:
> >
> >       skb->truesize += PAGE_SIZE * i;
> >
> > i would be equal to skb_shinfo(skb)->nr_frags after exiting the loop.
>
> Oops, pls ignore this. I forgot that XSK buffers are not
> "one per page".
> We need to count the number of pages manually and then do
>
>         skb->truesize += PAGE_SIZE * npages;
>
> Right.

There are two possible packet buffer (chunks) sizes in a umem, 2K and
4K on a system with a PAGE_SIZE of 4K. If I remember correctly, and
please correct me if wrong, truesize is used for memory accounting.
But in this code, no kernel memory has been allocated (apart from the
skb). The page is just a part of the umem that has been already
allocated beforehand and by user-space in this case. So what should
truesize be in this case? Do we add 0, chunk_size * i, or the
complicated case of counting exactly how many 4K pages that are used
when the chunk_size is 2K, as two chunks could occupy the same page,
or just the upper bound of PAGE_SIZE * i that is likely a good
approximation in most cases? Just note that there might be other uses
of truesize that I am unaware of that could impact this choice.

> > > > +
> > > > + refcount_add(len, &xs->sk.sk_wmem_alloc);
> > > > +
> > > > + return skb;
> > > > +}
> > > > +
> >
> > Al
>
> Thanks,
> Al
>

  parent reply	other threads:[~2021-01-22 12:20 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-21 13:47 [PATCH bpf-next v3 0/3] xsk: build skb by page Xuan Zhuo
2021-01-21 13:47 ` [PATCH bpf-next v3 1/3] net: add priv_flags for allow tx skb without linear Xuan Zhuo
2021-01-21 15:13   ` Alexander Lobakin
2021-01-21 13:47 ` [PATCH bpf-next v3 2/3] virtio-net: support IFF_TX_SKB_NO_LINEAR Xuan Zhuo
2021-01-21 13:47 ` [PATCH bpf-next v3 3/3] xsk: build skb by page Xuan Zhuo
2021-01-21 15:17   ` Magnus Karlsson
2021-01-21 15:41   ` Eric Dumazet
2021-01-22 11:47     ` Alexander Lobakin
2021-01-22 11:55       ` Alexander Lobakin
2021-01-22 12:08         ` Alexander Lobakin
2021-01-22 12:18         ` Magnus Karlsson [this message]
2021-01-22 12:39           ` Alexander Lobakin
2021-01-22 12:55             ` Magnus Karlsson
     [not found] <1611329789.3222687-1-xuanzhuo@linux.alibaba.com>
2021-01-22 16:24 ` Alexander Lobakin
     [not found] <1611329955.4913929-2-xuanzhuo@linux.alibaba.com>
2021-01-22 17:26 ` Alexander Lobakin
2021-01-22 18:37   ` Magnus Karlsson
2021-01-22 18:45     ` Alexander Lobakin
     [not found]     ` <1611541335.3012564-1-xuanzhuo@linux.alibaba.com>
2021-01-25  7:44       ` Magnus Karlsson
     [not found]         ` <1611578136.5043845-1-xuanzhuo@linux.alibaba.com>
2021-01-25 13:16           ` Magnus Karlsson
     [not found]             ` <1611587242.9653594-2-xuanzhuo@linux.alibaba.com>
2021-01-26  7:34               ` Magnus Karlsson
     [not found] <1611544243.8363645-1-xuanzhuo@linux.alibaba.com>
2021-01-25 13:25 ` Alexander Lobakin
     [not found] <1611586627.1035807-1-xuanzhuo@linux.alibaba.com>
2021-01-25 15:07 ` Alexander Lobakin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJ8uoz0ve9iRmz6zkCTaBMMjckFrD0df43-uVreXVf_wM3mZ1A@mail.gmail.com \
    --to=magnus.karlsson@gmail.com \
    --cc=alobakin@pm.me \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=hawk@kernel.org \
    --cc=jasowang@redhat.com \
    --cc=john.fastabend@gmail.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=magnus.karlsson@intel.com \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=songliubraving@fb.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=xuanzhuo@linux.alibaba.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).