All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Harold Huang <baymaxhuang@gmail.com>,
	netdev <netdev@vger.kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	open list <linux-kernel@vger.kernel.org>,
	"open list:XDP (eXpress Data Path)" <bpf@vger.kernel.org>,
	Eric Dumazet <edumazet@google.com>
Subject: Re: [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs
Date: Mon, 28 Feb 2022 12:20:06 +0800	[thread overview]
Message-ID: <CACGkMEtTdvbc1rk6sk=KE7J2L0=R2M-FMxK+DfJDUYMTPbPJGA@mail.gmail.com> (raw)
In-Reply-To: <c687e1d8-e36a-8f23-342a-22b2a1efb372@gmail.com>

On Mon, Feb 28, 2022 at 12:06 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> On 2/25/22 01:02, Harold Huang wrote:
> > In tun, NAPI is supported and we can also use NAPI in the path of
> > batched XDP buffs to accelerate packet processing. What is more, after
> > we use NAPI, GRO is also supported. The iperf shows that the throughput of
> > single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
> > Gbps nearly reachs the line speed of the phy nic and there is still about
> > 15% idle cpu core remaining on the vhost thread.
> >
> > Test topology:
> >
> > [iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]
> >
> > Iperf stream:
> >
> > Before:
> > ...
> > [  5]   5.00-6.00   sec   558 MBytes  4.68 Gbits/sec    0   1.50 MBytes
> > [  5]   6.00-7.00   sec   556 MBytes  4.67 Gbits/sec    1   1.35 MBytes
> > [  5]   7.00-8.00   sec   556 MBytes  4.67 Gbits/sec    2   1.18 MBytes
> > [  5]   8.00-9.00   sec   559 MBytes  4.69 Gbits/sec    0   1.48 MBytes
> > [  5]   9.00-10.00  sec   556 MBytes  4.67 Gbits/sec    1   1.33 MBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval           Transfer     Bitrate         Retr
> > [  5]   0.00-10.00  sec  5.39 GBytes  4.63 Gbits/sec   72          sender
> > [  5]   0.00-10.04  sec  5.39 GBytes  4.61 Gbits/sec               receiver
> >
> > After:
> > ...
> > [  5]   5.00-6.00   sec  1.07 GBytes  9.19 Gbits/sec    0   1.55 MBytes
> > [  5]   6.00-7.00   sec  1.08 GBytes  9.30 Gbits/sec    0   1.63 MBytes
> > [  5]   7.00-8.00   sec  1.08 GBytes  9.25 Gbits/sec    0   1.72 MBytes
> > [  5]   8.00-9.00   sec  1.08 GBytes  9.25 Gbits/sec   77   1.31 MBytes
> > [  5]   9.00-10.00  sec  1.08 GBytes  9.24 Gbits/sec    0   1.48 MBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval           Transfer     Bitrate         Retr
> > [  5]   0.00-10.00  sec  10.8 GBytes  9.28 Gbits/sec  166          sender
> > [  5]   0.00-10.04  sec  10.8 GBytes  9.24 Gbits/sec               receiver
> > ....
> >
> > Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
> > Signed-off-by: Harold Huang <baymaxhuang@gmail.com>
> > ---
> > v1 -> v2
> >   - fix commit messages
> >   - add queued flag to avoid void unnecessary napi suggested by Jason
> >
> >   drivers/net/tun.c | 20 ++++++++++++++++----
> >   1 file changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index fed85447701a..c7d8b7c821d8 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -2379,7 +2379,7 @@ static void tun_put_page(struct tun_page *tpage)
> >   }
> >
> >   static int tun_xdp_one(struct tun_struct *tun,
> > -                    struct tun_file *tfile,
> > +                    struct tun_file *tfile, int *queued,
> >                      struct xdp_buff *xdp, int *flush,
> >                      struct tun_page *tpage)
> >   {
> > @@ -2388,6 +2388,7 @@ static int tun_xdp_one(struct tun_struct *tun,
> >       struct virtio_net_hdr *gso = &hdr->gso;
> >       struct bpf_prog *xdp_prog;
> >       struct sk_buff *skb = NULL;
> > +     struct sk_buff_head *queue;
> >       u32 rxhash = 0, act;
> >       int buflen = hdr->buflen;
> >       int err = 0;
> > @@ -2464,7 +2465,15 @@ static int tun_xdp_one(struct tun_struct *tun,
> >           !tfile->detached)
> >               rxhash = __skb_get_hash_symmetric(skb);
> >
> > -     netif_receive_skb(skb);
> > +     if (tfile->napi_enabled) {
> > +             queue = &tfile->sk.sk_write_queue;
> > +             spin_lock(&queue->lock);
> > +             __skb_queue_tail(queue, skb);
> > +             spin_unlock(&queue->lock);
> > +             (*queued)++;
> > +     } else {
> > +             netif_receive_skb(skb);
> > +     }
> >
> >       /* No need to disable preemption here since this function is
> >        * always called with bh disabled
> > @@ -2492,7 +2501,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> >       if (ctl && (ctl->type == TUN_MSG_PTR)) {
> >               struct tun_page tpage;
> >               int n = ctl->num;
> > -             int flush = 0;
> > +             int flush = 0, queued = 0;
> >
> >               memset(&tpage, 0, sizeof(tpage));
> >
> > @@ -2501,12 +2510,15 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> >
> >               for (i = 0; i < n; i++) {
> >                       xdp = &((struct xdp_buff *)ctl->ptr)[i];
> > -                     tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
> > +                     tun_xdp_one(tun, tfile, &queued, xdp, &flush, &tpage);
>
>
> How big n can be ?
>
> BTW I could not find where m->msg_controllen was checked in tun_sendmsg().
>
> struct tun_msg_ctl *ctl = m->msg_control;
>
> if (ctl && (ctl->type == TUN_MSG_PTR)) {
>
>      int n = ctl->num;  // can be set to values in [0..65535]
>
>      for (i = 0; i < n; i++) {
>
>          xdp = &((struct xdp_buff *)ctl->ptr)[i];
>
>
> I really do not understand how we prevent malicious user space from
> crashing the kernel.

It looks to me the only user for this is vhost-net which limits it to
64, userspace can't use sendmsg() directly on tap.

Thanks

>
>
>
> >               }
> >
> >               if (flush)
> >                       xdp_do_flush();
> >
> > +             if (tfile->napi_enabled && queued > 0)
> > +                     napi_schedule(&tfile->napi);
> > +
> >               rcu_read_unlock();
> >               local_bh_enable();
> >
>


  reply	other threads:[~2022-02-28  4:20 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-24 10:38 [PATCH] tun: support NAPI to accelerate packet processing Harold Huang
2022-02-24 17:22 ` Paolo Abeni
2022-02-25  3:36   ` Harold Huang
2022-02-25  3:46 ` Jason Wang
2022-02-25  9:02 ` [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs Harold Huang
2022-02-28  2:15   ` Jason Wang
2022-02-28  4:06   ` Eric Dumazet
2022-02-28  4:20     ` Jason Wang [this message]
     [not found]       ` <CANn89iKLhhwGnmEyfZuEKjtt7OwTbVyDYcFUMDYoRpdXjbMwiA@mail.gmail.com>
2022-02-28  5:17         ` Jason Wang
2022-02-28  7:26           ` Harold Huang
2022-02-28  7:56             ` Jason Wang
2022-02-28  3:38 ` [PATCH net-next v3] " Harold Huang
2022-02-28  7:46   ` Jason Wang
2022-02-28 17:15     ` Stephen Hemminger
2022-03-01  1:47       ` Jason Wang
2022-03-01  1:58       ` Harold Huang
2022-03-02  1:40   ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACGkMEtTdvbc1rk6sk=KE7J2L0=R2M-FMxK+DfJDUYMTPbPJGA@mail.gmail.com' \
    --to=jasowang@redhat.com \
    --cc=ast@kernel.org \
    --cc=baymaxhuang@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.