All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] tun: support NAPI to accelerate packet processing
@ 2022-02-24 10:38 Harold Huang
  2022-02-24 17:22 ` Paolo Abeni
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Harold Huang @ 2022-02-24 10:38 UTC (permalink / raw)
  To: netdev; +Cc: jasowang, Harold Huang, David S. Miller, Jakub Kicinski, open list

In tun, NAPI is supported and we can also use NAPI in the path of
batched XDP buffs to accelerate packet processing. What is more, after
we use NPAI, GRO is also supported. The iperf shows that the throughput
could be improved from 4.5Gbsp to 9.2Gbps per stream.

Reported-at: https://lore.kernel.org/netdev/CAHJXk3Y9_Fh04sakMMbcAkef7kOTEc-kf84Ne3DtWD7EAp13cg@mail.gmail.com/T/#t
Signed-off-by: Harold Huang <baymaxhuang@gmail.com>
---
 drivers/net/tun.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index fed85447701a..4e1cea659b42 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2388,6 +2388,7 @@ static int tun_xdp_one(struct tun_struct *tun,
 	struct virtio_net_hdr *gso = &hdr->gso;
 	struct bpf_prog *xdp_prog;
 	struct sk_buff *skb = NULL;
+	struct sk_buff_head *queue;
 	u32 rxhash = 0, act;
 	int buflen = hdr->buflen;
 	int err = 0;
@@ -2464,7 +2465,14 @@ static int tun_xdp_one(struct tun_struct *tun,
 	    !tfile->detached)
 		rxhash = __skb_get_hash_symmetric(skb);
 
-	netif_receive_skb(skb);
+	if (tfile->napi_enabled) {
+		queue = &tfile->sk.sk_write_queue;
+		spin_lock(&queue->lock);
+		__skb_queue_tail(queue, skb);
+		spin_unlock(&queue->lock);
+	} else {
+		netif_receive_skb(skb);
+	}
 
 	/* No need to disable preemption here since this function is
 	 * always called with bh disabled
@@ -2507,6 +2515,9 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 		if (flush)
 			xdp_do_flush();
 
+		if (tfile->napi_enabled)
+			napi_schedule(&tfile->napi);
+
 		rcu_read_unlock();
 		local_bh_enable();
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] tun: support NAPI to accelerate packet processing
  2022-02-24 10:38 [PATCH] tun: support NAPI to accelerate packet processing Harold Huang
@ 2022-02-24 17:22 ` Paolo Abeni
  2022-02-25  3:36   ` Harold Huang
  2022-02-25  3:46 ` Jason Wang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Paolo Abeni @ 2022-02-24 17:22 UTC (permalink / raw)
  To: Harold Huang, netdev; +Cc: jasowang, David S. Miller, Jakub Kicinski, open list

Hello,

On Thu, 2022-02-24 at 18:38 +0800, Harold Huang wrote:
> In tun, NAPI is supported and we can also use NAPI in the path of
> batched XDP buffs to accelerate packet processing. What is more, after
> we use NPAI, GRO is also supported. The iperf shows that the throughput

Very minor nit: typo above NPAI -> NAPI

> could be improved from 4.5Gbsp to 9.2Gbps per stream.
> 
> Reported-at: https://lore.kernel.org/netdev/CAHJXk3Y9_Fh04sakMMbcAkef7kOTEc-kf84Ne3DtWD7EAp13cg@mail.gmail.com/T/#t
> Signed-off-by: Harold Huang <baymaxhuang@gmail.com>

Additionally, please specify explicitly the target tree into the patch
subject.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] tun: support NAPI to accelerate packet processing
  2022-02-24 17:22 ` Paolo Abeni
@ 2022-02-25  3:36   ` Harold Huang
  0 siblings, 0 replies; 17+ messages in thread
From: Harold Huang @ 2022-02-25  3:36 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Jason Wang, David S. Miller, Jakub Kicinski, open list

Paolo Abeni <pabeni@redhat.com> 于2022年2月25日周五 01:22写道:
>
> Hello,
>
> On Thu, 2022-02-24 at 18:38 +0800, Harold Huang wrote:
> > In tun, NAPI is supported and we can also use NAPI in the path of
> > batched XDP buffs to accelerate packet processing. What is more, after
> > we use NPAI, GRO is also supported. The iperf shows that the throughput
>
> Very minor nit: typo above NPAI -> NAPI

Fix it in the next version.

>
> > could be improved from 4.5Gbsp to 9.2Gbps per stream.
> >
> > Reported-at: https://lore.kernel.org/netdev/CAHJXk3Y9_Fh04sakMMbcAkef7kOTEc-kf84Ne3DtWD7EAp13cg@mail.gmail.com/T/#t
> > Signed-off-by: Harold Huang <baymaxhuang@gmail.com>
>
> Additionally, please specify explicitly the target tree into the patch
> subject.

Fix it in the next version.

>
> Cheers,
>
> Paolo
>

Thanks,

Harold

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] tun: support NAPI to accelerate packet processing
  2022-02-24 10:38 [PATCH] tun: support NAPI to accelerate packet processing Harold Huang
  2022-02-24 17:22 ` Paolo Abeni
@ 2022-02-25  3:46 ` Jason Wang
  2022-02-25  9:02 ` [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs Harold Huang
  2022-02-28  3:38 ` [PATCH net-next v3] " Harold Huang
  3 siblings, 0 replies; 17+ messages in thread
From: Jason Wang @ 2022-02-25  3:46 UTC (permalink / raw)
  To: Harold Huang; +Cc: netdev, David S. Miller, Jakub Kicinski, open list

On Thu, Feb 24, 2022 at 6:39 PM Harold Huang <baymaxhuang@gmail.com> wrote:
>
> In tun, NAPI is supported and we can also use NAPI in the path of
> batched XDP buffs to accelerate packet processing. What is more, after
> we use NPAI, GRO is also supported. The iperf shows that the throughput
> could be improved from 4.5Gbsp to 9.2Gbps per stream.

It's better to describe the setup in the testing.

And we need to tweak the title as NAPI is supported in some paths,
something like "support NAPI for packets received from msg_control"?

>
> Reported-at: https://lore.kernel.org/netdev/CAHJXk3Y9_Fh04sakMMbcAkef7kOTEc-kf84Ne3DtWD7EAp13cg@mail.gmail.com/T/#t
> Signed-off-by: Harold Huang <baymaxhuang@gmail.com>
> ---
>  drivers/net/tun.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index fed85447701a..4e1cea659b42 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -2388,6 +2388,7 @@ static int tun_xdp_one(struct tun_struct *tun,
>         struct virtio_net_hdr *gso = &hdr->gso;
>         struct bpf_prog *xdp_prog;
>         struct sk_buff *skb = NULL;
> +       struct sk_buff_head *queue;
>         u32 rxhash = 0, act;
>         int buflen = hdr->buflen;
>         int err = 0;
> @@ -2464,7 +2465,14 @@ static int tun_xdp_one(struct tun_struct *tun,
>             !tfile->detached)
>                 rxhash = __skb_get_hash_symmetric(skb);
>
> -       netif_receive_skb(skb);
> +       if (tfile->napi_enabled) {
> +               queue = &tfile->sk.sk_write_queue;
> +               spin_lock(&queue->lock);
> +               __skb_queue_tail(queue, skb);
> +               spin_unlock(&queue->lock);
> +       } else {
> +               netif_receive_skb(skb);
> +       }
>
>         /* No need to disable preemption here since this function is
>          * always called with bh disabled
> @@ -2507,6 +2515,9 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>                 if (flush)
>                         xdp_do_flush();
>
> +               if (tfile->napi_enabled)
> +                       napi_schedule(&tfile->napi);

It's better to check whether we've queued anything to avoid unnecessary napi.

Thanks

> +
>                 rcu_read_unlock();
>                 local_bh_enable();
>
> --
> 2.27.0
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs
  2022-02-24 10:38 [PATCH] tun: support NAPI to accelerate packet processing Harold Huang
  2022-02-24 17:22 ` Paolo Abeni
  2022-02-25  3:46 ` Jason Wang
@ 2022-02-25  9:02 ` Harold Huang
  2022-02-28  2:15   ` Jason Wang
  2022-02-28  4:06   ` Eric Dumazet
  2022-02-28  3:38 ` [PATCH net-next v3] " Harold Huang
  3 siblings, 2 replies; 17+ messages in thread
From: Harold Huang @ 2022-02-25  9:02 UTC (permalink / raw)
  To: netdev
  Cc: jasowang, pabeni, Harold Huang, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, open list, open list:XDP (eXpress Data Path)

In tun, NAPI is supported and we can also use NAPI in the path of
batched XDP buffs to accelerate packet processing. What is more, after
we use NAPI, GRO is also supported. The iperf shows that the throughput of
single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
Gbps nearly reachs the line speed of the phy nic and there is still about
15% idle cpu core remaining on the vhost thread.

Test topology:

[iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]

Iperf stream:

Before:
...
[  5]   5.00-6.00   sec   558 MBytes  4.68 Gbits/sec    0   1.50 MBytes
[  5]   6.00-7.00   sec   556 MBytes  4.67 Gbits/sec    1   1.35 MBytes
[  5]   7.00-8.00   sec   556 MBytes  4.67 Gbits/sec    2   1.18 MBytes
[  5]   8.00-9.00   sec   559 MBytes  4.69 Gbits/sec    0   1.48 MBytes
[  5]   9.00-10.00  sec   556 MBytes  4.67 Gbits/sec    1   1.33 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  5.39 GBytes  4.63 Gbits/sec   72          sender
[  5]   0.00-10.04  sec  5.39 GBytes  4.61 Gbits/sec               receiver

After:
...
[  5]   5.00-6.00   sec  1.07 GBytes  9.19 Gbits/sec    0   1.55 MBytes
[  5]   6.00-7.00   sec  1.08 GBytes  9.30 Gbits/sec    0   1.63 MBytes
[  5]   7.00-8.00   sec  1.08 GBytes  9.25 Gbits/sec    0   1.72 MBytes
[  5]   8.00-9.00   sec  1.08 GBytes  9.25 Gbits/sec   77   1.31 MBytes
[  5]   9.00-10.00  sec  1.08 GBytes  9.24 Gbits/sec    0   1.48 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  10.8 GBytes  9.28 Gbits/sec  166          sender
[  5]   0.00-10.04  sec  10.8 GBytes  9.24 Gbits/sec               receiver
....

Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
Signed-off-by: Harold Huang <baymaxhuang@gmail.com>
---
v1 -> v2
 - fix commit messages
 - add queued flag to avoid void unnecessary napi suggested by Jason

 drivers/net/tun.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index fed85447701a..c7d8b7c821d8 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2379,7 +2379,7 @@ static void tun_put_page(struct tun_page *tpage)
 }
 
 static int tun_xdp_one(struct tun_struct *tun,
-		       struct tun_file *tfile,
+		       struct tun_file *tfile, int *queued,
 		       struct xdp_buff *xdp, int *flush,
 		       struct tun_page *tpage)
 {
@@ -2388,6 +2388,7 @@ static int tun_xdp_one(struct tun_struct *tun,
 	struct virtio_net_hdr *gso = &hdr->gso;
 	struct bpf_prog *xdp_prog;
 	struct sk_buff *skb = NULL;
+	struct sk_buff_head *queue;
 	u32 rxhash = 0, act;
 	int buflen = hdr->buflen;
 	int err = 0;
@@ -2464,7 +2465,15 @@ static int tun_xdp_one(struct tun_struct *tun,
 	    !tfile->detached)
 		rxhash = __skb_get_hash_symmetric(skb);
 
-	netif_receive_skb(skb);
+	if (tfile->napi_enabled) {
+		queue = &tfile->sk.sk_write_queue;
+		spin_lock(&queue->lock);
+		__skb_queue_tail(queue, skb);
+		spin_unlock(&queue->lock);
+		(*queued)++;
+	} else {
+		netif_receive_skb(skb);
+	}
 
 	/* No need to disable preemption here since this function is
 	 * always called with bh disabled
@@ -2492,7 +2501,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 	if (ctl && (ctl->type == TUN_MSG_PTR)) {
 		struct tun_page tpage;
 		int n = ctl->num;
-		int flush = 0;
+		int flush = 0, queued = 0;
 
 		memset(&tpage, 0, sizeof(tpage));
 
@@ -2501,12 +2510,15 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 
 		for (i = 0; i < n; i++) {
 			xdp = &((struct xdp_buff *)ctl->ptr)[i];
-			tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
+			tun_xdp_one(tun, tfile, &queued, xdp, &flush, &tpage);
 		}
 
 		if (flush)
 			xdp_do_flush();
 
+		if (tfile->napi_enabled && queued > 0)
+			napi_schedule(&tfile->napi);
+
 		rcu_read_unlock();
 		local_bh_enable();
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs
  2022-02-25  9:02 ` [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs Harold Huang
@ 2022-02-28  2:15   ` Jason Wang
  2022-02-28  4:06   ` Eric Dumazet
  1 sibling, 0 replies; 17+ messages in thread
From: Jason Wang @ 2022-02-28  2:15 UTC (permalink / raw)
  To: Harold Huang
  Cc: netdev, Paolo Abeni, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, open list, open list:XDP (eXpress Data Path)

On Fri, Feb 25, 2022 at 5:03 PM Harold Huang <baymaxhuang@gmail.com> wrote:
>
> In tun, NAPI is supported and we can also use NAPI in the path of
> batched XDP buffs to accelerate packet processing. What is more, after
> we use NAPI, GRO is also supported. The iperf shows that the throughput of
> single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
> Gbps nearly reachs the line speed of the phy nic and there is still about
> 15% idle cpu core remaining on the vhost thread.
>
> Test topology:
>
> [iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]
>
> Iperf stream:
>
> Before:
> ...
> [  5]   5.00-6.00   sec   558 MBytes  4.68 Gbits/sec    0   1.50 MBytes
> [  5]   6.00-7.00   sec   556 MBytes  4.67 Gbits/sec    1   1.35 MBytes
> [  5]   7.00-8.00   sec   556 MBytes  4.67 Gbits/sec    2   1.18 MBytes
> [  5]   8.00-9.00   sec   559 MBytes  4.69 Gbits/sec    0   1.48 MBytes
> [  5]   9.00-10.00  sec   556 MBytes  4.67 Gbits/sec    1   1.33 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec  5.39 GBytes  4.63 Gbits/sec   72          sender
> [  5]   0.00-10.04  sec  5.39 GBytes  4.61 Gbits/sec               receiver
>
> After:
> ...
> [  5]   5.00-6.00   sec  1.07 GBytes  9.19 Gbits/sec    0   1.55 MBytes
> [  5]   6.00-7.00   sec  1.08 GBytes  9.30 Gbits/sec    0   1.63 MBytes
> [  5]   7.00-8.00   sec  1.08 GBytes  9.25 Gbits/sec    0   1.72 MBytes
> [  5]   8.00-9.00   sec  1.08 GBytes  9.25 Gbits/sec   77   1.31 MBytes
> [  5]   9.00-10.00  sec  1.08 GBytes  9.24 Gbits/sec    0   1.48 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec  10.8 GBytes  9.28 Gbits/sec  166          sender
> [  5]   0.00-10.04  sec  10.8 GBytes  9.24 Gbits/sec               receiver
> ....
>
> Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
> Signed-off-by: Harold Huang <baymaxhuang@gmail.com>
> ---
> v1 -> v2
>  - fix commit messages
>  - add queued flag to avoid void unnecessary napi suggested by Jason
>
>  drivers/net/tun.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index fed85447701a..c7d8b7c821d8 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -2379,7 +2379,7 @@ static void tun_put_page(struct tun_page *tpage)
>  }
>
>  static int tun_xdp_one(struct tun_struct *tun,
> -                      struct tun_file *tfile,
> +                      struct tun_file *tfile, int *queued,
>                        struct xdp_buff *xdp, int *flush,
>                        struct tun_page *tpage)

Nit: how about simply returning the number of packets queued here?

Thanks

>  {
> @@ -2388,6 +2388,7 @@ static int tun_xdp_one(struct tun_struct *tun,
>         struct virtio_net_hdr *gso = &hdr->gso;
>         struct bpf_prog *xdp_prog;
>         struct sk_buff *skb = NULL;
> +       struct sk_buff_head *queue;
>         u32 rxhash = 0, act;
>         int buflen = hdr->buflen;
>         int err = 0;
> @@ -2464,7 +2465,15 @@ static int tun_xdp_one(struct tun_struct *tun,
>             !tfile->detached)
>                 rxhash = __skb_get_hash_symmetric(skb);
>
> -       netif_receive_skb(skb);
> +       if (tfile->napi_enabled) {
> +               queue = &tfile->sk.sk_write_queue;
> +               spin_lock(&queue->lock);
> +               __skb_queue_tail(queue, skb);
> +               spin_unlock(&queue->lock);
> +               (*queued)++;
> +       } else {
> +               netif_receive_skb(skb);
> +       }
>
>         /* No need to disable preemption here since this function is
>          * always called with bh disabled
> @@ -2492,7 +2501,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>         if (ctl && (ctl->type == TUN_MSG_PTR)) {
>                 struct tun_page tpage;
>                 int n = ctl->num;
> -               int flush = 0;
> +               int flush = 0, queued = 0;
>
>                 memset(&tpage, 0, sizeof(tpage));
>
> @@ -2501,12 +2510,15 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>
>                 for (i = 0; i < n; i++) {
>                         xdp = &((struct xdp_buff *)ctl->ptr)[i];
> -                       tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
> +                       tun_xdp_one(tun, tfile, &queued, xdp, &flush, &tpage);
>                 }
>
>                 if (flush)
>                         xdp_do_flush();
>
> +               if (tfile->napi_enabled && queued > 0)
> +                       napi_schedule(&tfile->napi);
> +
>                 rcu_read_unlock();
>                 local_bh_enable();
>
> --
> 2.27.0
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH net-next v3] tun: support NAPI for packets received from batched XDP buffs
  2022-02-24 10:38 [PATCH] tun: support NAPI to accelerate packet processing Harold Huang
                   ` (2 preceding siblings ...)
  2022-02-25  9:02 ` [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs Harold Huang
@ 2022-02-28  3:38 ` Harold Huang
  2022-02-28  7:46   ` Jason Wang
  2022-03-02  1:40   ` patchwork-bot+netdevbpf
  3 siblings, 2 replies; 17+ messages in thread
From: Harold Huang @ 2022-02-28  3:38 UTC (permalink / raw)
  To: netdev
  Cc: jasowang, pabeni, Harold Huang, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, open list, open list:XDP (eXpress Data Path)

In tun, NAPI is supported and we can also use NAPI in the path of
batched XDP buffs to accelerate packet processing. What is more, after
we use NAPI, GRO is also supported. The iperf shows that the throughput of
single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
Gbps nearly reachs the line speed of the phy nic and there is still about
15% idle cpu core remaining on the vhost thread.

Test topology:
[iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]

Iperf stream:
iperf3 -c 10.0.0.2  -i 1 -t 10

Before:
...
[  5]   5.00-6.00   sec   558 MBytes  4.68 Gbits/sec    0   1.50 MBytes
[  5]   6.00-7.00   sec   556 MBytes  4.67 Gbits/sec    1   1.35 MBytes
[  5]   7.00-8.00   sec   556 MBytes  4.67 Gbits/sec    2   1.18 MBytes
[  5]   8.00-9.00   sec   559 MBytes  4.69 Gbits/sec    0   1.48 MBytes
[  5]   9.00-10.00  sec   556 MBytes  4.67 Gbits/sec    1   1.33 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  5.39 GBytes  4.63 Gbits/sec   72          sender
[  5]   0.00-10.04  sec  5.39 GBytes  4.61 Gbits/sec               receiver

After:
...
[  5]   5.00-6.00   sec  1.07 GBytes  9.19 Gbits/sec    0   1.55 MBytes
[  5]   6.00-7.00   sec  1.08 GBytes  9.30 Gbits/sec    0   1.63 MBytes
[  5]   7.00-8.00   sec  1.08 GBytes  9.25 Gbits/sec    0   1.72 MBytes
[  5]   8.00-9.00   sec  1.08 GBytes  9.25 Gbits/sec   77   1.31 MBytes
[  5]   9.00-10.00  sec  1.08 GBytes  9.24 Gbits/sec    0   1.48 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  10.8 GBytes  9.28 Gbits/sec  166          sender
[  5]   0.00-10.04  sec  10.8 GBytes  9.24 Gbits/sec               receiver

Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
Signed-off-by: Harold Huang <baymaxhuang@gmail.com>
---
v2 -> v3
 - return the queued NAPI packet from tun_xdp_one

 drivers/net/tun.c | 43 ++++++++++++++++++++++++++++++-------------
 1 file changed, 30 insertions(+), 13 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index fed85447701a..969ea69fd29d 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2388,9 +2388,10 @@ static int tun_xdp_one(struct tun_struct *tun,
 	struct virtio_net_hdr *gso = &hdr->gso;
 	struct bpf_prog *xdp_prog;
 	struct sk_buff *skb = NULL;
+	struct sk_buff_head *queue;
 	u32 rxhash = 0, act;
 	int buflen = hdr->buflen;
-	int err = 0;
+	int ret = 0;
 	bool skb_xdp = false;
 	struct page *page;
 
@@ -2405,13 +2406,13 @@ static int tun_xdp_one(struct tun_struct *tun,
 		xdp_set_data_meta_invalid(xdp);
 
 		act = bpf_prog_run_xdp(xdp_prog, xdp);
-		err = tun_xdp_act(tun, xdp_prog, xdp, act);
-		if (err < 0) {
+		ret = tun_xdp_act(tun, xdp_prog, xdp, act);
+		if (ret < 0) {
 			put_page(virt_to_head_page(xdp->data));
-			return err;
+			return ret;
 		}
 
-		switch (err) {
+		switch (ret) {
 		case XDP_REDIRECT:
 			*flush = true;
 			fallthrough;
@@ -2435,7 +2436,7 @@ static int tun_xdp_one(struct tun_struct *tun,
 build:
 	skb = build_skb(xdp->data_hard_start, buflen);
 	if (!skb) {
-		err = -ENOMEM;
+		ret = -ENOMEM;
 		goto out;
 	}
 
@@ -2445,7 +2446,7 @@ static int tun_xdp_one(struct tun_struct *tun,
 	if (virtio_net_hdr_to_skb(skb, gso, tun_is_little_endian(tun))) {
 		atomic_long_inc(&tun->rx_frame_errors);
 		kfree_skb(skb);
-		err = -EINVAL;
+		ret = -EINVAL;
 		goto out;
 	}
 
@@ -2455,16 +2456,27 @@ static int tun_xdp_one(struct tun_struct *tun,
 	skb_record_rx_queue(skb, tfile->queue_index);
 
 	if (skb_xdp) {
-		err = do_xdp_generic(xdp_prog, skb);
-		if (err != XDP_PASS)
+		ret = do_xdp_generic(xdp_prog, skb);
+		if (ret != XDP_PASS) {
+			ret = 0;
 			goto out;
+		}
 	}
 
 	if (!rcu_dereference(tun->steering_prog) && tun->numqueues > 1 &&
 	    !tfile->detached)
 		rxhash = __skb_get_hash_symmetric(skb);
 
-	netif_receive_skb(skb);
+	if (tfile->napi_enabled) {
+		queue = &tfile->sk.sk_write_queue;
+		spin_lock(&queue->lock);
+		__skb_queue_tail(queue, skb);
+		spin_unlock(&queue->lock);
+		ret = 1;
+	} else {
+		netif_receive_skb(skb);
+		ret = 0;
+	}
 
 	/* No need to disable preemption here since this function is
 	 * always called with bh disabled
@@ -2475,7 +2487,7 @@ static int tun_xdp_one(struct tun_struct *tun,
 		tun_flow_update(tun, rxhash, tfile);
 
 out:
-	return err;
+	return ret;
 }
 
 static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
@@ -2492,7 +2504,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 	if (ctl && (ctl->type == TUN_MSG_PTR)) {
 		struct tun_page tpage;
 		int n = ctl->num;
-		int flush = 0;
+		int flush = 0, queued = 0;
 
 		memset(&tpage, 0, sizeof(tpage));
 
@@ -2501,12 +2513,17 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 
 		for (i = 0; i < n; i++) {
 			xdp = &((struct xdp_buff *)ctl->ptr)[i];
-			tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
+			ret = tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
+			if (ret > 0)
+				queued += ret;
 		}
 
 		if (flush)
 			xdp_do_flush();
 
+		if (tfile->napi_enabled && queued > 0)
+			napi_schedule(&tfile->napi);
+
 		rcu_read_unlock();
 		local_bh_enable();
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs
  2022-02-25  9:02 ` [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs Harold Huang
  2022-02-28  2:15   ` Jason Wang
@ 2022-02-28  4:06   ` Eric Dumazet
  2022-02-28  4:20     ` Jason Wang
  1 sibling, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2022-02-28  4:06 UTC (permalink / raw)
  To: Harold Huang, netdev
  Cc: jasowang, pabeni, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, open list, open list:XDP (eXpress Data Path),
	edumazet


On 2/25/22 01:02, Harold Huang wrote:
> In tun, NAPI is supported and we can also use NAPI in the path of
> batched XDP buffs to accelerate packet processing. What is more, after
> we use NAPI, GRO is also supported. The iperf shows that the throughput of
> single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
> Gbps nearly reachs the line speed of the phy nic and there is still about
> 15% idle cpu core remaining on the vhost thread.
>
> Test topology:
>
> [iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]
>
> Iperf stream:
>
> Before:
> ...
> [  5]   5.00-6.00   sec   558 MBytes  4.68 Gbits/sec    0   1.50 MBytes
> [  5]   6.00-7.00   sec   556 MBytes  4.67 Gbits/sec    1   1.35 MBytes
> [  5]   7.00-8.00   sec   556 MBytes  4.67 Gbits/sec    2   1.18 MBytes
> [  5]   8.00-9.00   sec   559 MBytes  4.69 Gbits/sec    0   1.48 MBytes
> [  5]   9.00-10.00  sec   556 MBytes  4.67 Gbits/sec    1   1.33 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec  5.39 GBytes  4.63 Gbits/sec   72          sender
> [  5]   0.00-10.04  sec  5.39 GBytes  4.61 Gbits/sec               receiver
>
> After:
> ...
> [  5]   5.00-6.00   sec  1.07 GBytes  9.19 Gbits/sec    0   1.55 MBytes
> [  5]   6.00-7.00   sec  1.08 GBytes  9.30 Gbits/sec    0   1.63 MBytes
> [  5]   7.00-8.00   sec  1.08 GBytes  9.25 Gbits/sec    0   1.72 MBytes
> [  5]   8.00-9.00   sec  1.08 GBytes  9.25 Gbits/sec   77   1.31 MBytes
> [  5]   9.00-10.00  sec  1.08 GBytes  9.24 Gbits/sec    0   1.48 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec  10.8 GBytes  9.28 Gbits/sec  166          sender
> [  5]   0.00-10.04  sec  10.8 GBytes  9.24 Gbits/sec               receiver
> ....
>
> Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
> Signed-off-by: Harold Huang <baymaxhuang@gmail.com>
> ---
> v1 -> v2
>   - fix commit messages
>   - add queued flag to avoid void unnecessary napi suggested by Jason
>
>   drivers/net/tun.c | 20 ++++++++++++++++----
>   1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index fed85447701a..c7d8b7c821d8 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -2379,7 +2379,7 @@ static void tun_put_page(struct tun_page *tpage)
>   }
>   
>   static int tun_xdp_one(struct tun_struct *tun,
> -		       struct tun_file *tfile,
> +		       struct tun_file *tfile, int *queued,
>   		       struct xdp_buff *xdp, int *flush,
>   		       struct tun_page *tpage)
>   {
> @@ -2388,6 +2388,7 @@ static int tun_xdp_one(struct tun_struct *tun,
>   	struct virtio_net_hdr *gso = &hdr->gso;
>   	struct bpf_prog *xdp_prog;
>   	struct sk_buff *skb = NULL;
> +	struct sk_buff_head *queue;
>   	u32 rxhash = 0, act;
>   	int buflen = hdr->buflen;
>   	int err = 0;
> @@ -2464,7 +2465,15 @@ static int tun_xdp_one(struct tun_struct *tun,
>   	    !tfile->detached)
>   		rxhash = __skb_get_hash_symmetric(skb);
>   
> -	netif_receive_skb(skb);
> +	if (tfile->napi_enabled) {
> +		queue = &tfile->sk.sk_write_queue;
> +		spin_lock(&queue->lock);
> +		__skb_queue_tail(queue, skb);
> +		spin_unlock(&queue->lock);
> +		(*queued)++;
> +	} else {
> +		netif_receive_skb(skb);
> +	}
>   
>   	/* No need to disable preemption here since this function is
>   	 * always called with bh disabled
> @@ -2492,7 +2501,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>   	if (ctl && (ctl->type == TUN_MSG_PTR)) {
>   		struct tun_page tpage;
>   		int n = ctl->num;
> -		int flush = 0;
> +		int flush = 0, queued = 0;
>   
>   		memset(&tpage, 0, sizeof(tpage));
>   
> @@ -2501,12 +2510,15 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>   
>   		for (i = 0; i < n; i++) {
>   			xdp = &((struct xdp_buff *)ctl->ptr)[i];
> -			tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
> +			tun_xdp_one(tun, tfile, &queued, xdp, &flush, &tpage);


How big n can be ?

BTW I could not find where m->msg_controllen was checked in tun_sendmsg().

struct tun_msg_ctl *ctl = m->msg_control;

if (ctl && (ctl->type == TUN_MSG_PTR)) {

     int n = ctl->num;  // can be set to values in [0..65535]

     for (i = 0; i < n; i++) {

         xdp = &((struct xdp_buff *)ctl->ptr)[i];


I really do not understand how we prevent malicious user space from 
crashing the kernel.



>   		}
>   
>   		if (flush)
>   			xdp_do_flush();
>   
> +		if (tfile->napi_enabled && queued > 0)
> +			napi_schedule(&tfile->napi);
> +
>   		rcu_read_unlock();
>   		local_bh_enable();
>   

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs
  2022-02-28  4:06   ` Eric Dumazet
@ 2022-02-28  4:20     ` Jason Wang
       [not found]       ` <CANn89iKLhhwGnmEyfZuEKjtt7OwTbVyDYcFUMDYoRpdXjbMwiA@mail.gmail.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Wang @ 2022-02-28  4:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Harold Huang, netdev, Paolo Abeni, David S. Miller,
	Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, open list,
	open list:XDP (eXpress Data Path),
	Eric Dumazet

On Mon, Feb 28, 2022 at 12:06 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> On 2/25/22 01:02, Harold Huang wrote:
> > In tun, NAPI is supported and we can also use NAPI in the path of
> > batched XDP buffs to accelerate packet processing. What is more, after
> > we use NAPI, GRO is also supported. The iperf shows that the throughput of
> > single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
> > Gbps nearly reachs the line speed of the phy nic and there is still about
> > 15% idle cpu core remaining on the vhost thread.
> >
> > Test topology:
> >
> > [iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]
> >
> > Iperf stream:
> >
> > Before:
> > ...
> > [  5]   5.00-6.00   sec   558 MBytes  4.68 Gbits/sec    0   1.50 MBytes
> > [  5]   6.00-7.00   sec   556 MBytes  4.67 Gbits/sec    1   1.35 MBytes
> > [  5]   7.00-8.00   sec   556 MBytes  4.67 Gbits/sec    2   1.18 MBytes
> > [  5]   8.00-9.00   sec   559 MBytes  4.69 Gbits/sec    0   1.48 MBytes
> > [  5]   9.00-10.00  sec   556 MBytes  4.67 Gbits/sec    1   1.33 MBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval           Transfer     Bitrate         Retr
> > [  5]   0.00-10.00  sec  5.39 GBytes  4.63 Gbits/sec   72          sender
> > [  5]   0.00-10.04  sec  5.39 GBytes  4.61 Gbits/sec               receiver
> >
> > After:
> > ...
> > [  5]   5.00-6.00   sec  1.07 GBytes  9.19 Gbits/sec    0   1.55 MBytes
> > [  5]   6.00-7.00   sec  1.08 GBytes  9.30 Gbits/sec    0   1.63 MBytes
> > [  5]   7.00-8.00   sec  1.08 GBytes  9.25 Gbits/sec    0   1.72 MBytes
> > [  5]   8.00-9.00   sec  1.08 GBytes  9.25 Gbits/sec   77   1.31 MBytes
> > [  5]   9.00-10.00  sec  1.08 GBytes  9.24 Gbits/sec    0   1.48 MBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval           Transfer     Bitrate         Retr
> > [  5]   0.00-10.00  sec  10.8 GBytes  9.28 Gbits/sec  166          sender
> > [  5]   0.00-10.04  sec  10.8 GBytes  9.24 Gbits/sec               receiver
> > ....
> >
> > Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
> > Signed-off-by: Harold Huang <baymaxhuang@gmail.com>
> > ---
> > v1 -> v2
> >   - fix commit messages
> >   - add queued flag to avoid void unnecessary napi suggested by Jason
> >
> >   drivers/net/tun.c | 20 ++++++++++++++++----
> >   1 file changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index fed85447701a..c7d8b7c821d8 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -2379,7 +2379,7 @@ static void tun_put_page(struct tun_page *tpage)
> >   }
> >
> >   static int tun_xdp_one(struct tun_struct *tun,
> > -                    struct tun_file *tfile,
> > +                    struct tun_file *tfile, int *queued,
> >                      struct xdp_buff *xdp, int *flush,
> >                      struct tun_page *tpage)
> >   {
> > @@ -2388,6 +2388,7 @@ static int tun_xdp_one(struct tun_struct *tun,
> >       struct virtio_net_hdr *gso = &hdr->gso;
> >       struct bpf_prog *xdp_prog;
> >       struct sk_buff *skb = NULL;
> > +     struct sk_buff_head *queue;
> >       u32 rxhash = 0, act;
> >       int buflen = hdr->buflen;
> >       int err = 0;
> > @@ -2464,7 +2465,15 @@ static int tun_xdp_one(struct tun_struct *tun,
> >           !tfile->detached)
> >               rxhash = __skb_get_hash_symmetric(skb);
> >
> > -     netif_receive_skb(skb);
> > +     if (tfile->napi_enabled) {
> > +             queue = &tfile->sk.sk_write_queue;
> > +             spin_lock(&queue->lock);
> > +             __skb_queue_tail(queue, skb);
> > +             spin_unlock(&queue->lock);
> > +             (*queued)++;
> > +     } else {
> > +             netif_receive_skb(skb);
> > +     }
> >
> >       /* No need to disable preemption here since this function is
> >        * always called with bh disabled
> > @@ -2492,7 +2501,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> >       if (ctl && (ctl->type == TUN_MSG_PTR)) {
> >               struct tun_page tpage;
> >               int n = ctl->num;
> > -             int flush = 0;
> > +             int flush = 0, queued = 0;
> >
> >               memset(&tpage, 0, sizeof(tpage));
> >
> > @@ -2501,12 +2510,15 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> >
> >               for (i = 0; i < n; i++) {
> >                       xdp = &((struct xdp_buff *)ctl->ptr)[i];
> > -                     tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
> > +                     tun_xdp_one(tun, tfile, &queued, xdp, &flush, &tpage);
>
>
> How big n can be ?
>
> BTW I could not find where m->msg_controllen was checked in tun_sendmsg().
>
> struct tun_msg_ctl *ctl = m->msg_control;
>
> if (ctl && (ctl->type == TUN_MSG_PTR)) {
>
>      int n = ctl->num;  // can be set to values in [0..65535]
>
>      for (i = 0; i < n; i++) {
>
>          xdp = &((struct xdp_buff *)ctl->ptr)[i];
>
>
> I really do not understand how we prevent malicious user space from
> crashing the kernel.

It looks to me the only user for this is vhost-net which limits it to
64, userspace can't use sendmsg() directly on tap.

Thanks

>
>
>
> >               }
> >
> >               if (flush)
> >                       xdp_do_flush();
> >
> > +             if (tfile->napi_enabled && queued > 0)
> > +                     napi_schedule(&tfile->napi);
> > +
> >               rcu_read_unlock();
> >               local_bh_enable();
> >
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs
       [not found]       ` <CANn89iKLhhwGnmEyfZuEKjtt7OwTbVyDYcFUMDYoRpdXjbMwiA@mail.gmail.com>
@ 2022-02-28  5:17         ` Jason Wang
  2022-02-28  7:26           ` Harold Huang
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Wang @ 2022-02-28  5:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, Harold Huang, netdev, Paolo Abeni, David S. Miller,
	Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, open list,
	open list:XDP (eXpress Data Path)

On Mon, Feb 28, 2022 at 12:59 PM Eric Dumazet <edumazet@google.com> wrote:
>
>
>
> On Sun, Feb 27, 2022 at 8:20 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> On Mon, Feb 28, 2022 at 12:06 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>> > How big n can be ?
>> >
>> > BTW I could not find where m->msg_controllen was checked in tun_sendmsg().
>> >
>> > struct tun_msg_ctl *ctl = m->msg_control;
>> >
>> > if (ctl && (ctl->type == TUN_MSG_PTR)) {
>> >
>> >      int n = ctl->num;  // can be set to values in [0..65535]
>> >
>> >      for (i = 0; i < n; i++) {
>> >
>> >          xdp = &((struct xdp_buff *)ctl->ptr)[i];
>> >
>> >
>> > I really do not understand how we prevent malicious user space from
>> > crashing the kernel.
>>
>> It looks to me the only user for this is vhost-net which limits it to
>> 64, userspace can't use sendmsg() directly on tap.
>>
>
> Ah right, thanks for the clarification.
>
> (IMO, either remove the "msg.msg_controllen = sizeof(ctl);" from handle_tx_zerocopy(), or add sanity checks in tun_sendmsg())
>
>

Right, Harold, want to do that?

Thanks


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs
  2022-02-28  5:17         ` Jason Wang
@ 2022-02-28  7:26           ` Harold Huang
  2022-02-28  7:56             ` Jason Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Harold Huang @ 2022-02-28  7:26 UTC (permalink / raw)
  To: Jason Wang
  Cc: Eric Dumazet, Eric Dumazet, netdev, Paolo Abeni, David S. Miller,
	Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, open list,
	open list:XDP (eXpress Data Path)

Thanks for the suggestions.

On Mon, Feb 28, 2022 at 1:17 PM Jason Wang <jasowang@redhat.com> wrote:
>
> On Mon, Feb 28, 2022 at 12:59 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> >
> >
> > On Sun, Feb 27, 2022 at 8:20 PM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> On Mon, Feb 28, 2022 at 12:06 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>
> >> > How big n can be ?
> >> >
> >> > BTW I could not find where m->msg_controllen was checked in tun_sendmsg().
> >> >
> >> > struct tun_msg_ctl *ctl = m->msg_control;
> >> >
> >> > if (ctl && (ctl->type == TUN_MSG_PTR)) {
> >> >
> >> >      int n = ctl->num;  // can be set to values in [0..65535]
> >> >
> >> >      for (i = 0; i < n; i++) {
> >> >
> >> >          xdp = &((struct xdp_buff *)ctl->ptr)[i];
> >> >
> >> >
> >> > I really do not understand how we prevent malicious user space from
> >> > crashing the kernel.
> >>
> >> It looks to me the only user for this is vhost-net which limits it to
> >> 64, userspace can't use sendmsg() directly on tap.
> >>
> >
> > Ah right, thanks for the clarification.
> >
> > (IMO, either remove the "msg.msg_controllen = sizeof(ctl);" from handle_tx_zerocopy(), or add sanity checks in tun_sendmsg())
> >
> >
>
> Right, Harold, want to do that?

I am greatly willing to do that. But  I am not quite sure about this.

If we remove the "msg.msg_controllen = sizeof(ctl);" from
handle_tx_zerocopy(), it seems msg.msg_controllen is always 0. What
does it stands for?

I see tap_sendmsg in drivers/net/tap.c also uses msg_controller to
send batched xdp buffers. Do we need to add similar sanity checks to
tap_sendmsg  as tun_sendmsg?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v3] tun: support NAPI for packets received from batched XDP buffs
  2022-02-28  3:38 ` [PATCH net-next v3] " Harold Huang
@ 2022-02-28  7:46   ` Jason Wang
  2022-02-28 17:15     ` Stephen Hemminger
  2022-03-02  1:40   ` patchwork-bot+netdevbpf
  1 sibling, 1 reply; 17+ messages in thread
From: Jason Wang @ 2022-02-28  7:46 UTC (permalink / raw)
  To: Harold Huang
  Cc: netdev, Paolo Abeni, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, open list, open list:XDP (eXpress Data Path)

On Mon, Feb 28, 2022 at 11:38 AM Harold Huang <baymaxhuang@gmail.com> wrote:
>
> In tun, NAPI is supported and we can also use NAPI in the path of
> batched XDP buffs to accelerate packet processing. What is more, after
> we use NAPI, GRO is also supported. The iperf shows that the throughput of
> single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
> Gbps nearly reachs the line speed of the phy nic and there is still about
> 15% idle cpu core remaining on the vhost thread.
>
> Test topology:
> [iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]
>
> Iperf stream:
> iperf3 -c 10.0.0.2  -i 1 -t 10
>
> Before:
> ...
> [  5]   5.00-6.00   sec   558 MBytes  4.68 Gbits/sec    0   1.50 MBytes
> [  5]   6.00-7.00   sec   556 MBytes  4.67 Gbits/sec    1   1.35 MBytes
> [  5]   7.00-8.00   sec   556 MBytes  4.67 Gbits/sec    2   1.18 MBytes
> [  5]   8.00-9.00   sec   559 MBytes  4.69 Gbits/sec    0   1.48 MBytes
> [  5]   9.00-10.00  sec   556 MBytes  4.67 Gbits/sec    1   1.33 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec  5.39 GBytes  4.63 Gbits/sec   72          sender
> [  5]   0.00-10.04  sec  5.39 GBytes  4.61 Gbits/sec               receiver
>
> After:
> ...
> [  5]   5.00-6.00   sec  1.07 GBytes  9.19 Gbits/sec    0   1.55 MBytes
> [  5]   6.00-7.00   sec  1.08 GBytes  9.30 Gbits/sec    0   1.63 MBytes
> [  5]   7.00-8.00   sec  1.08 GBytes  9.25 Gbits/sec    0   1.72 MBytes
> [  5]   8.00-9.00   sec  1.08 GBytes  9.25 Gbits/sec   77   1.31 MBytes
> [  5]   9.00-10.00  sec  1.08 GBytes  9.24 Gbits/sec    0   1.48 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec  10.8 GBytes  9.28 Gbits/sec  166          sender
> [  5]   0.00-10.04  sec  10.8 GBytes  9.24 Gbits/sec               receiver
>
> Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
> Signed-off-by: Harold Huang <baymaxhuang@gmail.com>

Acked-by: Jason Wang <jasowang@redhat.com>

> ---
> v2 -> v3
>  - return the queued NAPI packet from tun_xdp_one
>
>  drivers/net/tun.c | 43 ++++++++++++++++++++++++++++++-------------
>  1 file changed, 30 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index fed85447701a..969ea69fd29d 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -2388,9 +2388,10 @@ static int tun_xdp_one(struct tun_struct *tun,
>         struct virtio_net_hdr *gso = &hdr->gso;
>         struct bpf_prog *xdp_prog;
>         struct sk_buff *skb = NULL;
> +       struct sk_buff_head *queue;
>         u32 rxhash = 0, act;
>         int buflen = hdr->buflen;
> -       int err = 0;
> +       int ret = 0;
>         bool skb_xdp = false;
>         struct page *page;
>
> @@ -2405,13 +2406,13 @@ static int tun_xdp_one(struct tun_struct *tun,
>                 xdp_set_data_meta_invalid(xdp);
>
>                 act = bpf_prog_run_xdp(xdp_prog, xdp);
> -               err = tun_xdp_act(tun, xdp_prog, xdp, act);
> -               if (err < 0) {
> +               ret = tun_xdp_act(tun, xdp_prog, xdp, act);
> +               if (ret < 0) {
>                         put_page(virt_to_head_page(xdp->data));
> -                       return err;
> +                       return ret;
>                 }
>
> -               switch (err) {
> +               switch (ret) {
>                 case XDP_REDIRECT:
>                         *flush = true;
>                         fallthrough;
> @@ -2435,7 +2436,7 @@ static int tun_xdp_one(struct tun_struct *tun,
>  build:
>         skb = build_skb(xdp->data_hard_start, buflen);
>         if (!skb) {
> -               err = -ENOMEM;
> +               ret = -ENOMEM;
>                 goto out;
>         }
>
> @@ -2445,7 +2446,7 @@ static int tun_xdp_one(struct tun_struct *tun,
>         if (virtio_net_hdr_to_skb(skb, gso, tun_is_little_endian(tun))) {
>                 atomic_long_inc(&tun->rx_frame_errors);
>                 kfree_skb(skb);
> -               err = -EINVAL;
> +               ret = -EINVAL;
>                 goto out;
>         }
>
> @@ -2455,16 +2456,27 @@ static int tun_xdp_one(struct tun_struct *tun,
>         skb_record_rx_queue(skb, tfile->queue_index);
>
>         if (skb_xdp) {
> -               err = do_xdp_generic(xdp_prog, skb);
> -               if (err != XDP_PASS)
> +               ret = do_xdp_generic(xdp_prog, skb);
> +               if (ret != XDP_PASS) {
> +                       ret = 0;
>                         goto out;
> +               }
>         }
>
>         if (!rcu_dereference(tun->steering_prog) && tun->numqueues > 1 &&
>             !tfile->detached)
>                 rxhash = __skb_get_hash_symmetric(skb);
>
> -       netif_receive_skb(skb);
> +       if (tfile->napi_enabled) {
> +               queue = &tfile->sk.sk_write_queue;
> +               spin_lock(&queue->lock);
> +               __skb_queue_tail(queue, skb);
> +               spin_unlock(&queue->lock);
> +               ret = 1;
> +       } else {
> +               netif_receive_skb(skb);
> +               ret = 0;
> +       }
>
>         /* No need to disable preemption here since this function is
>          * always called with bh disabled
> @@ -2475,7 +2487,7 @@ static int tun_xdp_one(struct tun_struct *tun,
>                 tun_flow_update(tun, rxhash, tfile);
>
>  out:
> -       return err;
> +       return ret;
>  }
>
>  static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> @@ -2492,7 +2504,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>         if (ctl && (ctl->type == TUN_MSG_PTR)) {
>                 struct tun_page tpage;
>                 int n = ctl->num;
> -               int flush = 0;
> +               int flush = 0, queued = 0;
>
>                 memset(&tpage, 0, sizeof(tpage));
>
> @@ -2501,12 +2513,17 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>
>                 for (i = 0; i < n; i++) {
>                         xdp = &((struct xdp_buff *)ctl->ptr)[i];
> -                       tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
> +                       ret = tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
> +                       if (ret > 0)
> +                               queued += ret;
>                 }
>
>                 if (flush)
>                         xdp_do_flush();
>
> +               if (tfile->napi_enabled && queued > 0)
> +                       napi_schedule(&tfile->napi);
> +
>                 rcu_read_unlock();
>                 local_bh_enable();
>
> --
> 2.27.0
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs
  2022-02-28  7:26           ` Harold Huang
@ 2022-02-28  7:56             ` Jason Wang
  0 siblings, 0 replies; 17+ messages in thread
From: Jason Wang @ 2022-02-28  7:56 UTC (permalink / raw)
  To: Harold Huang
  Cc: Eric Dumazet, Eric Dumazet, netdev, Paolo Abeni, David S. Miller,
	Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, open list,
	open list:XDP (eXpress Data Path)

On Mon, Feb 28, 2022 at 3:27 PM Harold Huang <baymaxhuang@gmail.com> wrote:
>
> Thanks for the suggestions.
>
> On Mon, Feb 28, 2022 at 1:17 PM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Mon, Feb 28, 2022 at 12:59 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > >
> > >
> > > On Sun, Feb 27, 2022 at 8:20 PM Jason Wang <jasowang@redhat.com> wrote:
> > >>
> > >> On Mon, Feb 28, 2022 at 12:06 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > >>
> > >> > How big n can be ?
> > >> >
> > >> > BTW I could not find where m->msg_controllen was checked in tun_sendmsg().
> > >> >
> > >> > struct tun_msg_ctl *ctl = m->msg_control;
> > >> >
> > >> > if (ctl && (ctl->type == TUN_MSG_PTR)) {
> > >> >
> > >> >      int n = ctl->num;  // can be set to values in [0..65535]
> > >> >
> > >> >      for (i = 0; i < n; i++) {
> > >> >
> > >> >          xdp = &((struct xdp_buff *)ctl->ptr)[i];
> > >> >
> > >> >
> > >> > I really do not understand how we prevent malicious user space from
> > >> > crashing the kernel.
> > >>
> > >> It looks to me the only user for this is vhost-net which limits it to
> > >> 64, userspace can't use sendmsg() directly on tap.
> > >>
> > >
> > > Ah right, thanks for the clarification.
> > >
> > > (IMO, either remove the "msg.msg_controllen = sizeof(ctl);" from handle_tx_zerocopy(), or add sanity checks in tun_sendmsg())
> > >
> > >
> >
> > Right, Harold, want to do that?
>
> I am greatly willing to do that. But  I am not quite sure about this.
>
> If we remove the "msg.msg_controllen = sizeof(ctl);" from
> handle_tx_zerocopy(), it seems msg.msg_controllen is always 0. What
> does it stands for?

It means msg_controllen is not used. But see below (adding sanity
check seems to be better).

>
> I see tap_sendmsg in drivers/net/tap.c also uses msg_controller to
> send batched xdp buffers. Do we need to add similar sanity checks to
> tap_sendmsg  as tun_sendmsg?
>

I think the point is to make sure the caller doesn't send us too short
msg_control. E.g the msg_controllen should be sizeof(tun_msg_ctl).

So we probably need to check in both places. (And initialize
msg_controllen is vhost_tx_batch())

Thanks


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v3] tun: support NAPI for packets received from batched XDP buffs
  2022-02-28  7:46   ` Jason Wang
@ 2022-02-28 17:15     ` Stephen Hemminger
  2022-03-01  1:47       ` Jason Wang
  2022-03-01  1:58       ` Harold Huang
  0 siblings, 2 replies; 17+ messages in thread
From: Stephen Hemminger @ 2022-02-28 17:15 UTC (permalink / raw)
  To: Jason Wang
  Cc: Harold Huang, netdev, Paolo Abeni, David S. Miller,
	Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, open list,
	open list:XDP (eXpress Data Path)

On Mon, 28 Feb 2022 15:46:56 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On Mon, Feb 28, 2022 at 11:38 AM Harold Huang <baymaxhuang@gmail.com> wrote:
> >
> > In tun, NAPI is supported and we can also use NAPI in the path of
> > batched XDP buffs to accelerate packet processing. What is more, after
> > we use NAPI, GRO is also supported. The iperf shows that the throughput of
> > single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
> > Gbps nearly reachs the line speed of the phy nic and there is still about
> > 15% idle cpu core remaining on the vhost thread.
> >
> > Test topology:
> > [iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]
> >
> > Iperf stream:
> > iperf3 -c 10.0.0.2  -i 1 -t 10
> >
> > Before:
> > ...
> > [  5]   5.00-6.00   sec   558 MBytes  4.68 Gbits/sec    0   1.50 MBytes
> > [  5]   6.00-7.00   sec   556 MBytes  4.67 Gbits/sec    1   1.35 MBytes
> > [  5]   7.00-8.00   sec   556 MBytes  4.67 Gbits/sec    2   1.18 MBytes
> > [  5]   8.00-9.00   sec   559 MBytes  4.69 Gbits/sec    0   1.48 MBytes
> > [  5]   9.00-10.00  sec   556 MBytes  4.67 Gbits/sec    1   1.33 MBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval           Transfer     Bitrate         Retr
> > [  5]   0.00-10.00  sec  5.39 GBytes  4.63 Gbits/sec   72          sender
> > [  5]   0.00-10.04  sec  5.39 GBytes  4.61 Gbits/sec               receiver
> >
> > After:
> > ...
> > [  5]   5.00-6.00   sec  1.07 GBytes  9.19 Gbits/sec    0   1.55 MBytes
> > [  5]   6.00-7.00   sec  1.08 GBytes  9.30 Gbits/sec    0   1.63 MBytes
> > [  5]   7.00-8.00   sec  1.08 GBytes  9.25 Gbits/sec    0   1.72 MBytes
> > [  5]   8.00-9.00   sec  1.08 GBytes  9.25 Gbits/sec   77   1.31 MBytes
> > [  5]   9.00-10.00  sec  1.08 GBytes  9.24 Gbits/sec    0   1.48 MBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval           Transfer     Bitrate         Retr
> > [  5]   0.00-10.00  sec  10.8 GBytes  9.28 Gbits/sec  166          sender
> > [  5]   0.00-10.04  sec  10.8 GBytes  9.24 Gbits/sec               receiver
> >
> > Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
> > Signed-off-by: Harold Huang <baymaxhuang@gmail.com>  
> 
> Acked-by: Jason Wang <jasowang@redhat.com>

Would this help when using sendmmsg and recvmmsg on the TAP device?
Asking because interested in speeding up another use of TAP device, and wondering
if this would help.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v3] tun: support NAPI for packets received from batched XDP buffs
  2022-02-28 17:15     ` Stephen Hemminger
@ 2022-03-01  1:47       ` Jason Wang
  2022-03-01  1:58       ` Harold Huang
  1 sibling, 0 replies; 17+ messages in thread
From: Jason Wang @ 2022-03-01  1:47 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Harold Huang, netdev, Paolo Abeni, David S. Miller,
	Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, open list,
	open list:XDP (eXpress Data Path)

On Tue, Mar 1, 2022 at 1:15 AM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Mon, 28 Feb 2022 15:46:56 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
> > On Mon, Feb 28, 2022 at 11:38 AM Harold Huang <baymaxhuang@gmail.com> wrote:
> > >
> > > In tun, NAPI is supported and we can also use NAPI in the path of
> > > batched XDP buffs to accelerate packet processing. What is more, after
> > > we use NAPI, GRO is also supported. The iperf shows that the throughput of
> > > single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
> > > Gbps nearly reachs the line speed of the phy nic and there is still about
> > > 15% idle cpu core remaining on the vhost thread.
> > >
> > > Test topology:
> > > [iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]
> > >
> > > Iperf stream:
> > > iperf3 -c 10.0.0.2  -i 1 -t 10
> > >
> > > Before:
> > > ...
> > > [  5]   5.00-6.00   sec   558 MBytes  4.68 Gbits/sec    0   1.50 MBytes
> > > [  5]   6.00-7.00   sec   556 MBytes  4.67 Gbits/sec    1   1.35 MBytes
> > > [  5]   7.00-8.00   sec   556 MBytes  4.67 Gbits/sec    2   1.18 MBytes
> > > [  5]   8.00-9.00   sec   559 MBytes  4.69 Gbits/sec    0   1.48 MBytes
> > > [  5]   9.00-10.00  sec   556 MBytes  4.67 Gbits/sec    1   1.33 MBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval           Transfer     Bitrate         Retr
> > > [  5]   0.00-10.00  sec  5.39 GBytes  4.63 Gbits/sec   72          sender
> > > [  5]   0.00-10.04  sec  5.39 GBytes  4.61 Gbits/sec               receiver
> > >
> > > After:
> > > ...
> > > [  5]   5.00-6.00   sec  1.07 GBytes  9.19 Gbits/sec    0   1.55 MBytes
> > > [  5]   6.00-7.00   sec  1.08 GBytes  9.30 Gbits/sec    0   1.63 MBytes
> > > [  5]   7.00-8.00   sec  1.08 GBytes  9.25 Gbits/sec    0   1.72 MBytes
> > > [  5]   8.00-9.00   sec  1.08 GBytes  9.25 Gbits/sec   77   1.31 MBytes
> > > [  5]   9.00-10.00  sec  1.08 GBytes  9.24 Gbits/sec    0   1.48 MBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval           Transfer     Bitrate         Retr
> > > [  5]   0.00-10.00  sec  10.8 GBytes  9.28 Gbits/sec  166          sender
> > > [  5]   0.00-10.04  sec  10.8 GBytes  9.24 Gbits/sec               receiver
> > >
> > > Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
> > > Signed-off-by: Harold Huang <baymaxhuang@gmail.com>
> >
> > Acked-by: Jason Wang <jasowang@redhat.com>
>
> Would this help when using sendmmsg and recvmmsg on the TAP device?

We haven't exported the socket object of tuntap to userspace. So we
can't use sendmmsg()/recvmsg() now.

> Asking because interested in speeding up another use of TAP device, and wondering
> if this would help.
>

Yes, it would be interesting. We need someone to work on that.

Thanks


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v3] tun: support NAPI for packets received from batched XDP buffs
  2022-02-28 17:15     ` Stephen Hemminger
  2022-03-01  1:47       ` Jason Wang
@ 2022-03-01  1:58       ` Harold Huang
  1 sibling, 0 replies; 17+ messages in thread
From: Harold Huang @ 2022-03-01  1:58 UTC (permalink / raw)
  To: stephen
  Cc: Jason Wang, netdev, Paolo Abeni, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, open list, open list:XDP (eXpress Data Path)

On Tue, Mar 1, 2022 at 1:15 AM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Mon, 28 Feb 2022 15:46:56 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
> > On Mon, Feb 28, 2022 at 11:38 AM Harold Huang <baymaxhuang@gmail.com> wrote:
> > >
> > > In tun, NAPI is supported and we can also use NAPI in the path of
> > > batched XDP buffs to accelerate packet processing. What is more, after
> > > we use NAPI, GRO is also supported. The iperf shows that the throughput of
> > > single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
> > > Gbps nearly reachs the line speed of the phy nic and there is still about
> > > 15% idle cpu core remaining on the vhost thread.
> > >
> > > Test topology:
> > > [iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]
> > >
> > > Iperf stream:
> > > iperf3 -c 10.0.0.2  -i 1 -t 10
> > >
> > > Before:
> > > ...
> > > [  5]   5.00-6.00   sec   558 MBytes  4.68 Gbits/sec    0   1.50 MBytes
> > > [  5]   6.00-7.00   sec   556 MBytes  4.67 Gbits/sec    1   1.35 MBytes
> > > [  5]   7.00-8.00   sec   556 MBytes  4.67 Gbits/sec    2   1.18 MBytes
> > > [  5]   8.00-9.00   sec   559 MBytes  4.69 Gbits/sec    0   1.48 MBytes
> > > [  5]   9.00-10.00  sec   556 MBytes  4.67 Gbits/sec    1   1.33 MBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval           Transfer     Bitrate         Retr
> > > [  5]   0.00-10.00  sec  5.39 GBytes  4.63 Gbits/sec   72          sender
> > > [  5]   0.00-10.04  sec  5.39 GBytes  4.61 Gbits/sec               receiver
> > >
> > > After:
> > > ...
> > > [  5]   5.00-6.00   sec  1.07 GBytes  9.19 Gbits/sec    0   1.55 MBytes
> > > [  5]   6.00-7.00   sec  1.08 GBytes  9.30 Gbits/sec    0   1.63 MBytes
> > > [  5]   7.00-8.00   sec  1.08 GBytes  9.25 Gbits/sec    0   1.72 MBytes
> > > [  5]   8.00-9.00   sec  1.08 GBytes  9.25 Gbits/sec   77   1.31 MBytes
> > > [  5]   9.00-10.00  sec  1.08 GBytes  9.24 Gbits/sec    0   1.48 MBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval           Transfer     Bitrate         Retr
> > > [  5]   0.00-10.00  sec  10.8 GBytes  9.28 Gbits/sec  166          sender
> > > [  5]   0.00-10.04  sec  10.8 GBytes  9.24 Gbits/sec               receiver
> > >
> > > Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
> > > Signed-off-by: Harold Huang <baymaxhuang@gmail.com>
> >
> > Acked-by: Jason Wang <jasowang@redhat.com>
>
> Would this help when using sendmmsg and recvmmsg on the TAP device?
> Asking because interested in speeding up another use of TAP device, and wondering
> if this would help.

As Jason said, sendmmsg()/recvmsg() could not be used on tuntap. But I
think another choice is to use writev/readv directly on the ttunap fd,
which will call tun_get_user to send msg and NAPI has also been
supported.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v3] tun: support NAPI for packets received from batched XDP buffs
  2022-02-28  3:38 ` [PATCH net-next v3] " Harold Huang
  2022-02-28  7:46   ` Jason Wang
@ 2022-03-02  1:40   ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 17+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-03-02  1:40 UTC (permalink / raw)
  To: Harold Huang
  Cc: netdev, jasowang, pabeni, davem, kuba, ast, daniel, hawk,
	john.fastabend, linux-kernel, bpf

Hello:

This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 28 Feb 2022 11:38:05 +0800 you wrote:
> In tun, NAPI is supported and we can also use NAPI in the path of
> batched XDP buffs to accelerate packet processing. What is more, after
> we use NAPI, GRO is also supported. The iperf shows that the throughput of
> single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
> Gbps nearly reachs the line speed of the phy nic and there is still about
> 15% idle cpu core remaining on the vhost thread.
> 
> [...]

Here is the summary with links:
  - [net-next,v3] tun: support NAPI for packets received from batched XDP buffs
    https://git.kernel.org/netdev/net-next/c/fb3f903769e8

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-03-02  1:40 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-24 10:38 [PATCH] tun: support NAPI to accelerate packet processing Harold Huang
2022-02-24 17:22 ` Paolo Abeni
2022-02-25  3:36   ` Harold Huang
2022-02-25  3:46 ` Jason Wang
2022-02-25  9:02 ` [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs Harold Huang
2022-02-28  2:15   ` Jason Wang
2022-02-28  4:06   ` Eric Dumazet
2022-02-28  4:20     ` Jason Wang
     [not found]       ` <CANn89iKLhhwGnmEyfZuEKjtt7OwTbVyDYcFUMDYoRpdXjbMwiA@mail.gmail.com>
2022-02-28  5:17         ` Jason Wang
2022-02-28  7:26           ` Harold Huang
2022-02-28  7:56             ` Jason Wang
2022-02-28  3:38 ` [PATCH net-next v3] " Harold Huang
2022-02-28  7:46   ` Jason Wang
2022-02-28 17:15     ` Stephen Hemminger
2022-03-01  1:47       ` Jason Wang
2022-03-01  1:58       ` Harold Huang
2022-03-02  1:40   ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.