linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V4 net 1/3] Revert "tuntap: add missing xdp flush"
@ 2018-02-24  3:32 Jason Wang
  2018-02-24  3:32 ` [PATCH V4 net 2/3] tuntap: disable preemption during XDP processing Jason Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Jason Wang @ 2018-02-24  3:32 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: mst, sergei.shtylyov, christoffer.dall, brouer, Jason Wang

This reverts commit 762c330d670e3d4b795cf7a8d761866fdd1eef49. The
reason is we try to batch packets for devmap which causes calling
xdp_do_flush() in the process context. Simply disabling preemption
may not work since process may move among processors which lead
xdp_do_flush() to miss some flushes on some processors.

So simply revert the patch, a follow-up patch will add the xdp flush
correctly.

Reported-by: Christoffer Dall <christoffer.dall@linaro.org>
Fixes: 762c330d670e ("tuntap: add missing xdp flush")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c | 15 ---------------
 1 file changed, 15 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index b52258c..2823a4a 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -181,7 +181,6 @@ struct tun_file {
 	struct tun_struct *detached;
 	struct ptr_ring tx_ring;
 	struct xdp_rxq_info xdp_rxq;
-	int xdp_pending_pkts;
 };
 
 struct tun_flow_entry {
@@ -1662,7 +1661,6 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 		case XDP_REDIRECT:
 			get_page(alloc_frag->page);
 			alloc_frag->offset += buflen;
-			++tfile->xdp_pending_pkts;
 			err = xdp_do_redirect(tun->dev, &xdp, xdp_prog);
 			if (err)
 				goto err_redirect;
@@ -1984,11 +1982,6 @@ static ssize_t tun_chr_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	result = tun_get_user(tun, tfile, NULL, from,
 			      file->f_flags & O_NONBLOCK, false);
 
-	if (tfile->xdp_pending_pkts) {
-		tfile->xdp_pending_pkts = 0;
-		xdp_do_flush_map();
-	}
-
 	tun_put(tun);
 	return result;
 }
@@ -2325,13 +2318,6 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 	ret = tun_get_user(tun, tfile, m->msg_control, &m->msg_iter,
 			   m->msg_flags & MSG_DONTWAIT,
 			   m->msg_flags & MSG_MORE);
-
-	if (tfile->xdp_pending_pkts >= NAPI_POLL_WEIGHT ||
-	    !(m->msg_flags & MSG_MORE)) {
-		tfile->xdp_pending_pkts = 0;
-		xdp_do_flush_map();
-	}
-
 	tun_put(tun);
 	return ret;
 }
@@ -3163,7 +3149,6 @@ static int tun_chr_open(struct inode *inode, struct file * file)
 	sock_set_flag(&tfile->sk, SOCK_ZEROCOPY);
 
 	memset(&tfile->tx_ring, 0, sizeof(tfile->tx_ring));
-	tfile->xdp_pending_pkts = 0;
 
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V4 net 2/3] tuntap: disable preemption during XDP processing
  2018-02-24  3:32 [PATCH V4 net 1/3] Revert "tuntap: add missing xdp flush" Jason Wang
@ 2018-02-24  3:32 ` Jason Wang
  2018-02-26 11:02   ` Jesper Dangaard Brouer
  2018-02-26 18:50   ` David Miller
  2018-02-24  3:32 ` [PATCH V4 net 3/3] tuntap: correctly add the missing XDP flush Jason Wang
  2018-02-26 18:50 ` [PATCH V4 net 1/3] Revert "tuntap: add missing xdp flush" David Miller
  2 siblings, 2 replies; 8+ messages in thread
From: Jason Wang @ 2018-02-24  3:32 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: mst, sergei.shtylyov, christoffer.dall, brouer, Jason Wang

Except for tuntap, all other drivers' XDP was implemented at NAPI
poll() routine in a bh. This guarantees all XDP operation were done at
the same CPU which is required by e.g BFP_MAP_TYPE_PERCPU_ARRAY. But
for tuntap, we do it in process context and we try to protect XDP
processing by RCU reader lock. This is insufficient since
CONFIG_PREEMPT_RCU can preempt the RCU reader critical section which
breaks the assumption that all XDP were processed in the same CPU.

Fixing this by simply disabling preemption during XDP processing.

Fixes: 761876c857cb ("tap: XDP support")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 2823a4a..63d39fe6 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1642,6 +1642,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 	else
 		*skb_xdp = 0;
 
+	preempt_disable();
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(tun->xdp_prog);
 	if (xdp_prog && !*skb_xdp) {
@@ -1665,6 +1666,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 			if (err)
 				goto err_redirect;
 			rcu_read_unlock();
+			preempt_enable();
 			return NULL;
 		case XDP_TX:
 			xdp_xmit = true;
@@ -1686,6 +1688,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 	skb = build_skb(buf, buflen);
 	if (!skb) {
 		rcu_read_unlock();
+		preempt_enable();
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -1698,10 +1701,12 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 		skb->dev = tun->dev;
 		generic_xdp_tx(skb, xdp_prog);
 		rcu_read_unlock();
+		preempt_enable();
 		return NULL;
 	}
 
 	rcu_read_unlock();
+	preempt_enable();
 
 	return skb;
 
@@ -1709,6 +1714,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 	put_page(alloc_frag->page);
 err_xdp:
 	rcu_read_unlock();
+	preempt_enable();
 	this_cpu_inc(tun->pcpu_stats->rx_dropped);
 	return NULL;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V4 net 3/3] tuntap: correctly add the missing XDP flush
  2018-02-24  3:32 [PATCH V4 net 1/3] Revert "tuntap: add missing xdp flush" Jason Wang
  2018-02-24  3:32 ` [PATCH V4 net 2/3] tuntap: disable preemption during XDP processing Jason Wang
@ 2018-02-24  3:32 ` Jason Wang
  2018-02-26 18:50   ` David Miller
  2018-02-26 18:50 ` [PATCH V4 net 1/3] Revert "tuntap: add missing xdp flush" David Miller
  2 siblings, 1 reply; 8+ messages in thread
From: Jason Wang @ 2018-02-24  3:32 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: mst, sergei.shtylyov, christoffer.dall, brouer, Jason Wang

We don't flush batched XDP packets through xdp_do_flush_map(), this
will cause packets stall at TX queue. Consider we don't do XDP on NAPI
poll(), the only possible fix is to call xdp_do_flush_map()
immediately after xdp_do_redirect().

Note, this in fact won't try to batch packets through devmap, we could
address in the future.

Reported-by: Christoffer Dall <christoffer.dall@linaro.org>
Fixes: 761876c857cb ("tap: XDP support")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 63d39fe6..7433bb2 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1663,6 +1663,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 			get_page(alloc_frag->page);
 			alloc_frag->offset += buflen;
 			err = xdp_do_redirect(tun->dev, &xdp, xdp_prog);
+			xdp_do_flush_map();
 			if (err)
 				goto err_redirect;
 			rcu_read_unlock();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH V4 net 2/3] tuntap: disable preemption during XDP processing
  2018-02-24  3:32 ` [PATCH V4 net 2/3] tuntap: disable preemption during XDP processing Jason Wang
@ 2018-02-26 11:02   ` Jesper Dangaard Brouer
  2018-02-26 13:43     ` Jason Wang
  2018-02-26 18:50   ` David Miller
  1 sibling, 1 reply; 8+ messages in thread
From: Jesper Dangaard Brouer @ 2018-02-26 11:02 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, linux-kernel, mst, sergei.shtylyov, christoffer.dall, brouer

On Sat, 24 Feb 2018 11:32:25 +0800
Jason Wang <jasowang@redhat.com> wrote:

> Except for tuntap, all other drivers' XDP was implemented at NAPI
> poll() routine in a bh. This guarantees all XDP operation were done at
> the same CPU which is required by e.g BFP_MAP_TYPE_PERCPU_ARRAY. But

There is a typo in the defined name "BFP_MAP_TYPE_PERCPU_ARRAY".
Besides it is NOT a requirement that comes from the map type
BPF_MAP_TYPE_PERCPU_ARRAY.

The requirement comes from the bpf_redirect_map helper (and only partly
devmap + cpumap types), as the BPF helper/program stores information in
the per-cpu redirect_info struct (see filter.c), that is used by
xdp_do_redirect() and xdp_do_flush_map().

 struct redirect_info {
	u32 ifindex;
	u32 flags;
	struct bpf_map *map;
	struct bpf_map *map_to_flush;
	unsigned long   map_owner;
 };
 static DEFINE_PER_CPU(struct redirect_info, redirect_info);

 [...]
 void xdp_do_flush_map(void)
 { 
	struct redirect_info *ri = this_cpu_ptr(&redirect_info);
	struct bpf_map *map = ri->map_to_flush;
 [...]

Notice the same redirect_info is used by the TC clsbpf system...


> for tuntap, we do it in process context and we try to protect XDP
> processing by RCU reader lock. This is insufficient since
> CONFIG_PREEMPT_RCU can preempt the RCU reader critical section which
> breaks the assumption that all XDP were processed in the same CPU.
> 
> Fixing this by simply disabling preemption during XDP processing.

I guess, this could pamper over the problem...

But I generally find it problematic that the tuntap is not invoking XDP
from NAPI poll() routine in BH-context, as that context provided us
with some protection that allow certain kind of optimizations (like
this flush API).  I hope this will not limit us in the future, that
tuntap driver violate the XDP call context.

> Fixes: 761876c857cb ("tap: XDP support")

$ git describe --contains 761876c857cb
v4.14-rc1~130^2~270^2
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V4 net 2/3] tuntap: disable preemption during XDP processing
  2018-02-26 11:02   ` Jesper Dangaard Brouer
@ 2018-02-26 13:43     ` Jason Wang
  0 siblings, 0 replies; 8+ messages in thread
From: Jason Wang @ 2018-02-26 13:43 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, linux-kernel, mst, sergei.shtylyov, christoffer.dall



On 2018年02月26日 19:02, Jesper Dangaard Brouer wrote:
> On Sat, 24 Feb 2018 11:32:25 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> Except for tuntap, all other drivers' XDP was implemented at NAPI
>> poll() routine in a bh. This guarantees all XDP operation were done at
>> the same CPU which is required by e.g BFP_MAP_TYPE_PERCPU_ARRAY. But
> There is a typo in the defined name "BFP_MAP_TYPE_PERCPU_ARRAY".
> Besides it is NOT a requirement that comes from the map type
> BPF_MAP_TYPE_PERCPU_ARRAY.

But it looks to me that bpf_array uses percpu structure, e.g in 
percpu_array_map_lookup_elem() it tries to access it through this_cpu_ptr().

>
> The requirement comes from the bpf_redirect_map helper (and only partly
> devmap + cpumap types), as the BPF helper/program stores information in
> the per-cpu redirect_info struct (see filter.c), that is used by
> xdp_do_redirect() and xdp_do_flush_map().
>
>   struct redirect_info {
> 	u32 ifindex;
> 	u32 flags;
> 	struct bpf_map *map;
> 	struct bpf_map *map_to_flush;
> 	unsigned long   map_owner;
>   };
>   static DEFINE_PER_CPU(struct redirect_info, redirect_info);
>
>   [...]
>   void xdp_do_flush_map(void)
>   {
> 	struct redirect_info *ri = this_cpu_ptr(&redirect_info);
> 	struct bpf_map *map = ri->map_to_flush;
>   [...]
>
> Notice the same redirect_info is used by the TC clsbpf system...
>
>
>> for tuntap, we do it in process context and we try to protect XDP
>> processing by RCU reader lock. This is insufficient since
>> CONFIG_PREEMPT_RCU can preempt the RCU reader critical section which
>> breaks the assumption that all XDP were processed in the same CPU.
>>
>> Fixing this by simply disabling preemption during XDP processing.
> I guess, this could pamper over the problem...
>
> But I generally find it problematic that the tuntap is not invoking XDP
> from NAPI poll() routine in BH-context, as that context provided us
> with some protection that allow certain kind of optimizations (like
> this flush API).  I hope this will not limit us in the future, that
> tuntap driver violate the XDP call context.

Good to see tuntap is on the radar :), it was easily forgotten. I was 
glad to test anything new in XDP for tuntap. But I do not see any thing 
that prevents us from having a similar environment that NAPI poll() can 
provides. I admit the flush is inefficient now, but it does not mean we 
can't solve it in the future. E.g for the flush, I plan to introduce the 
batching API which can accept an array of skb/XDP pointers in its 
sendmsg(). Then we can do a more efficient flush.

Note, tuntap supports NAPI (IFF_NAPI), but the main use case is kernel 
rx path hardening. Unless rx batching is enabled, it would be slower 
than non NAPI mode. Technically, it can support XDP but need more work 
(e.g work at the level of XDP buffer instead of skb).

I tend to do fixes or optimizations on the current code unless we find a 
real blocker.

>
>> Fixes: 761876c857cb ("tap: XDP support")
> $ git describe --contains 761876c857cb
> v4.14-rc1~130^2~270^2

So please let me know if you're ok with the fix.

Thanks

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V4 net 1/3] Revert "tuntap: add missing xdp flush"
  2018-02-24  3:32 [PATCH V4 net 1/3] Revert "tuntap: add missing xdp flush" Jason Wang
  2018-02-24  3:32 ` [PATCH V4 net 2/3] tuntap: disable preemption during XDP processing Jason Wang
  2018-02-24  3:32 ` [PATCH V4 net 3/3] tuntap: correctly add the missing XDP flush Jason Wang
@ 2018-02-26 18:50 ` David Miller
  2 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2018-02-26 18:50 UTC (permalink / raw)
  To: jasowang
  Cc: netdev, linux-kernel, mst, sergei.shtylyov, christoffer.dall, brouer

From: Jason Wang <jasowang@redhat.com>
Date: Sat, 24 Feb 2018 11:32:24 +0800

> This reverts commit 762c330d670e3d4b795cf7a8d761866fdd1eef49. The
> reason is we try to batch packets for devmap which causes calling
> xdp_do_flush() in the process context. Simply disabling preemption
> may not work since process may move among processors which lead
> xdp_do_flush() to miss some flushes on some processors.
> 
> So simply revert the patch, a follow-up patch will add the xdp flush
> correctly.
> 
> Reported-by: Christoffer Dall <christoffer.dall@linaro.org>
> Fixes: 762c330d670e ("tuntap: add missing xdp flush")
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Applied.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V4 net 2/3] tuntap: disable preemption during XDP processing
  2018-02-24  3:32 ` [PATCH V4 net 2/3] tuntap: disable preemption during XDP processing Jason Wang
  2018-02-26 11:02   ` Jesper Dangaard Brouer
@ 2018-02-26 18:50   ` David Miller
  1 sibling, 0 replies; 8+ messages in thread
From: David Miller @ 2018-02-26 18:50 UTC (permalink / raw)
  To: jasowang
  Cc: netdev, linux-kernel, mst, sergei.shtylyov, christoffer.dall, brouer

From: Jason Wang <jasowang@redhat.com>
Date: Sat, 24 Feb 2018 11:32:25 +0800

> Except for tuntap, all other drivers' XDP was implemented at NAPI
> poll() routine in a bh. This guarantees all XDP operation were done at
> the same CPU which is required by e.g BFP_MAP_TYPE_PERCPU_ARRAY. But
> for tuntap, we do it in process context and we try to protect XDP
> processing by RCU reader lock. This is insufficient since
> CONFIG_PREEMPT_RCU can preempt the RCU reader critical section which
> breaks the assumption that all XDP were processed in the same CPU.
> 
> Fixing this by simply disabling preemption during XDP processing.
> 
> Fixes: 761876c857cb ("tap: XDP support")
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Applied.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V4 net 3/3] tuntap: correctly add the missing XDP flush
  2018-02-24  3:32 ` [PATCH V4 net 3/3] tuntap: correctly add the missing XDP flush Jason Wang
@ 2018-02-26 18:50   ` David Miller
  0 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2018-02-26 18:50 UTC (permalink / raw)
  To: jasowang
  Cc: netdev, linux-kernel, mst, sergei.shtylyov, christoffer.dall, brouer

From: Jason Wang <jasowang@redhat.com>
Date: Sat, 24 Feb 2018 11:32:26 +0800

> We don't flush batched XDP packets through xdp_do_flush_map(), this
> will cause packets stall at TX queue. Consider we don't do XDP on NAPI
> poll(), the only possible fix is to call xdp_do_flush_map()
> immediately after xdp_do_redirect().
> 
> Note, this in fact won't try to batch packets through devmap, we could
> address in the future.
> 
> Reported-by: Christoffer Dall <christoffer.dall@linaro.org>
> Fixes: 761876c857cb ("tap: XDP support")
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Applied, and all 3 patches queued up for -stable.

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-02-26 18:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-24  3:32 [PATCH V4 net 1/3] Revert "tuntap: add missing xdp flush" Jason Wang
2018-02-24  3:32 ` [PATCH V4 net 2/3] tuntap: disable preemption during XDP processing Jason Wang
2018-02-26 11:02   ` Jesper Dangaard Brouer
2018-02-26 13:43     ` Jason Wang
2018-02-26 18:50   ` David Miller
2018-02-24  3:32 ` [PATCH V4 net 3/3] tuntap: correctly add the missing XDP flush Jason Wang
2018-02-26 18:50   ` David Miller
2018-02-26 18:50 ` [PATCH V4 net 1/3] Revert "tuntap: add missing xdp flush" David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).