netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* virtio_net: suspicious RCU usage with xdp
@ 2019-04-24 17:13 David Ahern
  2019-04-24 17:37 ` Michael S. Tsirkin
  0 siblings, 1 reply; 16+ messages in thread
From: David Ahern @ 2019-04-24 17:13 UTC (permalink / raw)
  To: makita.toshiaki, Michael S. Tsirkin, Jesper Dangaard Brouer,
	Toke Høiland-Jørgensen, netdev

seeing an RCU warning testing xdp with virtio net. net-next as of commit
b2f97f7de2f6a4df8e431330cf467576486651c5. No obvious changes so hoping
this rings a bell with someone else.


[  121.990304] =============================
[  121.991488] WARNING: suspicious RCU usage
[  121.992392] 5.1.0-rc5+ #60 Not tainted
[  121.993220] -----------------------------
[  121.994158] /home/dsa/kernel-3.git/drivers/net/virtio_net.c:516
suspicious rcu_dereference_check() usage!
[  121.996284]
               other info that might help us debug this:

[  121.997988]
               rcu_scheduler_active = 2, debug_locks = 1
[  121.999321] no locks held by swapper/1/0.
[  122.000328]
               stack backtrace:
[  122.001253] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.1.0-rc5+ #60
[  122.002474] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.11.1-1 04/01/2014
[  122.004141] Call Trace:
[  122.004651]  <IRQ>
[  122.005082]  dump_stack+0x7e/0xbb
[  122.005757]  lockdep_rcu_suspicious+0x102/0x10b
[  122.006654]  virtnet_xdp_xmit+0x104/0x4fe
[  122.007447]  ? kasan_check_read+0x11/0x13
[  122.008267]  ? mergeable_rx_buffer_size_show+0x163/0x163
[  122.009299]  ? __asan_loadN+0xf/0x11
[  122.010010]  ? pvclock_clocksource_read+0xfa/0x189
[  122.010975]  bq_xmit_all+0xdc/0x358
[  122.011699]  __dev_map_flush+0xc2/0xef
[  122.012472]  xdp_do_flush_map+0x5b/0x74
[  122.013238]  virtnet_poll+0x58f/0x679

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: virtio_net: suspicious RCU usage with xdp
  2019-04-24 17:13 virtio_net: suspicious RCU usage with xdp David Ahern
@ 2019-04-24 17:37 ` Michael S. Tsirkin
  2019-04-24 17:40   ` David Ahern
                     ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Michael S. Tsirkin @ 2019-04-24 17:37 UTC (permalink / raw)
  To: David Ahern
  Cc: makita.toshiaki, Jesper Dangaard Brouer,
	Toke Høiland-Jørgensen, netdev, Jason Wang

On Wed, Apr 24, 2019 at 11:13:42AM -0600, David Ahern wrote:
> seeing an RCU warning testing xdp with virtio net. net-next as of commit
> b2f97f7de2f6a4df8e431330cf467576486651c5. No obvious changes so hoping
> this rings a bell with someone else.
> 
> 
> [  121.990304] =============================
> [  121.991488] WARNING: suspicious RCU usage
> [  121.992392] 5.1.0-rc5+ #60 Not tainted
> [  121.993220] -----------------------------
> [  121.994158] /home/dsa/kernel-3.git/drivers/net/virtio_net.c:516
> suspicious rcu_dereference_check() usage!
> [  121.996284]
>                other info that might help us debug this:
> 
> [  121.997988]
>                rcu_scheduler_active = 2, debug_locks = 1
> [  121.999321] no locks held by swapper/1/0.
> [  122.000328]
>                stack backtrace:
> [  122.001253] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.1.0-rc5+ #60
> [  122.002474] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.11.1-1 04/01/2014
> [  122.004141] Call Trace:
> [  122.004651]  <IRQ>
> [  122.005082]  dump_stack+0x7e/0xbb
> [  122.005757]  lockdep_rcu_suspicious+0x102/0x10b
> [  122.006654]  virtnet_xdp_xmit+0x104/0x4fe
> [  122.007447]  ? kasan_check_read+0x11/0x13
> [  122.008267]  ? mergeable_rx_buffer_size_show+0x163/0x163
> [  122.009299]  ? __asan_loadN+0xf/0x11
> [  122.010010]  ? pvclock_clocksource_read+0xfa/0x189
> [  122.010975]  bq_xmit_all+0xdc/0x358
> [  122.011699]  __dev_map_flush+0xc2/0xef
> [  122.012472]  xdp_do_flush_map+0x5b/0x74
> [  122.013238]  virtnet_poll+0x58f/0x679

Well virtnet_xdp_xmit seems to be called from .ndo_xdp_xmit
and that isn't in an RCU read-side critical section.

Looks like we just need to add RCU read lock/unlock.
Like the below perhaps?

This issue was introduced by 8dcc5b0ab0 however I find it
inelegant that we need to do checks in each driver,
and add RCU locks just for a startup initialization issue.
Can't XDP core make sure the callback isn't invoked
at an inappropriate time instead?

--->

virtio_net: call virtnet_xdp_xmit with RCU lock

This functions uses rcu_dereference so it needs to be
called in an RCU read-side critical section.

Fixes: 8dcc5b0ab0 ("virtio_net: fix ndo_xdp_xmit crash towards dev not ready for XDP")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

--

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9010938e2d71..ccc1bdd1bb1f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -495,8 +495,8 @@ static struct send_queue *virtnet_xdp_sq(struct virtnet_info *vi)
 	return &vi->sq[qp];
 }
 
-static int virtnet_xdp_xmit(struct net_device *dev,
-			    int n, struct xdp_frame **frames, u32 flags)
+static int __virtnet_xdp_xmit(struct net_device *dev,
+			      int n, struct xdp_frame **frames, u32 flags)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
 	struct receive_queue *rq = vi->rq;
@@ -569,6 +569,17 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 	return ret;
 }
 
+static int virtnet_xdp_xmit(struct net_device *dev,
+			    int n, struct xdp_frame **frames, u32 flags)
+{
+	int r;
+
+	rcu_read_lock_bh();
+	r = __virtnet_xdp_xmit(dev, n, frames, flags);
+	rcu_read_unlock_bh();
+	return r;
+}
+
 static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
 {
 	return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0;
@@ -714,7 +725,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 			xdpf = convert_to_xdp_frame(&xdp);
 			if (unlikely(!xdpf))
 				goto err_xdp;
-			err = virtnet_xdp_xmit(dev, 1, &xdpf, 0);
+			err = __virtnet_xdp_xmit(dev, 1, &xdpf, 0);
 			if (unlikely(err < 0)) {
 				trace_xdp_exception(vi->dev, xdp_prog, act);
 				goto err_xdp;
@@ -887,7 +898,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 			xdpf = convert_to_xdp_frame(&xdp);
 			if (unlikely(!xdpf))
 				goto err_xdp;
-			err = virtnet_xdp_xmit(dev, 1, &xdpf, 0);
+			err = __virtnet_xdp_xmit(dev, 1, &xdpf, 0);
 			if (unlikely(err < 0)) {
 				trace_xdp_exception(vi->dev, xdp_prog, act);
 				if (unlikely(xdp_page != page))


-- 
MST

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: virtio_net: suspicious RCU usage with xdp
  2019-04-24 17:37 ` Michael S. Tsirkin
@ 2019-04-24 17:40   ` David Ahern
  2019-04-25  2:56     ` Jason Wang
  2019-04-25  2:55   ` Jason Wang
  2019-04-25  4:58   ` Toshiaki Makita
  2 siblings, 1 reply; 16+ messages in thread
From: David Ahern @ 2019-04-24 17:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: makita.toshiaki, Jesper Dangaard Brouer,
	Toke Høiland-Jørgensen, netdev, Jason Wang

On 4/24/19 11:37 AM, Michael S. Tsirkin wrote:
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 9010938e2d71..ccc1bdd1bb1f 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -495,8 +495,8 @@ static struct send_queue *virtnet_xdp_sq(struct virtnet_info *vi)
>  	return &vi->sq[qp];
>  }
>  
> -static int virtnet_xdp_xmit(struct net_device *dev,
> -			    int n, struct xdp_frame **frames, u32 flags)
> +static int __virtnet_xdp_xmit(struct net_device *dev,
> +			      int n, struct xdp_frame **frames, u32 flags)
>  {
>  	struct virtnet_info *vi = netdev_priv(dev);
>  	struct receive_queue *rq = vi->rq;
> @@ -569,6 +569,17 @@ static int virtnet_xdp_xmit(struct net_device *dev,
>  	return ret;
>  }
>  
> +static int virtnet_xdp_xmit(struct net_device *dev,
> +			    int n, struct xdp_frame **frames, u32 flags)
> +{
> +	int r;
> +
> +	rcu_read_lock_bh();
> +	r = __virtnet_xdp_xmit(dev, n, frames, flags);
> +	rcu_read_unlock_bh();
> +	return r;
> +}
> +
>  static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
>  {
>  	return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0;
> @@ -714,7 +725,7 @@ static struct sk_buff *receive_small(struct net_device *dev,

receive_small takes the rcu lock at the beginning.


>  			xdpf = convert_to_xdp_frame(&xdp);
>  			if (unlikely(!xdpf))
>  				goto err_xdp;
> -			err = virtnet_xdp_xmit(dev, 1, &xdpf, 0);
> +			err = __virtnet_xdp_xmit(dev, 1, &xdpf, 0);
>  			if (unlikely(err < 0)) {
>  				trace_xdp_exception(vi->dev, xdp_prog, act);
>  				goto err_xdp;
> @@ -887,7 +898,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,

same here.
>  			xdpf = convert_to_xdp_frame(&xdp);
>  			if (unlikely(!xdpf))
>  				goto err_xdp;
> -			err = virtnet_xdp_xmit(dev, 1, &xdpf, 0);
> +			err = __virtnet_xdp_xmit(dev, 1, &xdpf, 0);
>  			if (unlikely(err < 0)) {
>  				trace_xdp_exception(vi->dev, xdp_prog, act);
>  				if (unlikely(xdp_page != page))
> 
> 

bq_xmit_all is invoking the xdp_xmit callback without taking the rcu
lock, but from the git history does not appear to have any recent changes.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: virtio_net: suspicious RCU usage with xdp
  2019-04-24 17:37 ` Michael S. Tsirkin
  2019-04-24 17:40   ` David Ahern
@ 2019-04-25  2:55   ` Jason Wang
  2019-04-25  4:58   ` Toshiaki Makita
  2 siblings, 0 replies; 16+ messages in thread
From: Jason Wang @ 2019-04-25  2:55 UTC (permalink / raw)
  To: Michael S. Tsirkin, David Ahern
  Cc: makita.toshiaki, Jesper Dangaard Brouer,
	Toke Høiland-Jørgensen, netdev


On 2019/4/25 上午1:37, Michael S. Tsirkin wrote:
> On Wed, Apr 24, 2019 at 11:13:42AM -0600, David Ahern wrote:
>> seeing an RCU warning testing xdp with virtio net. net-next as of commit
>> b2f97f7de2f6a4df8e431330cf467576486651c5. No obvious changes so hoping
>> this rings a bell with someone else.
>>
>>
>> [  121.990304] =============================
>> [  121.991488] WARNING: suspicious RCU usage
>> [  121.992392] 5.1.0-rc5+ #60 Not tainted
>> [  121.993220] -----------------------------
>> [  121.994158] /home/dsa/kernel-3.git/drivers/net/virtio_net.c:516
>> suspicious rcu_dereference_check() usage!
>> [  121.996284]
>>                 other info that might help us debug this:
>>
>> [  121.997988]
>>                 rcu_scheduler_active = 2, debug_locks = 1
>> [  121.999321] no locks held by swapper/1/0.
>> [  122.000328]
>>                 stack backtrace:
>> [  122.001253] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.1.0-rc5+ #60
>> [  122.002474] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> BIOS 1.11.1-1 04/01/2014
>> [  122.004141] Call Trace:
>> [  122.004651]  <IRQ>
>> [  122.005082]  dump_stack+0x7e/0xbb
>> [  122.005757]  lockdep_rcu_suspicious+0x102/0x10b
>> [  122.006654]  virtnet_xdp_xmit+0x104/0x4fe
>> [  122.007447]  ? kasan_check_read+0x11/0x13
>> [  122.008267]  ? mergeable_rx_buffer_size_show+0x163/0x163
>> [  122.009299]  ? __asan_loadN+0xf/0x11
>> [  122.010010]  ? pvclock_clocksource_read+0xfa/0x189
>> [  122.010975]  bq_xmit_all+0xdc/0x358
>> [  122.011699]  __dev_map_flush+0xc2/0xef
>> [  122.012472]  xdp_do_flush_map+0x5b/0x74
>> [  122.013238]  virtnet_poll+0x58f/0x679
> Well virtnet_xdp_xmit seems to be called from .ndo_xdp_xmit
> and that isn't in an RCU read-side critical section.
>
> Looks like we just need to add RCU read lock/unlock.
> Like the below perhaps?
>
> This issue was introduced by 8dcc5b0ab0 however I find it
> inelegant that we need to do checks in each driver,
> and add RCU locks just for a startup initialization issue.
> Can't XDP core make sure the callback isn't invoked
> at an inappropriate time instead?
>
> --->
>
> virtio_net: call virtnet_xdp_xmit with RCU lock
>
> This functions uses rcu_dereference so it needs to be
> called in an RCU read-side critical section.
>
> Fixes: 8dcc5b0ab0 ("virtio_net: fix ndo_xdp_xmit crash towards dev not ready for XDP")
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>
> --
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 9010938e2d71..ccc1bdd1bb1f 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -495,8 +495,8 @@ static struct send_queue *virtnet_xdp_sq(struct virtnet_info *vi)
>   	return &vi->sq[qp];
>   }
>   
> -static int virtnet_xdp_xmit(struct net_device *dev,
> -			    int n, struct xdp_frame **frames, u32 flags)
> +static int __virtnet_xdp_xmit(struct net_device *dev,
> +			      int n, struct xdp_frame **frames, u32 flags)
>   {
>   	struct virtnet_info *vi = netdev_priv(dev);
>   	struct receive_queue *rq = vi->rq;
> @@ -569,6 +569,17 @@ static int virtnet_xdp_xmit(struct net_device *dev,
>   	return ret;
>   }
>   
> +static int virtnet_xdp_xmit(struct net_device *dev,
> +			    int n, struct xdp_frame **frames, u32 flags)
> +{
> +	int r;
> +
> +	rcu_read_lock_bh();
> +	r = __virtnet_xdp_xmit(dev, n, frames, flags);
> +	rcu_read_unlock_bh();
> +	return r;
> +}
> +
>   static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
>   {
>   	return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0;
> @@ -714,7 +725,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>   			xdpf = convert_to_xdp_frame(&xdp);
>   			if (unlikely(!xdpf))
>   				goto err_xdp;
> -			err = virtnet_xdp_xmit(dev, 1, &xdpf, 0);
> +			err = __virtnet_xdp_xmit(dev, 1, &xdpf, 0);
>   			if (unlikely(err < 0)) {
>   				trace_xdp_exception(vi->dev, xdp_prog, act);
>   				goto err_xdp;
> @@ -887,7 +898,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   			xdpf = convert_to_xdp_frame(&xdp);
>   			if (unlikely(!xdpf))
>   				goto err_xdp;
> -			err = virtnet_xdp_xmit(dev, 1, &xdpf, 0);
> +			err = __virtnet_xdp_xmit(dev, 1, &xdpf, 0);
>   			if (unlikely(err < 0)) {
>   				trace_xdp_exception(vi->dev, xdp_prog, act);
>   				if (unlikely(xdp_page != page))
>
>

Acked-by: Jason Wang <jasowang@redhat.com>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: virtio_net: suspicious RCU usage with xdp
  2019-04-24 17:40   ` David Ahern
@ 2019-04-25  2:56     ` Jason Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Jason Wang @ 2019-04-25  2:56 UTC (permalink / raw)
  To: David Ahern, Michael S. Tsirkin
  Cc: makita.toshiaki, Jesper Dangaard Brouer,
	Toke Høiland-Jørgensen, netdev


On 2019/4/25 上午1:40, David Ahern wrote:
> On 4/24/19 11:37 AM, Michael S. Tsirkin wrote:
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 9010938e2d71..ccc1bdd1bb1f 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -495,8 +495,8 @@ static struct send_queue *virtnet_xdp_sq(struct virtnet_info *vi)
>>   	return &vi->sq[qp];
>>   }
>>   
>> -static int virtnet_xdp_xmit(struct net_device *dev,
>> -			    int n, struct xdp_frame **frames, u32 flags)
>> +static int __virtnet_xdp_xmit(struct net_device *dev,
>> +			      int n, struct xdp_frame **frames, u32 flags)
>>   {
>>   	struct virtnet_info *vi = netdev_priv(dev);
>>   	struct receive_queue *rq = vi->rq;
>> @@ -569,6 +569,17 @@ static int virtnet_xdp_xmit(struct net_device *dev,
>>   	return ret;
>>   }
>>   
>> +static int virtnet_xdp_xmit(struct net_device *dev,
>> +			    int n, struct xdp_frame **frames, u32 flags)
>> +{
>> +	int r;
>> +
>> +	rcu_read_lock_bh();
>> +	r = __virtnet_xdp_xmit(dev, n, frames, flags);
>> +	rcu_read_unlock_bh();
>> +	return r;
>> +}
>> +
>>   static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
>>   {
>>   	return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0;
>> @@ -714,7 +725,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
> receive_small takes the rcu lock at the beginning.


Yes, so the code call __virtnet_xdp_xmit() that won't try to take rcu lock.

Thanks


>
>
>>   			xdpf = convert_to_xdp_frame(&xdp);
>>   			if (unlikely(!xdpf))
>>   				goto err_xdp;
>> -			err = virtnet_xdp_xmit(dev, 1, &xdpf, 0);
>> +			err = __virtnet_xdp_xmit(dev, 1, &xdpf, 0);
>>   			if (unlikely(err < 0)) {
>>   				trace_xdp_exception(vi->dev, xdp_prog, act);
>>   				goto err_xdp;
>> @@ -887,7 +898,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> same here.
>>   			xdpf = convert_to_xdp_frame(&xdp);
>>   			if (unlikely(!xdpf))
>>   				goto err_xdp;
>> -			err = virtnet_xdp_xmit(dev, 1, &xdpf, 0);
>> +			err = __virtnet_xdp_xmit(dev, 1, &xdpf, 0);
>>   			if (unlikely(err < 0)) {
>>   				trace_xdp_exception(vi->dev, xdp_prog, act);
>>   				if (unlikely(xdp_page != page))
>>
>>
> bq_xmit_all is invoking the xdp_xmit callback without taking the rcu
> lock, but from the git history does not appear to have any recent changes.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: virtio_net: suspicious RCU usage with xdp
  2019-04-24 17:37 ` Michael S. Tsirkin
  2019-04-24 17:40   ` David Ahern
  2019-04-25  2:55   ` Jason Wang
@ 2019-04-25  4:58   ` Toshiaki Makita
  2019-04-25 17:03     ` Michael S. Tsirkin
  2 siblings, 1 reply; 16+ messages in thread
From: Toshiaki Makita @ 2019-04-25  4:58 UTC (permalink / raw)
  To: Michael S. Tsirkin, David Ahern
  Cc: Jesper Dangaard Brouer, Toke Høiland-Jørgensen, netdev,
	Jason Wang

On 2019/04/25 2:37, Michael S. Tsirkin wrote:
> On Wed, Apr 24, 2019 at 11:13:42AM -0600, David Ahern wrote:
>> seeing an RCU warning testing xdp with virtio net. net-next as of commit
>> b2f97f7de2f6a4df8e431330cf467576486651c5. No obvious changes so hoping
>> this rings a bell with someone else.
>>
>>
>> [  121.990304] =============================
>> [  121.991488] WARNING: suspicious RCU usage
>> [  121.992392] 5.1.0-rc5+ #60 Not tainted
>> [  121.993220] -----------------------------
>> [  121.994158] /home/dsa/kernel-3.git/drivers/net/virtio_net.c:516
>> suspicious rcu_dereference_check() usage!
>> [  121.996284]
>>                other info that might help us debug this:
>>
>> [  121.997988]
>>                rcu_scheduler_active = 2, debug_locks = 1
>> [  121.999321] no locks held by swapper/1/0.
>> [  122.000328]
>>                stack backtrace:
>> [  122.001253] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.1.0-rc5+ #60
>> [  122.002474] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> BIOS 1.11.1-1 04/01/2014
>> [  122.004141] Call Trace:
>> [  122.004651]  <IRQ>
>> [  122.005082]  dump_stack+0x7e/0xbb
>> [  122.005757]  lockdep_rcu_suspicious+0x102/0x10b
>> [  122.006654]  virtnet_xdp_xmit+0x104/0x4fe
>> [  122.007447]  ? kasan_check_read+0x11/0x13
>> [  122.008267]  ? mergeable_rx_buffer_size_show+0x163/0x163
>> [  122.009299]  ? __asan_loadN+0xf/0x11
>> [  122.010010]  ? pvclock_clocksource_read+0xfa/0x189
>> [  122.010975]  bq_xmit_all+0xdc/0x358
>> [  122.011699]  __dev_map_flush+0xc2/0xef
>> [  122.012472]  xdp_do_flush_map+0x5b/0x74
>> [  122.013238]  virtnet_poll+0x58f/0x679
> 
> Well virtnet_xdp_xmit seems to be called from .ndo_xdp_xmit
> and that isn't in an RCU read-side critical section.
> 
> Looks like we just need to add RCU read lock/unlock.
> Like the below perhaps?
> 
> This issue was introduced by 8dcc5b0ab0 however I find it

Probably not 8dcc5b0ab0, but 5d053f9da431 ("bpf: devmap prepare xdp
frames for bulking").

> inelegant that we need to do checks in each driver,
> and add RCU locks just for a startup initialization issue.
> Can't XDP core make sure the callback isn't invoked
> at an inappropriate time instead?

Before commit 5d053f9da431, .ndo_xdp_xmit() should have always been
called under RCU. After the commit, xdp_do_flush_map() also can trigger
.ndo_xdp_xmit() but we forgot to add RCU read lock there?
I guess veth has the same problem and I feel like it should be fixed in
__dev_map_flush(). dev_map_flush_old() needs to be cared too.

-- 
Toshiaki Makita


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: virtio_net: suspicious RCU usage with xdp
  2019-04-25  4:58   ` Toshiaki Makita
@ 2019-04-25 17:03     ` Michael S. Tsirkin
  2019-04-25 17:41       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2019-04-25 17:03 UTC (permalink / raw)
  To: Toshiaki Makita
  Cc: David Ahern, Jesper Dangaard Brouer,
	Toke Høiland-Jørgensen, netdev, Jason Wang

On Thu, Apr 25, 2019 at 01:58:48PM +0900, Toshiaki Makita wrote:
> On 2019/04/25 2:37, Michael S. Tsirkin wrote:
> > On Wed, Apr 24, 2019 at 11:13:42AM -0600, David Ahern wrote:
> >> seeing an RCU warning testing xdp with virtio net. net-next as of commit
> >> b2f97f7de2f6a4df8e431330cf467576486651c5. No obvious changes so hoping
> >> this rings a bell with someone else.
> >>
> >>
> >> [  121.990304] =============================
> >> [  121.991488] WARNING: suspicious RCU usage
> >> [  121.992392] 5.1.0-rc5+ #60 Not tainted
> >> [  121.993220] -----------------------------
> >> [  121.994158] /home/dsa/kernel-3.git/drivers/net/virtio_net.c:516
> >> suspicious rcu_dereference_check() usage!
> >> [  121.996284]
> >>                other info that might help us debug this:
> >>
> >> [  121.997988]
> >>                rcu_scheduler_active = 2, debug_locks = 1
> >> [  121.999321] no locks held by swapper/1/0.
> >> [  122.000328]
> >>                stack backtrace:
> >> [  122.001253] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.1.0-rc5+ #60
> >> [  122.002474] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> >> BIOS 1.11.1-1 04/01/2014
> >> [  122.004141] Call Trace:
> >> [  122.004651]  <IRQ>
> >> [  122.005082]  dump_stack+0x7e/0xbb
> >> [  122.005757]  lockdep_rcu_suspicious+0x102/0x10b
> >> [  122.006654]  virtnet_xdp_xmit+0x104/0x4fe
> >> [  122.007447]  ? kasan_check_read+0x11/0x13
> >> [  122.008267]  ? mergeable_rx_buffer_size_show+0x163/0x163
> >> [  122.009299]  ? __asan_loadN+0xf/0x11
> >> [  122.010010]  ? pvclock_clocksource_read+0xfa/0x189
> >> [  122.010975]  bq_xmit_all+0xdc/0x358
> >> [  122.011699]  __dev_map_flush+0xc2/0xef
> >> [  122.012472]  xdp_do_flush_map+0x5b/0x74
> >> [  122.013238]  virtnet_poll+0x58f/0x679
> > 
> > Well virtnet_xdp_xmit seems to be called from .ndo_xdp_xmit
> > and that isn't in an RCU read-side critical section.
> > 
> > Looks like we just need to add RCU read lock/unlock.
> > Like the below perhaps?
> > 
> > This issue was introduced by 8dcc5b0ab0 however I find it
> 
> Probably not 8dcc5b0ab0, but 5d053f9da431 ("bpf: devmap prepare xdp
> frames for bulking").
> 
> > inelegant that we need to do checks in each driver,
> > and add RCU locks just for a startup initialization issue.
> > Can't XDP core make sure the callback isn't invoked
> > at an inappropriate time instead?
> 
> Before commit 5d053f9da431, .ndo_xdp_xmit() should have always been
> called under RCU. After the commit, xdp_do_flush_map() also can trigger
> .ndo_xdp_xmit() but we forgot to add RCU read lock there?
> I guess veth has the same problem and I feel like it should be fixed in
> __dev_map_flush(). dev_map_flush_old() needs to be cared too.

I don't have a problem either way.  Jesper, what do you think?

> -- 
> Toshiaki Makita

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: virtio_net: suspicious RCU usage with xdp
  2019-04-25 17:03     ` Michael S. Tsirkin
@ 2019-04-25 17:41       ` Jesper Dangaard Brouer
  2019-04-25 17:44         ` David Ahern
                           ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2019-04-25 17:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Toshiaki Makita, David Ahern, Toke Høiland-Jørgensen,
	netdev, Jason Wang, brouer

On Thu, 25 Apr 2019 13:03:39 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Apr 25, 2019 at 01:58:48PM +0900, Toshiaki Makita wrote:
> > On 2019/04/25 2:37, Michael S. Tsirkin wrote:  
> > > On Wed, Apr 24, 2019 at 11:13:42AM -0600, David Ahern wrote:  
> > >> seeing an RCU warning testing xdp with virtio net. net-next as of commit
> > >> b2f97f7de2f6a4df8e431330cf467576486651c5. No obvious changes so hoping
> > >> this rings a bell with someone else.
> > >>
> > >>
> > >> [  121.990304] =============================
> > >> [  121.991488] WARNING: suspicious RCU usage
> > >> [  121.992392] 5.1.0-rc5+ #60 Not tainted
> > >> [  121.993220] -----------------------------
> > >> [  121.994158] /home/dsa/kernel-3.git/drivers/net/virtio_net.c:516
> > >> suspicious rcu_dereference_check() usage!
> > >> [  121.996284]
> > >>                other info that might help us debug this:
> > >>
> > >> [  121.997988]
> > >>                rcu_scheduler_active = 2, debug_locks = 1
> > >> [  121.999321] no locks held by swapper/1/0.
> > >> [  122.000328]
> > >>                stack backtrace:
> > >> [  122.001253] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.1.0-rc5+ #60
> > >> [  122.002474] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > >> BIOS 1.11.1-1 04/01/2014
> > >> [  122.004141] Call Trace:
> > >> [  122.004651]  <IRQ>
> > >> [  122.005082]  dump_stack+0x7e/0xbb
> > >> [  122.005757]  lockdep_rcu_suspicious+0x102/0x10b
> > >> [  122.006654]  virtnet_xdp_xmit+0x104/0x4fe
> > >> [  122.007447]  ? kasan_check_read+0x11/0x13
> > >> [  122.008267]  ? mergeable_rx_buffer_size_show+0x163/0x163
> > >> [  122.009299]  ? __asan_loadN+0xf/0x11
> > >> [  122.010010]  ? pvclock_clocksource_read+0xfa/0x189
> > >> [  122.010975]  bq_xmit_all+0xdc/0x358
> > >> [  122.011699]  __dev_map_flush+0xc2/0xef
> > >> [  122.012472]  xdp_do_flush_map+0x5b/0x74
> > >> [  122.013238]  virtnet_poll+0x58f/0x679  
> > > 
> > > Well virtnet_xdp_xmit seems to be called from .ndo_xdp_xmit
> > > and that isn't in an RCU read-side critical section.
> > > 
> > > Looks like we just need to add RCU read lock/unlock.
> > > Like the below perhaps?
> > > 
> > > This issue was introduced by 8dcc5b0ab0 however I find it  
> > 
> > Probably not 8dcc5b0ab0, but 5d053f9da431 ("bpf: devmap prepare xdp
> > frames for bulking").
> >   
> > > inelegant that we need to do checks in each driver,
> > > and add RCU locks just for a startup initialization issue.
> > > Can't XDP core make sure the callback isn't invoked
> > > at an inappropriate time instead?  
> > 
> > Before commit 5d053f9da431, .ndo_xdp_xmit() should have always been
> > called under RCU. After the commit, xdp_do_flush_map() also can trigger
> > .ndo_xdp_xmit() but we forgot to add RCU read lock there?
> > I guess veth has the same problem and I feel like it should be fixed in
> > __dev_map_flush(). dev_map_flush_old() needs to be cared too.  
> 
> I don't have a problem either way.  Jesper, what do you think?

It does sound like my commit 5d053f9da431 ("bpf: devmap prepare xdp
frames for bulking") introduced this issue.  I guess we can add the RCU
section to xdp_do_flush_map(), and then also verify that the devmap
(and cpumap) take-down code also have appropriate RCU sections (which
they should have).

Another requirement for calling .ndo_xdp_xmit is running under NAPI
protection, is that still satisifed for veth?
(even when invoked via xdp_do_flush_map()).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: virtio_net: suspicious RCU usage with xdp
  2019-04-25 17:41       ` Jesper Dangaard Brouer
@ 2019-04-25 17:44         ` David Ahern
  2019-04-25 18:54           ` Maciej Fijalkowski
  2019-04-26  7:42         ` Toshiaki Makita
  2019-04-26  8:00         ` Jason Wang
  2 siblings, 1 reply; 16+ messages in thread
From: David Ahern @ 2019-04-25 17:44 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Michael S. Tsirkin
  Cc: Toshiaki Makita, Toke Høiland-Jørgensen, netdev, Jason Wang

On 4/25/19 11:41 AM, Jesper Dangaard Brouer wrote:
> On Thu, 25 Apr 2019 13:03:39 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
>> On Thu, Apr 25, 2019 at 01:58:48PM +0900, Toshiaki Makita wrote:
>>> On 2019/04/25 2:37, Michael S. Tsirkin wrote:  
>>>> On Wed, Apr 24, 2019 at 11:13:42AM -0600, David Ahern wrote:  
>>>>> seeing an RCU warning testing xdp with virtio net. net-next as of commit
>>>>> b2f97f7de2f6a4df8e431330cf467576486651c5. No obvious changes so hoping
>>>>> this rings a bell with someone else.
>>>>>
>>>>>
>>>>> [  121.990304] =============================
>>>>> [  121.991488] WARNING: suspicious RCU usage
>>>>> [  121.992392] 5.1.0-rc5+ #60 Not tainted
>>>>> [  121.993220] -----------------------------
>>>>> [  121.994158] /home/dsa/kernel-3.git/drivers/net/virtio_net.c:516
>>>>> suspicious rcu_dereference_check() usage!
>>>>> [  121.996284]
>>>>>                other info that might help us debug this:
>>>>>
>>>>> [  121.997988]
>>>>>                rcu_scheduler_active = 2, debug_locks = 1
>>>>> [  121.999321] no locks held by swapper/1/0.
>>>>> [  122.000328]
>>>>>                stack backtrace:
>>>>> [  122.001253] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.1.0-rc5+ #60
>>>>> [  122.002474] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>>>>> BIOS 1.11.1-1 04/01/2014
>>>>> [  122.004141] Call Trace:
>>>>> [  122.004651]  <IRQ>
>>>>> [  122.005082]  dump_stack+0x7e/0xbb
>>>>> [  122.005757]  lockdep_rcu_suspicious+0x102/0x10b
>>>>> [  122.006654]  virtnet_xdp_xmit+0x104/0x4fe
>>>>> [  122.007447]  ? kasan_check_read+0x11/0x13
>>>>> [  122.008267]  ? mergeable_rx_buffer_size_show+0x163/0x163
>>>>> [  122.009299]  ? __asan_loadN+0xf/0x11
>>>>> [  122.010010]  ? pvclock_clocksource_read+0xfa/0x189
>>>>> [  122.010975]  bq_xmit_all+0xdc/0x358
>>>>> [  122.011699]  __dev_map_flush+0xc2/0xef
>>>>> [  122.012472]  xdp_do_flush_map+0x5b/0x74
>>>>> [  122.013238]  virtnet_poll+0x58f/0x679  
>>>>
>>>> Well virtnet_xdp_xmit seems to be called from .ndo_xdp_xmit
>>>> and that isn't in an RCU read-side critical section.
>>>>
>>>> Looks like we just need to add RCU read lock/unlock.
>>>> Like the below perhaps?
>>>>
>>>> This issue was introduced by 8dcc5b0ab0 however I find it  
>>>
>>> Probably not 8dcc5b0ab0, but 5d053f9da431 ("bpf: devmap prepare xdp
>>> frames for bulking").
>>>   
>>>> inelegant that we need to do checks in each driver,
>>>> and add RCU locks just for a startup initialization issue.
>>>> Can't XDP core make sure the callback isn't invoked
>>>> at an inappropriate time instead?  
>>>
>>> Before commit 5d053f9da431, .ndo_xdp_xmit() should have always been
>>> called under RCU. After the commit, xdp_do_flush_map() also can trigger
>>> .ndo_xdp_xmit() but we forgot to add RCU read lock there?
>>> I guess veth has the same problem and I feel like it should be fixed in
>>> __dev_map_flush(). dev_map_flush_old() needs to be cared too.  
>>
>> I don't have a problem either way.  Jesper, what do you think?
> 
> It does sound like my commit 5d053f9da431 ("bpf: devmap prepare xdp
> frames for bulking") introduced this issue.  I guess we can add the RCU
> section to xdp_do_flush_map(), and then also verify that the devmap
> (and cpumap) take-down code also have appropriate RCU sections (which
> they should have).
> 
> Another requirement for calling .ndo_xdp_xmit is running under NAPI
> protection, is that still satisifed for veth?
> (even when invoked via xdp_do_flush_map()).
> 

virtio_net hits this because of:
   xdp_prog = rcu_dereference(rq->xdp_prog);

in its ndo_xdp_xmit. Scanning ndo_xdp_xmit for other nics does not show
this same check. Is it really required? If so, why don't other drivers
do it?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: virtio_net: suspicious RCU usage with xdp
  2019-04-25 17:44         ` David Ahern
@ 2019-04-25 18:54           ` Maciej Fijalkowski
  2019-04-25 20:04             ` David Ahern
  2019-04-25 20:12             ` Jesper Dangaard Brouer
  0 siblings, 2 replies; 16+ messages in thread
From: Maciej Fijalkowski @ 2019-04-25 18:54 UTC (permalink / raw)
  To: David Ahern
  Cc: Jesper Dangaard Brouer, Michael S. Tsirkin, Toshiaki Makita,
	Toke Høiland-Jørgensen, netdev, Jason Wang

On Thu, 25 Apr 2019 11:44:27 -0600
David Ahern <dsahern@gmail.com> wrote:

> On 4/25/19 11:41 AM, Jesper Dangaard Brouer wrote:
> > On Thu, 25 Apr 2019 13:03:39 -0400
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >   
> >> On Thu, Apr 25, 2019 at 01:58:48PM +0900, Toshiaki Makita wrote:  
> >>> On 2019/04/25 2:37, Michael S. Tsirkin wrote:    
> >>>> On Wed, Apr 24, 2019 at 11:13:42AM -0600, David Ahern wrote:    
> >>>>> seeing an RCU warning testing xdp with virtio net. net-next as of commit
> >>>>> b2f97f7de2f6a4df8e431330cf467576486651c5. No obvious changes so hoping
> >>>>> this rings a bell with someone else.
> >>>>>
> >>>>>
> >>>>> [  121.990304] =============================
> >>>>> [  121.991488] WARNING: suspicious RCU usage
> >>>>> [  121.992392] 5.1.0-rc5+ #60 Not tainted
> >>>>> [  121.993220] -----------------------------
> >>>>> [  121.994158] /home/dsa/kernel-3.git/drivers/net/virtio_net.c:516
> >>>>> suspicious rcu_dereference_check() usage!
> >>>>> [  121.996284]
> >>>>>                other info that might help us debug this:
> >>>>>
> >>>>> [  121.997988]
> >>>>>                rcu_scheduler_active = 2, debug_locks = 1
> >>>>> [  121.999321] no locks held by swapper/1/0.
> >>>>> [  122.000328]
> >>>>>                stack backtrace:
> >>>>> [  122.001253] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.1.0-rc5+ #60
> >>>>> [  122.002474] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> >>>>> BIOS 1.11.1-1 04/01/2014
> >>>>> [  122.004141] Call Trace:
> >>>>> [  122.004651]  <IRQ>
> >>>>> [  122.005082]  dump_stack+0x7e/0xbb
> >>>>> [  122.005757]  lockdep_rcu_suspicious+0x102/0x10b
> >>>>> [  122.006654]  virtnet_xdp_xmit+0x104/0x4fe
> >>>>> [  122.007447]  ? kasan_check_read+0x11/0x13
> >>>>> [  122.008267]  ? mergeable_rx_buffer_size_show+0x163/0x163
> >>>>> [  122.009299]  ? __asan_loadN+0xf/0x11
> >>>>> [  122.010010]  ? pvclock_clocksource_read+0xfa/0x189
> >>>>> [  122.010975]  bq_xmit_all+0xdc/0x358
> >>>>> [  122.011699]  __dev_map_flush+0xc2/0xef
> >>>>> [  122.012472]  xdp_do_flush_map+0x5b/0x74
> >>>>> [  122.013238]  virtnet_poll+0x58f/0x679    
> >>>>
> >>>> Well virtnet_xdp_xmit seems to be called from .ndo_xdp_xmit
> >>>> and that isn't in an RCU read-side critical section.
> >>>>
> >>>> Looks like we just need to add RCU read lock/unlock.
> >>>> Like the below perhaps?
> >>>>
> >>>> This issue was introduced by 8dcc5b0ab0 however I find it    
> >>>
> >>> Probably not 8dcc5b0ab0, but 5d053f9da431 ("bpf: devmap prepare xdp
> >>> frames for bulking").
> >>>     
> >>>> inelegant that we need to do checks in each driver,
> >>>> and add RCU locks just for a startup initialization issue.
> >>>> Can't XDP core make sure the callback isn't invoked
> >>>> at an inappropriate time instead?    
> >>>
> >>> Before commit 5d053f9da431, .ndo_xdp_xmit() should have always been
> >>> called under RCU. After the commit, xdp_do_flush_map() also can trigger
> >>> .ndo_xdp_xmit() but we forgot to add RCU read lock there?
> >>> I guess veth has the same problem and I feel like it should be fixed in
> >>> __dev_map_flush(). dev_map_flush_old() needs to be cared too.    
> >>
> >> I don't have a problem either way.  Jesper, what do you think?  
> > 
> > It does sound like my commit 5d053f9da431 ("bpf: devmap prepare xdp
> > frames for bulking") introduced this issue.  I guess we can add the RCU
> > section to xdp_do_flush_map(), and then also verify that the devmap
> > (and cpumap) take-down code also have appropriate RCU sections (which
> > they should have).
> > 
> > Another requirement for calling .ndo_xdp_xmit is running under NAPI
> > protection, is that still satisifed for veth?
> > (even when invoked via xdp_do_flush_map()).
> >   
> 
> virtio_net hits this because of:
>    xdp_prog = rcu_dereference(rq->xdp_prog);
> 
> in its ndo_xdp_xmit. Scanning ndo_xdp_xmit for other nics does not show
> this same check. Is it really required? If so, why don't other drivers
> do it?

That was my initial thought as well. Can't we just check that
vi->xdp_queue_pairs was set in virtnet_xdp_xmit? It is being done just before
assigning the XDP prog pointers for rqs. Haven't looked at veth though.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: virtio_net: suspicious RCU usage with xdp
  2019-04-25 18:54           ` Maciej Fijalkowski
@ 2019-04-25 20:04             ` David Ahern
  2019-04-25 20:12             ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 16+ messages in thread
From: David Ahern @ 2019-04-25 20:04 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Jesper Dangaard Brouer, Michael S. Tsirkin, Toshiaki Makita,
	Toke Høiland-Jørgensen, netdev, Jason Wang

On 4/25/19 12:54 PM, Maciej Fijalkowski wrote:
>>
>> virtio_net hits this because of:
>>    xdp_prog = rcu_dereference(rq->xdp_prog);
>>
>> in its ndo_xdp_xmit. Scanning ndo_xdp_xmit for other nics does not show
>> this same check. Is it really required? If so, why don't other drivers
>> do it?
> 
> That was my initial thought as well. Can't we just check that
> vi->xdp_queue_pairs was set in virtnet_xdp_xmit? It is being done just before
> assigning the XDP prog pointers for rqs. Haven't looked at veth though.
> 

or change the dereference to just rcu_access_pointer. The xdp_prog is
not needed; it only checks if set.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: virtio_net: suspicious RCU usage with xdp
  2019-04-25 18:54           ` Maciej Fijalkowski
  2019-04-25 20:04             ` David Ahern
@ 2019-04-25 20:12             ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2019-04-25 20:12 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: David Ahern, Michael S. Tsirkin, Toshiaki Makita,
	Toke Høiland-Jørgensen, netdev, Jason Wang, brouer,
	Saeed Mahameed, John Fastabend

On Thu, 25 Apr 2019 20:54:11 +0200
Maciej Fijalkowski <maciejromanfijalkowski@gmail.com> wrote:

> On Thu, 25 Apr 2019 11:44:27 -0600
> David Ahern <dsahern@gmail.com> wrote:
> 
> > On 4/25/19 11:41 AM, Jesper Dangaard Brouer wrote:  
> > > On Thu, 25 Apr 2019 13:03:39 -0400
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >     
> > >> On Thu, Apr 25, 2019 at 01:58:48PM +0900, Toshiaki Makita wrote:    
> > >>> On 2019/04/25 2:37, Michael S. Tsirkin wrote:      
> > >>>> On Wed, Apr 24, 2019 at 11:13:42AM -0600, David Ahern wrote:      
> > >>>>> seeing an RCU warning testing xdp with virtio net. net-next as of commit
> > >>>>> b2f97f7de2f6a4df8e431330cf467576486651c5. No obvious changes so hoping
> > >>>>> this rings a bell with someone else.
> > >>>>>
> > >>>>>
> > >>>>> [  121.990304] =============================
> > >>>>> [  121.991488] WARNING: suspicious RCU usage
> > >>>>> [  121.992392] 5.1.0-rc5+ #60 Not tainted
> > >>>>> [  121.993220] -----------------------------
> > >>>>> [  121.994158] /home/dsa/kernel-3.git/drivers/net/virtio_net.c:516
> > >>>>> suspicious rcu_dereference_check() usage!
> > >>>>> [  121.996284]
> > >>>>>                other info that might help us debug this:
> > >>>>>
> > >>>>> [  121.997988]
> > >>>>>                rcu_scheduler_active = 2, debug_locks = 1
> > >>>>> [  121.999321] no locks held by swapper/1/0.
> > >>>>> [  122.000328]
> > >>>>>                stack backtrace:
> > >>>>> [  122.001253] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.1.0-rc5+ #60
> > >>>>> [  122.002474] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > >>>>> BIOS 1.11.1-1 04/01/2014
> > >>>>> [  122.004141] Call Trace:
> > >>>>> [  122.004651]  <IRQ>
> > >>>>> [  122.005082]  dump_stack+0x7e/0xbb
> > >>>>> [  122.005757]  lockdep_rcu_suspicious+0x102/0x10b
> > >>>>> [  122.006654]  virtnet_xdp_xmit+0x104/0x4fe
> > >>>>> [  122.007447]  ? kasan_check_read+0x11/0x13
> > >>>>> [  122.008267]  ? mergeable_rx_buffer_size_show+0x163/0x163
> > >>>>> [  122.009299]  ? __asan_loadN+0xf/0x11
> > >>>>> [  122.010010]  ? pvclock_clocksource_read+0xfa/0x189
> > >>>>> [  122.010975]  bq_xmit_all+0xdc/0x358
> > >>>>> [  122.011699]  __dev_map_flush+0xc2/0xef
> > >>>>> [  122.012472]  xdp_do_flush_map+0x5b/0x74
> > >>>>> [  122.013238]  virtnet_poll+0x58f/0x679      
> > >>>>
> > >>>> Well virtnet_xdp_xmit seems to be called from .ndo_xdp_xmit
> > >>>> and that isn't in an RCU read-side critical section.
> > >>>>
> > >>>> Looks like we just need to add RCU read lock/unlock.
> > >>>> Like the below perhaps?
> > >>>>
> > >>>> This issue was introduced by 8dcc5b0ab0 however I find it      
> > >>>
> > >>> Probably not 8dcc5b0ab0, but 5d053f9da431 ("bpf: devmap prepare xdp
> > >>> frames for bulking").
> > >>>       
> > >>>> inelegant that we need to do checks in each driver,
> > >>>> and add RCU locks just for a startup initialization issue.
> > >>>> Can't XDP core make sure the callback isn't invoked
> > >>>> at an inappropriate time instead?      
> > >>>
> > >>> Before commit 5d053f9da431, .ndo_xdp_xmit() should have always been
> > >>> called under RCU. After the commit, xdp_do_flush_map() also can trigger
> > >>> .ndo_xdp_xmit() but we forgot to add RCU read lock there?
> > >>> I guess veth has the same problem and I feel like it should be fixed in
> > >>> __dev_map_flush(). dev_map_flush_old() needs to be cared too.      
> > >>
> > >> I don't have a problem either way.  Jesper, what do you think?    
> > > 
> > > It does sound like my commit 5d053f9da431 ("bpf: devmap prepare xdp
> > > frames for bulking") introduced this issue.  I guess we can add the RCU
> > > section to xdp_do_flush_map(), and then also verify that the devmap
> > > (and cpumap) take-down code also have appropriate RCU sections (which
> > > they should have).
> > > 
> > > Another requirement for calling .ndo_xdp_xmit is running under NAPI
> > > protection, is that still satisifed for veth?
> > > (even when invoked via xdp_do_flush_map()).
> > >     
> > 
> > virtio_net hits this because of:
> >    xdp_prog = rcu_dereference(rq->xdp_prog);
> > 
> > in its ndo_xdp_xmit. Scanning ndo_xdp_xmit for other nics does not show
> > this same check. Is it really required? If so, why don't other drivers
> > do it?  
> 
> That was my initial thought as well. Can't we just check that
> vi->xdp_queue_pairs was set in virtnet_xdp_xmit? It is being done
> just before assigning the XDP prog pointers for rqs. Haven't looked
> at veth though.

Ah, I see, the reason virtio_net is doing this is documented in a
comment:

	/* Only allow ndo_xdp_xmit if XDP is loaded on dev, as this
	 * indicate XDP resources have been successfully allocated.
	 */

As we have discussed before, this is a bad assumption. We even have a
TODO[1] in the XDP-project named: Better ndo_xdp_xmit resource management.
This area need to be improved. 

I can see other drivers also end-up checking for if xdp_prog pointer
exist, but they don't need to dereference the pointer.  Even if we find
some other resource management check, then that check also need some
protection that likely want to leverage RCU as well(?).

The mlx5 driver also had issues in this area, and is doing some RCU
sync/barrier tricks, which also indicate a RCU dependency.


[1] https://github.com/xdp-project/xdp-project/blob/master/xdp-project.org#better-ndo_xdp_xmit-resource-management

IHMO this tying XDP ndo_xdp_xmit resources to whether there is a XDP RX
prog loaded is wrong... and cause the very ugly trick of loading dummy
XDP program on TX devices.
Fixing this is a larger effort... a work-around for now would be to do
a RCU read-section in xdp_do_flush_map().
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: virtio_net: suspicious RCU usage with xdp
  2019-04-25 17:41       ` Jesper Dangaard Brouer
  2019-04-25 17:44         ` David Ahern
@ 2019-04-26  7:42         ` Toshiaki Makita
  2019-04-26  8:00         ` Jason Wang
  2 siblings, 0 replies; 16+ messages in thread
From: Toshiaki Makita @ 2019-04-26  7:42 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Michael S. Tsirkin
  Cc: David Ahern, Toke Høiland-Jørgensen, netdev, Jason Wang

On 2019/04/26 2:41, Jesper Dangaard Brouer wrote:
> On Thu, 25 Apr 2019 13:03:39 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
>> On Thu, Apr 25, 2019 at 01:58:48PM +0900, Toshiaki Makita wrote:
>>> On 2019/04/25 2:37, Michael S. Tsirkin wrote:  
>>>> On Wed, Apr 24, 2019 at 11:13:42AM -0600, David Ahern wrote:  
>>>>> seeing an RCU warning testing xdp with virtio net. net-next as of commit
>>>>> b2f97f7de2f6a4df8e431330cf467576486651c5. No obvious changes so hoping
>>>>> this rings a bell with someone else.
>>>>>
>>>>>
>>>>> [  121.990304] =============================
>>>>> [  121.991488] WARNING: suspicious RCU usage
>>>>> [  121.992392] 5.1.0-rc5+ #60 Not tainted
>>>>> [  121.993220] -----------------------------
>>>>> [  121.994158] /home/dsa/kernel-3.git/drivers/net/virtio_net.c:516
>>>>> suspicious rcu_dereference_check() usage!
>>>>> [  121.996284]
>>>>>                other info that might help us debug this:
>>>>>
>>>>> [  121.997988]
>>>>>                rcu_scheduler_active = 2, debug_locks = 1
>>>>> [  121.999321] no locks held by swapper/1/0.
>>>>> [  122.000328]
>>>>>                stack backtrace:
>>>>> [  122.001253] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.1.0-rc5+ #60
>>>>> [  122.002474] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>>>>> BIOS 1.11.1-1 04/01/2014
>>>>> [  122.004141] Call Trace:
>>>>> [  122.004651]  <IRQ>
>>>>> [  122.005082]  dump_stack+0x7e/0xbb
>>>>> [  122.005757]  lockdep_rcu_suspicious+0x102/0x10b
>>>>> [  122.006654]  virtnet_xdp_xmit+0x104/0x4fe
>>>>> [  122.007447]  ? kasan_check_read+0x11/0x13
>>>>> [  122.008267]  ? mergeable_rx_buffer_size_show+0x163/0x163
>>>>> [  122.009299]  ? __asan_loadN+0xf/0x11
>>>>> [  122.010010]  ? pvclock_clocksource_read+0xfa/0x189
>>>>> [  122.010975]  bq_xmit_all+0xdc/0x358
>>>>> [  122.011699]  __dev_map_flush+0xc2/0xef
>>>>> [  122.012472]  xdp_do_flush_map+0x5b/0x74
>>>>> [  122.013238]  virtnet_poll+0x58f/0x679  
>>>>
>>>> Well virtnet_xdp_xmit seems to be called from .ndo_xdp_xmit
>>>> and that isn't in an RCU read-side critical section.
>>>>
>>>> Looks like we just need to add RCU read lock/unlock.
>>>> Like the below perhaps?
>>>>
>>>> This issue was introduced by 8dcc5b0ab0 however I find it  
>>>
>>> Probably not 8dcc5b0ab0, but 5d053f9da431 ("bpf: devmap prepare xdp
>>> frames for bulking").
>>>   
>>>> inelegant that we need to do checks in each driver,
>>>> and add RCU locks just for a startup initialization issue.
>>>> Can't XDP core make sure the callback isn't invoked
>>>> at an inappropriate time instead?  
>>>
>>> Before commit 5d053f9da431, .ndo_xdp_xmit() should have always been
>>> called under RCU. After the commit, xdp_do_flush_map() also can trigger
>>> .ndo_xdp_xmit() but we forgot to add RCU read lock there?
>>> I guess veth has the same problem and I feel like it should be fixed in
>>> __dev_map_flush(). dev_map_flush_old() needs to be cared too.  
>>
>> I don't have a problem either way.  Jesper, what do you think?
> 
> It does sound like my commit 5d053f9da431 ("bpf: devmap prepare xdp
> frames for bulking") introduced this issue.  I guess we can add the RCU
> section to xdp_do_flush_map(), and then also verify that the devmap
> (and cpumap) take-down code also have appropriate RCU sections (which
> they should have).
> 
> Another requirement for calling .ndo_xdp_xmit is running under NAPI
> protection, is that still satisifed for veth?
> (even when invoked via xdp_do_flush_map()).

veth_xdp_xmit() assumes it's called from NAPI handler, and veth calls
xdp_do_flush_map() only from within NAPI context. Not sure what happens
when .ndo_xdp_xmit is called from dev_map_flush_old().

-- 
Toshiaki Makita


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: virtio_net: suspicious RCU usage with xdp
  2019-04-25 17:41       ` Jesper Dangaard Brouer
  2019-04-25 17:44         ` David Ahern
  2019-04-26  7:42         ` Toshiaki Makita
@ 2019-04-26  8:00         ` Jason Wang
  2019-04-26 11:05           ` Jesper Dangaard Brouer
  2 siblings, 1 reply; 16+ messages in thread
From: Jason Wang @ 2019-04-26  8:00 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Michael S. Tsirkin
  Cc: Toshiaki Makita, David Ahern, Toke Høiland-Jørgensen, netdev


On 2019/4/26 上午1:41, Jesper Dangaard Brouer wrote:
> It does sound like my commit 5d053f9da431 ("bpf: devmap prepare xdp
> frames for bulking") introduced this issue.  I guess we can add the RCU
> section to xdp_do_flush_map(), and then also verify that the devmap
> (and cpumap) take-down code also have appropriate RCU sections (which
> they should have).
>
> Another requirement for calling .ndo_xdp_xmit is running under NAPI
> protection,


May I know the reason for this? I'm asking since if the packet was 
redirected from tuntap, ndo_xdp_xmit()  won't be called under the 
protection of NAPI (but bh is disabled).

Thanks


> is that still satisifed for veth?
> (even when invoked via xdp_do_flush_map()).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: virtio_net: suspicious RCU usage with xdp
  2019-04-26  8:00         ` Jason Wang
@ 2019-04-26 11:05           ` Jesper Dangaard Brouer
  2019-04-28  3:06             ` Jason Wang
  0 siblings, 1 reply; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2019-04-26 11:05 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Toshiaki Makita, David Ahern,
	Toke Høiland-Jørgensen, netdev, brouer, John Fastabend

On Fri, 26 Apr 2019 16:00:28 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 2019/4/26 上午1:41, Jesper Dangaard Brouer wrote:
> > It does sound like my commit 5d053f9da431 ("bpf: devmap prepare xdp
> > frames for bulking") introduced this issue.  I guess we can add the RCU
> > section to xdp_do_flush_map(), and then also verify that the devmap
> > (and cpumap) take-down code also have appropriate RCU sections (which
> > they should have).
> >
> > Another requirement for calling .ndo_xdp_xmit is running under NAPI
> > protection,  
> 
> 
> May I know the reason for this? I'm asking since if the packet was 
> redirected from tuntap, ndo_xdp_xmit()  won't be called under the 
> protection of NAPI (but bh is disabled).

There are a number of things that rely on this NAPI/softirq protection.

One is preempt-free access per-cpu struct bpf_redirect_info. Which is
at the core of the XDP and TC redirect feature.

  DEFINE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info);
  EXPORT_PER_CPU_SYMBOL_GPL(bpf_redirect_info);
  struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);

And devmap and cpumap also have per-cpu variables, that we don't use
preempt-disable around.

Another is xdp_return_frame_rx_napi() that when page_pool is active,
can store frames to be recycled directly into an array, in function
__page_pool_recycle_direct() (but as I don't trust every driver getting
this correct I've added a safe-guard in page-pool via
in_serving_softirq().

I guess, disable_bh is sufficient protection, as we are mostly
optimizing away a preempt-disable when accessing per-cpu variables.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: virtio_net: suspicious RCU usage with xdp
  2019-04-26 11:05           ` Jesper Dangaard Brouer
@ 2019-04-28  3:06             ` Jason Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Jason Wang @ 2019-04-28  3:06 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Michael S. Tsirkin, Toshiaki Makita, David Ahern,
	Toke Høiland-Jørgensen, netdev, John Fastabend


On 2019/4/26 下午7:05, Jesper Dangaard Brouer wrote:
> On Fri, 26 Apr 2019 16:00:28 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> On 2019/4/26 上午1:41, Jesper Dangaard Brouer wrote:
>>> It does sound like my commit 5d053f9da431 ("bpf: devmap prepare xdp
>>> frames for bulking") introduced this issue.  I guess we can add the RCU
>>> section to xdp_do_flush_map(), and then also verify that the devmap
>>> (and cpumap) take-down code also have appropriate RCU sections (which
>>> they should have).
>>>
>>> Another requirement for calling .ndo_xdp_xmit is running under NAPI
>>> protection,
>>
>> May I know the reason for this? I'm asking since if the packet was
>> redirected from tuntap, ndo_xdp_xmit()  won't be called under the
>> protection of NAPI (but bh is disabled).
> There are a number of things that rely on this NAPI/softirq protection.
>
> One is preempt-free access per-cpu struct bpf_redirect_info. Which is
> at the core of the XDP and TC redirect feature.
>
>    DEFINE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info);
>    EXPORT_PER_CPU_SYMBOL_GPL(bpf_redirect_info);
>    struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
>
> And devmap and cpumap also have per-cpu variables, that we don't use
> preempt-disable around.
>
> Another is xdp_return_frame_rx_napi() that when page_pool is active,
> can store frames to be recycled directly into an array, in function
> __page_pool_recycle_direct() (but as I don't trust every driver getting
> this correct I've added a safe-guard in page-pool via
> in_serving_softirq().


I see, if I want to use page pool for tap for VM2VM traffic, this 
probably means I can only recycle through ptr_ring.


>
> I guess, disable_bh is sufficient protection, as we are mostly
> optimizing away a preempt-disable when accessing per-cpu variables.
>

Yes, that's what I want for confirm.

Thanks


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-04-28  3:07 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-24 17:13 virtio_net: suspicious RCU usage with xdp David Ahern
2019-04-24 17:37 ` Michael S. Tsirkin
2019-04-24 17:40   ` David Ahern
2019-04-25  2:56     ` Jason Wang
2019-04-25  2:55   ` Jason Wang
2019-04-25  4:58   ` Toshiaki Makita
2019-04-25 17:03     ` Michael S. Tsirkin
2019-04-25 17:41       ` Jesper Dangaard Brouer
2019-04-25 17:44         ` David Ahern
2019-04-25 18:54           ` Maciej Fijalkowski
2019-04-25 20:04             ` David Ahern
2019-04-25 20:12             ` Jesper Dangaard Brouer
2019-04-26  7:42         ` Toshiaki Makita
2019-04-26  8:00         ` Jason Wang
2019-04-26 11:05           ` Jesper Dangaard Brouer
2019-04-28  3:06             ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).