[3/3] vhost-net: use lock_sock_fast() in peek_head_len()
diff mbox series

Message ID 20110117081117.18900.48672.stgit@dhcp-91-7.nay.redhat.com.englab.nay.redhat.com
State New, archived
Headers show
Series
  • [1/3] vhost-net: check the support of mergeable buffer outside the receive loop
Related show

Commit Message

Jason Wang Jan. 17, 2011, 8:11 a.m. UTC
We can use lock_sock_fast() instead of lock_sock() in order to get
speedup in peek_head_len().

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Comments

Eric Dumazet Jan. 17, 2011, 9:33 a.m. UTC | #1
Le lundi 17 janvier 2011 à 16:11 +0800, Jason Wang a écrit :
> We can use lock_sock_fast() instead of lock_sock() in order to get
> speedup in peek_head_len().
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/net.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index c32a2e4..50b622a 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -211,12 +211,12 @@ static int peek_head_len(struct sock *sk)
>  {
>  	struct sk_buff *head;
>  	int len = 0;
> +	bool slow = lock_sock_fast(sk);
>  
> -	lock_sock(sk);
>  	head = skb_peek(&sk->sk_receive_queue);
>  	if (head)
>  		len = head->len;
> -	release_sock(sk);
> +	unlock_sock_fast(sk, slow);
>  	return len;
>  }
>  
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Michael S. Tsirkin Jan. 17, 2011, 9:57 a.m. UTC | #2
On Mon, Jan 17, 2011 at 04:11:17PM +0800, Jason Wang wrote:
> We can use lock_sock_fast() instead of lock_sock() in order to get
> speedup in peek_head_len().
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Queued for 2.6.39, thanks everyone.

> ---
>  drivers/vhost/net.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index c32a2e4..50b622a 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -211,12 +211,12 @@ static int peek_head_len(struct sock *sk)
>  {
>  	struct sk_buff *head;
>  	int len = 0;
> +	bool slow = lock_sock_fast(sk);
>  
> -	lock_sock(sk);
>  	head = skb_peek(&sk->sk_receive_queue);
>  	if (head)
>  		len = head->len;
> -	release_sock(sk);
> +	unlock_sock_fast(sk, slow);
>  	return len;
>  }
>  
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Michael S. Tsirkin March 13, 2011, 3:06 p.m. UTC | #3
On Mon, Jan 17, 2011 at 04:11:17PM +0800, Jason Wang wrote:
> We can use lock_sock_fast() instead of lock_sock() in order to get
> speedup in peek_head_len().
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/net.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index c32a2e4..50b622a 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -211,12 +211,12 @@ static int peek_head_len(struct sock *sk)
>  {
>  	struct sk_buff *head;
>  	int len = 0;
> +	bool slow = lock_sock_fast(sk);
>  
> -	lock_sock(sk);
>  	head = skb_peek(&sk->sk_receive_queue);
>  	if (head)
>  		len = head->len;
> -	release_sock(sk);
> +	unlock_sock_fast(sk, slow);
>  	return len;
>  }
>  

Wanted to apply this, but looking at the code I think the lock_sock here
is wrong. What we really need is to handle the case where the skb is
pulled from the receive queue after skb_peek.  However this is not the
right lock to use for that, sk_receive_queue.lock is.
So I expect the following is the right way to handle this.
Comments?

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 0329c41..5720301 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -213,12 +213,13 @@ static int peek_head_len(struct sock *sk)
 {
 	struct sk_buff *head;
 	int len = 0;
+	unsigned long flags;
 
-	lock_sock(sk);
+	spin_lock_irqsave(&sk->sk_receive_queue.lock, flags);
 	head = skb_peek(&sk->sk_receive_queue);
-	if (head)
+	if (likely(head))
 		len = head->len;
-	release_sock(sk);
+	spin_unlock_irqrestore(&sk->sk_receive_queue.lock, flags);
 	return len;
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Eric Dumazet March 13, 2011, 3:52 p.m. UTC | #4
Le dimanche 13 mars 2011 à 17:06 +0200, Michael S. Tsirkin a écrit :
> On Mon, Jan 17, 2011 at 04:11:17PM +0800, Jason Wang wrote:
> > We can use lock_sock_fast() instead of lock_sock() in order to get
> > speedup in peek_head_len().
> > 
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >  drivers/vhost/net.c |    4 ++--
> >  1 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index c32a2e4..50b622a 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -211,12 +211,12 @@ static int peek_head_len(struct sock *sk)
> >  {
> >  	struct sk_buff *head;
> >  	int len = 0;
> > +	bool slow = lock_sock_fast(sk);
> >  
> > -	lock_sock(sk);
> >  	head = skb_peek(&sk->sk_receive_queue);
> >  	if (head)
> >  		len = head->len;
> > -	release_sock(sk);
> > +	unlock_sock_fast(sk, slow);
> >  	return len;
> >  }
> >  
> 
> Wanted to apply this, but looking at the code I think the lock_sock here
> is wrong. What we really need is to handle the case where the skb is
> pulled from the receive queue after skb_peek.  However this is not the
> right lock to use for that, sk_receive_queue.lock is.
> So I expect the following is the right way to handle this.
> Comments?
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 0329c41..5720301 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -213,12 +213,13 @@ static int peek_head_len(struct sock *sk)
>  {
>  	struct sk_buff *head;
>  	int len = 0;
> +	unsigned long flags;
>  
> -	lock_sock(sk);
> +	spin_lock_irqsave(&sk->sk_receive_queue.lock, flags);
>  	head = skb_peek(&sk->sk_receive_queue);
> -	if (head)
> +	if (likely(head))
>  		len = head->len;
> -	release_sock(sk);
> +	spin_unlock_irqrestore(&sk->sk_receive_queue.lock, flags);
>  	return len;
>  }
>  

You may be right, only way to be sure is to check the other side.

If it uses skb_queue_tail(), then yes, your patch is fine.

If other side did not lock socket, then your patch is a bug fix.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Michael S. Tsirkin March 13, 2011, 4:19 p.m. UTC | #5
On Sun, Mar 13, 2011 at 04:52:50PM +0100, Eric Dumazet wrote:
> Le dimanche 13 mars 2011 à 17:06 +0200, Michael S. Tsirkin a écrit :
> > On Mon, Jan 17, 2011 at 04:11:17PM +0800, Jason Wang wrote:
> > > We can use lock_sock_fast() instead of lock_sock() in order to get
> > > speedup in peek_head_len().
> > > 
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > ---
> > >  drivers/vhost/net.c |    4 ++--
> > >  1 files changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > > index c32a2e4..50b622a 100644
> > > --- a/drivers/vhost/net.c
> > > +++ b/drivers/vhost/net.c
> > > @@ -211,12 +211,12 @@ static int peek_head_len(struct sock *sk)
> > >  {
> > >  	struct sk_buff *head;
> > >  	int len = 0;
> > > +	bool slow = lock_sock_fast(sk);
> > >  
> > > -	lock_sock(sk);
> > >  	head = skb_peek(&sk->sk_receive_queue);
> > >  	if (head)
> > >  		len = head->len;
> > > -	release_sock(sk);
> > > +	unlock_sock_fast(sk, slow);
> > >  	return len;
> > >  }
> > >  
> > 
> > Wanted to apply this, but looking at the code I think the lock_sock here
> > is wrong. What we really need is to handle the case where the skb is
> > pulled from the receive queue after skb_peek.  However this is not the
> > right lock to use for that, sk_receive_queue.lock is.
> > So I expect the following is the right way to handle this.
> > Comments?
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index 0329c41..5720301 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -213,12 +213,13 @@ static int peek_head_len(struct sock *sk)
> >  {
> >  	struct sk_buff *head;
> >  	int len = 0;
> > +	unsigned long flags;
> >  
> > -	lock_sock(sk);
> > +	spin_lock_irqsave(&sk->sk_receive_queue.lock, flags);
> >  	head = skb_peek(&sk->sk_receive_queue);
> > -	if (head)
> > +	if (likely(head))
> >  		len = head->len;
> > -	release_sock(sk);
> > +	spin_unlock_irqrestore(&sk->sk_receive_queue.lock, flags);
> >  	return len;
> >  }
> >  
> 
> You may be right, only way to be sure is to check the other side.
> 
> If it uses skb_queue_tail(), then yes, your patch is fine.
> 
> If other side did not lock socket, then your patch is a bug fix.
> 
> 

Other side is in drivers/net/tun.c and net/packet/af_packet.c
At least wrt tun it seems clear socket is not locked.
Besides queue, dequeue seems to be done without socket locked.
Eric Dumazet March 13, 2011, 4:32 p.m. UTC | #6
Le dimanche 13 mars 2011 à 18:19 +0200, Michael S. Tsirkin a écrit :

> Other side is in drivers/net/tun.c and net/packet/af_packet.c
> At least wrt tun it seems clear socket is not locked.

Yes (assuming you refer to tun_net_xmit())

> Besides queue, dequeue seems to be done without socket locked.
> 

It seems this code (assuming you speak of drivers/vhost/net.c ?) has
some races indeed.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Michael S. Tsirkin March 13, 2011, 4:43 p.m. UTC | #7
On Sun, Mar 13, 2011 at 05:32:07PM +0100, Eric Dumazet wrote:
> Le dimanche 13 mars 2011 à 18:19 +0200, Michael S. Tsirkin a écrit :
> 
> > Other side is in drivers/net/tun.c and net/packet/af_packet.c
> > At least wrt tun it seems clear socket is not locked.
> 
> Yes (assuming you refer to tun_net_xmit())
> 
> > Besides queue, dequeue seems to be done without socket locked.
> > 
> 
> It seems this code (assuming you speak of drivers/vhost/net.c ?) has
> some races indeed.
> 

Hmm. Any more besides the one fixed here?
Eric Dumazet March 13, 2011, 5:41 p.m. UTC | #8
Le dimanche 13 mars 2011 à 18:43 +0200, Michael S. Tsirkin a écrit :
> On Sun, Mar 13, 2011 at 05:32:07PM +0100, Eric Dumazet wrote:
> > Le dimanche 13 mars 2011 à 18:19 +0200, Michael S. Tsirkin a écrit :
> > 
> > > Other side is in drivers/net/tun.c and net/packet/af_packet.c
> > > At least wrt tun it seems clear socket is not locked.
> > 
> > Yes (assuming you refer to tun_net_xmit())
> > 
> > > Besides queue, dequeue seems to be done without socket locked.
> > > 
> > 
> > It seems this code (assuming you speak of drivers/vhost/net.c ?) has
> > some races indeed.
> > 
> 
> Hmm. Any more besides the one fixed here?
> 

If writers and readers dont share a common lock, how can they reliably
synchronize states ?

For example, the check at line 420 seems unsafe or useless.

skb_queue_empty(&sock->sk->sk_receive_queue)



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Michael S. Tsirkin March 13, 2011, 9:11 p.m. UTC | #9
On Sun, Mar 13, 2011 at 06:41:32PM +0100, Eric Dumazet wrote:
> Le dimanche 13 mars 2011 à 18:43 +0200, Michael S. Tsirkin a écrit :
> > On Sun, Mar 13, 2011 at 05:32:07PM +0100, Eric Dumazet wrote:
> > > Le dimanche 13 mars 2011 à 18:19 +0200, Michael S. Tsirkin a écrit :
> > > 
> > > > Other side is in drivers/net/tun.c and net/packet/af_packet.c
> > > > At least wrt tun it seems clear socket is not locked.
> > > 
> > > Yes (assuming you refer to tun_net_xmit())
> > > 
> > > > Besides queue, dequeue seems to be done without socket locked.
> > > > 
> > > 
> > > It seems this code (assuming you speak of drivers/vhost/net.c ?) has
> > > some races indeed.
> > > 
> > 
> > Hmm. Any more besides the one fixed here?
> > 
> 
> If writers and readers dont share a common lock, how can they reliably
> synchronize states ?

They are all supposed to use sk_receive_queue.lock I think.

> For example, the check at line 420 seems unsafe or useless.
> 
> skb_queue_empty(&sock->sk->sk_receive_queue)
> 

It's mostly useless: code that is called after this
 does skb_peek and checks the result under the spinlock.
This was supposed to be an optimization: quickly check
that queue is not empty before we bother disabling notifications
etc, but I dont' remember at this point whether it actually gives any gain.
Thanks for pointing this out, I'll take it out I think (below).
Note: there are two places of this call in upstream: handle_rx_bug and
handle_rx_mergeable, but they are merged into a single
handle_rx by a patch by Jason Wang.
The below patch is on top.
If you like to look at the latest code,
it's here master.kernel.org:/home/mst/pub/vhost.git
branch vhost-net-next has it all.

Eric, thanks very much for pointing out these.
Is there anything else that you see in this driver?


Thanks!


    vhost-net: remove unlocked use of receive_queue
    
    Use of skb_queue_empty(&sock->sk->sk_receive_queue)
    without taking the sk_receive_queue.lock is unsafe
    or useless. Take it out.
    
    Reported-by:  Eric Dumazet <eric.dumazet@gmail.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 5720301..2f7c76a 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -311,7 +311,7 @@ static void handle_rx(struct vhost_net *net)
 	/* TODO: check that we are running from vhost_worker? */
 	struct socket *sock = rcu_dereference_check(vq->private_data, 1);
 
-	if (!sock || skb_queue_empty(&sock->sk->sk_receive_queue))
+	if (!sock)
 		return;
 
 	mutex_lock(&vq->mutex);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch
diff mbox series

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index c32a2e4..50b622a 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -211,12 +211,12 @@  static int peek_head_len(struct sock *sk)
 {
 	struct sk_buff *head;
 	int len = 0;
+	bool slow = lock_sock_fast(sk);
 
-	lock_sock(sk);
 	head = skb_peek(&sk->sk_receive_queue);
 	if (head)
 		len = head->len;
-	release_sock(sk);
+	unlock_sock_fast(sk, slow);
 	return len;
 }