All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net V2 1/2] tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
@ 2013-07-18  2:55 Jason Wang
  2013-07-18  2:55 ` [PATCH net V2 2/2] macvtap: " Jason Wang
  2013-07-19  5:03 ` [PATCH net V2 1/2] tuntap: " Michael S. Tsirkin
  0 siblings, 2 replies; 8+ messages in thread
From: Jason Wang @ 2013-07-18  2:55 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: Jason Wang

We try to linearize part of the skb when the number of iov is greater than
MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than
one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest
network.

Solve this problem by calculate the pages needed for iov before trying to do
zerocopy and switch to use copy instead of zerocopy if it needs more than
MAX_SKB_FRAGS.

This is done through introducing a new helper to count the pages for iov, and
call uarg->callback() manually when switching from zerocopy to copy to notify
vhost.

We can do further optimization on top.

The bug were introduced from commit 0690899b4d4501b3505be069b9a687e68ccbe15b
(tun: experimental zero copy tx support)

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

---
The patch is needed fot -stable.
---
 drivers/net/tun.c |   62 ++++++++++++++++++++++++++++++++--------------------
 1 files changed, 38 insertions(+), 24 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 5cdcf92..db690a3 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1035,6 +1035,29 @@ static int zerocopy_sg_from_iovec(struct sk_buff *skb, const struct iovec *from,
 	return 0;
 }
 
+static unsigned long iov_pages(const struct iovec *iv, int offset,
+			       unsigned long nr_segs)
+{
+	unsigned long seg, base;
+	int pages = 0, len, size;
+
+	while (nr_segs && (offset >= iv->iov_len)) {
+		offset -= iv->iov_len;
+		++iv;
+		--nr_segs;
+	}
+
+	for (seg = 0; seg < nr_segs; seg++) {
+		base = (unsigned long)iv[seg].iov_base + offset;
+		len = iv[seg].iov_len - offset;
+		size = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT;
+		pages += size;
+		offset = 0;
+	}
+
+	return pages;
+}
+
 /* Get packet from user space buffer */
 static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 			    void *msg_control, const struct iovec *iv,
@@ -1082,32 +1105,18 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 			return -EINVAL;
 	}
 
-	if (msg_control)
-		zerocopy = true;
-
-	if (zerocopy) {
-		/* Userspace may produce vectors with count greater than
-		 * MAX_SKB_FRAGS, so we need to linearize parts of the skb
-		 * to let the rest of data to be fit in the frags.
-		 */
-		if (count > MAX_SKB_FRAGS) {
-			copylen = iov_length(iv, count - MAX_SKB_FRAGS);
-			if (copylen < offset)
-				copylen = 0;
-			else
-				copylen -= offset;
-		} else
-				copylen = 0;
-		/* There are 256 bytes to be copied in skb, so there is enough
-		 * room for skb expand head in case it is used.
+	if (msg_control) {
+		/* There are 256 bytes to be copied in skb, so there is
+		 * enough room for skb expand head in case it is used.
 		 * The rest of the buffer is mapped from userspace.
 		 */
-		if (copylen < gso.hdr_len)
-			copylen = gso.hdr_len;
-		if (!copylen)
-			copylen = GOODCOPY_LEN;
+		copylen = gso.hdr_len ? gso.hdr_len : GOODCOPY_LEN;
 		linear = copylen;
-	} else {
+		if (iov_pages(iv, offset + copylen, count) <= MAX_SKB_FRAGS)
+			zerocopy = true;
+	}
+
+	if (!zerocopy) {
 		copylen = len;
 		linear = gso.hdr_len;
 	}
@@ -1121,8 +1130,13 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 
 	if (zerocopy)
 		err = zerocopy_sg_from_iovec(skb, iv, offset, count);
-	else
+	else {
 		err = skb_copy_datagram_from_iovec(skb, 0, iv, offset, len);
+		if (!err && msg_control) {
+			struct ubuf_info *uarg = msg_control;
+			uarg->callback(uarg, false);
+		}
+	}
 
 	if (err) {
 		tun->dev->stats.rx_dropped++;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net V2 2/2] macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
  2013-07-18  2:55 [PATCH net V2 1/2] tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS Jason Wang
@ 2013-07-18  2:55 ` Jason Wang
  2013-07-19  5:03   ` Michael S. Tsirkin
  2013-07-19  5:03 ` [PATCH net V2 1/2] tuntap: " Michael S. Tsirkin
  1 sibling, 1 reply; 8+ messages in thread
From: Jason Wang @ 2013-07-18  2:55 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: Jason Wang

We try to linearize part of the skb when the number of iov is greater than
MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than
one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest
network.

Solve this problem by calculate the pages needed for iov before trying to do
zerocopy and switch to use copy instead of zerocopy if it needs more than
MAX_SKB_FRAGS.

This is done through introducing a new helper to count the pages for iov, and
call uarg->callback() manually when switching from zerocopy to copy to notify
vhost.

We can do further optimization on top.

This bug were introduced from b92946e2919134ebe2a4083e4302236295ea2a73
(macvtap: zerocopy: validate vectors before building skb).

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

---
The patch is needed for -stable.
Changes from V1:
- remove the wrong comment
---
 drivers/net/macvtap.c |   62 +++++++++++++++++++++++++++++-------------------
 1 files changed, 37 insertions(+), 25 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index a7c5654..a98fb0e 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -698,6 +698,28 @@ static int macvtap_skb_to_vnet_hdr(const struct sk_buff *skb,
 	return 0;
 }
 
+static unsigned long iov_pages(const struct iovec *iv, int offset,
+			       unsigned long nr_segs)
+{
+	unsigned long seg, base;
+	int pages = 0, len, size;
+
+	while (nr_segs && (offset >= iv->iov_len)) {
+		offset -= iv->iov_len;
+		++iv;
+		--nr_segs;
+	}
+
+	for (seg = 0; seg < nr_segs; seg++) {
+		base = (unsigned long)iv[seg].iov_base + offset;
+		len = iv[seg].iov_len - offset;
+		size = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT;
+		pages += size;
+		offset = 0;
+	}
+
+	return pages;
+}
 
 /* Get packet from user space buffer */
 static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m,
@@ -744,31 +766,15 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m,
 	if (unlikely(count > UIO_MAXIOV))
 		goto err;
 
-	if (m && m->msg_control && sock_flag(&q->sk, SOCK_ZEROCOPY))
-		zerocopy = true;
-
-	if (zerocopy) {
-		/* Userspace may produce vectors with count greater than
-		 * MAX_SKB_FRAGS, so we need to linearize parts of the skb
-		 * to let the rest of data to be fit in the frags.
-		 */
-		if (count > MAX_SKB_FRAGS) {
-			copylen = iov_length(iv, count - MAX_SKB_FRAGS);
-			if (copylen < vnet_hdr_len)
-				copylen = 0;
-			else
-				copylen -= vnet_hdr_len;
-		}
-		/* There are 256 bytes to be copied in skb, so there is enough
-		 * room for skb expand head in case it is used.
-		 * The rest buffer is mapped from userspace.
-		 */
-		if (copylen < vnet_hdr.hdr_len)
-			copylen = vnet_hdr.hdr_len;
-		if (!copylen)
-			copylen = GOODCOPY_LEN;
+	if (m && m->msg_control && sock_flag(&q->sk, SOCK_ZEROCOPY)) {
+		copylen = vnet_hdr.hdr_len ? vnet_hdr.hdr_len : GOODCOPY_LEN;
 		linear = copylen;
-	} else {
+		if (iov_pages(iv, vnet_hdr_len + copylen, count)
+		    <= MAX_SKB_FRAGS)
+			zerocopy = true;
+	}
+
+	if (!zerocopy) {
 		copylen = len;
 		linear = vnet_hdr.hdr_len;
 	}
@@ -780,9 +786,15 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m,
 
 	if (zerocopy)
 		err = zerocopy_sg_from_iovec(skb, iv, vnet_hdr_len, count);
-	else
+	else {
 		err = skb_copy_datagram_from_iovec(skb, 0, iv, vnet_hdr_len,
 						   len);
+		if (!err && m && m->msg_control) {
+			struct ubuf_info *uarg = m->msg_control;
+			uarg->callback(uarg, false);
+		}
+	}
+
 	if (err)
 		goto err_kfree;
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net V2 1/2] tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
  2013-07-18  2:55 [PATCH net V2 1/2] tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS Jason Wang
  2013-07-18  2:55 ` [PATCH net V2 2/2] macvtap: " Jason Wang
@ 2013-07-19  5:03 ` Michael S. Tsirkin
  2013-07-30 11:41   ` Michael S. Tsirkin
  1 sibling, 1 reply; 8+ messages in thread
From: Michael S. Tsirkin @ 2013-07-19  5:03 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel

On Thu, Jul 18, 2013 at 10:55:15AM +0800, Jason Wang wrote:
> We try to linearize part of the skb when the number of iov is greater than
> MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than
> one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest
> network.
> 
> Solve this problem by calculate the pages needed for iov before trying to do
> zerocopy and switch to use copy instead of zerocopy if it needs more than
> MAX_SKB_FRAGS.
> 
> This is done through introducing a new helper to count the pages for iov, and
> call uarg->callback() manually when switching from zerocopy to copy to notify
> vhost.
> 
> We can do further optimization on top.
> 
> The bug were introduced from commit 0690899b4d4501b3505be069b9a687e68ccbe15b
> (tun: experimental zero copy tx support)
> 
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>


Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
> The patch is needed fot -stable.
> ---
>  drivers/net/tun.c |   62 ++++++++++++++++++++++++++++++++--------------------
>  1 files changed, 38 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 5cdcf92..db690a3 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1035,6 +1035,29 @@ static int zerocopy_sg_from_iovec(struct sk_buff *skb, const struct iovec *from,
>  	return 0;
>  }
>  
> +static unsigned long iov_pages(const struct iovec *iv, int offset,
> +			       unsigned long nr_segs)
> +{
> +	unsigned long seg, base;
> +	int pages = 0, len, size;
> +
> +	while (nr_segs && (offset >= iv->iov_len)) {
> +		offset -= iv->iov_len;
> +		++iv;
> +		--nr_segs;
> +	}
> +
> +	for (seg = 0; seg < nr_segs; seg++) {
> +		base = (unsigned long)iv[seg].iov_base + offset;
> +		len = iv[seg].iov_len - offset;
> +		size = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT;
> +		pages += size;
> +		offset = 0;
> +	}
> +
> +	return pages;
> +}
> +
>  /* Get packet from user space buffer */
>  static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>  			    void *msg_control, const struct iovec *iv,
> @@ -1082,32 +1105,18 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>  			return -EINVAL;
>  	}
>  
> -	if (msg_control)
> -		zerocopy = true;
> -
> -	if (zerocopy) {
> -		/* Userspace may produce vectors with count greater than
> -		 * MAX_SKB_FRAGS, so we need to linearize parts of the skb
> -		 * to let the rest of data to be fit in the frags.
> -		 */
> -		if (count > MAX_SKB_FRAGS) {
> -			copylen = iov_length(iv, count - MAX_SKB_FRAGS);
> -			if (copylen < offset)
> -				copylen = 0;
> -			else
> -				copylen -= offset;
> -		} else
> -				copylen = 0;
> -		/* There are 256 bytes to be copied in skb, so there is enough
> -		 * room for skb expand head in case it is used.
> +	if (msg_control) {
> +		/* There are 256 bytes to be copied in skb, so there is
> +		 * enough room for skb expand head in case it is used.
>  		 * The rest of the buffer is mapped from userspace.
>  		 */
> -		if (copylen < gso.hdr_len)
> -			copylen = gso.hdr_len;
> -		if (!copylen)
> -			copylen = GOODCOPY_LEN;
> +		copylen = gso.hdr_len ? gso.hdr_len : GOODCOPY_LEN;
>  		linear = copylen;
> -	} else {
> +		if (iov_pages(iv, offset + copylen, count) <= MAX_SKB_FRAGS)
> +			zerocopy = true;
> +	}
> +
> +	if (!zerocopy) {
>  		copylen = len;
>  		linear = gso.hdr_len;
>  	}
> @@ -1121,8 +1130,13 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>  
>  	if (zerocopy)
>  		err = zerocopy_sg_from_iovec(skb, iv, offset, count);
> -	else
> +	else {
>  		err = skb_copy_datagram_from_iovec(skb, 0, iv, offset, len);
> +		if (!err && msg_control) {
> +			struct ubuf_info *uarg = msg_control;
> +			uarg->callback(uarg, false);
> +		}
> +	}
>  
>  	if (err) {
>  		tun->dev->stats.rx_dropped++;
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net V2 2/2] macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
  2013-07-18  2:55 ` [PATCH net V2 2/2] macvtap: " Jason Wang
@ 2013-07-19  5:03   ` Michael S. Tsirkin
  2013-07-30 11:41     ` Michael S. Tsirkin
  0 siblings, 1 reply; 8+ messages in thread
From: Michael S. Tsirkin @ 2013-07-19  5:03 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel

On Thu, Jul 18, 2013 at 10:55:16AM +0800, Jason Wang wrote:
> We try to linearize part of the skb when the number of iov is greater than
> MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than
> one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest
> network.
> 
> Solve this problem by calculate the pages needed for iov before trying to do
> zerocopy and switch to use copy instead of zerocopy if it needs more than
> MAX_SKB_FRAGS.
> 
> This is done through introducing a new helper to count the pages for iov, and
> call uarg->callback() manually when switching from zerocopy to copy to notify
> vhost.
> 
> We can do further optimization on top.
> 
> This bug were introduced from b92946e2919134ebe2a4083e4302236295ea2a73
> (macvtap: zerocopy: validate vectors before building skb).
> 
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>


Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
> The patch is needed for -stable.
> Changes from V1:
> - remove the wrong comment
> ---
>  drivers/net/macvtap.c |   62 +++++++++++++++++++++++++++++-------------------
>  1 files changed, 37 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index a7c5654..a98fb0e 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -698,6 +698,28 @@ static int macvtap_skb_to_vnet_hdr(const struct sk_buff *skb,
>  	return 0;
>  }
>  
> +static unsigned long iov_pages(const struct iovec *iv, int offset,
> +			       unsigned long nr_segs)
> +{
> +	unsigned long seg, base;
> +	int pages = 0, len, size;
> +
> +	while (nr_segs && (offset >= iv->iov_len)) {
> +		offset -= iv->iov_len;
> +		++iv;
> +		--nr_segs;
> +	}
> +
> +	for (seg = 0; seg < nr_segs; seg++) {
> +		base = (unsigned long)iv[seg].iov_base + offset;
> +		len = iv[seg].iov_len - offset;
> +		size = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT;
> +		pages += size;
> +		offset = 0;
> +	}
> +
> +	return pages;
> +}
>  
>  /* Get packet from user space buffer */
>  static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m,
> @@ -744,31 +766,15 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m,
>  	if (unlikely(count > UIO_MAXIOV))
>  		goto err;
>  
> -	if (m && m->msg_control && sock_flag(&q->sk, SOCK_ZEROCOPY))
> -		zerocopy = true;
> -
> -	if (zerocopy) {
> -		/* Userspace may produce vectors with count greater than
> -		 * MAX_SKB_FRAGS, so we need to linearize parts of the skb
> -		 * to let the rest of data to be fit in the frags.
> -		 */
> -		if (count > MAX_SKB_FRAGS) {
> -			copylen = iov_length(iv, count - MAX_SKB_FRAGS);
> -			if (copylen < vnet_hdr_len)
> -				copylen = 0;
> -			else
> -				copylen -= vnet_hdr_len;
> -		}
> -		/* There are 256 bytes to be copied in skb, so there is enough
> -		 * room for skb expand head in case it is used.
> -		 * The rest buffer is mapped from userspace.
> -		 */
> -		if (copylen < vnet_hdr.hdr_len)
> -			copylen = vnet_hdr.hdr_len;
> -		if (!copylen)
> -			copylen = GOODCOPY_LEN;
> +	if (m && m->msg_control && sock_flag(&q->sk, SOCK_ZEROCOPY)) {
> +		copylen = vnet_hdr.hdr_len ? vnet_hdr.hdr_len : GOODCOPY_LEN;
>  		linear = copylen;
> -	} else {
> +		if (iov_pages(iv, vnet_hdr_len + copylen, count)
> +		    <= MAX_SKB_FRAGS)
> +			zerocopy = true;
> +	}
> +
> +	if (!zerocopy) {
>  		copylen = len;
>  		linear = vnet_hdr.hdr_len;
>  	}
> @@ -780,9 +786,15 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m,
>  
>  	if (zerocopy)
>  		err = zerocopy_sg_from_iovec(skb, iv, vnet_hdr_len, count);
> -	else
> +	else {
>  		err = skb_copy_datagram_from_iovec(skb, 0, iv, vnet_hdr_len,
>  						   len);
> +		if (!err && m && m->msg_control) {
> +			struct ubuf_info *uarg = m->msg_control;
> +			uarg->callback(uarg, false);
> +		}
> +	}
> +
>  	if (err)
>  		goto err_kfree;
>  
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net V2 1/2] tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
  2013-07-19  5:03 ` [PATCH net V2 1/2] tuntap: " Michael S. Tsirkin
@ 2013-07-30 11:41   ` Michael S. Tsirkin
  2013-07-30 19:55     ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Michael S. Tsirkin @ 2013-07-30 11:41 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel

On Fri, Jul 19, 2013 at 08:03:35AM +0300, Michael S. Tsirkin wrote:
> On Thu, Jul 18, 2013 at 10:55:15AM +0800, Jason Wang wrote:
> > We try to linearize part of the skb when the number of iov is greater than
> > MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than
> > one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest
> > network.
> > 
> > Solve this problem by calculate the pages needed for iov before trying to do
> > zerocopy and switch to use copy instead of zerocopy if it needs more than
> > MAX_SKB_FRAGS.
> > 
> > This is done through introducing a new helper to count the pages for iov, and
> > call uarg->callback() manually when switching from zerocopy to copy to notify
> > vhost.
> > 
> > We can do further optimization on top.
> > 
> > The bug were introduced from commit 0690899b4d4501b3505be069b9a687e68ccbe15b
> > (tun: experimental zero copy tx support)
> > 
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> 
> 
> Acked-by: Michael S. Tsirkin <mst@redhat.com>

Was this patch dropped?
Any objections to merging it for 3.11 and stable?

> > ---
> > The patch is needed fot -stable.
> > ---
> >  drivers/net/tun.c |   62 ++++++++++++++++++++++++++++++++--------------------
> >  1 files changed, 38 insertions(+), 24 deletions(-)
> > 
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index 5cdcf92..db690a3 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -1035,6 +1035,29 @@ static int zerocopy_sg_from_iovec(struct sk_buff *skb, const struct iovec *from,
> >  	return 0;
> >  }
> >  
> > +static unsigned long iov_pages(const struct iovec *iv, int offset,
> > +			       unsigned long nr_segs)
> > +{
> > +	unsigned long seg, base;
> > +	int pages = 0, len, size;
> > +
> > +	while (nr_segs && (offset >= iv->iov_len)) {
> > +		offset -= iv->iov_len;
> > +		++iv;
> > +		--nr_segs;
> > +	}
> > +
> > +	for (seg = 0; seg < nr_segs; seg++) {
> > +		base = (unsigned long)iv[seg].iov_base + offset;
> > +		len = iv[seg].iov_len - offset;
> > +		size = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT;
> > +		pages += size;
> > +		offset = 0;
> > +	}
> > +
> > +	return pages;
> > +}
> > +
> >  /* Get packet from user space buffer */
> >  static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
> >  			    void *msg_control, const struct iovec *iv,
> > @@ -1082,32 +1105,18 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
> >  			return -EINVAL;
> >  	}
> >  
> > -	if (msg_control)
> > -		zerocopy = true;
> > -
> > -	if (zerocopy) {
> > -		/* Userspace may produce vectors with count greater than
> > -		 * MAX_SKB_FRAGS, so we need to linearize parts of the skb
> > -		 * to let the rest of data to be fit in the frags.
> > -		 */
> > -		if (count > MAX_SKB_FRAGS) {
> > -			copylen = iov_length(iv, count - MAX_SKB_FRAGS);
> > -			if (copylen < offset)
> > -				copylen = 0;
> > -			else
> > -				copylen -= offset;
> > -		} else
> > -				copylen = 0;
> > -		/* There are 256 bytes to be copied in skb, so there is enough
> > -		 * room for skb expand head in case it is used.
> > +	if (msg_control) {
> > +		/* There are 256 bytes to be copied in skb, so there is
> > +		 * enough room for skb expand head in case it is used.
> >  		 * The rest of the buffer is mapped from userspace.
> >  		 */
> > -		if (copylen < gso.hdr_len)
> > -			copylen = gso.hdr_len;
> > -		if (!copylen)
> > -			copylen = GOODCOPY_LEN;
> > +		copylen = gso.hdr_len ? gso.hdr_len : GOODCOPY_LEN;
> >  		linear = copylen;
> > -	} else {
> > +		if (iov_pages(iv, offset + copylen, count) <= MAX_SKB_FRAGS)
> > +			zerocopy = true;
> > +	}
> > +
> > +	if (!zerocopy) {
> >  		copylen = len;
> >  		linear = gso.hdr_len;
> >  	}
> > @@ -1121,8 +1130,13 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
> >  
> >  	if (zerocopy)
> >  		err = zerocopy_sg_from_iovec(skb, iv, offset, count);
> > -	else
> > +	else {
> >  		err = skb_copy_datagram_from_iovec(skb, 0, iv, offset, len);
> > +		if (!err && msg_control) {
> > +			struct ubuf_info *uarg = msg_control;
> > +			uarg->callback(uarg, false);
> > +		}
> > +	}
> >  
> >  	if (err) {
> >  		tun->dev->stats.rx_dropped++;
> > -- 
> > 1.7.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net V2 2/2] macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
  2013-07-19  5:03   ` Michael S. Tsirkin
@ 2013-07-30 11:41     ` Michael S. Tsirkin
  2013-07-30 19:55       ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Michael S. Tsirkin @ 2013-07-30 11:41 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel

On Fri, Jul 19, 2013 at 08:03:51AM +0300, Michael S. Tsirkin wrote:
> On Thu, Jul 18, 2013 at 10:55:16AM +0800, Jason Wang wrote:
> > We try to linearize part of the skb when the number of iov is greater than
> > MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than
> > one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest
> > network.
> > 
> > Solve this problem by calculate the pages needed for iov before trying to do
> > zerocopy and switch to use copy instead of zerocopy if it needs more than
> > MAX_SKB_FRAGS.
> > 
> > This is done through introducing a new helper to count the pages for iov, and
> > call uarg->callback() manually when switching from zerocopy to copy to notify
> > vhost.
> > 
> > We can do further optimization on top.
> > 
> > This bug were introduced from b92946e2919134ebe2a4083e4302236295ea2a73
> > (macvtap: zerocopy: validate vectors before building skb).
> > 
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> 
> 
> Acked-by: Michael S. Tsirkin <mst@redhat.com>

Was this patch dropped?
Any objections to merging it for 3.11 and stable?


> > ---
> > The patch is needed for -stable.
> > Changes from V1:
> > - remove the wrong comment
> > ---
> >  drivers/net/macvtap.c |   62 +++++++++++++++++++++++++++++-------------------
> >  1 files changed, 37 insertions(+), 25 deletions(-)
> > 
> > diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> > index a7c5654..a98fb0e 100644
> > --- a/drivers/net/macvtap.c
> > +++ b/drivers/net/macvtap.c
> > @@ -698,6 +698,28 @@ static int macvtap_skb_to_vnet_hdr(const struct sk_buff *skb,
> >  	return 0;
> >  }
> >  
> > +static unsigned long iov_pages(const struct iovec *iv, int offset,
> > +			       unsigned long nr_segs)
> > +{
> > +	unsigned long seg, base;
> > +	int pages = 0, len, size;
> > +
> > +	while (nr_segs && (offset >= iv->iov_len)) {
> > +		offset -= iv->iov_len;
> > +		++iv;
> > +		--nr_segs;
> > +	}
> > +
> > +	for (seg = 0; seg < nr_segs; seg++) {
> > +		base = (unsigned long)iv[seg].iov_base + offset;
> > +		len = iv[seg].iov_len - offset;
> > +		size = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT;
> > +		pages += size;
> > +		offset = 0;
> > +	}
> > +
> > +	return pages;
> > +}
> >  
> >  /* Get packet from user space buffer */
> >  static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m,
> > @@ -744,31 +766,15 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m,
> >  	if (unlikely(count > UIO_MAXIOV))
> >  		goto err;
> >  
> > -	if (m && m->msg_control && sock_flag(&q->sk, SOCK_ZEROCOPY))
> > -		zerocopy = true;
> > -
> > -	if (zerocopy) {
> > -		/* Userspace may produce vectors with count greater than
> > -		 * MAX_SKB_FRAGS, so we need to linearize parts of the skb
> > -		 * to let the rest of data to be fit in the frags.
> > -		 */
> > -		if (count > MAX_SKB_FRAGS) {
> > -			copylen = iov_length(iv, count - MAX_SKB_FRAGS);
> > -			if (copylen < vnet_hdr_len)
> > -				copylen = 0;
> > -			else
> > -				copylen -= vnet_hdr_len;
> > -		}
> > -		/* There are 256 bytes to be copied in skb, so there is enough
> > -		 * room for skb expand head in case it is used.
> > -		 * The rest buffer is mapped from userspace.
> > -		 */
> > -		if (copylen < vnet_hdr.hdr_len)
> > -			copylen = vnet_hdr.hdr_len;
> > -		if (!copylen)
> > -			copylen = GOODCOPY_LEN;
> > +	if (m && m->msg_control && sock_flag(&q->sk, SOCK_ZEROCOPY)) {
> > +		copylen = vnet_hdr.hdr_len ? vnet_hdr.hdr_len : GOODCOPY_LEN;
> >  		linear = copylen;
> > -	} else {
> > +		if (iov_pages(iv, vnet_hdr_len + copylen, count)
> > +		    <= MAX_SKB_FRAGS)
> > +			zerocopy = true;
> > +	}
> > +
> > +	if (!zerocopy) {
> >  		copylen = len;
> >  		linear = vnet_hdr.hdr_len;
> >  	}
> > @@ -780,9 +786,15 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m,
> >  
> >  	if (zerocopy)
> >  		err = zerocopy_sg_from_iovec(skb, iv, vnet_hdr_len, count);
> > -	else
> > +	else {
> >  		err = skb_copy_datagram_from_iovec(skb, 0, iv, vnet_hdr_len,
> >  						   len);
> > +		if (!err && m && m->msg_control) {
> > +			struct ubuf_info *uarg = m->msg_control;
> > +			uarg->callback(uarg, false);
> > +		}
> > +	}
> > +
> >  	if (err)
> >  		goto err_kfree;
> >  
> > -- 
> > 1.7.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net V2 1/2] tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
  2013-07-30 11:41   ` Michael S. Tsirkin
@ 2013-07-30 19:55     ` David Miller
  0 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2013-07-30 19:55 UTC (permalink / raw)
  To: mst; +Cc: jasowang, netdev, linux-kernel

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Tue, 30 Jul 2013 14:41:13 +0300

> Was this patch dropped?

No, it's commit 885291761dba2bfe04df4c0f7bb75e4c920ab82e

> Any objections to merging it for 3.11 and stable?

No objections, queued up for -stable.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net V2 2/2] macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
  2013-07-30 11:41     ` Michael S. Tsirkin
@ 2013-07-30 19:55       ` David Miller
  0 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2013-07-30 19:55 UTC (permalink / raw)
  To: mst; +Cc: jasowang, netdev, linux-kernel

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Tue, 30 Jul 2013 14:41:25 +0300

> Was this patch dropped?

Nope, it's commit ece793fcfc417b3925844be88a6a6dc82ae8f7c6

> Any objections to merging it for 3.11 and stable?

No objection, queued up for -stable.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-07-30 19:55 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-18  2:55 [PATCH net V2 1/2] tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS Jason Wang
2013-07-18  2:55 ` [PATCH net V2 2/2] macvtap: " Jason Wang
2013-07-19  5:03   ` Michael S. Tsirkin
2013-07-30 11:41     ` Michael S. Tsirkin
2013-07-30 19:55       ` David Miller
2013-07-19  5:03 ` [PATCH net V2 1/2] tuntap: " Michael S. Tsirkin
2013-07-30 11:41   ` Michael S. Tsirkin
2013-07-30 19:55     ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.