linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next V2 1/2] net: introduce skb_coalesce_rx_frag()
@ 2013-10-31 11:47 Jason Wang
  2013-10-31 11:47 ` [PATCH net-next V2 2/2] virtio-net: coalesce rx frags when possible during rx Jason Wang
  2013-10-31 14:26 ` [PATCH net-next V2 1/2] net: introduce skb_coalesce_rx_frag() Eric Dumazet
  0 siblings, 2 replies; 5+ messages in thread
From: Jason Wang @ 2013-10-31 11:47 UTC (permalink / raw)
  To: davem, edumazet, linux-kernel, netdev, rusty, mst, mwdalton,
	virtualization
  Cc: kmindg, Jason Wang

Sometimes we need to coalesce the rx frags to avoid frag list. One example is
virtio-net driver which tries to use small frags for both MTU sized packet and
GSO packet. So this patch introduce skb_coalesce_rx_frag() to do this.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Michael Dalton <mwdalton@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
Changes from V1:
- remove the useless off parameter.
---
 include/linux/skbuff.h |  3 +++
 net/core/skbuff.c      | 13 +++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 2c15497..fffaeaf 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1372,6 +1372,9 @@ static inline void skb_fill_page_desc(struct sk_buff *skb, int i,
 void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off,
 		     int size, unsigned int truesize);
 
+void skb_coalesce_rx_frag(struct sk_buff *skb, int i, int size,
+			  unsigned int truesize);
+
 #define SKB_PAGE_ASSERT(skb) 	BUG_ON(skb_shinfo(skb)->nr_frags)
 #define SKB_FRAG_ASSERT(skb) 	BUG_ON(skb_has_frag_list(skb))
 #define SKB_LINEAR_ASSERT(skb)  BUG_ON(skb_is_nonlinear(skb))
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 0ab32fa..87670e1 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -476,6 +476,19 @@ void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off,
 }
 EXPORT_SYMBOL(skb_add_rx_frag);
 
+void skb_coalesce_rx_frag(struct sk_buff *skb, int i, int size,
+			  unsigned int truesize)
+{
+	skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+
+	skb_frag_size_add(frag, size);
+	skb->len += size;
+	skb->data_len += size;
+	skb->truesize += truesize;
+	skb_frag_unref(skb, i);
+}
+EXPORT_SYMBOL(skb_coalesce_rx_frag);
+
 static void skb_drop_list(struct sk_buff **listp)
 {
 	kfree_skb_list(*listp);
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH net-next V2 2/2] virtio-net: coalesce rx frags when possible during rx
  2013-10-31 11:47 [PATCH net-next V2 1/2] net: introduce skb_coalesce_rx_frag() Jason Wang
@ 2013-10-31 11:47 ` Jason Wang
  2013-10-31 13:57   ` Eric Dumazet
  2013-10-31 14:26 ` [PATCH net-next V2 1/2] net: introduce skb_coalesce_rx_frag() Eric Dumazet
  1 sibling, 1 reply; 5+ messages in thread
From: Jason Wang @ 2013-10-31 11:47 UTC (permalink / raw)
  To: davem, edumazet, linux-kernel, netdev, rusty, mst, mwdalton,
	virtualization
  Cc: kmindg, Jason Wang

Commit 2613af0ed18a11d5c566a81f9a6510b73180660a (virtio_net: migrate mergeable
rx buffers to page frag allocators) try to increase the payload/truesize for
MTU-sized traffic. But this will introduce the extra overhead for GSO packets
received because of the frag list. This commit tries to reduce this issue by
coalesce the possible rx frags when possible during rx. Test result shows the
about 15% improvement on full size GSO packet receiving (and even better than
commit 2613af0ed18a11d5c566a81f9a6510b73180660a).

Before this commit:
./netperf -H 192.168.100.4
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4
() port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    20303.87

After this commit:
./netperf -H 192.168.100.4
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4 () port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    23841.26

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Michael Dalton <mwdalton@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 113ee93..5dc0de0 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -305,7 +305,7 @@ static int receive_mergeable(struct receive_queue *rq, struct sk_buff *head_skb)
 	struct sk_buff *curr_skb = head_skb;
 	char *buf;
 	struct page *page;
-	int num_buf, len;
+	int num_buf, len, offset;
 
 	num_buf = hdr->mhdr.num_buffers;
 	while (--num_buf) {
@@ -342,9 +342,15 @@ static int receive_mergeable(struct receive_queue *rq, struct sk_buff *head_skb)
 			head_skb->truesize += MAX_PACKET_LEN;
 		}
 		page = virt_to_head_page(buf);
-		skb_add_rx_frag(curr_skb, num_skb_frags, page,
-				buf - (char *)page_address(page), len,
-				MAX_PACKET_LEN);
+		offset = buf - (char *)page_address(page);
+		if (skb_can_coalesce(curr_skb, num_skb_frags, page, offset)) {
+			skb_coalesce_rx_frag(curr_skb, num_skb_frags - 1,
+					     len, MAX_PACKET_LEN);
+		} else {
+			skb_add_rx_frag(curr_skb, num_skb_frags, page,
+					offset, len,
+					MAX_PACKET_LEN);
+		}
 		--rq->num;
 	}
 	return 0;
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next V2 2/2] virtio-net: coalesce rx frags when possible during rx
  2013-10-31 11:47 ` [PATCH net-next V2 2/2] virtio-net: coalesce rx frags when possible during rx Jason Wang
@ 2013-10-31 13:57   ` Eric Dumazet
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2013-10-31 13:57 UTC (permalink / raw)
  To: Jason Wang
  Cc: davem, edumazet, linux-kernel, netdev, rusty, mst, mwdalton,
	virtualization, kmindg

On Thu, 2013-10-31 at 19:47 +0800, Jason Wang wrote:
> Commit 2613af0ed18a11d5c566a81f9a6510b73180660a (virtio_net: migrate mergeable
> rx buffers to page frag allocators) try to increase the payload/truesize for
> MTU-sized traffic. But this will introduce the extra overhead for GSO packets
> received because of the frag list. This commit tries to reduce this issue by
> coalesce the possible rx frags when possible during rx. Test result shows the
> about 15% improvement on full size GSO packet receiving (and even better than
> commit 2613af0ed18a11d5c566a81f9a6510b73180660a).
> 
> Before this commit:
> ./netperf -H 192.168.100.4
> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4
> () port 0 AF_INET : demo
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
> 
>  87380  16384  16384    10.00    20303.87
> 
> After this commit:
> ./netperf -H 192.168.100.4
> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4 () port 0 AF_INET : demo
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
> 
>  87380  16384  16384    10.00    23841.26
> 
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Michael Dalton <mwdalton@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---

Excellent !

We now have 2 or 3 frags per skb, like tcp stack manages to do on output
path.

Michael Dalton is also working on a autotuning patch, using an EWMA, so
that the size of individual sg blocks can vary from 1500 to 4096, this
might show even better throughput, we'll see.

Acked-by: Eric Dumazet <edumazet@google.com>




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next V2 1/2] net: introduce skb_coalesce_rx_frag()
  2013-10-31 11:47 [PATCH net-next V2 1/2] net: introduce skb_coalesce_rx_frag() Jason Wang
  2013-10-31 11:47 ` [PATCH net-next V2 2/2] virtio-net: coalesce rx frags when possible during rx Jason Wang
@ 2013-10-31 14:26 ` Eric Dumazet
  2013-11-01  5:25   ` Jason Wang
  1 sibling, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2013-10-31 14:26 UTC (permalink / raw)
  To: Jason Wang
  Cc: davem, edumazet, linux-kernel, netdev, rusty, mst, mwdalton,
	virtualization, kmindg

On Thu, 2013-10-31 at 19:47 +0800, Jason Wang wrote:
> Sometimes we need to coalesce the rx frags to avoid frag list. One example is
> virtio-net driver which tries to use small frags for both MTU sized packet and
> GSO packet. So this patch introduce skb_coalesce_rx_frag() to do this.
> 
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Michael Dalton <mwdalton@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> Changes from V1:
> - remove the useless off parameter.
> ---
>  include/linux/skbuff.h |  3 +++
>  net/core/skbuff.c      | 13 +++++++++++++
>  2 files changed, 16 insertions(+)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 2c15497..fffaeaf 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1372,6 +1372,9 @@ static inline void skb_fill_page_desc(struct sk_buff *skb, int i,
>  void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off,
>  		     int size, unsigned int truesize);
>  
> +void skb_coalesce_rx_frag(struct sk_buff *skb, int i, int size,
> +			  unsigned int truesize);
> +
>  #define SKB_PAGE_ASSERT(skb) 	BUG_ON(skb_shinfo(skb)->nr_frags)
>  #define SKB_FRAG_ASSERT(skb) 	BUG_ON(skb_has_frag_list(skb))
>  #define SKB_LINEAR_ASSERT(skb)  BUG_ON(skb_is_nonlinear(skb))
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 0ab32fa..87670e1 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -476,6 +476,19 @@ void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off,
>  }
>  EXPORT_SYMBOL(skb_add_rx_frag);
>  
> +void skb_coalesce_rx_frag(struct sk_buff *skb, int i, int size,
> +			  unsigned int truesize)
> +{
> +	skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
> +
> +	skb_frag_size_add(frag, size);
> +	skb->len += size;
> +	skb->data_len += size;
> +	skb->truesize += truesize;


> +	skb_frag_unref(skb, i);

This unref is not logical, or should at least be

__skb_frag_unref(frag);

But I do think this is best done in the caller.

In virtio_net this would be a :

put_page(page);

In tcp stack we do almost the same, but we take the reference on the
page if we could not coalesce with prio frag, instead of doing a get and
put in the other case.

        if (can_coalesce) {
                skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
        } else {
                get_page(page);
                skb_fill_page_desc(skb, i, page, offset, copy);
        }


> +}
> +EXPORT_SYMBOL(skb_coalesce_rx_frag);
> +
>  static void skb_drop_list(struct sk_buff **listp)
>  {
>  	kfree_skb_list(*listp);



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next V2 1/2] net: introduce skb_coalesce_rx_frag()
  2013-10-31 14:26 ` [PATCH net-next V2 1/2] net: introduce skb_coalesce_rx_frag() Eric Dumazet
@ 2013-11-01  5:25   ` Jason Wang
  0 siblings, 0 replies; 5+ messages in thread
From: Jason Wang @ 2013-11-01  5:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, edumazet, linux-kernel, netdev, rusty, mst, mwdalton,
	virtualization, kmindg

On 10/31/2013 10:26 PM, Eric Dumazet wrote:
> On Thu, 2013-10-31 at 19:47 +0800, Jason Wang wrote:
>> Sometimes we need to coalesce the rx frags to avoid frag list. One example is
>> virtio-net driver which tries to use small frags for both MTU sized packet and
>> GSO packet. So this patch introduce skb_coalesce_rx_frag() to do this.
>>
>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>> Cc: Michael S. Tsirkin <mst@redhat.com>
>> Cc: Michael Dalton <mwdalton@google.com>
>> Cc: Eric Dumazet <edumazet@google.com>
>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> Changes from V1:
>> - remove the useless off parameter.
>> ---
>>  include/linux/skbuff.h |  3 +++
>>  net/core/skbuff.c      | 13 +++++++++++++
>>  2 files changed, 16 insertions(+)
>>
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index 2c15497..fffaeaf 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -1372,6 +1372,9 @@ static inline void skb_fill_page_desc(struct sk_buff *skb, int i,
>>  void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off,
>>  		     int size, unsigned int truesize);
>>  
>> +void skb_coalesce_rx_frag(struct sk_buff *skb, int i, int size,
>> +			  unsigned int truesize);
>> +
>>  #define SKB_PAGE_ASSERT(skb) 	BUG_ON(skb_shinfo(skb)->nr_frags)
>>  #define SKB_FRAG_ASSERT(skb) 	BUG_ON(skb_has_frag_list(skb))
>>  #define SKB_LINEAR_ASSERT(skb)  BUG_ON(skb_is_nonlinear(skb))
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 0ab32fa..87670e1 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -476,6 +476,19 @@ void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off,
>>  }
>>  EXPORT_SYMBOL(skb_add_rx_frag);
>>  
>> +void skb_coalesce_rx_frag(struct sk_buff *skb, int i, int size,
>> +			  unsigned int truesize)
>> +{
>> +	skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
>> +
>> +	skb_frag_size_add(frag, size);
>> +	skb->len += size;
>> +	skb->data_len += size;
>> +	skb->truesize += truesize;
>
>> +	skb_frag_unref(skb, i);
> This unref is not logical, or should at least be
>
> __skb_frag_unref(frag);
>
> But I do think this is best done in the caller.
>
> In virtio_net this would be a :
>
> put_page(page);
>
> In tcp stack we do almost the same, but we take the reference on the
> page if we could not coalesce with prio frag, instead of doing a get and
> put in the other case.
>
>         if (can_coalesce) {
>                 skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
>         } else {
>                 get_page(page);
>                 skb_fill_page_desc(skb, i, page, offset, copy);
>         }
>

Ok, get it. Will do a put_page() in V3.

Thanks
>> +}
>> +EXPORT_SYMBOL(skb_coalesce_rx_frag);
>> +
>>  static void skb_drop_list(struct sk_buff **listp)
>>  {
>>  	kfree_skb_list(*listp);
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-11-01  5:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-31 11:47 [PATCH net-next V2 1/2] net: introduce skb_coalesce_rx_frag() Jason Wang
2013-10-31 11:47 ` [PATCH net-next V2 2/2] virtio-net: coalesce rx frags when possible during rx Jason Wang
2013-10-31 13:57   ` Eric Dumazet
2013-10-31 14:26 ` [PATCH net-next V2 1/2] net: introduce skb_coalesce_rx_frag() Eric Dumazet
2013-11-01  5:25   ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).