[PATCH net-next 0/2] net: core: use listified Rx for GRO_NORMAL in napi_gro

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next 0/2] net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()
@ 2019-10-10 14:42 Alexander Lobakin
  2019-10-10 14:42 ` [PATCH net-next1/2] " Alexander Lobakin
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Alexander Lobakin @ 2019-10-10 14:42 UTC (permalink / raw)
  To: David S. Miller
  Cc: Edward Cree, Jiri Pirko, Eric Dumazet, Ido Schimmel, Paolo Abeni,
	Petr Machata, Sabrina Dubroca, Florian Fainelli, Jassi Brar,
	Ilias Apalodimas, netdev, linux-kernel, Alexander Lobakin

Hi Dave,

This series was written as a continuation to commit 323ebb61e32b
("net: use listified RX for handling GRO_NORMAL skbs"), and also takes
an advantage of listified Rx for GRO. This time, however, we're
targeting at a way more common and used function, napi_gro_receive().

There are about ~100 call sites of this function, including gro_cells
and mac80211, so even wireless systems will benefit from it.
The only driver that cares about the return value is
ethernet/socionext/netsec, and only for updating statistics. I don't
believe that this change can break its functionality, but anyway,
we have plenty of time till next merge window to pay this change
a proper attention.

Besides having this functionality implemented for napi_gro_frags()
users, the main reason is the solid performance boost that has been
shown during tests on 1-core MIPS board (with not yet mainlined
driver):

* no batching (5.4-rc2): ~450/450 Mbit/s
* with gro_normal_batch == 8: ~480/480 Mbit/s
* with gro_normal_batch == 16: ~500/500 Mbit/s

Applies on top of net-next.
Thanks.

Alexander Lobakin (2):
  net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()
  net: core: increase the default size of GRO_NORMAL skb lists to flush

 net/core/dev.c | 51 +++++++++++++++++++++++++-------------------------
 1 file changed, 26 insertions(+), 25 deletions(-)

-- 
2.23.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next1/2] net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()
  2019-10-10 14:42 [PATCH net-next 0/2] net: core: use listified Rx for GRO_NORMAL in napi_gro_receive() Alexander Lobakin
@ 2019-10-10 14:42 ` Alexander Lobakin
  2019-10-10 18:23   ` Edward Cree
  2019-10-10 14:42 ` [PATCH net-next 2/2] net: core: increase the default size of GRO_NORMAL skb lists to flush Alexander Lobakin
  2019-10-11 12:23 ` [PATCH net-next 0/2] net: core: use listified Rx for GRO_NORMAL in napi_gro_receive() Ilias Apalodimas
  2 siblings, 1 reply; 15+ messages in thread
From: Alexander Lobakin @ 2019-10-10 14:42 UTC (permalink / raw)
  To: David S. Miller
  Cc: Edward Cree, Jiri Pirko, Eric Dumazet, Ido Schimmel, Paolo Abeni,
	Petr Machata, Sabrina Dubroca, Florian Fainelli, Jassi Brar,
	Ilias Apalodimas, netdev, linux-kernel, Alexander Lobakin

Commit 323ebb61e32b4 ("net: use listified RX for handling GRO_NORMAL
skbs") made use of listified skb processing for the users of
napi_gro_frags().
The same technique can be used in a way more common napi_gro_receive()
to speed up non-merged (GRO_NORMAL) skbs for a wide range of drivers,
including gro_cells and mac80211 users.

Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
---
 net/core/dev.c | 49 +++++++++++++++++++++++++------------------------
 1 file changed, 25 insertions(+), 24 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 8bc3dce71fc0..a33f56b439ce 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5884,6 +5884,26 @@ struct packet_offload *gro_find_complete_by_type(__be16 type)
 }
 EXPORT_SYMBOL(gro_find_complete_by_type);
 
+/* Pass the currently batched GRO_NORMAL SKBs up to the stack. */
+static void gro_normal_list(struct napi_struct *napi)
+{
+	if (!napi->rx_count)
+		return;
+	netif_receive_skb_list_internal(&napi->rx_list);
+	INIT_LIST_HEAD(&napi->rx_list);
+	napi->rx_count = 0;
+}
+
+/* Queue one GRO_NORMAL SKB up for list processing.  If batch size exceeded,
+ * pass the whole batch up to the stack.
+ */
+static void gro_normal_one(struct napi_struct *napi, struct sk_buff *skb)
+{
+	list_add_tail(&skb->list, &napi->rx_list);
+	if (++napi->rx_count >= gro_normal_batch)
+		gro_normal_list(napi);
+}
+
 static void napi_skb_free_stolen_head(struct sk_buff *skb)
 {
 	skb_dst_drop(skb);
@@ -5891,12 +5911,13 @@ static void napi_skb_free_stolen_head(struct sk_buff *skb)
 	kmem_cache_free(skbuff_head_cache, skb);
 }
 
-static gro_result_t napi_skb_finish(gro_result_t ret, struct sk_buff *skb)
+static gro_result_t napi_skb_finish(struct napi_struct *napi,
+				    struct sk_buff *skb,
+				    gro_result_t ret)
 {
 	switch (ret) {
 	case GRO_NORMAL:
-		if (netif_receive_skb_internal(skb))
-			ret = GRO_DROP;
+		gro_normal_one(napi, skb);
 		break;
 
 	case GRO_DROP:
@@ -5928,7 +5949,7 @@ gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
 
 	skb_gro_reset_offset(skb);
 
-	ret = napi_skb_finish(dev_gro_receive(napi, skb), skb);
+	ret = napi_skb_finish(napi, skb, dev_gro_receive(napi, skb));
 	trace_napi_gro_receive_exit(ret);
 
 	return ret;
@@ -5974,26 +5995,6 @@ struct sk_buff *napi_get_frags(struct napi_struct *napi)
 }
 EXPORT_SYMBOL(napi_get_frags);
 
-/* Pass the currently batched GRO_NORMAL SKBs up to the stack. */
-static void gro_normal_list(struct napi_struct *napi)
-{
-	if (!napi->rx_count)
-		return;
-	netif_receive_skb_list_internal(&napi->rx_list);
-	INIT_LIST_HEAD(&napi->rx_list);
-	napi->rx_count = 0;
-}
-
-/* Queue one GRO_NORMAL SKB up for list processing.  If batch size exceeded,
- * pass the whole batch up to the stack.
- */
-static void gro_normal_one(struct napi_struct *napi, struct sk_buff *skb)
-{
-	list_add_tail(&skb->list, &napi->rx_list);
-	if (++napi->rx_count >= gro_normal_batch)
-		gro_normal_list(napi);
-}
-
 static gro_result_t napi_frags_finish(struct napi_struct *napi,
 				      struct sk_buff *skb,
 				      gro_result_t ret)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 2/2] net: core: increase the default size of GRO_NORMAL skb lists to flush
  2019-10-10 14:42 [PATCH net-next 0/2] net: core: use listified Rx for GRO_NORMAL in napi_gro_receive() Alexander Lobakin
  2019-10-10 14:42 ` [PATCH net-next1/2] " Alexander Lobakin
@ 2019-10-10 14:42 ` Alexander Lobakin
  2019-10-10 18:16   ` Edward Cree
  2019-10-11 12:23 ` [PATCH net-next 0/2] net: core: use listified Rx for GRO_NORMAL in napi_gro_receive() Ilias Apalodimas
  2 siblings, 1 reply; 15+ messages in thread
From: Alexander Lobakin @ 2019-10-10 14:42 UTC (permalink / raw)
  To: David S. Miller
  Cc: Edward Cree, Jiri Pirko, Eric Dumazet, Ido Schimmel, Paolo Abeni,
	Petr Machata, Sabrina Dubroca, Florian Fainelli, Jassi Brar,
	Ilias Apalodimas, netdev, linux-kernel, Alexander Lobakin

Commit 323ebb61e32b ("net: use listified RX for handling GRO_NORMAL
skbs") have introduced a sysctl variable gro_normal_batch for defining
a limit for listified Rx of GRO_NORMAL skbs. The initial value of 8 is
purely arbitrary and has been chosen, I believe, as a minimal safe
default.
However, several tests show that it's rather suboptimal and doesn't
allow to take a full advantage of listified processing. The best and
the most balanced results have been achieved with a batches of 16 skbs
per flush.
So double the default value to give a yet another boost for Rx path.
It remains configurable via sysctl anyway, so may be fine-tuned for
each hardware.

Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
---
 net/core/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index a33f56b439ce..4f60444bb766 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4189,7 +4189,7 @@ int dev_weight_tx_bias __read_mostly = 1;  /* bias for output_queue quota */
 int dev_rx_weight __read_mostly = 64;
 int dev_tx_weight __read_mostly = 64;
 /* Maximum number of GRO_NORMAL skbs to batch up for list-RX */
-int gro_normal_batch __read_mostly = 8;
+int gro_normal_batch __read_mostly = 16;
 
 /* Called with irq disabled */
 static inline void ____napi_schedule(struct softnet_data *sd,
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 2/2] net: core: increase the default size of GRO_NORMAL skb lists to flush
  2019-10-10 14:42 ` [PATCH net-next 2/2] net: core: increase the default size of GRO_NORMAL skb lists to flush Alexander Lobakin
@ 2019-10-10 18:16   ` Edward Cree
  2019-10-11  7:23     ` Alexander Lobakin
  0 siblings, 1 reply; 15+ messages in thread
From: Edward Cree @ 2019-10-10 18:16 UTC (permalink / raw)
  To: Alexander Lobakin, David S. Miller
  Cc: Jiri Pirko, Eric Dumazet, Ido Schimmel, Paolo Abeni,
	Petr Machata, Sabrina Dubroca, Florian Fainelli, Jassi Brar,
	Ilias Apalodimas, netdev, linux-kernel

On 10/10/2019 15:42, Alexander Lobakin wrote:
> Commit 323ebb61e32b ("net: use listified RX for handling GRO_NORMAL
> skbs") have introduced a sysctl variable gro_normal_batch for defining
> a limit for listified Rx of GRO_NORMAL skbs. The initial value of 8 is
> purely arbitrary and has been chosen, I believe, as a minimal safe
> default.
8 was chosen by performance tests on my setup with v1 of that patch;
 see https://www.spinics.net/lists/netdev/msg585001.html .
Sorry for not including that info in the final version of the patch.
While I didn't re-do tests on varying gro_normal_batch on the final
 version, I think changing it needs more evidence than just "we tested
 it; it's better".  In particular, increasing the batch size should be
 accompanied by demonstration that latency isn't increased in e.g. a
 multi-stream ping-pong test.

> However, several tests show that it's rather suboptimal and doesn't
> allow to take a full advantage of listified processing. The best and
> the most balanced results have been achieved with a batches of 16 skbs
> per flush.
> So double the default value to give a yet another boost for Rx path.

> It remains configurable via sysctl anyway, so may be fine-tuned for
> each hardware.
I see this as a reason to leave the default as it is; the combination
 of your tests and mine have established that the optimal size does
 vary (I found 16 to be 2% slower than 8 with my setup), so any
 tweaking of the default is likely only worthwhile if we have data
 over lots of different hardware combinations.

> Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
> ---
>  net/core/dev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index a33f56b439ce..4f60444bb766 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4189,7 +4189,7 @@ int dev_weight_tx_bias __read_mostly = 1;  /* bias for output_queue quota */
>  int dev_rx_weight __read_mostly = 64;
>  int dev_tx_weight __read_mostly = 64;
>  /* Maximum number of GRO_NORMAL skbs to batch up for list-RX */
> -int gro_normal_batch __read_mostly = 8;
> +int gro_normal_batch __read_mostly = 16;
>  
>  /* Called with irq disabled */
>  static inline void ____napi_schedule(struct softnet_data *sd,


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next1/2] net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()
  2019-10-10 14:42 ` [PATCH net-next1/2] " Alexander Lobakin
@ 2019-10-10 18:23   ` Edward Cree
  2019-10-11  7:26     ` Alexander Lobakin
  0 siblings, 1 reply; 15+ messages in thread
From: Edward Cree @ 2019-10-10 18:23 UTC (permalink / raw)
  To: Alexander Lobakin, David S. Miller
  Cc: Jiri Pirko, Eric Dumazet, Ido Schimmel, Paolo Abeni,
	Petr Machata, Sabrina Dubroca, Florian Fainelli, Jassi Brar,
	Ilias Apalodimas, netdev, linux-kernel

On 10/10/2019 15:42, Alexander Lobakin wrote:
> Commit 323ebb61e32b4 ("net: use listified RX for handling GRO_NORMAL
> skbs") made use of listified skb processing for the users of
> napi_gro_frags().
> The same technique can be used in a way more common napi_gro_receive()
> to speed up non-merged (GRO_NORMAL) skbs for a wide range of drivers,
> including gro_cells and mac80211 users.
>
> Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
> ---
>  net/core/dev.c | 49 +++++++++++++++++++++++++------------------------
>  1 file changed, 25 insertions(+), 24 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 8bc3dce71fc0..a33f56b439ce 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5884,6 +5884,26 @@ struct packet_offload *gro_find_complete_by_type(__be16 type)
>  }
>  EXPORT_SYMBOL(gro_find_complete_by_type);
>  
> +/* Pass the currently batched GRO_NORMAL SKBs up to the stack. */
> +static void gro_normal_list(struct napi_struct *napi)
> +{
> +	if (!napi->rx_count)
> +		return;
> +	netif_receive_skb_list_internal(&napi->rx_list);
> +	INIT_LIST_HEAD(&napi->rx_list);
> +	napi->rx_count = 0;
> +}
> +
> +/* Queue one GRO_NORMAL SKB up for list processing.  If batch size exceeded,
> + * pass the whole batch up to the stack.
> + */
> +static void gro_normal_one(struct napi_struct *napi, struct sk_buff *skb)
> +{
> +	list_add_tail(&skb->list, &napi->rx_list);
> +	if (++napi->rx_count >= gro_normal_batch)
> +		gro_normal_list(napi);
> +}
> +
>  static void napi_skb_free_stolen_head(struct sk_buff *skb)
>  {
>  	skb_dst_drop(skb);
> @@ -5891,12 +5911,13 @@ static void napi_skb_free_stolen_head(struct sk_buff *skb)
>  	kmem_cache_free(skbuff_head_cache, skb);
>  }
>  
> -static gro_result_t napi_skb_finish(gro_result_t ret, struct sk_buff *skb)
> +static gro_result_t napi_skb_finish(struct napi_struct *napi,
> +				    struct sk_buff *skb,
> +				    gro_result_t ret)
Any reason why the argument order here is changed around?

-Ed
>  {
>  	switch (ret) {
>  	case GRO_NORMAL:
> -		if (netif_receive_skb_internal(skb))
> -			ret = GRO_DROP;
> +		gro_normal_one(napi, skb);
>  		break;
>  
>  	case GRO_DROP:
> @@ -5928,7 +5949,7 @@ gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
>  
>  	skb_gro_reset_offset(skb);
>  
> -	ret = napi_skb_finish(dev_gro_receive(napi, skb), skb);
> +	ret = napi_skb_finish(napi, skb, dev_gro_receive(napi, skb));
>  	trace_napi_gro_receive_exit(ret);
>  
>  	return ret;
> @@ -5974,26 +5995,6 @@ struct sk_buff *napi_get_frags(struct napi_struct *napi)
>  }
>  EXPORT_SYMBOL(napi_get_frags);
>  
> -/* Pass the currently batched GRO_NORMAL SKBs up to the stack. */
> -static void gro_normal_list(struct napi_struct *napi)
> -{
> -	if (!napi->rx_count)
> -		return;
> -	netif_receive_skb_list_internal(&napi->rx_list);
> -	INIT_LIST_HEAD(&napi->rx_list);
> -	napi->rx_count = 0;
> -}
> -
> -/* Queue one GRO_NORMAL SKB up for list processing.  If batch size exceeded,
> - * pass the whole batch up to the stack.
> - */
> -static void gro_normal_one(struct napi_struct *napi, struct sk_buff *skb)
> -{
> -	list_add_tail(&skb->list, &napi->rx_list);
> -	if (++napi->rx_count >= gro_normal_batch)
> -		gro_normal_list(napi);
> -}
> -
>  static gro_result_t napi_frags_finish(struct napi_struct *napi,
>  				      struct sk_buff *skb,
>  				      gro_result_t ret)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 2/2] net: core: increase the default size of GRO_NORMAL skb lists to flush
  2019-10-10 18:16   ` Edward Cree
@ 2019-10-11  7:23     ` Alexander Lobakin
  2019-10-12  9:22       ` Alexander Lobakin
  0 siblings, 1 reply; 15+ messages in thread
From: Alexander Lobakin @ 2019-10-11  7:23 UTC (permalink / raw)
  To: Edward Cree
  Cc: David S. Miller, Jiri Pirko, Eric Dumazet, Ido Schimmel,
	Paolo Abeni, Petr Machata, Sabrina Dubroca, Florian Fainelli,
	Jassi Brar, Ilias Apalodimas, netdev, linux-kernel

Hi Edward,

Edward Cree wrote 10.10.2019 21:16:
> On 10/10/2019 15:42, Alexander Lobakin wrote:
>> Commit 323ebb61e32b ("net: use listified RX for handling GRO_NORMAL
>> skbs") have introduced a sysctl variable gro_normal_batch for defining
>> a limit for listified Rx of GRO_NORMAL skbs. The initial value of 8 is
>> purely arbitrary and has been chosen, I believe, as a minimal safe
>> default.
> 8 was chosen by performance tests on my setup with v1 of that patch;
>  see https://www.spinics.net/lists/netdev/msg585001.html .
> Sorry for not including that info in the final version of the patch.
> While I didn't re-do tests on varying gro_normal_batch on the final
>  version, I think changing it needs more evidence than just "we tested
>  it; it's better".  In particular, increasing the batch size should be
>  accompanied by demonstration that latency isn't increased in e.g. a
>  multi-stream ping-pong test.
> 
>> However, several tests show that it's rather suboptimal and doesn't
>> allow to take a full advantage of listified processing. The best and
>> the most balanced results have been achieved with a batches of 16 skbs
>> per flush.
>> So double the default value to give a yet another boost for Rx path.
> 
>> It remains configurable via sysctl anyway, so may be fine-tuned for
>> each hardware.
> I see this as a reason to leave the default as it is; the combination
>  of your tests and mine have established that the optimal size does
>  vary (I found 16 to be 2% slower than 8 with my setup), so any
>  tweaking of the default is likely only worthwhile if we have data
>  over lots of different hardware combinations.

Agree, if you've got slower results on 16, we must leave the default
value, as it seems to be VERY hardware- and driver- dependent.
So, patch 2/2 is not actual any more (I supposed that it would likely
go away before sending this series).

>> Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
>> ---
>>  net/core/dev.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index a33f56b439ce..4f60444bb766 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -4189,7 +4189,7 @@ int dev_weight_tx_bias __read_mostly = 1;  /* 
>> bias for output_queue quota */
>>  int dev_rx_weight __read_mostly = 64;
>>  int dev_tx_weight __read_mostly = 64;
>>  /* Maximum number of GRO_NORMAL skbs to batch up for list-RX */
>> -int gro_normal_batch __read_mostly = 8;
>> +int gro_normal_batch __read_mostly = 16;
>> 
>>  /* Called with irq disabled */
>>  static inline void ____napi_schedule(struct softnet_data *sd,

Regards,
ᚷ ᛖ ᚢ ᚦ ᚠ ᚱ

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next1/2] net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()
  2019-10-10 18:23   ` Edward Cree
@ 2019-10-11  7:26     ` Alexander Lobakin
  2019-10-11  9:20       ` Edward Cree
  0 siblings, 1 reply; 15+ messages in thread
From: Alexander Lobakin @ 2019-10-11  7:26 UTC (permalink / raw)
  To: Edward Cree
  Cc: David S. Miller, Jiri Pirko, Eric Dumazet, Ido Schimmel,
	Paolo Abeni, Petr Machata, Sabrina Dubroca, Florian Fainelli,
	Jassi Brar, Ilias Apalodimas, netdev, linux-kernel

Edward Cree wrote 10.10.2019 21:23:
> On 10/10/2019 15:42, Alexander Lobakin wrote:
>> Commit 323ebb61e32b4 ("net: use listified RX for handling GRO_NORMAL
>> skbs") made use of listified skb processing for the users of
>> napi_gro_frags().
>> The same technique can be used in a way more common napi_gro_receive()
>> to speed up non-merged (GRO_NORMAL) skbs for a wide range of drivers,
>> including gro_cells and mac80211 users.
>> 
>> Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
>> ---
>>  net/core/dev.c | 49 +++++++++++++++++++++++++------------------------
>>  1 file changed, 25 insertions(+), 24 deletions(-)
>> 
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 8bc3dce71fc0..a33f56b439ce 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -5884,6 +5884,26 @@ struct packet_offload 
>> *gro_find_complete_by_type(__be16 type)
>>  }
>>  EXPORT_SYMBOL(gro_find_complete_by_type);
>> 
>> +/* Pass the currently batched GRO_NORMAL SKBs up to the stack. */
>> +static void gro_normal_list(struct napi_struct *napi)
>> +{
>> +	if (!napi->rx_count)
>> +		return;
>> +	netif_receive_skb_list_internal(&napi->rx_list);
>> +	INIT_LIST_HEAD(&napi->rx_list);
>> +	napi->rx_count = 0;
>> +}
>> +
>> +/* Queue one GRO_NORMAL SKB up for list processing.  If batch size 
>> exceeded,
>> + * pass the whole batch up to the stack.
>> + */
>> +static void gro_normal_one(struct napi_struct *napi, struct sk_buff 
>> *skb)
>> +{
>> +	list_add_tail(&skb->list, &napi->rx_list);
>> +	if (++napi->rx_count >= gro_normal_batch)
>> +		gro_normal_list(napi);
>> +}
>> +
>>  static void napi_skb_free_stolen_head(struct sk_buff *skb)
>>  {
>>  	skb_dst_drop(skb);
>> @@ -5891,12 +5911,13 @@ static void napi_skb_free_stolen_head(struct 
>> sk_buff *skb)
>>  	kmem_cache_free(skbuff_head_cache, skb);
>>  }
>> 
>> -static gro_result_t napi_skb_finish(gro_result_t ret, struct sk_buff 
>> *skb)
>> +static gro_result_t napi_skb_finish(struct napi_struct *napi,
>> +				    struct sk_buff *skb,
>> +				    gro_result_t ret)
> Any reason why the argument order here is changed around?

Actually yes: to match napi_skb_finish() and napi_frags_finish()
prototypes, as gro_normal one() required an addition of napi
argument anyway.

> 
> -Ed
>>  {
>>  	switch (ret) {
>>  	case GRO_NORMAL:
>> -		if (netif_receive_skb_internal(skb))
>> -			ret = GRO_DROP;
>> +		gro_normal_one(napi, skb);
>>  		break;
>> 
>>  	case GRO_DROP:
>> @@ -5928,7 +5949,7 @@ gro_result_t napi_gro_receive(struct napi_struct 
>> *napi, struct sk_buff *skb)
>> 
>>  	skb_gro_reset_offset(skb);
>> 
>> -	ret = napi_skb_finish(dev_gro_receive(napi, skb), skb);
>> +	ret = napi_skb_finish(napi, skb, dev_gro_receive(napi, skb));
>>  	trace_napi_gro_receive_exit(ret);
>> 
>>  	return ret;
>> @@ -5974,26 +5995,6 @@ struct sk_buff *napi_get_frags(struct 
>> napi_struct *napi)
>>  }
>>  EXPORT_SYMBOL(napi_get_frags);
>> 
>> -/* Pass the currently batched GRO_NORMAL SKBs up to the stack. */
>> -static void gro_normal_list(struct napi_struct *napi)
>> -{
>> -	if (!napi->rx_count)
>> -		return;
>> -	netif_receive_skb_list_internal(&napi->rx_list);
>> -	INIT_LIST_HEAD(&napi->rx_list);
>> -	napi->rx_count = 0;
>> -}
>> -
>> -/* Queue one GRO_NORMAL SKB up for list processing.  If batch size 
>> exceeded,
>> - * pass the whole batch up to the stack.
>> - */
>> -static void gro_normal_one(struct napi_struct *napi, struct sk_buff 
>> *skb)
>> -{
>> -	list_add_tail(&skb->list, &napi->rx_list);
>> -	if (++napi->rx_count >= gro_normal_batch)
>> -		gro_normal_list(napi);
>> -}
>> -
>>  static gro_result_t napi_frags_finish(struct napi_struct *napi,
>>  				      struct sk_buff *skb,
>>  				      gro_result_t ret)

Regards,
ᚷ ᛖ ᚢ ᚦ ᚠ ᚱ

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next1/2] net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()
  2019-10-11  7:26     ` Alexander Lobakin
@ 2019-10-11  9:20       ` Edward Cree
  2019-10-11  9:24         ` Alexander Lobakin
  0 siblings, 1 reply; 15+ messages in thread
From: Edward Cree @ 2019-10-11  9:20 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: David S. Miller, Jiri Pirko, Eric Dumazet, Ido Schimmel,
	Paolo Abeni, Petr Machata, Sabrina Dubroca, Florian Fainelli,
	Jassi Brar, Ilias Apalodimas, netdev, linux-kernel

>> On 10/10/2019 15:42, Alexander Lobakin wrote:
>>> Commit 323ebb61e32b4 ("net: use listified RX for handling GRO_NORMAL
>>> skbs") made use of listified skb processing for the users of
>>> napi_gro_frags().
>>> The same technique can be used in a way more common napi_gro_receive()
>>> to speed up non-merged (GRO_NORMAL) skbs for a wide range of drivers,
>>> including gro_cells and mac80211 users.
>>>
>>> Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
Acked-by: Edward Cree <ecree@solarflare.com>
but I think this needs review from the socionext folks as well.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next1/2] net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()
  2019-10-11  9:20       ` Edward Cree
@ 2019-10-11  9:24         ` Alexander Lobakin
  0 siblings, 0 replies; 15+ messages in thread
From: Alexander Lobakin @ 2019-10-11  9:24 UTC (permalink / raw)
  To: Edward Cree
  Cc: David S. Miller, Jiri Pirko, Eric Dumazet, Ido Schimmel,
	Paolo Abeni, Petr Machata, Sabrina Dubroca, Florian Fainelli,
	Jassi Brar, Ilias Apalodimas, netdev, linux-kernel

Edward Cree wrote 11.10.2019 12:20:
>>> On 10/10/2019 15:42, Alexander Lobakin wrote:
>>>> Commit 323ebb61e32b4 ("net: use listified RX for handling GRO_NORMAL
>>>> skbs") made use of listified skb processing for the users of
>>>> napi_gro_frags().
>>>> The same technique can be used in a way more common 
>>>> napi_gro_receive()
>>>> to speed up non-merged (GRO_NORMAL) skbs for a wide range of 
>>>> drivers,
>>>> including gro_cells and mac80211 users.
>>>> 
>>>> Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
> Acked-by: Edward Cree <ecree@solarflare.com>
> but I think this needs review from the socionext folks as well.

Thanks!
Sure, I'm waiting for any other possible comments.

Regards,
ᚷ ᛖ ᚢ ᚦ ᚠ ᚱ

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 0/2] net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()
  2019-10-10 14:42 [PATCH net-next 0/2] net: core: use listified Rx for GRO_NORMAL in napi_gro_receive() Alexander Lobakin
  2019-10-10 14:42 ` [PATCH net-next1/2] " Alexander Lobakin
  2019-10-10 14:42 ` [PATCH net-next 2/2] net: core: increase the default size of GRO_NORMAL skb lists to flush Alexander Lobakin
@ 2019-10-11 12:23 ` Ilias Apalodimas
  2019-10-11 12:27   ` Alexander Lobakin
  2 siblings, 1 reply; 15+ messages in thread
From: Ilias Apalodimas @ 2019-10-11 12:23 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: David S. Miller, Edward Cree, Jiri Pirko, Eric Dumazet,
	Ido Schimmel, Paolo Abeni, Petr Machata, Sabrina Dubroca,
	Florian Fainelli, Jassi Brar, netdev, linux-kernel

Hi Alexander,

On Thu, Oct 10, 2019 at 05:42:24PM +0300, Alexander Lobakin wrote:
> Hi Dave,
> 
> This series was written as a continuation to commit 323ebb61e32b
> ("net: use listified RX for handling GRO_NORMAL skbs"), and also takes
> an advantage of listified Rx for GRO. This time, however, we're
> targeting at a way more common and used function, napi_gro_receive().
> 
> There are about ~100 call sites of this function, including gro_cells
> and mac80211, so even wireless systems will benefit from it.
> The only driver that cares about the return value is
> ethernet/socionext/netsec, and only for updating statistics. I don't
> believe that this change can break its functionality, but anyway,
> we have plenty of time till next merge window to pay this change
> a proper attention.

I don't think this will break anything on the netsec driver. Dropped packets
will still be properly accounted for

> 
> Besides having this functionality implemented for napi_gro_frags()
> users, the main reason is the solid performance boost that has been
> shown during tests on 1-core MIPS board (with not yet mainlined
> driver):
> 
> * no batching (5.4-rc2): ~450/450 Mbit/s
> * with gro_normal_batch == 8: ~480/480 Mbit/s
> * with gro_normal_batch == 16: ~500/500 Mbit/s
> 
> Applies on top of net-next.
> Thanks.
> 
> Alexander Lobakin (2):
>   net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()
>   net: core: increase the default size of GRO_NORMAL skb lists to flush
> 
>  net/core/dev.c | 51 +++++++++++++++++++++++++-------------------------
>  1 file changed, 26 insertions(+), 25 deletions(-)
> 
> -- 
> 2.23.0
> 

Thanks
/Ilias

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 0/2] net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()
  2019-10-11 12:23 ` [PATCH net-next 0/2] net: core: use listified Rx for GRO_NORMAL in napi_gro_receive() Ilias Apalodimas
@ 2019-10-11 12:27   ` Alexander Lobakin
  2019-10-11 12:32     ` Ilias Apalodimas
  0 siblings, 1 reply; 15+ messages in thread
From: Alexander Lobakin @ 2019-10-11 12:27 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: David S. Miller, Edward Cree, Jiri Pirko, Eric Dumazet,
	Ido Schimmel, Paolo Abeni, Petr Machata, Sabrina Dubroca,
	Florian Fainelli, Jassi Brar, netdev, linux-kernel

Hi Ilias,

Ilias Apalodimas wrote 11.10.2019 15:23:
> Hi Alexander,
> 
> On Thu, Oct 10, 2019 at 05:42:24PM +0300, Alexander Lobakin wrote:
>> Hi Dave,
>> 
>> This series was written as a continuation to commit 323ebb61e32b
>> ("net: use listified RX for handling GRO_NORMAL skbs"), and also takes
>> an advantage of listified Rx for GRO. This time, however, we're
>> targeting at a way more common and used function, napi_gro_receive().
>> 
>> There are about ~100 call sites of this function, including gro_cells
>> and mac80211, so even wireless systems will benefit from it.
>> The only driver that cares about the return value is
>> ethernet/socionext/netsec, and only for updating statistics. I don't
>> believe that this change can break its functionality, but anyway,
>> we have plenty of time till next merge window to pay this change
>> a proper attention.
> 
> I don't think this will break anything on the netsec driver. Dropped 
> packets
> will still be properly accounted for
> 

Thank you for clarification. Do I need to mention you under separate 
Acked-by in v2?

>> 
>> Besides having this functionality implemented for napi_gro_frags()
>> users, the main reason is the solid performance boost that has been
>> shown during tests on 1-core MIPS board (with not yet mainlined
>> driver):
>> 
>> * no batching (5.4-rc2): ~450/450 Mbit/s
>> * with gro_normal_batch == 8: ~480/480 Mbit/s
>> * with gro_normal_batch == 16: ~500/500 Mbit/s
>> 
>> Applies on top of net-next.
>> Thanks.
>> 
>> Alexander Lobakin (2):
>>   net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()
>>   net: core: increase the default size of GRO_NORMAL skb lists to 
>> flush
>> 
>>  net/core/dev.c | 51 
>> +++++++++++++++++++++++++-------------------------
>>  1 file changed, 26 insertions(+), 25 deletions(-)
>> 
>> --
>> 2.23.0
>> 
> 
> Thanks
> /Ilias

Regards,
ᚷ ᛖ ᚢ ᚦ ᚠ ᚱ

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 0/2] net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()
  2019-10-11 12:27   ` Alexander Lobakin
@ 2019-10-11 12:32     ` Ilias Apalodimas
  0 siblings, 0 replies; 15+ messages in thread
From: Ilias Apalodimas @ 2019-10-11 12:32 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: David S. Miller, Edward Cree, Jiri Pirko, Eric Dumazet,
	Ido Schimmel, Paolo Abeni, Petr Machata, Sabrina Dubroca,
	Florian Fainelli, Jassi Brar, netdev, linux-kernel

Hi Alexander, 

On Fri, Oct 11, 2019 at 03:27:50PM +0300, Alexander Lobakin wrote:
> Hi Ilias,
> 
> Ilias Apalodimas wrote 11.10.2019 15:23:
> > Hi Alexander,
> > 
> > On Thu, Oct 10, 2019 at 05:42:24PM +0300, Alexander Lobakin wrote:
> > > Hi Dave,
> > > 
> > > This series was written as a continuation to commit 323ebb61e32b
> > > ("net: use listified RX for handling GRO_NORMAL skbs"), and also takes
> > > an advantage of listified Rx for GRO. This time, however, we're
> > > targeting at a way more common and used function, napi_gro_receive().
> > > 
> > > There are about ~100 call sites of this function, including gro_cells
> > > and mac80211, so even wireless systems will benefit from it.
> > > The only driver that cares about the return value is
> > > ethernet/socionext/netsec, and only for updating statistics. I don't
> > > believe that this change can break its functionality, but anyway,
> > > we have plenty of time till next merge window to pay this change
> > > a proper attention.
> > 
> > I don't think this will break anything on the netsec driver. Dropped
> > packets
> > will still be properly accounted for
> > 
> 
> Thank you for clarification. Do I need to mention you under separate
> Acked-by in v2?
> 

Well i only checked for the netsec part. I'll try having a look on the whole
patch and send a proper Acked-by if i get some free time!

> > > 
> > > Besides having this functionality implemented for napi_gro_frags()
> > > users, the main reason is the solid performance boost that has been
> > > shown during tests on 1-core MIPS board (with not yet mainlined
> > > driver):
> > > 
> > > * no batching (5.4-rc2): ~450/450 Mbit/s
> > > * with gro_normal_batch == 8: ~480/480 Mbit/s
> > > * with gro_normal_batch == 16: ~500/500 Mbit/s
> > > 
> > > Applies on top of net-next.
> > > Thanks.
> > > 
> > > Alexander Lobakin (2):
> > >   net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()
> > >   net: core: increase the default size of GRO_NORMAL skb lists to
> > > flush
> > > 
> > >  net/core/dev.c | 51
> > > +++++++++++++++++++++++++-------------------------
> > >  1 file changed, 26 insertions(+), 25 deletions(-)
> > > 
> > > --
> > > 2.23.0
> > > 
> > 
> > Thanks
> > /Ilias
> 
> Regards,
> ᚷ ᛖ ᚢ ᚦ ᚠ ᚱ

Regards
/Ilias

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 2/2] net: core: increase the default size of GRO_NORMAL skb lists to flush
  2019-10-11  7:23     ` Alexander Lobakin
@ 2019-10-12  9:22       ` Alexander Lobakin
  2019-10-12 11:18         ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Alexander Lobakin @ 2019-10-12  9:22 UTC (permalink / raw)
  To: Edward Cree
  Cc: David S. Miller, Jiri Pirko, Eric Dumazet, Ido Schimmel,
	Paolo Abeni, Petr Machata, Sabrina Dubroca, Florian Fainelli,
	Jassi Brar, Ilias Apalodimas, netdev, linux-kernel

Alexander Lobakin wrote 11.10.2019 10:23:
> Hi Edward,
> 
> Edward Cree wrote 10.10.2019 21:16:
>> On 10/10/2019 15:42, Alexander Lobakin wrote:
>>> Commit 323ebb61e32b ("net: use listified RX for handling GRO_NORMAL
>>> skbs") have introduced a sysctl variable gro_normal_batch for 
>>> defining
>>> a limit for listified Rx of GRO_NORMAL skbs. The initial value of 8 
>>> is
>>> purely arbitrary and has been chosen, I believe, as a minimal safe
>>> default.
>> 8 was chosen by performance tests on my setup with v1 of that patch;
>>  see https://www.spinics.net/lists/netdev/msg585001.html .
>> Sorry for not including that info in the final version of the patch.
>> While I didn't re-do tests on varying gro_normal_batch on the final
>>  version, I think changing it needs more evidence than just "we tested
>>  it; it's better".  In particular, increasing the batch size should be
>>  accompanied by demonstration that latency isn't increased in e.g. a
>>  multi-stream ping-pong test.
>> 
>>> However, several tests show that it's rather suboptimal and doesn't
>>> allow to take a full advantage of listified processing. The best and
>>> the most balanced results have been achieved with a batches of 16 
>>> skbs
>>> per flush.
>>> So double the default value to give a yet another boost for Rx path.
>> 
>>> It remains configurable via sysctl anyway, so may be fine-tuned for
>>> each hardware.
>> I see this as a reason to leave the default as it is; the combination
>>  of your tests and mine have established that the optimal size does
>>  vary (I found 16 to be 2% slower than 8 with my setup), so any
>>  tweaking of the default is likely only worthwhile if we have data
>>  over lots of different hardware combinations.
> 
> Agree, if you've got slower results on 16, we must leave the default
> value, as it seems to be VERY hardware- and driver- dependent.
> So, patch 2/2 is not actual any more (I supposed that it would likely
> go away before sending this series).

I've generated an another solution. Considering that gro_normal_batch
is very individual for every single case, maybe it would be better to
make it per-NAPI (or per-netdevice) variable rather than a global
across the kernel?
I think most of all network-capable configurations and systems has more
than one network device nowadays, and they might need different values
for achieving their bests.

One possible variant is:

#define THIS_DRIVER_GRO_NORMAL_BATCH	16

/* ... */

netif_napi_add(dev, napi, this_driver_rx_poll, NAPI_POLL_WEIGHT); /*
napi->gro_normal_batch will be set to the systcl value during NAPI
context initialization */
napi_set_gro_normal_batch(napi, THIS_DRIVER_GRO_NORMAL_BATCH); /* new
static inline helper, napi->gro_normal_batch will be set to the
driver-speficic value of 16 */

The second possible variant is to make gro_normal_batch sysctl
per-netdevice to tune it from userspace.
Or we can combine them into one to make it available for tweaking from
both driver and userspace, just like it's now with XPS CPUs setting.

If you'll find any of this reasonable and worth implementing, I'll come
with it in v2 after a proper testing.

> 
>>> Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
>>> ---
>>>  net/core/dev.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index a33f56b439ce..4f60444bb766 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -4189,7 +4189,7 @@ int dev_weight_tx_bias __read_mostly = 1;  /* 
>>> bias for output_queue quota */
>>>  int dev_rx_weight __read_mostly = 64;
>>>  int dev_tx_weight __read_mostly = 64;
>>>  /* Maximum number of GRO_NORMAL skbs to batch up for list-RX */
>>> -int gro_normal_batch __read_mostly = 8;
>>> +int gro_normal_batch __read_mostly = 16;
>>> 
>>>  /* Called with irq disabled */
>>>  static inline void ____napi_schedule(struct softnet_data *sd,
> 
> Regards,
> ᚷ ᛖ ᚢ ᚦ ᚠ ᚱ

Regards,
ᚷ ᛖ ᚢ ᚦ ᚠ ᚱ

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 2/2] net: core: increase the default size of GRO_NORMAL skb lists to flush
  2019-10-12  9:22       ` Alexander Lobakin
@ 2019-10-12 11:18         ` Eric Dumazet
  2019-10-12 11:51           ` Alexander Lobakin
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2019-10-12 11:18 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Edward Cree, David S. Miller, Jiri Pirko, Ido Schimmel,
	Paolo Abeni, Petr Machata, Sabrina Dubroca, Florian Fainelli,
	Jassi Brar, Ilias Apalodimas, netdev, LKML

On Sat, Oct 12, 2019 at 2:22 AM Alexander Lobakin <alobakin@dlink.ru> wrote:

>
> I've generated an another solution. Considering that gro_normal_batch
> is very individual for every single case, maybe it would be better to
> make it per-NAPI (or per-netdevice) variable rather than a global
> across the kernel?
> I think most of all network-capable configurations and systems has more
> than one network device nowadays, and they might need different values
> for achieving their bests.
>
> One possible variant is:
>
> #define THIS_DRIVER_GRO_NORMAL_BATCH    16
>
> /* ... */
>
> netif_napi_add(dev, napi, this_driver_rx_poll, NAPI_POLL_WEIGHT); /*
> napi->gro_normal_batch will be set to the systcl value during NAPI
> context initialization */
> napi_set_gro_normal_batch(napi, THIS_DRIVER_GRO_NORMAL_BATCH); /* new
> static inline helper, napi->gro_normal_batch will be set to the
> driver-speficic value of 16 */
>
> The second possible variant is to make gro_normal_batch sysctl
> per-netdevice to tune it from userspace.
> Or we can combine them into one to make it available for tweaking from
> both driver and userspace, just like it's now with XPS CPUs setting.
>
> If you'll find any of this reasonable and worth implementing, I'll come
> with it in v2 after a proper testing.

Most likely the optimal tuning is also a function of the host cpu caches.

Building a too big list can also lead to premature cache evictions.

Tuning the value on your test machines does not mean the value will be good
for other systems.

Adding yet another per device value should only be done if you demonstrate
a significant performance increase compared to the conservative value
Edward chose.

Also the behavior can be quite different depending on the protocols,
make sure you test handling of TCP pure ACK packets.

Accumulating 64 (in case the device uses standard NAPI_POLL_WEIGHT)
of them before entering upper stacks seems not a good choice, since 64 skbs
will need to be kept in the GRO system, compared to only 8 with Edward value.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 2/2] net: core: increase the default size of GRO_NORMAL skb lists to flush
  2019-10-12 11:18         ` Eric Dumazet
@ 2019-10-12 11:51           ` Alexander Lobakin
  0 siblings, 0 replies; 15+ messages in thread
From: Alexander Lobakin @ 2019-10-12 11:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Edward Cree, David S. Miller, Jiri Pirko, Ido Schimmel,
	Paolo Abeni, Petr Machata, Sabrina Dubroca, Florian Fainelli,
	Jassi Brar, Ilias Apalodimas, netdev, LKML

Hi Eric,

Eric Dumazet wrote 12.10.2019 14:18:
> On Sat, Oct 12, 2019 at 2:22 AM Alexander Lobakin <alobakin@dlink.ru> 
> wrote:
> 
>> 
>> I've generated an another solution. Considering that gro_normal_batch
>> is very individual for every single case, maybe it would be better to
>> make it per-NAPI (or per-netdevice) variable rather than a global
>> across the kernel?
>> I think most of all network-capable configurations and systems has 
>> more
>> than one network device nowadays, and they might need different values
>> for achieving their bests.
>> 
>> One possible variant is:
>> 
>> #define THIS_DRIVER_GRO_NORMAL_BATCH    16
>> 
>> /* ... */
>> 
>> netif_napi_add(dev, napi, this_driver_rx_poll, NAPI_POLL_WEIGHT); /*
>> napi->gro_normal_batch will be set to the systcl value during NAPI
>> context initialization */
>> napi_set_gro_normal_batch(napi, THIS_DRIVER_GRO_NORMAL_BATCH); /* new
>> static inline helper, napi->gro_normal_batch will be set to the
>> driver-speficic value of 16 */
>> 
>> The second possible variant is to make gro_normal_batch sysctl
>> per-netdevice to tune it from userspace.
>> Or we can combine them into one to make it available for tweaking from
>> both driver and userspace, just like it's now with XPS CPUs setting.
>> 
>> If you'll find any of this reasonable and worth implementing, I'll 
>> come
>> with it in v2 after a proper testing.
> 
> Most likely the optimal tuning is also a function of the host cpu 
> caches.
> 
> Building a too big list can also lead to premature cache evictions.
> 
> Tuning the value on your test machines does not mean the value will be 
> good
> for other systems.

Oh, I missed that it might be a lot more machine-dependent than
netdevice-dependent. Thank you for explanation. The best I can do in
that case is to leave batch control in its current.
I'll publish v2 containing only the acked first part of the series on
Monday if nothing serious will happen. Addition of listified Rx to
napi_gro_receive() was the main goal anyway.

> 
> Adding yet another per device value should only be done if you 
> demonstrate
> a significant performance increase compared to the conservative value
> Edward chose.
> 
> Also the behavior can be quite different depending on the protocols,
> make sure you test handling of TCP pure ACK packets.
> 
> Accumulating 64 (in case the device uses standard NAPI_POLL_WEIGHT)
> of them before entering upper stacks seems not a good choice, since 64 
> skbs
> will need to be kept in the GRO system, compared to only 8 with Edward 
> value.

Regards,
ᚷ ᛖ ᚢ ᚦ ᚠ ᚱ

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-10-12 11:53 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-10 14:42 [PATCH net-next 0/2] net: core: use listified Rx for GRO_NORMAL in napi_gro_receive() Alexander Lobakin
2019-10-10 14:42 ` [PATCH net-next1/2] " Alexander Lobakin
2019-10-10 18:23   ` Edward Cree
2019-10-11  7:26     ` Alexander Lobakin
2019-10-11  9:20       ` Edward Cree
2019-10-11  9:24         ` Alexander Lobakin
2019-10-10 14:42 ` [PATCH net-next 2/2] net: core: increase the default size of GRO_NORMAL skb lists to flush Alexander Lobakin
2019-10-10 18:16   ` Edward Cree
2019-10-11  7:23     ` Alexander Lobakin
2019-10-12  9:22       ` Alexander Lobakin
2019-10-12 11:18         ` Eric Dumazet
2019-10-12 11:51           ` Alexander Lobakin
2019-10-11 12:23 ` [PATCH net-next 0/2] net: core: use listified Rx for GRO_NORMAL in napi_gro_receive() Ilias Apalodimas
2019-10-11 12:27   ` Alexander Lobakin
2019-10-11 12:32     ` Ilias Apalodimas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.