All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next] net: properly flush delay-freed skbs
@ 2016-11-23 16:44 Eric Dumazet
  2016-11-23 17:12 ` Alexander Duyck
  2016-11-26  0:49 ` David Miller
  0 siblings, 2 replies; 4+ messages in thread
From: Eric Dumazet @ 2016-11-23 16:44 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jesper Dangaard Brouer, Alexander Duyck

From: Eric Dumazet <edumazet@google.com>

Typical NAPI drivers use napi_consume_skb(skb) at TX completion time.
This put skb in a percpu special queue, napi_alloc_cache, to get bulk
frees.

It turns out the queue is not flushed and hits the NAPI_SKB_CACHE_SIZE
limit quite often, with skbs that were queued hundreds of usec earlier.
I measured this can take ~6000 nsec to perform one flush.

__kfree_skb_flush() can be called from two points right now :

1) From net_tx_action(), but only for skbs that were queued to
sd->completion_queue.

 -> Irrelevant for NAPI drivers in normal operation.

2) From net_rx_action(), but only under high stress or if RPS/RFS has a
pending action.

This patch changes net_rx_action() to perform the flush in all cases and
after more urgent operations happened (like kicking remote CPUS for
RPS/RFS).

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
---
 net/core/dev.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index f71b34ab57a5132647729d20e21376d362d4e630..048b46b7c92ae10080226ea7050fad3529920baa 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5260,7 +5260,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
 
 		if (list_empty(&list)) {
 			if (!sd_has_rps_ipi_waiting(sd) && list_empty(&repoll))
-				return;
+				goto out;
 			break;
 		}
 
@@ -5278,7 +5278,6 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
 		}
 	}
 
-	__kfree_skb_flush();
 	local_irq_disable();
 
 	list_splice_tail_init(&sd->poll_list, &list);
@@ -5288,6 +5287,8 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
 		__raise_softirq_irqoff(NET_RX_SOFTIRQ);
 
 	net_rps_action_and_irq_enable(sd);
+out:
+	__kfree_skb_flush();
 }
 
 struct netdev_adjacent {

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next] net: properly flush delay-freed skbs
  2016-11-23 16:44 [PATCH net-next] net: properly flush delay-freed skbs Eric Dumazet
@ 2016-11-23 17:12 ` Alexander Duyck
  2016-11-23 18:11   ` Jesper Dangaard Brouer
  2016-11-26  0:49 ` David Miller
  1 sibling, 1 reply; 4+ messages in thread
From: Alexander Duyck @ 2016-11-23 17:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Jesper Dangaard Brouer, Alexander Duyck

On Wed, Nov 23, 2016 at 8:44 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> Typical NAPI drivers use napi_consume_skb(skb) at TX completion time.
> This put skb in a percpu special queue, napi_alloc_cache, to get bulk
> frees.
>
> It turns out the queue is not flushed and hits the NAPI_SKB_CACHE_SIZE
> limit quite often, with skbs that were queued hundreds of usec earlier.
> I measured this can take ~6000 nsec to perform one flush.
>
> __kfree_skb_flush() can be called from two points right now :
>
> 1) From net_tx_action(), but only for skbs that were queued to
> sd->completion_queue.
>
>  -> Irrelevant for NAPI drivers in normal operation.
>
> 2) From net_rx_action(), but only under high stress or if RPS/RFS has a
> pending action.
>
> This patch changes net_rx_action() to perform the flush in all cases and
> after more urgent operations happened (like kicking remote CPUS for
> RPS/RFS).
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
> ---

Yeah, we didn't intent the data to be sitting around that long.  The
change looks good to me.

Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next] net: properly flush delay-freed skbs
  2016-11-23 17:12 ` Alexander Duyck
@ 2016-11-23 18:11   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 4+ messages in thread
From: Jesper Dangaard Brouer @ 2016-11-23 18:11 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Eric Dumazet, David Miller, netdev, Alexander Duyck, brouer

On Wed, 23 Nov 2016 09:12:50 -0800
Alexander Duyck <alexander.duyck@gmail.com> wrote:

> On Wed, Nov 23, 2016 at 8:44 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > Typical NAPI drivers use napi_consume_skb(skb) at TX completion time.
> > This put skb in a percpu special queue, napi_alloc_cache, to get bulk
> > frees.
> >
> > It turns out the queue is not flushed and hits the NAPI_SKB_CACHE_SIZE
> > limit quite often, with skbs that were queued hundreds of usec earlier.
> > I measured this can take ~6000 nsec to perform one flush.
> >
> > __kfree_skb_flush() can be called from two points right now :
> >
> > 1) From net_tx_action(), but only for skbs that were queued to
> > sd->completion_queue.
> >  
> >  -> Irrelevant for NAPI drivers in normal operation.  
> >
> > 2) From net_rx_action(), but only under high stress or if RPS/RFS has a
> > pending action.
> >
> > This patch changes net_rx_action() to perform the flush in all cases and
> > after more urgent operations happened (like kicking remote CPUS for
> > RPS/RFS).
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> > Cc: Alexander Duyck <alexander.h.duyck@intel.com>
> > ---  
> 
> Yeah, we didn't intent the data to be sitting around that long.  The
> change looks good to me.
> 
> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>

Also looks good to me! Thanks for catching this.

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next] net: properly flush delay-freed skbs
  2016-11-23 16:44 [PATCH net-next] net: properly flush delay-freed skbs Eric Dumazet
  2016-11-23 17:12 ` Alexander Duyck
@ 2016-11-26  0:49 ` David Miller
  1 sibling, 0 replies; 4+ messages in thread
From: David Miller @ 2016-11-26  0:49 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, brouer, alexander.h.duyck

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 23 Nov 2016 08:44:56 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Typical NAPI drivers use napi_consume_skb(skb) at TX completion time.
> This put skb in a percpu special queue, napi_alloc_cache, to get bulk
> frees.
> 
> It turns out the queue is not flushed and hits the NAPI_SKB_CACHE_SIZE
> limit quite often, with skbs that were queued hundreds of usec earlier.
> I measured this can take ~6000 nsec to perform one flush.
> 
> __kfree_skb_flush() can be called from two points right now :
> 
> 1) From net_tx_action(), but only for skbs that were queued to
> sd->completion_queue.
> 
>  -> Irrelevant for NAPI drivers in normal operation.
> 
> 2) From net_rx_action(), but only under high stress or if RPS/RFS has a
> pending action.
> 
> This patch changes net_rx_action() to perform the flush in all cases and
> after more urgent operations happened (like kicking remote CPUS for
> RPS/RFS).
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Alexander Duyck <alexander.h.duyck@intel.com>

Applied.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-11-26  0:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-23 16:44 [PATCH net-next] net: properly flush delay-freed skbs Eric Dumazet
2016-11-23 17:12 ` Alexander Duyck
2016-11-23 18:11   ` Jesper Dangaard Brouer
2016-11-26  0:49 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.