All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/4] net: batched receive in GRO path
@ 2018-09-06 14:24 Edward Cree
  2018-09-06 14:26 ` [PATCH v2 net-next 1/4] net: introduce list entry point for GRO Edward Cree
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Edward Cree @ 2018-09-06 14:24 UTC (permalink / raw)
  To: davem; +Cc: linux-net-drivers, netdev

This series listifies part of GRO processing, in a manner which allows those
 packets which are not GROed (i.e. for which dev_gro_receive returns
 GRO_NORMAL) to be passed on to the listified regular receive path.
I have not listified dev_gro_receive() itself, or the per-protocol GRO
 callback, since GRO's need to hold packets on lists under napi->gro_hash
 makes keeping the packets on other lists awkward, and since the GRO control
 block state of held skbs can refer only to one 'new' skb at a time.
 Nonetheless the batching of the calling code yields some performance gains
 in the GRO case as well.

Herewith the performance figures obtained in a NetPerf TCP stream test (with
 four streams, and irqs bound to a single core):
net-next: 7.166 Gbit/s (sigma 0.435)
after #2: 7.715 Gbit/s (sigma 0.145) = datum + 7.7%
after #4: 7.890 Gbit/s (sigma 0.217) = datum + 10.1%
(Note that the 'net-next' results were distinctly bimodal, with two results
 of about 8 Gbit/s and the remaining ten around 7 Gbit/s.  I don't have a
 good explanation for this.)
And with GRO disabled through ethtool -K (thus simulating traffic which is
 not GRO-able but, being TCP, is still passed to the GRO entry point):
net-next: 4.756 Gbit/s (sigma 0.240)
after #4: 5.355 Gbit/s (sigma 0.232) = datum + 12.6%

v2: Rebased on latest net-next.  Removed RFC tags.  Otherwise unchanged
 owing to lack of comments on v1.

Edward Cree (4):
  net: introduce list entry point for GRO
  sfc: use batched receive for GRO
  net: make listified RX functions return number of good packets
  net/core: handle GRO_NORMAL skbs as a list in napi_gro_receive_list

 drivers/net/ethernet/sfc/efx.c        |  11 +++-
 drivers/net/ethernet/sfc/net_driver.h |   1 +
 drivers/net/ethernet/sfc/rx.c         |  16 +++++-
 include/linux/netdevice.h             |   6 +-
 include/net/ip.h                      |   4 +-
 include/net/ipv6.h                    |   4 +-
 net/core/dev.c                        | 104 ++++++++++++++++++++++++++--------
 net/ipv4/ip_input.c                   |  39 ++++++++-----
 net/ipv6/ip6_input.c                  |  37 +++++++-----
 9 files changed, 157 insertions(+), 65 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 net-next 1/4] net: introduce list entry point for GRO
  2018-09-06 14:24 [PATCH v2 net-next 0/4] net: batched receive in GRO path Edward Cree
@ 2018-09-06 14:26 ` Edward Cree
  2018-09-06 14:26 ` [PATCH v2 net-next 2/4] sfc: use batched receive " Edward Cree
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Edward Cree @ 2018-09-06 14:26 UTC (permalink / raw)
  To: davem; +Cc: linux-net-drivers, netdev

Also export napi_frags_skb() so that drivers using the napi_gro_frags()
 interface can prepare their SKBs properly for submitting on such a list.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 include/linux/netdevice.h |  2 ++
 net/core/dev.c            | 28 +++++++++++++++++++++++++++-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e2b3bd750c98..2b53536b1d99 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3548,8 +3548,10 @@ int netif_receive_skb(struct sk_buff *skb);
 int netif_receive_skb_core(struct sk_buff *skb);
 void netif_receive_skb_list(struct list_head *head);
 gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb);
+int napi_gro_receive_list(struct napi_struct *napi, struct list_head *head);
 void napi_gro_flush(struct napi_struct *napi, bool flush_old);
 struct sk_buff *napi_get_frags(struct napi_struct *napi);
+struct sk_buff *napi_frags_skb(struct napi_struct *napi);
 gro_result_t napi_gro_frags(struct napi_struct *napi);
 struct packet_offload *gro_find_receive_by_type(__be16 type);
 struct packet_offload *gro_find_complete_by_type(__be16 type);
diff --git a/net/core/dev.c b/net/core/dev.c
index ca78dc5a79a3..8df39ded77bd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5598,6 +5598,31 @@ gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
 }
 EXPORT_SYMBOL(napi_gro_receive);
 
+/* Returns the number of SKBs on the list successfully received */
+int napi_gro_receive_list(struct napi_struct *napi, struct list_head *head)
+{
+	struct sk_buff *skb, *next;
+	gro_result_t result;
+	int kept = 0;
+
+	list_for_each_entry(skb, head, list) {
+		skb_mark_napi_id(skb, napi);
+		trace_napi_gro_receive_entry(skb);
+		skb_gro_reset_offset(skb);
+	}
+
+	list_for_each_entry_safe(skb, next, head, list) {
+		list_del(&skb->list);
+		skb->next = NULL;
+		result = dev_gro_receive(napi, skb);
+		result = napi_skb_finish(result, skb);
+		if (result != GRO_DROP)
+			kept++;
+	}
+	return kept;
+}
+EXPORT_SYMBOL(napi_gro_receive_list);
+
 static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb)
 {
 	if (unlikely(skb->pfmemalloc)) {
@@ -5669,7 +5694,7 @@ static gro_result_t napi_frags_finish(struct napi_struct *napi,
  * Drivers could call both napi_gro_frags() and napi_gro_receive()
  * We copy ethernet header into skb->data to have a common layout.
  */
-static struct sk_buff *napi_frags_skb(struct napi_struct *napi)
+struct sk_buff *napi_frags_skb(struct napi_struct *napi)
 {
 	struct sk_buff *skb = napi->skb;
 	const struct ethhdr *eth;
@@ -5705,6 +5730,7 @@ static struct sk_buff *napi_frags_skb(struct napi_struct *napi)
 
 	return skb;
 }
+EXPORT_SYMBOL(napi_frags_skb);
 
 gro_result_t napi_gro_frags(struct napi_struct *napi)
 {

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 net-next 2/4] sfc: use batched receive for GRO
  2018-09-06 14:24 [PATCH v2 net-next 0/4] net: batched receive in GRO path Edward Cree
  2018-09-06 14:26 ` [PATCH v2 net-next 1/4] net: introduce list entry point for GRO Edward Cree
@ 2018-09-06 14:26 ` Edward Cree
  2018-09-06 14:26 ` [PATCH v2 net-next 3/4] net: make listified RX functions return number of good packets Edward Cree
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Edward Cree @ 2018-09-06 14:26 UTC (permalink / raw)
  To: davem; +Cc: linux-net-drivers, netdev

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/efx.c        | 11 +++++++++--
 drivers/net/ethernet/sfc/net_driver.h |  1 +
 drivers/net/ethernet/sfc/rx.c         | 16 +++++++++++++---
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 330233286e78..dba13a28014c 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -263,9 +263,9 @@ static int efx_check_disabled(struct efx_nic *efx)
  */
 static int efx_process_channel(struct efx_channel *channel, int budget)
 {
+	struct list_head rx_list, gro_list;
 	struct efx_tx_queue *tx_queue;
-	struct list_head rx_list;
-	int spent;
+	int spent, gro_count;
 
 	if (unlikely(!channel->enabled))
 		return 0;
@@ -275,6 +275,10 @@ static int efx_process_channel(struct efx_channel *channel, int budget)
 	INIT_LIST_HEAD(&rx_list);
 	channel->rx_list = &rx_list;
 
+	EFX_WARN_ON_PARANOID(channel->gro_list != NULL);
+	INIT_LIST_HEAD(&gro_list);
+	channel->gro_list = &gro_list;
+
 	efx_for_each_channel_tx_queue(tx_queue, channel) {
 		tx_queue->pkts_compl = 0;
 		tx_queue->bytes_compl = 0;
@@ -300,6 +304,9 @@ static int efx_process_channel(struct efx_channel *channel, int budget)
 	/* Receive any packets we queued up */
 	netif_receive_skb_list(channel->rx_list);
 	channel->rx_list = NULL;
+	gro_count = napi_gro_receive_list(&channel->napi_str, channel->gro_list);
+	channel->irq_mod_score += gro_count * 2;
+	channel->gro_list = NULL;
 
 	return spent;
 }
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 961b92979640..72addac7a84a 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -502,6 +502,7 @@ struct efx_channel {
 	unsigned int rx_pkt_index;
 
 	struct list_head *rx_list;
+	struct list_head *gro_list;
 
 	struct efx_rx_queue rx_queue;
 	struct efx_tx_queue tx_queue[EFX_TXQ_TYPES];
diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 396ff01298cd..0534a54048c6 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -453,9 +453,19 @@ efx_rx_packet_gro(struct efx_channel *channel, struct efx_rx_buffer *rx_buf,
 
 	skb_record_rx_queue(skb, channel->rx_queue.core_index);
 
-	gro_result = napi_gro_frags(napi);
-	if (gro_result != GRO_DROP)
-		channel->irq_mod_score += 2;
+	/* Pass the packet up */
+	if (channel->gro_list != NULL) {
+		/* Clear napi->skb and prepare skb for GRO */
+		skb = napi_frags_skb(napi);
+		if (skb)
+			/* Add to list, will pass up later */
+			list_add_tail(&skb->list, channel->gro_list);
+	} else {
+		/* No list, so pass it up now */
+		gro_result = napi_gro_frags(napi);
+		if (gro_result != GRO_DROP)
+			channel->irq_mod_score += 2;
+	}
 }
 
 /* Allocate and construct an SKB around page fragments */

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 net-next 3/4] net: make listified RX functions return number of good packets
  2018-09-06 14:24 [PATCH v2 net-next 0/4] net: batched receive in GRO path Edward Cree
  2018-09-06 14:26 ` [PATCH v2 net-next 1/4] net: introduce list entry point for GRO Edward Cree
  2018-09-06 14:26 ` [PATCH v2 net-next 2/4] sfc: use batched receive " Edward Cree
@ 2018-09-06 14:26 ` Edward Cree
  2018-09-07  2:27   ` Eric Dumazet
  2018-09-06 14:26 ` [PATCH v2 net-next 4/4] net/core: handle GRO_NORMAL skbs as a list in napi_gro_receive_list Edward Cree
  2018-09-07  2:32 ` [PATCH v2 net-next 0/4] net: batched receive in GRO path Eric Dumazet
  4 siblings, 1 reply; 11+ messages in thread
From: Edward Cree @ 2018-09-06 14:26 UTC (permalink / raw)
  To: davem; +Cc: linux-net-drivers, netdev

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 include/linux/netdevice.h |  4 +--
 include/net/ip.h          |  4 +--
 include/net/ipv6.h        |  4 +--
 net/core/dev.c            | 63 +++++++++++++++++++++++++++++------------------
 net/ipv4/ip_input.c       | 39 ++++++++++++++++++-----------
 net/ipv6/ip6_input.c      | 37 +++++++++++++++++-----------
 6 files changed, 92 insertions(+), 59 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2b53536b1d99..9b3fc5944ba5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2349,7 +2349,7 @@ struct packet_type {
 					 struct net_device *,
 					 struct packet_type *,
 					 struct net_device *);
-	void			(*list_func) (struct list_head *,
+	int			(*list_func) (struct list_head *,
 					      struct packet_type *,
 					      struct net_device *);
 	bool			(*id_match)(struct packet_type *ptype,
@@ -3546,7 +3546,7 @@ int netif_rx(struct sk_buff *skb);
 int netif_rx_ni(struct sk_buff *skb);
 int netif_receive_skb(struct sk_buff *skb);
 int netif_receive_skb_core(struct sk_buff *skb);
-void netif_receive_skb_list(struct list_head *head);
+int netif_receive_skb_list(struct list_head *head);
 gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb);
 int napi_gro_receive_list(struct napi_struct *napi, struct list_head *head);
 void napi_gro_flush(struct napi_struct *napi, bool flush_old);
diff --git a/include/net/ip.h b/include/net/ip.h
index e44b1a44f67a..aab1f7eea1e1 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -152,8 +152,8 @@ int ip_build_and_send_pkt(struct sk_buff *skb, const struct sock *sk,
 			  struct ip_options_rcu *opt);
 int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
 	   struct net_device *orig_dev);
-void ip_list_rcv(struct list_head *head, struct packet_type *pt,
-		 struct net_device *orig_dev);
+int ip_list_rcv(struct list_head *head, struct packet_type *pt,
+		struct net_device *orig_dev);
 int ip_local_deliver(struct sk_buff *skb);
 int ip_mr_input(struct sk_buff *skb);
 int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb);
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index ff33f498c137..f15651eabfe0 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -914,8 +914,8 @@ static inline __be32 flowi6_get_flowlabel(const struct flowi6 *fl6)
 
 int ipv6_rcv(struct sk_buff *skb, struct net_device *dev,
 	     struct packet_type *pt, struct net_device *orig_dev);
-void ipv6_list_rcv(struct list_head *head, struct packet_type *pt,
-		   struct net_device *orig_dev);
+int ipv6_list_rcv(struct list_head *head, struct packet_type *pt,
+		  struct net_device *orig_dev);
 
 int ip6_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb);
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 8df39ded77bd..69e2819994e4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4922,24 +4922,27 @@ int netif_receive_skb_core(struct sk_buff *skb)
 }
 EXPORT_SYMBOL(netif_receive_skb_core);
 
-static inline void __netif_receive_skb_list_ptype(struct list_head *head,
-						  struct packet_type *pt_prev,
-						  struct net_device *orig_dev)
+static inline int __netif_receive_skb_list_ptype(struct list_head *head,
+						 struct packet_type *pt_prev,
+						 struct net_device *orig_dev)
 {
 	struct sk_buff *skb, *next;
+	int kept = 0;
 
 	if (!pt_prev)
-		return;
+		return 0;
 	if (list_empty(head))
-		return;
+		return 0;
 	if (pt_prev->list_func != NULL)
-		pt_prev->list_func(head, pt_prev, orig_dev);
+		kept = pt_prev->list_func(head, pt_prev, orig_dev);
 	else
 		list_for_each_entry_safe(skb, next, head, list)
-			pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
+			if (pt_prev->func(skb, skb->dev, pt_prev, orig_dev) == NET_RX_SUCCESS)
+				kept++;
+	return kept;
 }
 
-static void __netif_receive_skb_list_core(struct list_head *head, bool pfmemalloc)
+static int __netif_receive_skb_list_core(struct list_head *head, bool pfmemalloc)
 {
 	/* Fast-path assumptions:
 	 * - There is no RX handler.
@@ -4956,6 +4959,7 @@ static void __netif_receive_skb_list_core(struct list_head *head, bool pfmemallo
 	struct net_device *od_curr = NULL;
 	struct list_head sublist;
 	struct sk_buff *skb, *next;
+	int kept = 0, ret;
 
 	INIT_LIST_HEAD(&sublist);
 	list_for_each_entry_safe(skb, next, head, list) {
@@ -4963,12 +4967,15 @@ static void __netif_receive_skb_list_core(struct list_head *head, bool pfmemallo
 		struct packet_type *pt_prev = NULL;
 
 		list_del(&skb->list);
-		__netif_receive_skb_core(skb, pfmemalloc, &pt_prev);
-		if (!pt_prev)
+		ret = __netif_receive_skb_core(skb, pfmemalloc, &pt_prev);
+		if (!pt_prev) {
+			if (ret == NET_RX_SUCCESS)
+				kept++;
 			continue;
+		}
 		if (pt_curr != pt_prev || od_curr != orig_dev) {
 			/* dispatch old sublist */
-			__netif_receive_skb_list_ptype(&sublist, pt_curr, od_curr);
+			kept += __netif_receive_skb_list_ptype(&sublist, pt_curr, od_curr);
 			/* start new sublist */
 			INIT_LIST_HEAD(&sublist);
 			pt_curr = pt_prev;
@@ -4978,7 +4985,8 @@ static void __netif_receive_skb_list_core(struct list_head *head, bool pfmemallo
 	}
 
 	/* dispatch final sublist */
-	__netif_receive_skb_list_ptype(&sublist, pt_curr, od_curr);
+	kept += __netif_receive_skb_list_ptype(&sublist, pt_curr, od_curr);
+	return kept;
 }
 
 static int __netif_receive_skb(struct sk_buff *skb)
@@ -5006,11 +5014,12 @@ static int __netif_receive_skb(struct sk_buff *skb)
 	return ret;
 }
 
-static void __netif_receive_skb_list(struct list_head *head)
+static int __netif_receive_skb_list(struct list_head *head)
 {
 	unsigned long noreclaim_flag = 0;
 	struct sk_buff *skb, *next;
 	bool pfmemalloc = false; /* Is current sublist PF_MEMALLOC? */
+	int kept = 0;
 
 	list_for_each_entry_safe(skb, next, head, list) {
 		if ((sk_memalloc_socks() && skb_pfmemalloc(skb)) != pfmemalloc) {
@@ -5019,7 +5028,7 @@ static void __netif_receive_skb_list(struct list_head *head)
 			/* Handle the previous sublist */
 			list_cut_before(&sublist, head, &skb->list);
 			if (!list_empty(&sublist))
-				__netif_receive_skb_list_core(&sublist, pfmemalloc);
+				kept += __netif_receive_skb_list_core(&sublist, pfmemalloc);
 			pfmemalloc = !pfmemalloc;
 			/* See comments in __netif_receive_skb */
 			if (pfmemalloc)
@@ -5030,10 +5039,11 @@ static void __netif_receive_skb_list(struct list_head *head)
 	}
 	/* Handle the remaining sublist */
 	if (!list_empty(head))
-		__netif_receive_skb_list_core(head, pfmemalloc);
+		kept += __netif_receive_skb_list_core(head, pfmemalloc);
 	/* Restore pflags */
 	if (pfmemalloc)
 		memalloc_noreclaim_restore(noreclaim_flag);
+	return kept;
 }
 
 static int generic_xdp_install(struct net_device *dev, struct netdev_bpf *xdp)
@@ -5109,17 +5119,20 @@ static int netif_receive_skb_internal(struct sk_buff *skb)
 	return ret;
 }
 
-static void netif_receive_skb_list_internal(struct list_head *head)
+static int netif_receive_skb_list_internal(struct list_head *head)
 {
 	struct bpf_prog *xdp_prog = NULL;
 	struct sk_buff *skb, *next;
 	struct list_head sublist;
+	int kept = 0;
 
 	INIT_LIST_HEAD(&sublist);
 	list_for_each_entry_safe(skb, next, head, list) {
 		net_timestamp_check(netdev_tstamp_prequeue, skb);
 		list_del(&skb->list);
-		if (!skb_defer_rx_timestamp(skb))
+		if (skb_defer_rx_timestamp(skb))
+			kept++;
+		else
 			list_add_tail(&skb->list, &sublist);
 	}
 	list_splice_init(&sublist, head);
@@ -5149,13 +5162,15 @@ static void netif_receive_skb_list_internal(struct list_head *head)
 			if (cpu >= 0) {
 				/* Will be handled, remove from list */
 				list_del(&skb->list);
-				enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
+				if (enqueue_to_backlog(skb, cpu, &rflow->last_qtail) == NET_RX_SUCCESS)
+					kept++;
 			}
 		}
 	}
 #endif
-	__netif_receive_skb_list(head);
+	kept += __netif_receive_skb_list(head);
 	rcu_read_unlock();
+	return kept;
 }
 
 /**
@@ -5185,21 +5200,21 @@ EXPORT_SYMBOL(netif_receive_skb);
  *	netif_receive_skb_list - process many receive buffers from network
  *	@head: list of skbs to process.
  *
- *	Since return value of netif_receive_skb() is normally ignored, and
- *	wouldn't be meaningful for a list, this function returns void.
+ *	Returns the number of skbs for which netif_receive_skb() would have
+ *	returned %NET_RX_SUCCESS.
  *
  *	This function may only be called from softirq context and interrupts
  *	should be enabled.
  */
-void netif_receive_skb_list(struct list_head *head)
+int netif_receive_skb_list(struct list_head *head)
 {
 	struct sk_buff *skb;
 
 	if (list_empty(head))
-		return;
+		return 0;
 	list_for_each_entry(skb, head, list)
 		trace_netif_receive_skb_list_entry(skb);
-	netif_receive_skb_list_internal(head);
+	return netif_receive_skb_list_internal(head);
 }
 EXPORT_SYMBOL(netif_receive_skb_list);
 
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index 3196cf58f418..75cc5a6ef9b8 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -526,9 +526,10 @@ int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
 		       ip_rcv_finish);
 }
 
-static void ip_sublist_rcv_finish(struct list_head *head)
+static int ip_sublist_rcv_finish(struct list_head *head)
 {
 	struct sk_buff *skb, *next;
+	int kept = 0;
 
 	list_for_each_entry_safe(skb, next, head, list) {
 		list_del(&skb->list);
@@ -536,16 +537,19 @@ static void ip_sublist_rcv_finish(struct list_head *head)
 		 * another kind of SKB-list usage (see validate_xmit_skb_list)
 		 */
 		skb->next = NULL;
-		dst_input(skb);
+		if (dst_input(skb) == NET_RX_SUCCESS)
+			kept++;
 	}
+	return kept;
 }
 
-static void ip_list_rcv_finish(struct net *net, struct sock *sk,
-			       struct list_head *head)
+static int ip_list_rcv_finish(struct net *net, struct sock *sk,
+			      struct list_head *head)
 {
 	struct dst_entry *curr_dst = NULL;
 	struct sk_buff *skb, *next;
 	struct list_head sublist;
+	int kept = 0;
 
 	INIT_LIST_HEAD(&sublist);
 	list_for_each_entry_safe(skb, next, head, list) {
@@ -556,8 +560,10 @@ static void ip_list_rcv_finish(struct net *net, struct sock *sk,
 		 * skb to its handler for processing
 		 */
 		skb = l3mdev_ip_rcv(skb);
-		if (!skb)
+		if (!skb) {
+			kept++;
 			continue;
+		}
 		if (ip_rcv_finish_core(net, sk, skb) == NET_RX_DROP)
 			continue;
 
@@ -565,7 +571,7 @@ static void ip_list_rcv_finish(struct net *net, struct sock *sk,
 		if (curr_dst != dst) {
 			/* dispatch old sublist */
 			if (!list_empty(&sublist))
-				ip_sublist_rcv_finish(&sublist);
+				kept += ip_sublist_rcv_finish(&sublist);
 			/* start new sublist */
 			INIT_LIST_HEAD(&sublist);
 			curr_dst = dst;
@@ -573,25 +579,27 @@ static void ip_list_rcv_finish(struct net *net, struct sock *sk,
 		list_add_tail(&skb->list, &sublist);
 	}
 	/* dispatch final sublist */
-	ip_sublist_rcv_finish(&sublist);
+	kept += ip_sublist_rcv_finish(&sublist);
+	return kept;
 }
 
-static void ip_sublist_rcv(struct list_head *head, struct net_device *dev,
-			   struct net *net)
+static int ip_sublist_rcv(struct list_head *head, struct net_device *dev,
+			  struct net *net)
 {
 	NF_HOOK_LIST(NFPROTO_IPV4, NF_INET_PRE_ROUTING, net, NULL,
 		     head, dev, NULL, ip_rcv_finish);
-	ip_list_rcv_finish(net, NULL, head);
+	return ip_list_rcv_finish(net, NULL, head);
 }
 
-/* Receive a list of IP packets */
-void ip_list_rcv(struct list_head *head, struct packet_type *pt,
-		 struct net_device *orig_dev)
+/* Receive a list of IP packets; return number of successful receives */
+int ip_list_rcv(struct list_head *head, struct packet_type *pt,
+		struct net_device *orig_dev)
 {
 	struct net_device *curr_dev = NULL;
 	struct net *curr_net = NULL;
 	struct sk_buff *skb, *next;
 	struct list_head sublist;
+	int kept = 0;
 
 	INIT_LIST_HEAD(&sublist);
 	list_for_each_entry_safe(skb, next, head, list) {
@@ -606,7 +614,7 @@ void ip_list_rcv(struct list_head *head, struct packet_type *pt,
 		if (curr_dev != dev || curr_net != net) {
 			/* dispatch old sublist */
 			if (!list_empty(&sublist))
-				ip_sublist_rcv(&sublist, curr_dev, curr_net);
+				kept += ip_sublist_rcv(&sublist, curr_dev, curr_net);
 			/* start new sublist */
 			INIT_LIST_HEAD(&sublist);
 			curr_dev = dev;
@@ -615,5 +623,6 @@ void ip_list_rcv(struct list_head *head, struct packet_type *pt,
 		list_add_tail(&skb->list, &sublist);
 	}
 	/* dispatch final sublist */
-	ip_sublist_rcv(&sublist, curr_dev, curr_net);
+	kept += ip_sublist_rcv(&sublist, curr_dev, curr_net);
+	return kept;
 }
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 6242682be876..e64b830c9f0f 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -76,20 +76,24 @@ int ip6_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
 	return dst_input(skb);
 }
 
-static void ip6_sublist_rcv_finish(struct list_head *head)
+static int ip6_sublist_rcv_finish(struct list_head *head)
 {
 	struct sk_buff *skb, *next;
+	int kept = 0;
 
 	list_for_each_entry_safe(skb, next, head, list)
-		dst_input(skb);
+		if (dst_input(skb) == NET_RX_SUCCESS)
+			kept++;
+	return kept;
 }
 
-static void ip6_list_rcv_finish(struct net *net, struct sock *sk,
-				struct list_head *head)
+static int ip6_list_rcv_finish(struct net *net, struct sock *sk,
+			       struct list_head *head)
 {
 	struct dst_entry *curr_dst = NULL;
 	struct sk_buff *skb, *next;
 	struct list_head sublist;
+	int kept = 0;
 
 	INIT_LIST_HEAD(&sublist);
 	list_for_each_entry_safe(skb, next, head, list) {
@@ -100,14 +104,16 @@ static void ip6_list_rcv_finish(struct net *net, struct sock *sk,
 		 * skb to its handler for processing
 		 */
 		skb = l3mdev_ip6_rcv(skb);
-		if (!skb)
+		if (!skb) {
+			kept++;
 			continue;
+		}
 		ip6_rcv_finish_core(net, sk, skb);
 		dst = skb_dst(skb);
 		if (curr_dst != dst) {
 			/* dispatch old sublist */
 			if (!list_empty(&sublist))
-				ip6_sublist_rcv_finish(&sublist);
+				kept += ip6_sublist_rcv_finish(&sublist);
 			/* start new sublist */
 			INIT_LIST_HEAD(&sublist);
 			curr_dst = dst;
@@ -115,7 +121,8 @@ static void ip6_list_rcv_finish(struct net *net, struct sock *sk,
 		list_add_tail(&skb->list, &sublist);
 	}
 	/* dispatch final sublist */
-	ip6_sublist_rcv_finish(&sublist);
+	kept += ip6_sublist_rcv_finish(&sublist);
+	return kept;
 }
 
 static struct sk_buff *ip6_rcv_core(struct sk_buff *skb, struct net_device *dev,
@@ -273,22 +280,23 @@ int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt
 		       ip6_rcv_finish);
 }
 
-static void ip6_sublist_rcv(struct list_head *head, struct net_device *dev,
-			    struct net *net)
+static int ip6_sublist_rcv(struct list_head *head, struct net_device *dev,
+			   struct net *net)
 {
 	NF_HOOK_LIST(NFPROTO_IPV6, NF_INET_PRE_ROUTING, net, NULL,
 		     head, dev, NULL, ip6_rcv_finish);
-	ip6_list_rcv_finish(net, NULL, head);
+	return ip6_list_rcv_finish(net, NULL, head);
 }
 
 /* Receive a list of IPv6 packets */
-void ipv6_list_rcv(struct list_head *head, struct packet_type *pt,
-		   struct net_device *orig_dev)
+int ipv6_list_rcv(struct list_head *head, struct packet_type *pt,
+		  struct net_device *orig_dev)
 {
 	struct net_device *curr_dev = NULL;
 	struct net *curr_net = NULL;
 	struct sk_buff *skb, *next;
 	struct list_head sublist;
+	int kept = 0;
 
 	INIT_LIST_HEAD(&sublist);
 	list_for_each_entry_safe(skb, next, head, list) {
@@ -303,7 +311,7 @@ void ipv6_list_rcv(struct list_head *head, struct packet_type *pt,
 		if (curr_dev != dev || curr_net != net) {
 			/* dispatch old sublist */
 			if (!list_empty(&sublist))
-				ip6_sublist_rcv(&sublist, curr_dev, curr_net);
+				kept += ip6_sublist_rcv(&sublist, curr_dev, curr_net);
 			/* start new sublist */
 			INIT_LIST_HEAD(&sublist);
 			curr_dev = dev;
@@ -312,7 +320,8 @@ void ipv6_list_rcv(struct list_head *head, struct packet_type *pt,
 		list_add_tail(&skb->list, &sublist);
 	}
 	/* dispatch final sublist */
-	ip6_sublist_rcv(&sublist, curr_dev, curr_net);
+	kept += ip6_sublist_rcv(&sublist, curr_dev, curr_net);
+	return kept;
 }
 
 /*

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 net-next 4/4] net/core: handle GRO_NORMAL skbs as a list in napi_gro_receive_list
  2018-09-06 14:24 [PATCH v2 net-next 0/4] net: batched receive in GRO path Edward Cree
                   ` (2 preceding siblings ...)
  2018-09-06 14:26 ` [PATCH v2 net-next 3/4] net: make listified RX functions return number of good packets Edward Cree
@ 2018-09-06 14:26 ` Edward Cree
  2018-09-07  2:32 ` [PATCH v2 net-next 0/4] net: batched receive in GRO path Eric Dumazet
  4 siblings, 0 replies; 11+ messages in thread
From: Edward Cree @ 2018-09-06 14:26 UTC (permalink / raw)
  To: davem; +Cc: linux-net-drivers, netdev

Allows GRO-using drivers to get the benefits of batching for non-GROable
 traffic.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 net/core/dev.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 69e2819994e4..9a937d2ac83b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5617,6 +5617,7 @@ EXPORT_SYMBOL(napi_gro_receive);
 int napi_gro_receive_list(struct napi_struct *napi, struct list_head *head)
 {
 	struct sk_buff *skb, *next;
+	struct list_head sublist;
 	gro_result_t result;
 	int kept = 0;
 
@@ -5626,14 +5627,26 @@ int napi_gro_receive_list(struct napi_struct *napi, struct list_head *head)
 		skb_gro_reset_offset(skb);
 	}
 
+	INIT_LIST_HEAD(&sublist);
 	list_for_each_entry_safe(skb, next, head, list) {
 		list_del(&skb->list);
 		skb->next = NULL;
 		result = dev_gro_receive(napi, skb);
-		result = napi_skb_finish(result, skb);
-		if (result != GRO_DROP)
-			kept++;
+		if (result == GRO_NORMAL) {
+			list_add_tail(&skb->list, &sublist);
+			continue;
+		} else {
+			if (!list_empty(&sublist)) {
+				/* Handle the GRO_NORMAL skbs to prevent OoO */
+				kept += netif_receive_skb_list_internal(&sublist);
+				INIT_LIST_HEAD(&sublist);
+			}
+			result = napi_skb_finish(result, skb);
+			if (result != GRO_DROP)
+				kept++;
+		}
 	}
+	kept += netif_receive_skb_list_internal(&sublist);
 	return kept;
 }
 EXPORT_SYMBOL(napi_gro_receive_list);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 net-next 3/4] net: make listified RX functions return number of good packets
  2018-09-06 14:26 ` [PATCH v2 net-next 3/4] net: make listified RX functions return number of good packets Edward Cree
@ 2018-09-07  2:27   ` Eric Dumazet
  2018-09-07 10:44     ` Edward Cree
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2018-09-07  2:27 UTC (permalink / raw)
  To: Edward Cree, davem; +Cc: linux-net-drivers, netdev



On 09/06/2018 07:26 AM, Edward Cree wrote:
> Signed-off-by: Edward Cree <ecree@solarflare.com>


Lack of changelog here ?

I do not know what is a good packet.

You are adding a lot of conditional expressions, that cpu
will mispredict quite often.

Typical micro benchmarks wont really notice.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 net-next 0/4] net: batched receive in GRO path
  2018-09-06 14:24 [PATCH v2 net-next 0/4] net: batched receive in GRO path Edward Cree
                   ` (3 preceding siblings ...)
  2018-09-06 14:26 ` [PATCH v2 net-next 4/4] net/core: handle GRO_NORMAL skbs as a list in napi_gro_receive_list Edward Cree
@ 2018-09-07  2:32 ` Eric Dumazet
  2018-09-07 10:51   ` Edward Cree
  2018-09-11 18:34   ` Edward Cree
  4 siblings, 2 replies; 11+ messages in thread
From: Eric Dumazet @ 2018-09-07  2:32 UTC (permalink / raw)
  To: Edward Cree, davem; +Cc: linux-net-drivers, netdev



On 09/06/2018 07:24 AM, Edward Cree wrote:
> This series listifies part of GRO processing, in a manner which allows those
>  packets which are not GROed (i.e. for which dev_gro_receive returns
>  GRO_NORMAL) to be passed on to the listified regular receive path.
> I have not listified dev_gro_receive() itself, or the per-protocol GRO
>  callback, since GRO's need to hold packets on lists under napi->gro_hash
>  makes keeping the packets on other lists awkward, and since the GRO control
>  block state of held skbs can refer only to one 'new' skb at a time.
>  Nonetheless the batching of the calling code yields some performance gains
>  in the GRO case as well.
> 
> Herewith the performance figures obtained in a NetPerf TCP stream test (with
>  four streams, and irqs bound to a single core):
> net-next: 7.166 Gbit/s (sigma 0.435)
> after #2: 7.715 Gbit/s (sigma 0.145) = datum + 7.7%
> after #4: 7.890 Gbit/s (sigma 0.217) = datum + 10.1%
> (Note that the 'net-next' results were distinctly bimodal, with two results
>  of about 8 Gbit/s and the remaining ten around 7 Gbit/s.  I don't have a
>  good explanation for this.)
> And with GRO disabled through ethtool -K (thus simulating traffic which is
>  not GRO-able but, being TCP, is still passed to the GRO entry point):
> net-next: 4.756 Gbit/s (sigma 0.240)
> after #4: 5.355 Gbit/s (sigma 0.232) = datum + 12.6%
> 
> v2: Rebased on latest net-next.  Removed RFC tags.  Otherwise unchanged
>  owing to lack of comments on v1.
>

Your performance numbers are not convincing, since TCP stream test should
get nominal GRO gains.

Adding this complexity and icache pressure needs more experimental results.

What about RPC workloads  (eg 100 concurrent netperf -t TCP_RR -- -r 8000,8000 )

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 net-next 3/4] net: make listified RX functions return number of good packets
  2018-09-07  2:27   ` Eric Dumazet
@ 2018-09-07 10:44     ` Edward Cree
  2018-09-07 11:43       ` Eric Dumazet
  0 siblings, 1 reply; 11+ messages in thread
From: Edward Cree @ 2018-09-07 10:44 UTC (permalink / raw)
  To: Eric Dumazet, davem; +Cc: linux-net-drivers, netdev

On 07/09/18 03:27, Eric Dumazet wrote:
> On 09/06/2018 07:26 AM, Edward Cree wrote:
>> Signed-off-by: Edward Cree <ecree@solarflare.com>
> Lack of changelog here ?
>
> I do not know what is a good packet.
The comment on netif_receive_skb_list() defines this as "skbs for which
 netif_receive_skb() would have returned %NET_RX_SUCCESS".  But I shall put
 that into the changelog as well.

> You are adding a lot of conditional expressions, that cpu
> will mispredict quite often.
I don't see an alternative, since this is needed by patch #4 and I daresay
 there are other drivers that will also want to get NET_RX_SUCCESS-like
 information (possibly from regular receives as well as GRO; I'm not quite
 sure why sfc only cares about the latter).

> Typical micro benchmarks wont really notice.

Any suggestions on how to construct a test that will?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 net-next 0/4] net: batched receive in GRO path
  2018-09-07  2:32 ` [PATCH v2 net-next 0/4] net: batched receive in GRO path Eric Dumazet
@ 2018-09-07 10:51   ` Edward Cree
  2018-09-11 18:34   ` Edward Cree
  1 sibling, 0 replies; 11+ messages in thread
From: Edward Cree @ 2018-09-07 10:51 UTC (permalink / raw)
  To: Eric Dumazet, davem; +Cc: linux-net-drivers, netdev

On 07/09/18 03:32, Eric Dumazet wrote:
> Your performance numbers are not convincing, since TCP stream test should
> get nominal GRO gains.
I'm not quite sure what you mean here, could you explain a bit more?

> Adding this complexity and icache pressure needs more experimental results.
>
> What about RPC workloads  (eg 100 concurrent netperf -t TCP_RR -- -r 8000,8000 )
I'll try that.  Any other tests that would be worthwhile?

-Ed

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 net-next 3/4] net: make listified RX functions return number of good packets
  2018-09-07 10:44     ` Edward Cree
@ 2018-09-07 11:43       ` Eric Dumazet
  0 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2018-09-07 11:43 UTC (permalink / raw)
  To: Edward Cree, Eric Dumazet, davem; +Cc: linux-net-drivers, netdev



On 09/07/2018 03:44 AM, Edward Cree wrote:

> 
> Any suggestions on how to construct a test that will?
> 

Say 50 concurrent netperf -t TCP_RR -- -r 8000,8000

This way you have a mix of GRO-candidates, and non GRO ones (pure acks)

GRO sizes would be reasonable (not full size GRO packets).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 net-next 0/4] net: batched receive in GRO path
  2018-09-07  2:32 ` [PATCH v2 net-next 0/4] net: batched receive in GRO path Eric Dumazet
  2018-09-07 10:51   ` Edward Cree
@ 2018-09-11 18:34   ` Edward Cree
  1 sibling, 0 replies; 11+ messages in thread
From: Edward Cree @ 2018-09-11 18:34 UTC (permalink / raw)
  To: Eric Dumazet, davem; +Cc: linux-net-drivers, netdev

On 07/09/18 03:32, Eric Dumazet wrote:
> Adding this complexity and icache pressure needs more experimental results.
> What about RPC workloads  (eg 100 concurrent netperf -t TCP_RR -- -r 8000,8000 )
>
> Thanks.
Some more results.  Note that the TCP_STREAM figures given in the cover
 letter were '-m 1450'; when I run that with '-m 8000' I hit line rate on
 my 10G NIC on both the old and new code.  Also, these tests are still all
 with IRQs bound to a single core on the RX side.
A further note: the Code Under Test is running on the netserver side (RX
 side for TCP_STREAM tests); the netperf side is running stock RHEL7u3
 (kernel 3.10.0-514.el7.x86_64).  This potentially matters more for the
 TCP_RR test as both sides have to receive data.

TCP_STREAM, 8000 bytes, GRO enabled (4 streams)
old: 9.415 Gbit/s
new: 9.417 Gbit/s
(Welch p = 0.087, n₁ = n₂ = 3)
There was however a noticeable reduction in *TX* CPU usage, of about 15%.
 I don't know why that should be (changes in ack timing, perhaps?)

TCP_STREAM, 8000 bytes, GRO disabled (4 streams)
old: 5.200 Gbit/s
new: 5.839 Gbit/s (12.3% faster)
(Welch p < 0.001, n₁ = n₂ = 6)

TCP_RR, 8000 bytes, GRO enabled (100 streams)
(FoM is one-way latency, 0.5 / tps)
old: 855.833 us
new: 862.033 us (0.7% slower)
(Welch p = 0.040, n₁ = n₂ = 6)

TCP_RR, 8000 bytes, GRO disabled (100 streams)
old: 962.733 us
new: 871.417 us (9.5% faster)
(Welch p < 0.001, n₁ = n₂ = 6)

Conclusion: with GRO on we pay a small but real RR penalty.  With GRO off
 (thus also with traffic that can't be coalesced) we get a noticeable
 speed boost from being able to use netif_receive_skb_list_internal().

-Ed

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-09-11 23:35 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-06 14:24 [PATCH v2 net-next 0/4] net: batched receive in GRO path Edward Cree
2018-09-06 14:26 ` [PATCH v2 net-next 1/4] net: introduce list entry point for GRO Edward Cree
2018-09-06 14:26 ` [PATCH v2 net-next 2/4] sfc: use batched receive " Edward Cree
2018-09-06 14:26 ` [PATCH v2 net-next 3/4] net: make listified RX functions return number of good packets Edward Cree
2018-09-07  2:27   ` Eric Dumazet
2018-09-07 10:44     ` Edward Cree
2018-09-07 11:43       ` Eric Dumazet
2018-09-06 14:26 ` [PATCH v2 net-next 4/4] net/core: handle GRO_NORMAL skbs as a list in napi_gro_receive_list Edward Cree
2018-09-07  2:32 ` [PATCH v2 net-next 0/4] net: batched receive in GRO path Eric Dumazet
2018-09-07 10:51   ` Edward Cree
2018-09-11 18:34   ` Edward Cree

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.