[RFC PATCH net-next 0/8] Handle multiple received packets at each stage

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH net-next 0/8] Handle multiple received packets at each stage
@ 2016-04-19 13:33 Edward Cree
  2016-04-19 13:34 ` [RFC PATCH net-next 1/8] net: core: trivial netif_receive_skb_list() entry point Edward Cree
                   ` (8 more replies)
  0 siblings, 9 replies; 24+ messages in thread
From: Edward Cree @ 2016-04-19 13:33 UTC (permalink / raw)
  To: netdev, David Miller; +Cc: Jesper Dangaard Brouer, linux-net-drivers

Earlier discussions on this list[1] suggested that having multiple packets
traverse the network stack together (rather than calling the stack for each
packet singly) could improve performance through better cache locality.
This patch series is an attempt to implement this by having drivers pass an
SKB list to the stack at the end of the NAPI poll.  The stack then attempts
to keep the list together, only splitting it when either packets need to be
treated differently, or the next layer of the stack is not list-aware.

The first two patches simply place received packets on a list during the
event processing loop on the sfc EF10 architecture, then call the normal
stack for each packet singly at the end of the NAPI poll.
The remaining patches extend the 'listified' processing as far as the IP
receive handler.

Packet rate was tested with NetPerf UDP_STREAM, with 10 streams of 1-byte
packets, and the process and interrupt pinned to a single core on the RX
side.
The NIC was a 40G Solarflare 7x42Q; the CPU was a Xeon E3-1220V2 @ 3.10GHz.
Baseline:      5.07Mpps
after patch 2: 5.59Mpps (10.2% above baseline)
after patch 8: 6.44Mpps (25.6% above baseline)

I also attempted to measure the latency, but couldn't get reliable numbers;
my best estimate is that the series cost about 160ns if interrupt moderation
is disabled and busy-poll is enabled; about 60ns vice-versa.
I tried adding a check in the driver to only perform bundling if interrupt
moderation was active on the channel, but was unable to demonstrate any
latency gain from this, so I have omitted it from this series.

[1] http://thread.gmane.org/gmane.linux.network/395502

Edward Cree (8):
  net: core: trivial netif_receive_skb_list() entry point
  sfc: batch up RX delivery on EF10
  net: core: unwrap skb list receive slightly further
  net: core: Another step of skb receive list processing
  net: core: another layer of lists, around PF_MEMALLOC skb handling
  net: core: propagate SKB lists through packet_type lookup
  net: ipv4: listified version of ip_rcv
  net: ipv4: listify ip_rcv_finish

 drivers/net/ethernet/sfc/ef10.c       |   9 ++
 drivers/net/ethernet/sfc/efx.c        |   2 +
 drivers/net/ethernet/sfc/net_driver.h |   3 +
 drivers/net/ethernet/sfc/rx.c         |   7 +-
 include/linux/netdevice.h             |   4 +
 include/linux/netfilter.h             |  27 ++++
 include/linux/skbuff.h                |  16 +++
 include/net/ip.h                      |   2 +
 include/trace/events/net.h            |  14 ++
 net/core/dev.c                        | 245 ++++++++++++++++++++++++++++------
 net/ipv4/af_inet.c                    |   1 +
 net/ipv4/ip_input.c                   | 127 ++++++++++++++++--
 12 files changed, 409 insertions(+), 48 deletions(-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 1/8] net: core: trivial netif_receive_skb_list() entry point
  2016-04-19 13:33 [RFC PATCH net-next 0/8] Handle multiple received packets at each stage Edward Cree
@ 2016-04-19 13:34 ` Edward Cree
  2016-04-19 13:35 ` [RFC PATCH net-next 2/8] sfc: batch up RX delivery on EF10 Edward Cree
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Edward Cree @ 2016-04-19 13:34 UTC (permalink / raw)
  To: netdev, David Miller; +Cc: Jesper Dangaard Brouer, linux-net-drivers

Just calls netif_receive_skb() in a loop.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 include/linux/netdevice.h |  1 +
 net/core/dev.c            | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a3bb534..682d0ad 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3218,6 +3218,7 @@ static inline void dev_consume_skb_any(struct sk_buff *skb)
 int netif_rx(struct sk_buff *skb);
 int netif_rx_ni(struct sk_buff *skb);
 int netif_receive_skb(struct sk_buff *skb);
+void netif_receive_skb_list(struct sk_buff_head *list);
 gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb);
 void napi_gro_flush(struct napi_struct *napi, bool flush_old);
 struct sk_buff *napi_get_frags(struct napi_struct *napi);
diff --git a/net/core/dev.c b/net/core/dev.c
index 52d446b..7abcb1d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4292,6 +4292,26 @@ int netif_receive_skb(struct sk_buff *skb)
 }
 EXPORT_SYMBOL(netif_receive_skb);
 
+/**
+ *	netif_receive_skb_list - process many receive buffers from network
+ *	@list: list of skbs to process.  Must not be shareable (e.g. it may
+ *	be on the stack)
+ *
+ *	For now, just calls netif_receive_skb() in a loop, ignoring the
+ *	return value.
+ *
+ *	This function may only be called from softirq context and interrupts
+ *	should be enabled.
+ */
+void netif_receive_skb_list(struct sk_buff_head *list)
+{
+	struct sk_buff *skb;
+
+	while ((skb = __skb_dequeue(list)) != NULL)
+		netif_receive_skb(skb);
+}
+EXPORT_SYMBOL(netif_receive_skb_list);
+
 /* Network device is going away, flush any packets still pending
  * Called with irqs disabled.
  */

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 2/8] sfc: batch up RX delivery on EF10
  2016-04-19 13:33 [RFC PATCH net-next 0/8] Handle multiple received packets at each stage Edward Cree
  2016-04-19 13:34 ` [RFC PATCH net-next 1/8] net: core: trivial netif_receive_skb_list() entry point Edward Cree
@ 2016-04-19 13:35 ` Edward Cree
  2016-04-19 14:47   ` Eric Dumazet
  2016-04-19 13:35 ` [RFC PATCH net-next 3/8] net: core: unwrap skb list receive slightly further Edward Cree
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Edward Cree @ 2016-04-19 13:35 UTC (permalink / raw)
  To: netdev, David Miller; +Cc: Jesper Dangaard Brouer, linux-net-drivers

Improves packet rate of 1-byte UDP receives by 10%.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/ef10.c       | 9 +++++++++
 drivers/net/ethernet/sfc/efx.c        | 2 ++
 drivers/net/ethernet/sfc/net_driver.h | 3 +++
 drivers/net/ethernet/sfc/rx.c         | 7 ++++++-
 4 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 98d33d4..e348f8f 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -2656,6 +2656,7 @@ static int efx_ef10_ev_process(struct efx_channel *channel, int quota)
 {
 	struct efx_nic *efx = channel->efx;
 	efx_qword_t event, *p_event;
+	struct sk_buff_head rx_list;
 	unsigned int read_ptr;
 	int ev_code;
 	int tx_descs = 0;
@@ -2664,6 +2665,11 @@ static int efx_ef10_ev_process(struct efx_channel *channel, int quota)
 	if (quota <= 0)
 		return spent;
 
+	/* Prepare the batch receive list */
+	EFX_BUG_ON_PARANOID(channel->rx_list != NULL);
+	channel->rx_list = &rx_list;
+	__skb_queue_head_init(channel->rx_list);
+
 	read_ptr = channel->eventq_read_ptr;
 
 	for (;;) {
@@ -2724,6 +2730,9 @@ static int efx_ef10_ev_process(struct efx_channel *channel, int quota)
 	}
 
 out:
+	/* Receive any packets we queued up */
+	netif_receive_skb_list(channel->rx_list);
+	channel->rx_list = NULL;
 	channel->eventq_read_ptr = read_ptr;
 	return spent;
 }
diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 0705ec86..e004c0b 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -527,6 +527,8 @@ static int efx_probe_channel(struct efx_channel *channel)
 			goto fail;
 	}
 
+	channel->rx_list = NULL;
+
 	return 0;
 
 fail:
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 38c4223..d969c85 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -418,6 +418,7 @@ enum efx_sync_events_state {
  *	__efx_rx_packet(), or zero if there is none
  * @rx_pkt_index: Ring index of first buffer for next packet to be delivered
  *	by __efx_rx_packet(), if @rx_pkt_n_frags != 0
+ * @rx_list: list of SKBs from current RX, awaiting processing
  * @rx_queue: RX queue for this channel
  * @tx_queue: TX queues for this channel
  * @sync_events_state: Current state of sync events on this channel
@@ -462,6 +463,8 @@ struct efx_channel {
 	unsigned int rx_pkt_n_frags;
 	unsigned int rx_pkt_index;
 
+	struct sk_buff_head *rx_list;
+
 	struct efx_rx_queue rx_queue;
 	struct efx_tx_queue tx_queue[EFX_TXQ_TYPES];
 
diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 8956995..025e387 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -642,7 +642,12 @@ static void efx_rx_deliver(struct efx_channel *channel, u8 *eh,
 			return;
 
 	/* Pass the packet up */
-	netif_receive_skb(skb);
+	if (channel->rx_list != NULL)
+		/* Add to list, will pass up later */
+		__skb_queue_tail(channel->rx_list, skb);
+	else
+		/* No list, so pass it up now */
+		netif_receive_skb(skb);
 }
 
 /* Handle a received packet.  Second half: Touches packet payload. */

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 3/8] net: core: unwrap skb list receive slightly further
  2016-04-19 13:33 [RFC PATCH net-next 0/8] Handle multiple received packets at each stage Edward Cree
  2016-04-19 13:34 ` [RFC PATCH net-next 1/8] net: core: trivial netif_receive_skb_list() entry point Edward Cree
  2016-04-19 13:35 ` [RFC PATCH net-next 2/8] sfc: batch up RX delivery on EF10 Edward Cree
@ 2016-04-19 13:35 ` Edward Cree
  2016-04-19 13:35 ` [RFC PATCH net-next 4/8] net: core: Another step of skb receive list processing Edward Cree
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Edward Cree @ 2016-04-19 13:35 UTC (permalink / raw)
  To: netdev, David Miller; +Cc: Jesper Dangaard Brouer, linux-net-drivers

Adds iterator skb_queue_for_each() to run over a list without modifying it.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 include/linux/skbuff.h     | 16 ++++++++++++++++
 include/trace/events/net.h |  7 +++++++
 net/core/dev.c             |  4 +++-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index da0ace3..2851c38 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1501,6 +1501,22 @@ static inline struct sk_buff *skb_peek_tail(const struct sk_buff_head *list_)
 }
 
 /**
+ *	skb_queue_for_each - iterate over an skb queue
+ *	@pos:        the &struct sk_buff to use as a loop cursor.
+ *	@head:       the &struct sk_buff_head for your list.
+ *
+ *	The reference count is not incremented and the reference is therefore
+ *	volatile; the list lock is not taken either. Use with caution.
+ *
+ *	The list must not be modified (though the individual skbs can be)
+ *	within the loop body.
+ *
+ *	After loop completion, @pos will be %NULL.
+ */
+#define skb_queue_for_each(pos, head) \
+	for (pos = skb_peek(head); pos != NULL; pos = skb_peek_next(pos, head))
+
+/**
  *	skb_queue_len	- get queue length
  *	@list_: list to measure
  *
diff --git a/include/trace/events/net.h b/include/trace/events/net.h
index 49cc7c3..30f359c 100644
--- a/include/trace/events/net.h
+++ b/include/trace/events/net.h
@@ -222,6 +222,13 @@ DEFINE_EVENT(net_dev_rx_verbose_template, netif_receive_skb_entry,
 	TP_ARGS(skb)
 );
 
+DEFINE_EVENT(net_dev_rx_verbose_template, netif_receive_skb_list_entry,
+
+	TP_PROTO(const struct sk_buff *skb),
+
+	TP_ARGS(skb)
+);
+
 DEFINE_EVENT(net_dev_rx_verbose_template, netif_rx_entry,
 
 	TP_PROTO(const struct sk_buff *skb),
diff --git a/net/core/dev.c b/net/core/dev.c
index 7abcb1d..4bb6724 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4307,8 +4307,10 @@ void netif_receive_skb_list(struct sk_buff_head *list)
 {
 	struct sk_buff *skb;
 
+	skb_queue_for_each(skb, list)
+		trace_netif_receive_skb_list_entry(skb);
 	while ((skb = __skb_dequeue(list)) != NULL)
-		netif_receive_skb(skb);
+		netif_receive_skb_internal(skb);
 }
 EXPORT_SYMBOL(netif_receive_skb_list);
 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 4/8] net: core: Another step of skb receive list processing
  2016-04-19 13:33 [RFC PATCH net-next 0/8] Handle multiple received packets at each stage Edward Cree
                   ` (2 preceding siblings ...)
  2016-04-19 13:35 ` [RFC PATCH net-next 3/8] net: core: unwrap skb list receive slightly further Edward Cree
@ 2016-04-19 13:35 ` Edward Cree
  2016-04-19 13:36 ` [RFC PATCH net-next 5/8] net: core: another layer of lists, around PF_MEMALLOC skb handling Edward Cree
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Edward Cree @ 2016-04-19 13:35 UTC (permalink / raw)
  To: netdev, David Miller; +Cc: Jesper Dangaard Brouer, linux-net-drivers

netif_receive_skb_list_internal() now processes a list and hands it
on to the next function.

The code duplication is unfortunate, but the common part between the list
and non-list versions of the function takes a lock (rcu_read_lock()), so
factoring it out would be a little ugly.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 net/core/dev.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 46 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 4bb6724..586807d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4241,6 +4241,14 @@ static int __netif_receive_skb(struct sk_buff *skb)
 	return ret;
 }
 
+static void __netif_receive_skb_list(struct sk_buff_head *list)
+{
+	struct sk_buff *skb;
+
+	while ((skb = __skb_dequeue(list)) != NULL)
+		__netif_receive_skb(skb);
+}
+
 static int netif_receive_skb_internal(struct sk_buff *skb)
 {
 	int ret;
@@ -4269,6 +4277,41 @@ static int netif_receive_skb_internal(struct sk_buff *skb)
 	return ret;
 }
 
+static void netif_receive_skb_list_internal(struct sk_buff_head *list)
+{
+	struct sk_buff_head sublist;
+	struct sk_buff *skb;
+
+	__skb_queue_head_init(&sublist);
+
+	rcu_read_lock();
+	while ((skb = __skb_dequeue(list)) != NULL) {
+		net_timestamp_check(netdev_tstamp_prequeue, skb);
+		if (skb_defer_rx_timestamp(skb)) {
+			/* Handled, don't add to sublist */
+			continue;
+		}
+
+#ifdef CONFIG_RPS
+		if (static_key_false(&rps_needed)) {
+			struct rps_dev_flow voidflow, *rflow = &voidflow;
+			int cpu = get_rps_cpu(skb->dev, skb, &rflow);
+
+			if (cpu >= 0) {
+				enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
+				/* Handled, don't add to sublist */
+				continue;
+			}
+		}
+#endif
+		__skb_queue_tail(&sublist, skb);
+	}
+
+	__netif_receive_skb_list(&sublist);
+	rcu_read_unlock();
+	return;
+}
+
 /**
  *	netif_receive_skb - process receive buffer from network
  *	@skb: buffer to process
@@ -4297,8 +4340,8 @@ EXPORT_SYMBOL(netif_receive_skb);
  *	@list: list of skbs to process.  Must not be shareable (e.g. it may
  *	be on the stack)
  *
- *	For now, just calls netif_receive_skb() in a loop, ignoring the
- *	return value.
+ *	Since return value of netif_receive_skb() is normally ignored, and
+ *	wouldn't be meaningful for a list, this function returns void.
  *
  *	This function may only be called from softirq context and interrupts
  *	should be enabled.
@@ -4309,8 +4352,7 @@ void netif_receive_skb_list(struct sk_buff_head *list)
 
 	skb_queue_for_each(skb, list)
 		trace_netif_receive_skb_list_entry(skb);
-	while ((skb = __skb_dequeue(list)) != NULL)
-		netif_receive_skb_internal(skb);
+	netif_receive_skb_list_internal(list);
 }
 EXPORT_SYMBOL(netif_receive_skb_list);
 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 5/8] net: core: another layer of lists, around PF_MEMALLOC skb handling
  2016-04-19 13:33 [RFC PATCH net-next 0/8] Handle multiple received packets at each stage Edward Cree
                   ` (3 preceding siblings ...)
  2016-04-19 13:35 ` [RFC PATCH net-next 4/8] net: core: Another step of skb receive list processing Edward Cree
@ 2016-04-19 13:36 ` Edward Cree
  2016-04-19 13:36 ` [RFC PATCH net-next 6/8] net: core: propagate SKB lists through packet_type lookup Edward Cree
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Edward Cree @ 2016-04-19 13:36 UTC (permalink / raw)
  To: netdev, David Miller; +Cc: Jesper Dangaard Brouer, linux-net-drivers

First example of a layer splitting the list (rather than merely taking
individual packets off it).

Again, trying to factor the common parts wouldn't make this any nicer.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 net/core/dev.c | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 586807d..0f914bf 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4216,6 +4216,14 @@ out:
 	return ret;
 }
 
+static void __netif_receive_skb_list_core(struct sk_buff_head *list, bool pfmemalloc)
+{
+	struct sk_buff *skb;
+
+	while ((skb = __skb_dequeue(list)) != NULL)
+		__netif_receive_skb_core(skb, pfmemalloc);
+}
+
 static int __netif_receive_skb(struct sk_buff *skb)
 {
 	int ret;
@@ -4243,10 +4251,34 @@ static int __netif_receive_skb(struct sk_buff *skb)
 
 static void __netif_receive_skb_list(struct sk_buff_head *list)
 {
+	struct sk_buff_head sublist;
+	bool pfmemalloc = false; /* Is current sublist PF_MEMALLOC? */
+	unsigned long pflags;
 	struct sk_buff *skb;
 
-	while ((skb = __skb_dequeue(list)) != NULL)
-		__netif_receive_skb(skb);
+	__skb_queue_head_init(&sublist);
+
+	while ((skb = __skb_dequeue(list)) != NULL) {
+		if ((sk_memalloc_socks() && skb_pfmemalloc(skb)) != pfmemalloc) {
+			/* Handle the previous sublist */
+			__netif_receive_skb_list_core(&sublist, pfmemalloc);
+			pfmemalloc = !pfmemalloc;
+			/* See comments in __netif_receive_skb */
+			if (pfmemalloc) {
+				pflags = current->flags;
+				current->flags |= PF_MEMALLOC;
+			} else {
+				tsk_restore_flags(current, pflags, PF_MEMALLOC);
+			}
+			__skb_queue_head_init(&sublist);
+		}
+		__skb_queue_tail(&sublist, skb);
+	}
+	/* Handle the last sublist */
+	__netif_receive_skb_list_core(&sublist, pfmemalloc);
+	/* Restore pflags */
+	if (pfmemalloc)
+		tsk_restore_flags(current, pflags, PF_MEMALLOC);
 }
 
 static int netif_receive_skb_internal(struct sk_buff *skb)

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 6/8] net: core: propagate SKB lists through packet_type lookup
  2016-04-19 13:33 [RFC PATCH net-next 0/8] Handle multiple received packets at each stage Edward Cree
                   ` (4 preceding siblings ...)
  2016-04-19 13:36 ` [RFC PATCH net-next 5/8] net: core: another layer of lists, around PF_MEMALLOC skb handling Edward Cree
@ 2016-04-19 13:36 ` Edward Cree
  2016-04-19 13:37 ` [RFC PATCH net-next 7/8] net: ipv4: listified version of ip_rcv Edward Cree
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Edward Cree @ 2016-04-19 13:36 UTC (permalink / raw)
  To: netdev, David Miller; +Cc: Jesper Dangaard Brouer, linux-net-drivers

This could maybe be made more efficient if we first split the list based on
 skb->protocol, and then did ptype lookup for each sublist.  Unfortunately,
 there are things liks sch_handle_ingress and the rx_handlers that can
 produce different results per packet.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 include/trace/events/net.h |   7 +++
 net/core/dev.c             | 146 ++++++++++++++++++++++++++++++++-------------
 2 files changed, 113 insertions(+), 40 deletions(-)

diff --git a/include/trace/events/net.h b/include/trace/events/net.h
index 30f359c..7a17a31 100644
--- a/include/trace/events/net.h
+++ b/include/trace/events/net.h
@@ -130,6 +130,13 @@ DEFINE_EVENT(net_dev_template, netif_receive_skb,
 	TP_ARGS(skb)
 );
 
+DEFINE_EVENT(net_dev_template, netif_receive_skb_list,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb)
+);
+
 DEFINE_EVENT(net_dev_template, netif_rx,
 
 	TP_PROTO(struct sk_buff *skb),
diff --git a/net/core/dev.c b/net/core/dev.c
index 0f914bf..db1d16a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4061,12 +4061,13 @@ static inline int nf_ingress(struct sk_buff *skb, struct packet_type **pt_prev,
 	return 0;
 }
 
-static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
+static int __netif_receive_skb_taps(struct sk_buff *skb, bool pfmemalloc,
+				    struct packet_type **pt_prev)
 {
-	struct packet_type *ptype, *pt_prev;
 	rx_handler_func_t *rx_handler;
 	struct net_device *orig_dev;
 	bool deliver_exact = false;
+	struct packet_type *ptype;
 	int ret = NET_RX_DROP;
 	__be16 type;
 
@@ -4081,7 +4082,7 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
 		skb_reset_transport_header(skb);
 	skb_reset_mac_len(skb);
 
-	pt_prev = NULL;
+	*pt_prev = NULL;
 
 another_round:
 	skb->skb_iif = skb->dev->ifindex;
@@ -4106,25 +4107,25 @@ another_round:
 		goto skip_taps;
 
 	list_for_each_entry_rcu(ptype, &ptype_all, list) {
-		if (pt_prev)
-			ret = deliver_skb(skb, pt_prev, orig_dev);
-		pt_prev = ptype;
+		if (*pt_prev)
+			ret = deliver_skb(skb, *pt_prev, orig_dev);
+		*pt_prev = ptype;
 	}
 
 	list_for_each_entry_rcu(ptype, &skb->dev->ptype_all, list) {
-		if (pt_prev)
-			ret = deliver_skb(skb, pt_prev, orig_dev);
-		pt_prev = ptype;
+		if (*pt_prev)
+			ret = deliver_skb(skb, *pt_prev, orig_dev);
+		*pt_prev = ptype;
 	}
 
 skip_taps:
 #ifdef CONFIG_NET_INGRESS
 	if (static_key_false(&ingress_needed)) {
-		skb = sch_handle_ingress(skb, &pt_prev, &ret, orig_dev);
+		skb = sch_handle_ingress(skb, pt_prev, &ret, orig_dev);
 		if (!skb)
 			goto out;
 
-		if (nf_ingress(skb, &pt_prev, &ret, orig_dev) < 0)
+		if (nf_ingress(skb, pt_prev, &ret, orig_dev) < 0)
 			goto out;
 	}
 #endif
@@ -4136,9 +4137,9 @@ ncls:
 		goto drop;
 
 	if (skb_vlan_tag_present(skb)) {
-		if (pt_prev) {
-			ret = deliver_skb(skb, pt_prev, orig_dev);
-			pt_prev = NULL;
+		if (*pt_prev) {
+			ret = deliver_skb(skb, *pt_prev, orig_dev);
+			*pt_prev = NULL;
 		}
 		if (vlan_do_receive(&skb))
 			goto another_round;
@@ -4148,9 +4149,9 @@ ncls:
 
 	rx_handler = rcu_dereference(skb->dev->rx_handler);
 	if (rx_handler) {
-		if (pt_prev) {
-			ret = deliver_skb(skb, pt_prev, orig_dev);
-			pt_prev = NULL;
+		if (*pt_prev) {
+			ret = deliver_skb(skb, *pt_prev, orig_dev);
+			*pt_prev = NULL;
 		}
 		switch (rx_handler(&skb)) {
 		case RX_HANDLER_CONSUMED:
@@ -4181,47 +4182,112 @@ ncls:
 
 	/* deliver only exact match when indicated */
 	if (likely(!deliver_exact)) {
-		deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type,
+		deliver_ptype_list_skb(skb, pt_prev, orig_dev, type,
 				       &ptype_base[ntohs(type) &
 						   PTYPE_HASH_MASK]);
 	}
 
-	deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type,
+	deliver_ptype_list_skb(skb, pt_prev, orig_dev, type,
 			       &orig_dev->ptype_specific);
 
 	if (unlikely(skb->dev != orig_dev)) {
-		deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type,
+		deliver_ptype_list_skb(skb, pt_prev, orig_dev, type,
 				       &skb->dev->ptype_specific);
 	}
-
-	if (pt_prev) {
-		if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
-			goto drop;
-		else
-			ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
-	} else {
+	if (*pt_prev && unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
+		goto drop;
+	return ret;
 drop:
-		if (!deliver_exact)
-			atomic_long_inc(&skb->dev->rx_dropped);
-		else
-			atomic_long_inc(&skb->dev->rx_nohandler);
-		kfree_skb(skb);
-		/* Jamal, now you will not able to escape explaining
-		 * me how you were going to use this. :-)
-		 */
-		ret = NET_RX_DROP;
-	}
-
+	if (!deliver_exact)
+		atomic_long_inc(&skb->dev->rx_dropped);
+	else
+		atomic_long_inc(&skb->dev->rx_nohandler);
+	kfree_skb(skb);
+	/* Jamal, now you will not able to escape explaining
+	 * me how you were going to use this. :-)
+	 */
+	ret = NET_RX_DROP;
 out:
+	*pt_prev = NULL;
 	return ret;
 }
 
-static void __netif_receive_skb_list_core(struct sk_buff_head *list, bool pfmemalloc)
+static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
+{
+	struct net_device *orig_dev = skb->dev;
+	struct packet_type *pt_prev;
+	int ret;
+
+	ret = __netif_receive_skb_taps(skb, pfmemalloc, &pt_prev);
+	if (pt_prev)
+		ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
+	return ret;
+}
+
+static inline void __netif_receive_skb_list_ptype(struct sk_buff_head *list,
+						  struct packet_type *pt_prev,
+						  struct net_device *orig_dev)
 {
 	struct sk_buff *skb;
 
 	while ((skb = __skb_dequeue(list)) != NULL)
-		__netif_receive_skb_core(skb, pfmemalloc);
+		pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
+}
+
+static void __netif_receive_skb_list_core(struct sk_buff_head *list, bool pfmemalloc)
+{
+	/* Fast-path assumptions:
+	 * - There is no RX handler.
+	 * - Only one packet_type matches.
+	 * If either of these fails, we will end up doing some per-packet
+	 * processing in-line, then handling the 'last ptype' for the whole
+	 * sublist.  This can't cause out-of-order delivery to any single ptype,
+	 * because the 'last ptype' must be constant across the sublist, and all
+	 * other ptypes are handled per-packet.  Unless, that is, a ptype can
+	 * be delivered to more than once for a single packet - but that seems
+	 * like it would be a bad idea anyway.
+	 * So it should be fine (at least, I think so), but you'll lose the
+	 * (putative) performance benefits of batching.
+	 */
+	/* Current (common) ptype of sublist */
+	struct packet_type *pt_curr = NULL;
+	/* In the normal (device RX) case, orig_dev should be the same for
+	 * every skb in the list.  But as I'm not certain of this, I check
+	 * it's constant and split the list if not.
+	 * So, od_curr is the current (common) orig_dev of sublist.
+	 */
+	struct net_device *od_curr = NULL;
+	struct sk_buff_head sublist;
+	struct sk_buff *skb;
+
+	__skb_queue_head_init(&sublist);
+
+	while ((skb = __skb_dequeue(list)) != NULL) {
+		struct packet_type *pt_prev;
+		struct net_device *orig_dev = skb->dev;
+
+		__netif_receive_skb_taps(skb, pfmemalloc, &pt_prev);
+		if (pt_prev) {
+			if (skb_queue_empty(&sublist)) {
+				pt_curr = pt_prev;
+				od_curr = orig_dev;
+			} else if (!(pt_curr == pt_prev &&
+				     od_curr == orig_dev)) {
+				/* dispatch old sublist */
+				__netif_receive_skb_list_ptype(&sublist,
+							       pt_curr,
+							       od_curr);
+				/* start new sublist */
+				__skb_queue_head_init(&sublist);
+				pt_curr = pt_prev;
+				od_curr = orig_dev;
+			}
+			__skb_queue_tail(&sublist, skb);
+		}
+	}
+
+	/* dispatch final sublist */
+	__netif_receive_skb_list_ptype(&sublist, pt_curr, od_curr);
 }
 
 static int __netif_receive_skb(struct sk_buff *skb)

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 7/8] net: ipv4: listified version of ip_rcv
  2016-04-19 13:33 [RFC PATCH net-next 0/8] Handle multiple received packets at each stage Edward Cree
                   ` (5 preceding siblings ...)
  2016-04-19 13:36 ` [RFC PATCH net-next 6/8] net: core: propagate SKB lists through packet_type lookup Edward Cree
@ 2016-04-19 13:37 ` Edward Cree
  2016-04-19 14:50   ` Eric Dumazet
  2016-04-21 17:24   ` Edward Cree
  2016-04-19 13:37 ` [RFC PATCH net-next 8/8] net: ipv4: listify ip_rcv_finish Edward Cree
  2016-04-19 19:11 ` [RFC PATCH net-next 0/8] Handle multiple received packets at each stage Jesper Dangaard Brouer
  8 siblings, 2 replies; 24+ messages in thread
From: Edward Cree @ 2016-04-19 13:37 UTC (permalink / raw)
  To: netdev, David Miller; +Cc: Jesper Dangaard Brouer, linux-net-drivers

Also involved adding a way to run a netfilter hook over a list of packets.
Rather than attempting to make netfilter know about lists (which would be
horrendous) we just let it call the regular okfn (in this case
ip_rcv_finish()) for any packets it steals, and have it give us back a list
of packets it's synchronously accepted (which normally NF_HOOK would
automatically call okfn() on, but we want to be able to potentially pass
the list to a listified version of okfn().)

There is potential for out-of-order receives if the netfilter hook ends up
synchronously stealing packets, as they will be processed before any accepts
earlier in the list.  However, it was already possible for an asynchronous
accept to cause out-of-order receives, so hopefully I haven't broken
anything that wasn't broken already.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 include/linux/netdevice.h |  3 ++
 include/linux/netfilter.h | 27 +++++++++++++++++
 include/net/ip.h          |  2 ++
 net/core/dev.c            | 11 +++++--
 net/ipv4/af_inet.c        |  1 +
 net/ipv4/ip_input.c       | 75 ++++++++++++++++++++++++++++++++++++++++++-----
 6 files changed, 110 insertions(+), 9 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 682d0ad..292f2d5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2143,6 +2143,9 @@ struct packet_type {
 					 struct net_device *,
 					 struct packet_type *,
 					 struct net_device *);
+	void			(*list_func) (struct sk_buff_head *,
+					      struct packet_type *,
+					      struct net_device *);
 	bool			(*id_match)(struct packet_type *ptype,
 					    struct sock *sk);
 	void			*af_packet_priv;
diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index 9230f9a..e18e91b 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -220,6 +220,24 @@ NF_HOOK_THRESH(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
 	return ret;
 }
 
+static inline void
+NF_HOOK_LIST_THRESH(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
+		    struct sk_buff_head *list, struct sk_buff_head *sublist,
+		    struct net_device *in, struct net_device *out,
+		    int (*okfn)(struct net *, struct sock *, struct sk_buff *),
+		    int thresh)
+{
+	struct sk_buff *skb;
+
+	__skb_queue_head_init(sublist); /* list of synchronously ACCEPTed skbs */
+	while ((skb = __skb_dequeue(list)) != NULL) {
+		int ret = nf_hook_thresh(pf, hook, net, sk, skb, in, out, okfn,
+					 thresh);
+		if (ret == 1)
+			__skb_queue_tail(sublist, skb);
+	}
+}
+
 static inline int
 NF_HOOK_COND(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
 	     struct sk_buff *skb, struct net_device *in, struct net_device *out,
@@ -242,6 +260,15 @@ NF_HOOK(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk, struct
 	return NF_HOOK_THRESH(pf, hook, net, sk, skb, in, out, okfn, INT_MIN);
 }
 
+static inline void
+NF_HOOK_LIST(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
+	     struct sk_buff_head *list, struct sk_buff_head *sublist,
+	     struct net_device *in, struct net_device *out,
+	     int (*okfn)(struct net *, struct sock *, struct sk_buff *))
+{
+	NF_HOOK_LIST_THRESH(pf, hook, net, sk, list, sublist, in, out, okfn, INT_MIN);
+}
+
 /* Call setsockopt() */
 int nf_setsockopt(struct sock *sk, u_int8_t pf, int optval, char __user *opt,
 		  unsigned int len);
diff --git a/include/net/ip.h b/include/net/ip.h
index 93725e5..c994c44 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -106,6 +106,8 @@ int ip_build_and_send_pkt(struct sk_buff *skb, const struct sock *sk,
 			  struct ip_options_rcu *opt);
 int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
 	   struct net_device *orig_dev);
+void ip_list_rcv(struct sk_buff_head *list, struct packet_type *pt,
+		 struct net_device *orig_dev);
 int ip_local_deliver(struct sk_buff *skb);
 int ip_mr_input(struct sk_buff *skb);
 int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index db1d16a..da768e2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4230,8 +4230,15 @@ static inline void __netif_receive_skb_list_ptype(struct sk_buff_head *list,
 {
 	struct sk_buff *skb;
 
-	while ((skb = __skb_dequeue(list)) != NULL)
-		pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
+	if (!pt_prev)
+		return;
+	if (skb_queue_empty(list))
+		return;
+	if (pt_prev->list_func != NULL)
+		pt_prev->list_func(list, pt_prev, orig_dev);
+	else
+		while ((skb = __skb_dequeue(list)) != NULL)
+			pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
 }
 
 static void __netif_receive_skb_list_core(struct sk_buff_head *list, bool pfmemalloc)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 2e6e65f..1424147 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1757,6 +1757,7 @@ fs_initcall(ipv4_offload_init);
 static struct packet_type ip_packet_type __read_mostly = {
 	.type = cpu_to_be16(ETH_P_IP),
 	.func = ip_rcv,
+	.list_func = ip_list_rcv,
 };
 
 static int __init inet_init(void)
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index e3d7827..e7d0d85 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -395,10 +395,9 @@ drop:
 /*
  * 	Main IP Receive routine.
  */
-int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)
+static struct sk_buff *ip_rcv_core(struct sk_buff *skb, struct net *net)
 {
 	const struct iphdr *iph;
-	struct net *net;
 	u32 len;
 
 	/* When the interface is in promisc. mode, drop all the crap
@@ -408,7 +407,6 @@ int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
 		goto drop;
 
 
-	net = dev_net(dev);
 	IP_UPD_PO_STATS_BH(net, IPSTATS_MIB_IN, skb->len);
 
 	skb = skb_share_check(skb, GFP_ATOMIC);
@@ -475,9 +473,7 @@ int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
 	/* Must drop socket now because of tproxy. */
 	skb_orphan(skb);
 
-	return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING,
-		       net, NULL, skb, dev, NULL,
-		       ip_rcv_finish);
+	return skb;
 
 csum_error:
 	IP_INC_STATS_BH(net, IPSTATS_MIB_CSUMERRORS);
@@ -486,5 +482,70 @@ inhdr_error:
 drop:
 	kfree_skb(skb);
 out:
-	return NET_RX_DROP;
+	return NULL;
+}
+
+/*
+ * IP receive entry point
+ */
+int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
+	   struct net_device *orig_dev)
+{
+	struct net *net = dev_net(dev);
+
+	skb = ip_rcv_core(skb, net);
+	if (skb == NULL)
+		return NET_RX_DROP;
+	return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING,
+		       net, NULL, skb, dev, NULL,
+		       ip_rcv_finish);
+}
+
+static void ip_sublist_rcv(struct sk_buff_head *list, struct net_device *dev,
+			   struct net *net)
+{
+	struct sk_buff_head sublist;
+	struct sk_buff *skb;
+
+	NF_HOOK_LIST(NFPROTO_IPV4, NF_INET_PRE_ROUTING, net, NULL,
+		     list, &sublist, dev, NULL, ip_rcv_finish);
+	while ((skb = __skb_dequeue(&sublist)) != NULL)
+		ip_rcv_finish(net, NULL, skb);
+}
+
+/* Receive a list of IP packets */
+void ip_list_rcv(struct sk_buff_head *list, struct packet_type *pt,
+		 struct net_device *orig_dev)
+{
+	struct net_device *curr_dev = NULL;
+	struct net *curr_net = NULL;
+	struct sk_buff_head sublist;
+	struct sk_buff *skb;
+
+	__skb_queue_head_init(&sublist);
+
+	while ((skb = __skb_dequeue(list)) != NULL) {
+		struct net_device *dev = skb->dev;
+		struct net *net = dev_net(dev);
+
+		skb = ip_rcv_core(skb, net);
+		if (skb == NULL)
+			continue;
+
+		if (skb_queue_empty(&sublist)) {
+			curr_dev = dev;
+			curr_net = net;
+		} else if (curr_dev != dev || curr_net != net) {
+			/* dispatch old sublist */
+			ip_sublist_rcv(&sublist, dev, net);
+			/* start new sublist */
+			__skb_queue_head_init(&sublist);
+			curr_dev = dev;
+			curr_net = net;
+		}
+		/* add to current sublist */
+		__skb_queue_tail(&sublist, skb);
+	}
+	/* dispatch final sublist */
+	ip_sublist_rcv(&sublist, curr_dev, curr_net);
 }

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 8/8] net: ipv4: listify ip_rcv_finish
  2016-04-19 13:33 [RFC PATCH net-next 0/8] Handle multiple received packets at each stage Edward Cree
                   ` (6 preceding siblings ...)
  2016-04-19 13:37 ` [RFC PATCH net-next 7/8] net: ipv4: listified version of ip_rcv Edward Cree
@ 2016-04-19 13:37 ` Edward Cree
  2016-04-19 19:11 ` [RFC PATCH net-next 0/8] Handle multiple received packets at each stage Jesper Dangaard Brouer
  8 siblings, 0 replies; 24+ messages in thread
From: Edward Cree @ 2016-04-19 13:37 UTC (permalink / raw)
  To: netdev, David Miller; +Cc: Jesper Dangaard Brouer, linux-net-drivers

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 net/ipv4/ip_input.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 53 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index e7d0d85..5bbc409 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -308,7 +308,8 @@ drop:
 	return true;
 }
 
-static int ip_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
+static int ip_rcv_finish_core(struct net *net, struct sock *sk,
+			      struct sk_buff *skb)
 {
 	const struct iphdr *iph = ip_hdr(skb);
 	struct rtable *rt;
@@ -385,13 +386,22 @@ static int ip_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
 			goto drop;
 	}
 
-	return dst_input(skb);
+	return NET_RX_SUCCESS;
 
 drop:
 	kfree_skb(skb);
 	return NET_RX_DROP;
 }
 
+static int ip_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
+{
+	int ret = ip_rcv_finish_core(net, sk, skb);
+
+	if (ret != NET_RX_DROP)
+		ret = dst_input(skb);
+	return ret;
+}
+
 /*
  * 	Main IP Receive routine.
  */
@@ -501,16 +511,54 @@ int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
 		       ip_rcv_finish);
 }
 
+static void ip_sublist_rcv_finish(struct sk_buff_head *list)
+{
+	struct sk_buff *skb;
+
+	while ((skb = __skb_dequeue(list)) != NULL)
+		dst_input(skb);
+}
+
+static void ip_list_rcv_finish(struct net *net, struct sock *sk,
+			       struct sk_buff_head *list)
+{
+	struct dst_entry *curr_dst = NULL;
+	struct sk_buff_head sublist;
+	struct sk_buff *skb;
+
+	__skb_queue_head_init(&sublist);
+
+	while ((skb = __skb_dequeue(list)) != NULL) {
+		struct dst_entry *dst;
+
+		if (ip_rcv_finish_core(net, sk, skb) == NET_RX_DROP)
+			continue;
+
+		dst = skb_dst(skb);
+		if (skb_queue_empty(&sublist)) {
+			curr_dst = dst;
+		} else if (curr_dst != dst) {
+			/* dispatch old sublist */
+			ip_sublist_rcv_finish(&sublist);
+			/* start new sublist */
+			__skb_queue_head_init(&sublist);
+			curr_dst = dst;
+		}
+		/* add to current sublist */
+		__skb_queue_tail(&sublist, skb);
+	}
+	/* dispatch final sublist */
+	ip_sublist_rcv_finish(&sublist);
+}
+
 static void ip_sublist_rcv(struct sk_buff_head *list, struct net_device *dev,
 			   struct net *net)
 {
 	struct sk_buff_head sublist;
-	struct sk_buff *skb;
 
 	NF_HOOK_LIST(NFPROTO_IPV4, NF_INET_PRE_ROUTING, net, NULL,
 		     list, &sublist, dev, NULL, ip_rcv_finish);
-	while ((skb = __skb_dequeue(&sublist)) != NULL)
-		ip_rcv_finish(net, NULL, skb);
+	ip_list_rcv_finish(net, NULL, &sublist);
 }
 
 /* Receive a list of IP packets */

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 2/8] sfc: batch up RX delivery on EF10
  2016-04-19 13:35 ` [RFC PATCH net-next 2/8] sfc: batch up RX delivery on EF10 Edward Cree
@ 2016-04-19 14:47   ` Eric Dumazet
  2016-04-19 16:36     ` Edward Cree
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Dumazet @ 2016-04-19 14:47 UTC (permalink / raw)
  To: Edward Cree
  Cc: netdev, David Miller, Jesper Dangaard Brouer, linux-net-drivers

On Tue, 2016-04-19 at 14:35 +0100, Edward Cree wrote:
> Improves packet rate of 1-byte UDP receives by 10%.

Sure, by adding yet another queue and extra latencies.

If the switch delivered a high prio packet to your host right before a
train of 60 low prio packets, this is not to allow us to wait the end of
the train.

We have to really invent something better, like a real pipeline, instead
of hacks like this, adding complexity everywhere.

Have you tested this on cpus with tiny caches, like 32KB ?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 7/8] net: ipv4: listified version of ip_rcv
  2016-04-19 13:37 ` [RFC PATCH net-next 7/8] net: ipv4: listified version of ip_rcv Edward Cree
@ 2016-04-19 14:50   ` Eric Dumazet
  2016-04-19 15:46     ` Tom Herbert
  2016-04-19 16:50     ` Edward Cree
  2016-04-21 17:24   ` Edward Cree
  1 sibling, 2 replies; 24+ messages in thread
From: Eric Dumazet @ 2016-04-19 14:50 UTC (permalink / raw)
  To: Edward Cree
  Cc: netdev, David Miller, Jesper Dangaard Brouer, linux-net-drivers

On Tue, 2016-04-19 at 14:37 +0100, Edward Cree wrote:
> Also involved adding a way to run a netfilter hook over a list of packets.
> Rather than attempting to make netfilter know about lists (which would be
> horrendous) we just let it call the regular okfn (in this case
> ip_rcv_finish()) for any packets it steals, and have it give us back a list
> of packets it's synchronously accepted (which normally NF_HOOK would
> automatically call okfn() on, but we want to be able to potentially pass
> the list to a listified version of okfn().)
> 
> There is potential for out-of-order receives if the netfilter hook ends up
> synchronously stealing packets, as they will be processed before any accepts
> earlier in the list.  However, it was already possible for an asynchronous
> accept to cause out-of-order receives, so hopefully I haven't broken
> anything that wasn't broken already.
> 
> Signed-off-by: Edward Cree <ecree@solarflare.com>
> ---

We have hard time to deal with latencies already, and maintaining some
sanity in the stack(s)

This is not going to give us a 10x or even 2x improvement factor, so
what about working on something that would really lower cache line
misses and use pipelines to amortize the costs ?

The main problem in UDP stack today is having to lock the socket because
of the dumb forward allocation problem. Are you really going to provide
a list of skbs up to _one_ UDP socket ?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 7/8] net: ipv4: listified version of ip_rcv
  2016-04-19 14:50   ` Eric Dumazet
@ 2016-04-19 15:46     ` Tom Herbert
  2016-04-19 16:54       ` Eric Dumazet
  2016-04-19 17:12       ` Edward Cree
  2016-04-19 16:50     ` Edward Cree
  1 sibling, 2 replies; 24+ messages in thread
From: Tom Herbert @ 2016-04-19 15:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Edward Cree, Linux Kernel Network Developers, David Miller,
	Jesper Dangaard Brouer, linux-net-drivers

On Tue, Apr 19, 2016 at 7:50 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2016-04-19 at 14:37 +0100, Edward Cree wrote:
>> Also involved adding a way to run a netfilter hook over a list of packets.
>> Rather than attempting to make netfilter know about lists (which would be
>> horrendous) we just let it call the regular okfn (in this case
>> ip_rcv_finish()) for any packets it steals, and have it give us back a list
>> of packets it's synchronously accepted (which normally NF_HOOK would
>> automatically call okfn() on, but we want to be able to potentially pass
>> the list to a listified version of okfn().)
>>
>> There is potential for out-of-order receives if the netfilter hook ends up
>> synchronously stealing packets, as they will be processed before any accepts
>> earlier in the list.  However, it was already possible for an asynchronous
>> accept to cause out-of-order receives, so hopefully I haven't broken
>> anything that wasn't broken already.
>>
>> Signed-off-by: Edward Cree <ecree@solarflare.com>
>> ---
>
> We have hard time to deal with latencies already, and maintaining some
> sanity in the stack(s)
>
Right, this is significant complexity for a fairly narrow use case.
One alternative might be to move early type demux like functionality
to the GRO layer. There's a lot of work done by GRO to parse and
identify packets of the same flow, even if we can't aggregate such
packets it might be nice if we can at least provide a cached route so
that we avoid doing a full route lookup on each one later on.

Tom

> This is not going to give us a 10x or even 2x improvement factor, so
> what about working on something that would really lower cache line
> misses and use pipelines to amortize the costs ?
>
> The main problem in UDP stack today is having to lock the socket because
> of the dumb forward allocation problem. Are you really going to provide
> a list of skbs up to _one_ UDP socket ?
>
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 2/8] sfc: batch up RX delivery on EF10
  2016-04-19 14:47   ` Eric Dumazet
@ 2016-04-19 16:36     ` Edward Cree
  2016-04-19 17:20       ` Eric Dumazet
  0 siblings, 1 reply; 24+ messages in thread
From: Edward Cree @ 2016-04-19 16:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, David Miller, Jesper Dangaard Brouer, linux-net-drivers

On 19/04/16 15:47, Eric Dumazet wrote:
> On Tue, 2016-04-19 at 14:35 +0100, Edward Cree wrote:
>> Improves packet rate of 1-byte UDP receives by 10%.
> Sure, by adding yet another queue and extra latencies.
>
> If the switch delivered a high prio packet to your host right before a
> train of 60 low prio packets, this is not to allow us to wait the end of
> the train.
The length of the list is bounded by the NAPI budget, and the first packet
in the list is delayed only by the time it takes to read the RX descriptors
and turn them into SKBs.  This patch never causes us to wait in the hope
that more things will arrive to batch, that's entirely driven by interrupt
moderation.

And if the high prio packet comes at the _end_ of a train of low prio
packets, we get to it _faster_ this way because we get the train out of the
way quicker.

Are you suggesting we should check for 802.1p priorities, and have those
skip the list?

> We have to really invent something better, like a real pipeline, instead
> of hacks like this, adding complexity everywhere.
I'm not sure what you mean by 'a real pipeline' in this context, could you
elaborate?

> Have you tested this on cpus with tiny caches, like 32KB ?
I haven't.  Is the concern here that the first packet's headers (we read 128
bytes into the linear area) and/or skb will get pushed out of the dcache as
we process further packets?

At least for sfc, it's highly unlikely that these cards will be used in low-
powered systems.  For the more general case, I suppose the answer would be a
tunable to set the maximum length of the RX list to less than the NAPI budget.
Fundamentally this kind of batching is trading dcache usage for icache usage.


Incidentally, this patch is very similar to what Jesper proposed for mlx5 in
an RFC back in February: http://article.gmane.org/gmane.linux.network/397379
So I'm a little surprised this bit is controversial, though I'm not surprised
the rest of the series is ;)

-Ed

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 7/8] net: ipv4: listified version of ip_rcv
  2016-04-19 14:50   ` Eric Dumazet
  2016-04-19 15:46     ` Tom Herbert
@ 2016-04-19 16:50     ` Edward Cree
  2016-04-19 18:06       ` Eric Dumazet
  1 sibling, 1 reply; 24+ messages in thread
From: Edward Cree @ 2016-04-19 16:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, David Miller, Jesper Dangaard Brouer, linux-net-drivers

On 19/04/16 15:50, Eric Dumazet wrote:
> The main problem in UDP stack today is having to lock the socket because
> of the dumb forward allocation problem.
I'm not quite sure what you're referring to here, care to educate me?

> Are you really going to provide
> a list of skbs up to _one_ UDP socket ?
In principle we should be able to take it that far, yes.  AFAICT the
socket already has a receive queue that we end up appending the packet
to (and which I presume the recvmsg() syscall pulls from), I don't see
why we couldn't just splice a list of skbs on the end rather than
appending them one by one.  Thus amortising looking up the socket.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 7/8] net: ipv4: listified version of ip_rcv
  2016-04-19 15:46     ` Tom Herbert
@ 2016-04-19 16:54       ` Eric Dumazet
  2016-04-19 17:12       ` Edward Cree
  1 sibling, 0 replies; 24+ messages in thread
From: Eric Dumazet @ 2016-04-19 16:54 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Edward Cree, Linux Kernel Network Developers, David Miller,
	Jesper Dangaard Brouer, linux-net-drivers

On Tue, 2016-04-19 at 08:46 -0700, Tom Herbert wrote:

> Right, this is significant complexity for a fairly narrow use case.
> One alternative might be to move early type demux like functionality
> to the GRO layer. There's a lot of work done by GRO to parse and
> identify packets of the same flow, even if we can't aggregate such
> packets it might be nice if we can at least provide a cached route so
> that we avoid doing a full route lookup on each one later on.

Moving early demux earlier in the stack would also allow us to implement
RFS more efficiently, removing one hash lookup.
(no extra cache line miss in global RFS table)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 7/8] net: ipv4: listified version of ip_rcv
  2016-04-19 15:46     ` Tom Herbert
  2016-04-19 16:54       ` Eric Dumazet
@ 2016-04-19 17:12       ` Edward Cree
  2016-04-19 17:54         ` Eric Dumazet
  2016-04-19 18:38         ` Tom Herbert
  1 sibling, 2 replies; 24+ messages in thread
From: Edward Cree @ 2016-04-19 17:12 UTC (permalink / raw)
  To: Tom Herbert, Eric Dumazet
  Cc: Linux Kernel Network Developers, David Miller,
	Jesper Dangaard Brouer, linux-net-drivers

On 19/04/16 16:46, Tom Herbert wrote:
> On Tue, Apr 19, 2016 at 7:50 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> We have hard time to deal with latencies already, and maintaining some
>> sanity in the stack(s)
> Right, this is significant complexity for a fairly narrow use case.
Why do you say the use case is narrow?  This approach should increase
packet rate for any (non-GROed) traffic, whether for local delivery or
forwarding.  If you're line-rate limited, it'll save CPU time instead.
The only reason I focused my testing on single-byte UDP is because the
benefits are more easily measured in that case.

If anything, the use case is broader than GRO, because GRO can't be used
for datagram protocols where packet boundaries must be maintained.
And because the listified processing is at least partly sharing code with
the regular stack, it's less complexity than GRO which has to have
essentially its own receive stack, _and_ code to coalesce the results
back into a superframe.

I think if we pushed bundled RX all the way up to the TCP layer, it might
potentially also be faster than GRO, because it avoids the work of
coalescing superframes; plus going through the GRO callbacks for each
packet could end up blowing icache in the same way the regular stack does.
If bundling did prove faster, we could then remove GRO, and overall
complexity would be _reduced_.

But I admit it may be a long shot.

-Ed

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 2/8] sfc: batch up RX delivery on EF10
  2016-04-19 16:36     ` Edward Cree
@ 2016-04-19 17:20       ` Eric Dumazet
  2016-04-19 17:42         ` Edward Cree
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Dumazet @ 2016-04-19 17:20 UTC (permalink / raw)
  To: Edward Cree
  Cc: netdev, David Miller, Jesper Dangaard Brouer, linux-net-drivers

On Tue, 2016-04-19 at 17:36 +0100, Edward Cree wrote:

> > We have to really invent something better, like a real pipeline, instead
> > of hacks like this, adding complexity everywhere.
> I'm not sure what you mean by 'a real pipeline' in this context, could you
> elaborate?
> 
> > Have you tested this on cpus with tiny caches, like 32KB ?
> I haven't.  Is the concern here that the first packet's headers (we read 128
> bytes into the linear area) and/or skb will get pushed out of the dcache as
> we process further packets?
> 
> At least for sfc, it's highly unlikely that these cards will be used in low-
> powered systems.  For the more general case, I suppose the answer would be a
> tunable to set the maximum length of the RX list to less than the NAPI budget.
> Fundamentally this kind of batching is trading dcache usage for icache usage.
> 
> 
> Incidentally, this patch is very similar to what Jesper proposed for mlx5 in
> an RFC back in February: http://article.gmane.org/gmane.linux.network/397379
> So I'm a little surprised this bit is controversial, though I'm not surprised
> the rest of the series is ;)

It seems all the discussions about fast kernel networking these days is
adding yet another queues, code duplication and complexity, batches, and
add latencies, on top of a single NIC RX queue.

Apparently the multiqueue nature of a NIC is obsolete and people want to
process 10+Mpps on a single queue.

But in the end, we do not address fundamental issues, like number of
cache line misses per incoming TCP packet, and per outgoing TCP packet.
(Or UDP if that matters).
Number of atomic ops to synchronize all accesses to common resources.
(memory, queue limits, queues)

Also issues with dealing with all the queues (percpu backlog in
net/core/dev.c, socket backlog, prequeue for TCP, ...)

We could probably get ~100% improvement in UDP if we really cared, just
by changing net/ipv[46]/udp.c, not changing other layers.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 2/8] sfc: batch up RX delivery on EF10
  2016-04-19 17:20       ` Eric Dumazet
@ 2016-04-19 17:42         ` Edward Cree
  2016-04-19 18:02           ` Eric Dumazet
  0 siblings, 1 reply; 24+ messages in thread
From: Edward Cree @ 2016-04-19 17:42 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, David Miller, Jesper Dangaard Brouer, linux-net-drivers

On 19/04/16 18:20, Eric Dumazet wrote:
> It seems all the discussions about fast kernel networking these days is
> adding yet another queues, code duplication and complexity, batches, and
> add latencies, on top of a single NIC RX queue.
>
> Apparently the multiqueue nature of a NIC is obsolete and people want to
> process 10+Mpps on a single queue.
The real goal here is to speed up packet processing generally.
Doing everything on a single queue (and thus a single CPU) is just a
convenient way of measuring that, while making sure performance is limited
by the RX side rather than the TX not being able to generate enough packets
to keep us busy.  Similarly, measuring single-byte UDP packet rate with one
CPU running flat-out is easier than measuring CPU usage while receiving
line-rate TCP in 1400-byte chunks.

> We could probably get ~100% improvement in UDP if we really cared, just
> by changing net/ipv[46]/udp.c, not changing other layers.
Well, I don't know how to achieve that, but it sounds like you do, so why
not go ahead and show us ;)
If you submitted a patch series to make UDP twice as fast, I think people
would "really care" about an improvement of that magnitude.

But RX batching should speed up all traffic, not just UDP.  Or at least,
that's the theory.

-Ed

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 7/8] net: ipv4: listified version of ip_rcv
  2016-04-19 17:12       ` Edward Cree
@ 2016-04-19 17:54         ` Eric Dumazet
  2016-04-19 18:38         ` Tom Herbert
  1 sibling, 0 replies; 24+ messages in thread
From: Eric Dumazet @ 2016-04-19 17:54 UTC (permalink / raw)
  To: Edward Cree
  Cc: Tom Herbert, Linux Kernel Network Developers, David Miller,
	Jesper Dangaard Brouer, linux-net-drivers

On Tue, 2016-04-19 at 18:12 +0100, Edward Cree wrote:

> I think if we pushed bundled RX all the way up to the TCP layer, it might
> potentially also be faster than GRO, because it avoids the work of
> coalescing superframes; plus going through the GRO callbacks for each
> packet could end up blowing icache in the same way the regular stack does.
> If bundling did prove faster, we could then remove GRO, and overall
> complexity would be _reduced_.

Oh well. You really are coming 8 years too late.

I receive ~40 Gbit on a _single_ TCP flow with GRO, I have no idea what
you expect to get without GRO, but my educated guess is :

It will be horrible, and will add memory overhead. (remember GRO shares
a single sk_buff/head for all the segs)

And as a bonus, GRO works also on forwarding workloads, when this packet
is delivered to another NIC or a virtual device.

Are you claiming TSO is useless ?

It is not because a model works well in some other stack (userspace I
guess), that we have to copy it in linux kernel.

Feel free to experiment things, but if your non GRO solution is slower
than GRO, don't even mention it here.

Remember to test things with a realistic iptables setup.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 2/8] sfc: batch up RX delivery on EF10
  2016-04-19 17:42         ` Edward Cree
@ 2016-04-19 18:02           ` Eric Dumazet
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Dumazet @ 2016-04-19 18:02 UTC (permalink / raw)
  To: Edward Cree
  Cc: netdev, David Miller, Jesper Dangaard Brouer, linux-net-drivers

On Tue, 2016-04-19 at 18:42 +0100, Edward Cree wrote:

> Well, I don't know how to achieve that, but it sounds like you do, so why
> not go ahead and show us ;)
> If you submitted a patch series to make UDP twice as fast, I think people
> would "really care" about an improvement of that magnitude.

Yes I definitely can do that, but Google uses SO_REUSEPORT so we do not
care at all consuming more cycles, if we still can absorb the load.

Just try it, and you'll reach 10 Mpps just fine.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 7/8] net: ipv4: listified version of ip_rcv
  2016-04-19 16:50     ` Edward Cree
@ 2016-04-19 18:06       ` Eric Dumazet
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Dumazet @ 2016-04-19 18:06 UTC (permalink / raw)
  To: Edward Cree
  Cc: netdev, David Miller, Jesper Dangaard Brouer, linux-net-drivers

On Tue, 2016-04-19 at 17:50 +0100, Edward Cree wrote:
> On 19/04/16 15:50, Eric Dumazet wrote:
> > The main problem in UDP stack today is having to lock the socket because
> > of the dumb forward allocation problem.
> I'm not quite sure what you're referring to here, care to educate me?


This was added for memory accounting, commit
95766fff6b9a78d11fc2d3812dd035381690b55d

TCP stack does not reclaim its forward allocations as aggressively.
(It has a concept of 'memory pressure')

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 7/8] net: ipv4: listified version of ip_rcv
  2016-04-19 17:12       ` Edward Cree
  2016-04-19 17:54         ` Eric Dumazet
@ 2016-04-19 18:38         ` Tom Herbert
  1 sibling, 0 replies; 24+ messages in thread
From: Tom Herbert @ 2016-04-19 18:38 UTC (permalink / raw)
  To: Edward Cree
  Cc: Eric Dumazet, Linux Kernel Network Developers, David Miller,
	Jesper Dangaard Brouer, linux-net-drivers

On Tue, Apr 19, 2016 at 10:12 AM, Edward Cree <ecree@solarflare.com> wrote:
> On 19/04/16 16:46, Tom Herbert wrote:
>> On Tue, Apr 19, 2016 at 7:50 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> We have hard time to deal with latencies already, and maintaining some
>>> sanity in the stack(s)
>> Right, this is significant complexity for a fairly narrow use case.
> Why do you say the use case is narrow?  This approach should increase
> packet rate for any (non-GROed) traffic, whether for local delivery or
> forwarding.  If you're line-rate limited, it'll save CPU time instead.
> The only reason I focused my testing on single-byte UDP is because the
> benefits are more easily measured in that case.
>
It's a narrow use case because of the intent to "suggested that having
multiple packets traverse the network stack together". Beyond queuing
to the backlog I don't understand what more processing can be done
without splitting the list up. We need to do a route lookup on each
packet, need to run each through IP tables, need to deliver each
packet individually to the application. For the queuing to backlog
that seems to me to be more of a localized bulk enqueue/dequeue
problem instead of a stack level infrastructure problem.

The general alternative to grouping packets together is to apply
cached values that were found in lookups for previous "similar"
packets. Since nearly all traffic fits some profile of a flow, we can
leverage the point that packets in a flow should have similar lookup
results. So, for example, the first time we see a flow we can create a
flow state and save any results of lookups found for that packets in
the flow (route lookup, IP tables etc.). For subsequent packets, if we
match the flow then we have the answers for all the lookups we would
need. Maintaining temporal flow states and performing fixed 5-tuple
flow state lookups in the hash table is easy for a host (and we can
often throw a lot of memory at it to size hash tables to avoid
collisions). VLP matching, open ended rule chains, multi table
lookups, crazy hashes over 35 fields in headers are the things we only
want to do when there is no other recourse. This illustrates one
reason why a host is not a switch, we have no hardware to do complex
lookups.

Tom

> If anything, the use case is broader than GRO, because GRO can't be used
> for datagram protocols where packet boundaries must be maintained.
> And because the listified processing is at least partly sharing code with
> the regular stack, it's less complexity than GRO which has to have
> essentially its own receive stack, _and_ code to coalesce the results
> back into a superframe.
>
> I think if we pushed bundled RX all the way up to the TCP layer, it might
> potentially also be faster than GRO, because it avoids the work of
> coalescing superframes; plus going through the GRO callbacks for each
> packet could end up blowing icache in the same way the regular stack does.
> If bundling did prove faster, we could then remove GRO, and overall
> complexity would be _reduced_.
>
> But I admit it may be a long shot.
>
> -Ed

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 0/8] Handle multiple received packets at each stage
  2016-04-19 13:33 [RFC PATCH net-next 0/8] Handle multiple received packets at each stage Edward Cree
                   ` (7 preceding siblings ...)
  2016-04-19 13:37 ` [RFC PATCH net-next 8/8] net: ipv4: listify ip_rcv_finish Edward Cree
@ 2016-04-19 19:11 ` Jesper Dangaard Brouer
  8 siblings, 0 replies; 24+ messages in thread
From: Jesper Dangaard Brouer @ 2016-04-19 19:11 UTC (permalink / raw)
  To: Edward Cree; +Cc: netdev, David Miller, linux-net-drivers, brouer

On Tue, 19 Apr 2016 14:33:02 +0100
Edward Cree <ecree@solarflare.com> wrote:

> Earlier discussions on this list[1] suggested that having multiple packets
> traverse the network stack together (rather than calling the stack for each
> packet singly) could improve performance through better cache locality.
> This patch series is an attempt to implement this by having drivers pass an
> SKB list to the stack at the end of the NAPI poll.  The stack then attempts
> to keep the list together, only splitting it when either packets need to be
> treated differently, or the next layer of the stack is not list-aware.
> 
> The first two patches simply place received packets on a list during the
> event processing loop on the sfc EF10 architecture, then call the normal
> stack for each packet singly at the end of the NAPI poll.
> The remaining patches extend the 'listified' processing as far as the IP
> receive handler.
> 
> Packet rate was tested with NetPerf UDP_STREAM, with 10 streams of 1-byte
> packets, and the process and interrupt pinned to a single core on the RX
> side.
> The NIC was a 40G Solarflare 7x42Q; the CPU was a Xeon E3-1220V2 @ 3.10GHz.
> Baseline:      5.07Mpps
> after patch 2: 5.59Mpps (10.2% above baseline)
> after patch 8: 6.44Mpps (25.6% above baseline)

Quite impressive!  Thank you Edward, for working on this.  It is nice
to see that doing this actually gives a nice performance boost, it was
mostly a theory of mine in [1].

(p.s. I'm currently a bit busy at MM-summit, but try to follow the
thread.  I want to try out your patchset once I return home again...)
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

[1] http://thread.gmane.org/gmane.linux.network/395502

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 7/8] net: ipv4: listified version of ip_rcv
  2016-04-19 13:37 ` [RFC PATCH net-next 7/8] net: ipv4: listified version of ip_rcv Edward Cree
  2016-04-19 14:50   ` Eric Dumazet
@ 2016-04-21 17:24   ` Edward Cree
  1 sibling, 0 replies; 24+ messages in thread
From: Edward Cree @ 2016-04-21 17:24 UTC (permalink / raw)
  To: netdev, David Miller; +Cc: Jesper Dangaard Brouer, linux-net-drivers

On 19/04/16 14:37, Edward Cree wrote:
> Also involved adding a way to run a netfilter hook over a list of packets.
Turns out, this breaks the build if netfilter is *disabled*, because I forgot to
add a stub in that case.  Next version of patch (if there is one) will have the
following under #ifndef CONFIG_NETFILTER:

static inline void
NF_HOOK_LIST(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
	     struct sk_buff_head *list, struct sk_buff_head *sublist,
	     struct net_device *in, struct net_device *out,
	     int (*okfn)(struct net *, struct sock *, struct sk_buff *))
{
	__skb_queue_head_init(sublist);
	/* Move everything to the sublist */
	skb_queue_splice_init(list, sublist);
}

-Ed

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-04-21 17:25 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-19 13:33 [RFC PATCH net-next 0/8] Handle multiple received packets at each stage Edward Cree
2016-04-19 13:34 ` [RFC PATCH net-next 1/8] net: core: trivial netif_receive_skb_list() entry point Edward Cree
2016-04-19 13:35 ` [RFC PATCH net-next 2/8] sfc: batch up RX delivery on EF10 Edward Cree
2016-04-19 14:47   ` Eric Dumazet
2016-04-19 16:36     ` Edward Cree
2016-04-19 17:20       ` Eric Dumazet
2016-04-19 17:42         ` Edward Cree
2016-04-19 18:02           ` Eric Dumazet
2016-04-19 13:35 ` [RFC PATCH net-next 3/8] net: core: unwrap skb list receive slightly further Edward Cree
2016-04-19 13:35 ` [RFC PATCH net-next 4/8] net: core: Another step of skb receive list processing Edward Cree
2016-04-19 13:36 ` [RFC PATCH net-next 5/8] net: core: another layer of lists, around PF_MEMALLOC skb handling Edward Cree
2016-04-19 13:36 ` [RFC PATCH net-next 6/8] net: core: propagate SKB lists through packet_type lookup Edward Cree
2016-04-19 13:37 ` [RFC PATCH net-next 7/8] net: ipv4: listified version of ip_rcv Edward Cree
2016-04-19 14:50   ` Eric Dumazet
2016-04-19 15:46     ` Tom Herbert
2016-04-19 16:54       ` Eric Dumazet
2016-04-19 17:12       ` Edward Cree
2016-04-19 17:54         ` Eric Dumazet
2016-04-19 18:38         ` Tom Herbert
2016-04-19 16:50     ` Edward Cree
2016-04-19 18:06       ` Eric Dumazet
2016-04-21 17:24   ` Edward Cree
2016-04-19 13:37 ` [RFC PATCH net-next 8/8] net: ipv4: listify ip_rcv_finish Edward Cree
2016-04-19 19:11 ` [RFC PATCH net-next 0/8] Handle multiple received packets at each stage Jesper Dangaard Brouer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.