All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next,RFC 00/13] New fast forwarding path
@ 2018-06-14 14:19 Pablo Neira Ayuso
  2018-06-14 14:19 ` [PATCH net-next,RFC 01/13] net: Add a helper to get the packet offload callbacks by priority Pablo Neira Ayuso
                   ` (16 more replies)
  0 siblings, 17 replies; 33+ messages in thread
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, steffen.klassert

Hi,

This patchset proposes a new fast forwarding path infrastructure that
combines the GRO/GSO and the flowtable infrastructures. The idea is to
add a hook at the GRO layer that is invoked before the standard GRO
protocol offloads. This allows us to build custom packet chains that we
can quickly pass in one go to the neighbour layer to define fast
forwarding path for flows.

For each packet that gets into the GRO layer, we first check if there is
an entry in the flowtable, if so, the packet is placed in a list until
the GRO infrastructure decides to send the batch from gro_complete to
the neighbour layer. The first packet in the list takes the route from
the flowtable entry, so we avoid reiterative routing lookups.

In case no entry is found in the flowtable, the packet is passed up to
the classic GRO offload handlers. Thus, this packet follows the standard
forwarding path. Note that the initial packets of the flow always go
through the standard IPv4/IPv6 netfilter forward hook, that is used to
configure what flows are placed in the flowtable. Therefore, only a few
(initial) packets follow the standard forwarding path while most of the
follow up packets take this new fast forwarding path.

The fast forwarding path is enabled through explicit user policy, so the
user needs to request this behaviour from control plane, the following
example shows how to place flows in the new fast forwarding path from
the netfilter forward chain:

 table x {
        flowtable f {
                hook early_ingress priority 0; devices = { eth0, eth1 }
        }

        chain y {
                type filter hook forward priority 0;
                ip protocol tcp flow offload @f
        }
 }

The example above defines a fastpath for TCP flows that are placed in
the flowtable 'f', this flowtable is hooked at the new early_ingress
hook. The initial TCP packets that match this rule from the standard
fowarding path create an entry in the flowtable, thus, GRO creates chain
of packets for those that find an entry in the flowtable and send
them through the neighbour layer.

This new hook is happening before the ingress taps, therefore, packets
that follow this new fast forwarding path are not shown by tcpdump.

This patchset supports both layer 3 IPv4 and IPv6, and layer 4 TCP and
UDP protocols. This fastpath also integrates with the IPSec
infrastructure and the ESP protocol.

We have collected performance numbers:

        TCP TSO         TCP Fast Forward
        32.5 Gbps       35.6 Gbps

        UDP             UDP Fast Forward
        17.6 Gbps       35.6 Gbps

        ESP             ESP Fast Forward
        6 Gbps          7.5 Gbps

For UDP, this is doubling performance, and we almost achieve line rate
with one single CPU using the Intel i40e NIC. We got similar numbers
with the Mellanox ConnectX-4. For TCP, this is slightly improving things
even if TSO is being defeated given that we need to segment the packet
chain in software. We would like to explore HW GRO support with hardware
vendors with this new mode, we think that should improve the TCP numbers
we are showing above even more. For ESP traffic, performance improvement
is ~25%, in this case, perf shows the bottleneck becomes the crypto layer.

This patchset is co-authored work with Steffen Klassert.

Comments are welcome, thanks.


Pablo Neira Ayuso (6):
  netfilter: nft_chain_filter: add support for early ingress
  netfilter: nf_flow_table: add hooknum to flowtable type
  netfilter: nf_flow_table: add flowtable for early ingress hook
  netfilter: nft_flow_offload: enable offload after second packet is seen
  netfilter: nft_flow_offload: remove secpath check
  netfilter: nft_flow_offload: make sure route is not stale

Steffen Klassert (7):
  net: Add a helper to get the packet offload callbacks by priority.
  net: Change priority of ipv4 and ipv6 packet offloads.
  net: Add a GSO feature bit for the netfilter forward fastpath.
  net: Use one bit of NAPI_GRO_CB for the netfilter fastpath.
  netfilter: add early ingress hook for IPv4
  netfilter: add early ingress support for IPv6
  netfilter: add ESP support for early ingress

 include/linux/netdev_features.h         |   4 +-
 include/linux/netdevice.h               |   6 +-
 include/linux/netfilter.h               |   6 +
 include/linux/netfilter_ingress.h       |   1 +
 include/linux/skbuff.h                  |   2 +
 include/net/netfilter/early_ingress.h   |  24 +++
 include/net/netfilter/nf_flow_table.h   |   4 +
 include/uapi/linux/netfilter.h          |   1 +
 net/core/dev.c                          |  50 ++++-
 net/ipv4/af_inet.c                      |   1 +
 net/ipv4/netfilter/Makefile             |   1 +
 net/ipv4/netfilter/early_ingress.c      | 327 +++++++++++++++++++++++++++++
 net/ipv4/netfilter/nf_flow_table_ipv4.c |  12 ++
 net/ipv6/ip6_offload.c                  |   1 +
 net/ipv6/netfilter/Makefile             |   1 +
 net/ipv6/netfilter/early_ingress.c      | 315 ++++++++++++++++++++++++++++
 net/ipv6/netfilter/nf_flow_table_ipv6.c |   1 +
 net/netfilter/Kconfig                   |   8 +
 net/netfilter/Makefile                  |   1 +
 net/netfilter/core.c                    |  35 +++-
 net/netfilter/early_ingress.c           | 361 ++++++++++++++++++++++++++++++++
 net/netfilter/nf_flow_table_inet.c      |   1 +
 net/netfilter/nf_flow_table_ip.c        |  72 +++++++
 net/netfilter/nf_tables_api.c           | 120 ++++++-----
 net/netfilter/nft_chain_filter.c        |   6 +-
 net/netfilter/nft_flow_offload.c        |  13 +-
 net/xfrm/xfrm_output.c                  |   4 +
 27 files changed, 1297 insertions(+), 81 deletions(-)
 create mode 100644 include/net/netfilter/early_ingress.h
 create mode 100644 net/ipv4/netfilter/early_ingress.c
 create mode 100644 net/ipv6/netfilter/early_ingress.c
 create mode 100644 net/netfilter/early_ingress.c

-- 
2.11.0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH net-next,RFC 01/13] net: Add a helper to get the packet offload callbacks by priority.
  2018-06-14 14:19 [PATCH net-next,RFC 00/13] New fast forwarding path Pablo Neira Ayuso
@ 2018-06-14 14:19 ` Pablo Neira Ayuso
  2018-06-14 14:19 ` [PATCH net-next,RFC 02/13] net: Change priority of ipv4 and ipv6 packet offloads Pablo Neira Ayuso
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, steffen.klassert

From: Steffen Klassert <steffen.klassert@secunet.com>

With this helper it is possible to request callbacks with
a certain priority. This will be used in the upcoming forward
fastpath to pass packets to the standard GRO path.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/netdevice.h |  1 +
 net/core/dev.c            | 14 ++++++++++++++
 2 files changed, 15 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3ec9850c7936..13a56f9b2a32 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2523,6 +2523,7 @@ void dev_remove_pack(struct packet_type *pt);
 void __dev_remove_pack(struct packet_type *pt);
 void dev_add_offload(struct packet_offload *po);
 void dev_remove_offload(struct packet_offload *po);
+struct packet_offload *dev_get_packet_offload(__be16 type, int priority);
 
 int dev_get_iflink(const struct net_device *dev);
 int dev_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 6e18242a1cae..115de8bfcb54 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -468,7 +468,21 @@ void dev_remove_pack(struct packet_type *pt)
 }
 EXPORT_SYMBOL(dev_remove_pack);
 
+struct packet_offload *dev_get_packet_offload(__be16 type, int priority)
+{
+	struct list_head *offload_head = &offload_base;
+	struct packet_offload *ptype;
+
+	list_for_each_entry_rcu(ptype, offload_head, list) {
+		if (ptype->type != type || !ptype->callbacks.gro_receive || !ptype->callbacks.gro_complete || ptype->priority < priority)
+			continue;
 
+		return ptype;
+	}
+
+	return NULL;
+}
+EXPORT_SYMBOL(dev_get_packet_offload);
 /**
  *	dev_add_offload - register offload handlers
  *	@po: protocol offload declaration
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next,RFC 02/13] net: Change priority of ipv4 and ipv6 packet offloads.
  2018-06-14 14:19 [PATCH net-next,RFC 00/13] New fast forwarding path Pablo Neira Ayuso
  2018-06-14 14:19 ` [PATCH net-next,RFC 01/13] net: Add a helper to get the packet offload callbacks by priority Pablo Neira Ayuso
@ 2018-06-14 14:19 ` Pablo Neira Ayuso
  2018-06-14 14:19 ` [PATCH net-next,RFC 03/13] net: Add a GSO feature bit for the netfilter forward fastpath Pablo Neira Ayuso
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, steffen.klassert

From: Steffen Klassert <steffen.klassert@secunet.com>

The forward fastpath needs to insert callbacks with
higher priority than the standard callbacks. So change
the priority of ipv4 and ipv6 packet offloads from zero
to one. With this we are able to insert callbacks with
priotity zero if needed.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv4/af_inet.c     | 1 +
 net/ipv6/ip6_offload.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 15e125558c76..fbb90f7556ea 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1841,6 +1841,7 @@ static int ipv4_proc_init(void);
 
 static struct packet_offload ip_packet_offload __read_mostly = {
 	.type = cpu_to_be16(ETH_P_IP),
+	.priority = 1,
 	.callbacks = {
 		.gso_segment = inet_gso_segment,
 		.gro_receive = inet_gro_receive,
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 5b3f2f89ef41..863913fb690f 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -343,6 +343,7 @@ static int ip4ip6_gro_complete(struct sk_buff *skb, int nhoff)
 
 static struct packet_offload ipv6_packet_offload __read_mostly = {
 	.type = cpu_to_be16(ETH_P_IPV6),
+	.priority = 1,
 	.callbacks = {
 		.gso_segment = ipv6_gso_segment,
 		.gro_receive = ipv6_gro_receive,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next,RFC 03/13] net: Add a GSO feature bit for the netfilter forward fastpath.
  2018-06-14 14:19 [PATCH net-next,RFC 00/13] New fast forwarding path Pablo Neira Ayuso
  2018-06-14 14:19 ` [PATCH net-next,RFC 01/13] net: Add a helper to get the packet offload callbacks by priority Pablo Neira Ayuso
  2018-06-14 14:19 ` [PATCH net-next,RFC 02/13] net: Change priority of ipv4 and ipv6 packet offloads Pablo Neira Ayuso
@ 2018-06-14 14:19 ` Pablo Neira Ayuso
  2018-06-14 14:19 ` [PATCH net-next,RFC 04/13] net: Use one bit of NAPI_GRO_CB for the netfilter fastpath Pablo Neira Ayuso
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, steffen.klassert

From: Steffen Klassert <steffen.klassert@secunet.com>

The netfilter forward fastpath has its own logic to create
GSO packets. So add a feature bit that we can catch GSO
packets that are generated by the fastpath GRO handler.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/netdev_features.h | 4 +++-
 include/linux/netdevice.h       | 1 +
 include/linux/skbuff.h          | 2 ++
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 623bb8ced060..f380a27410ef 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -56,8 +56,9 @@ enum {
 	NETIF_F_GSO_ESP_BIT,		/* ... ESP with TSO */
 	NETIF_F_GSO_UDP_BIT,		/* ... UFO, deprecated except tuntap */
 	NETIF_F_GSO_UDP_L4_BIT,		/* ... UDP payload GSO (not UFO) */
+	NETIF_F_GSO_NFT_BIT,		/* ... NFT generic */
 	/**/NETIF_F_GSO_LAST =		/* last bit, see GSO_MASK */
-		NETIF_F_GSO_UDP_L4_BIT,
+		NETIF_F_GSO_NFT_BIT,
 
 	NETIF_F_FCOE_CRC_BIT,		/* FCoE CRC32 */
 	NETIF_F_SCTP_CRC_BIT,		/* SCTP checksum offload */
@@ -140,6 +141,7 @@ enum {
 #define NETIF_F_GSO_SCTP	__NETIF_F(GSO_SCTP)
 #define NETIF_F_GSO_ESP		__NETIF_F(GSO_ESP)
 #define NETIF_F_GSO_UDP		__NETIF_F(GSO_UDP)
+#define NETIF_F_GSO_NFT		__NETIF_F(GSO_NFT)
 #define NETIF_F_HW_VLAN_STAG_FILTER __NETIF_F(HW_VLAN_STAG_FILTER)
 #define NETIF_F_HW_VLAN_STAG_RX	__NETIF_F(HW_VLAN_STAG_RX)
 #define NETIF_F_HW_VLAN_STAG_TX	__NETIF_F(HW_VLAN_STAG_TX)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 13a56f9b2a32..d8cadfa3769b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4229,6 +4229,7 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type)
 	BUILD_BUG_ON(SKB_GSO_ESP != (NETIF_F_GSO_ESP >> NETIF_F_GSO_SHIFT));
 	BUILD_BUG_ON(SKB_GSO_UDP != (NETIF_F_GSO_UDP >> NETIF_F_GSO_SHIFT));
 	BUILD_BUG_ON(SKB_GSO_UDP_L4 != (NETIF_F_GSO_UDP_L4 >> NETIF_F_GSO_SHIFT));
+	BUILD_BUG_ON(SKB_GSO_NFT != (NETIF_F_GSO_NFT >> NETIF_F_GSO_SHIFT));
 
 	return (features & feature) == feature;
 }
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index c86885954994..4a5cff1ffcaa 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -575,6 +575,8 @@ enum {
 	SKB_GSO_UDP = 1 << 16,
 
 	SKB_GSO_UDP_L4 = 1 << 17,
+
+	SKB_GSO_NFT = 1 << 18,
 };
 
 #if BITS_PER_LONG > 32
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next,RFC 04/13] net: Use one bit of NAPI_GRO_CB for the netfilter fastpath.
  2018-06-14 14:19 [PATCH net-next,RFC 00/13] New fast forwarding path Pablo Neira Ayuso
                   ` (2 preceding siblings ...)
  2018-06-14 14:19 ` [PATCH net-next,RFC 03/13] net: Add a GSO feature bit for the netfilter forward fastpath Pablo Neira Ayuso
@ 2018-06-14 14:19 ` Pablo Neira Ayuso
  2018-06-14 14:19 ` [PATCH net-next,RFC 05/13] netfilter: add early ingress hook for IPv4 Pablo Neira Ayuso
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, steffen.klassert

From: Steffen Klassert <steffen.klassert@secunet.com>

This patch adds a is_ffwd bit to the NAPI_GRO_CB to indicate
fastpath packtes in the GRO layer. It also implements the
logic we need for this in the generic codepath. The rest
of the needed logic is implemented within netfilter and
introduced with a followup patch.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/netdevice.h |  2 +-
 net/core/dev.c            | 36 +++++++++++++++++++++++++++---------
 2 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d8cadfa3769b..62734cf0c43a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2238,7 +2238,7 @@ struct napi_gro_cb {
 	/* Number of gro_receive callbacks this packet already went through */
 	u8 recursion_counter:4;
 
-	/* 1 bit hole */
+	u8	is_ffwd:1;
 
 	/* used to support CHECKSUM_COMPLETE for tunneling protocols */
 	__wsum	csum;
diff --git a/net/core/dev.c b/net/core/dev.c
index 115de8bfcb54..75f530886874 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4864,7 +4864,8 @@ static int napi_gro_complete(struct sk_buff *skb)
 
 	BUILD_BUG_ON(sizeof(struct napi_gro_cb) > sizeof(skb->cb));
 
-	if (NAPI_GRO_CB(skb)->count == 1) {
+	if (NAPI_GRO_CB(skb)->count == 1 &&
+	    !(NAPI_GRO_CB(skb)->is_ffwd)) {
 		skb_shinfo(skb)->gso_size = 0;
 		goto out;
 	}
@@ -4880,8 +4881,10 @@ static int napi_gro_complete(struct sk_buff *skb)
 	rcu_read_unlock();
 
 	if (err) {
-		WARN_ON(&ptype->list == head);
-		kfree_skb(skb);
+		if (err != -EINPROGRESS) {
+			WARN_ON(&ptype->list == head);
+			kfree_skb(skb);
+		}
 		return NET_RX_SUCCESS;
 	}
 
@@ -4936,8 +4939,10 @@ static void gro_list_prepare(struct napi_struct *napi, struct sk_buff *skb)
 
 		diffs = (unsigned long)p->dev ^ (unsigned long)skb->dev;
 		diffs |= p->vlan_tci ^ skb->vlan_tci;
-		diffs |= skb_metadata_dst_cmp(p, skb);
-		diffs |= skb_metadata_differs(p, skb);
+		if (!NAPI_GRO_CB(p)->is_ffwd) {
+			diffs |= skb_metadata_dst_cmp(p, skb);
+			diffs |= skb_metadata_differs(p, skb);
+		}
 		if (maclen == ETH_HLEN)
 			diffs |= compare_ether_header(skb_mac_header(p),
 						      skb_mac_header(skb));
@@ -5019,6 +5024,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 		NAPI_GRO_CB(skb)->is_fou = 0;
 		NAPI_GRO_CB(skb)->is_atomic = 1;
 		NAPI_GRO_CB(skb)->gro_remcsum_start = 0;
+		NAPI_GRO_CB(skb)->is_ffwd = 0;
 
 		/* Setup for GRO checksum validation */
 		switch (skb->ip_summed) {
@@ -5044,9 +5050,14 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 	if (&ptype->list == head)
 		goto normal;
 
-	if (IS_ERR(pp) && PTR_ERR(pp) == -EINPROGRESS) {
-		ret = GRO_CONSUMED;
-		goto ok;
+	if (IS_ERR(pp)) {
+		int err;
+
+		err = PTR_ERR(pp);
+		if (err == -EINPROGRESS || err == -EPERM) {
+			ret = GRO_CONSUMED;
+			goto ok;
+		}
 	}
 
 	same_flow = NAPI_GRO_CB(skb)->same_flow;
@@ -5064,8 +5075,15 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 	if (same_flow)
 		goto ok;
 
-	if (NAPI_GRO_CB(skb)->flush)
+	if (NAPI_GRO_CB(skb)->flush) {
+		if (NAPI_GRO_CB(skb)->is_ffwd) {
+			napi_gro_complete(skb);
+			ret = GRO_CONSUMED;
+			goto ok;
+		}
+
 		goto normal;
+	}
 
 	if (unlikely(napi->gro_count >= MAX_GRO_SKBS)) {
 		struct sk_buff *nskb = napi->gro_list;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next,RFC 05/13] netfilter: add early ingress hook for IPv4
  2018-06-14 14:19 [PATCH net-next,RFC 00/13] New fast forwarding path Pablo Neira Ayuso
                   ` (3 preceding siblings ...)
  2018-06-14 14:19 ` [PATCH net-next,RFC 04/13] net: Use one bit of NAPI_GRO_CB for the netfilter fastpath Pablo Neira Ayuso
@ 2018-06-14 14:19 ` Pablo Neira Ayuso
  2018-06-14 14:19 ` [PATCH net-next,RFC 06/13] netfilter: add early ingress support for IPv6 Pablo Neira Ayuso
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, steffen.klassert

From: Steffen Klassert <steffen.klassert@secunet.com>

Add the new early ingress hook for the netdev family, this new hook is
called from the GRO layer before the standard ipv4 GRO layers.

This hook allows us to perform early packet filtering and to define fast
forwarding path through packet chaining and flowtables using the new GSO
netfilter type. Packet that don't follow the fast path are passed up to
the standard GRO path for aggregation as usual.

This patch adds the GRO and GSO logic for this custom packet chaining.
The chaining uses the frag_list pointer so this means we do not need to
mangle the packets, therefore the aggregation strategy we follow does
not modify the packet as in the standard GRO path - we have no need to
recalculate checksum. This chain of packets is sent from the
.gro_complete callback directly to the neighbour layer. The first packet
in the chain holds a reference to the destination route.

Supported layer 4 protocols for this custom GRO packet chaining include
TCP and UDP.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/netdevice.h             |   2 +
 include/linux/netfilter.h             |   6 +
 include/linux/netfilter_ingress.h     |   1 +
 include/net/netfilter/early_ingress.h |  20 +++
 include/uapi/linux/netfilter.h        |   1 +
 net/ipv4/netfilter/Makefile           |   1 +
 net/ipv4/netfilter/early_ingress.c    | 319 +++++++++++++++++++++++++++++++++
 net/netfilter/Kconfig                 |   8 +
 net/netfilter/Makefile                |   1 +
 net/netfilter/core.c                  |  35 +++-
 net/netfilter/early_ingress.c         | 323 ++++++++++++++++++++++++++++++++++
 11 files changed, 716 insertions(+), 1 deletion(-)
 create mode 100644 include/net/netfilter/early_ingress.h
 create mode 100644 net/ipv4/netfilter/early_ingress.c
 create mode 100644 net/netfilter/early_ingress.c

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 62734cf0c43a..c79922665be5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1880,6 +1880,8 @@ struct net_device {
 	rx_handler_func_t __rcu	*rx_handler;
 	void __rcu		*rx_handler_data;
 
+	struct nf_hook_entries __rcu *nf_hooks_early_ingress;
+
 #ifdef CONFIG_NET_CLS_ACT
 	struct mini_Qdisc __rcu	*miniq_ingress;
 #endif
diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index 04551af2ff23..ad3f0b9ae4f1 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -429,4 +429,10 @@ extern struct nfnl_ct_hook __rcu *nfnl_ct_hook;
  */
 DECLARE_PER_CPU(bool, nf_skb_duplicated);
 
+int nf_hook_netdev(struct sk_buff *skb, struct nf_hook_state *state,
+		   const struct nf_hook_entries *e);
+
+void nf_early_ingress_enable(void);
+void nf_early_ingress_disable(void);
+
 #endif /*__LINUX_NETFILTER_H*/
diff --git a/include/linux/netfilter_ingress.h b/include/linux/netfilter_ingress.h
index 554c920691dd..7b70c9d4c435 100644
--- a/include/linux/netfilter_ingress.h
+++ b/include/linux/netfilter_ingress.h
@@ -40,6 +40,7 @@ static inline int nf_hook_ingress(struct sk_buff *skb)
 
 static inline void nf_hook_ingress_init(struct net_device *dev)
 {
+	RCU_INIT_POINTER(dev->nf_hooks_early_ingress, NULL);
 	RCU_INIT_POINTER(dev->nf_hooks_ingress, NULL);
 }
 #else /* CONFIG_NETFILTER_INGRESS */
diff --git a/include/net/netfilter/early_ingress.h b/include/net/netfilter/early_ingress.h
new file mode 100644
index 000000000000..caaef9fe619f
--- /dev/null
+++ b/include/net/netfilter/early_ingress.h
@@ -0,0 +1,20 @@
+#ifndef _NF_EARLY_INGRESS_H_
+#define _NF_EARLY_INGRESS_H_
+
+#include <net/protocol.h>
+
+struct sk_buff *nft_skb_segment(struct sk_buff *head_skb);
+struct sk_buff **nft_udp_gro_receive(struct sk_buff **head,
+				     struct sk_buff *skb);
+struct sk_buff **nft_tcp_gro_receive(struct sk_buff **head,
+				     struct sk_buff *skb);
+
+int nf_hook_early_ingress(struct sk_buff *skb);
+
+void nf_early_ingress_ip_enable(void);
+void nf_early_ingress_ip_disable(void);
+
+void nf_early_ingress_enable(void);
+void nf_early_ingress_disable(void);
+
+#endif
diff --git a/include/uapi/linux/netfilter.h b/include/uapi/linux/netfilter.h
index cca10e767cd8..55d26b20e09f 100644
--- a/include/uapi/linux/netfilter.h
+++ b/include/uapi/linux/netfilter.h
@@ -54,6 +54,7 @@ enum nf_inet_hooks {
 
 enum nf_dev_hooks {
 	NF_NETDEV_INGRESS,
+	NF_NETDEV_EARLY_INGRESS,
 	NF_NETDEV_NUMHOOKS
 };
 
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index 8394c17c269f..faf5fab59f0f 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -2,6 +2,7 @@
 #
 # Makefile for the netfilter modules on top of IPv4.
 #
+obj-$(CONFIG_NETFILTER_EARLY_INGRESS) += early_ingress.o
 
 # objects for l3 independent conntrack
 nf_conntrack_ipv4-y	:=  nf_conntrack_l3proto_ipv4.o nf_conntrack_proto_icmp.o
diff --git a/net/ipv4/netfilter/early_ingress.c b/net/ipv4/netfilter/early_ingress.c
new file mode 100644
index 000000000000..6ff6e34e5eff
--- /dev/null
+++ b/net/ipv4/netfilter/early_ingress.c
@@ -0,0 +1,319 @@
+#include <linux/kernel.h>
+#include <linux/netfilter.h>
+#include <linux/types.h>
+#include <net/xfrm.h>
+#include <net/arp.h>
+#include <net/udp.h>
+#include <net/tcp.h>
+#include <net/protocol.h>
+#include <net/netfilter/early_ingress.h>
+
+static const struct net_offload __rcu *nft_ip_offloads[MAX_INET_PROTOS] __read_mostly;
+
+static struct sk_buff *nft_udp4_gso_segment(struct sk_buff *skb,
+					    netdev_features_t features)
+{
+	skb_push(skb, sizeof(struct iphdr));
+	return nft_skb_segment(skb);
+}
+
+static struct sk_buff *nft_tcp4_gso_segment(struct sk_buff *skb,
+					    netdev_features_t features)
+{
+	skb_push(skb, sizeof(struct iphdr));
+	return nft_skb_segment(skb);
+}
+
+static struct sk_buff *nft_ipv4_gso_segment(struct sk_buff *skb,
+					    netdev_features_t features)
+{
+	struct sk_buff *segs = ERR_PTR(-EINVAL);
+	const struct net_offload *ops;
+	struct packet_offload *ptype;
+	struct iphdr *iph;
+	int proto;
+	int ihl;
+
+	if (!(skb_shinfo(skb)->gso_type & SKB_GSO_NFT)) {
+		ptype = dev_get_packet_offload(skb->protocol, 1);
+		if (ptype)
+			return ptype->callbacks.gso_segment(skb, features);
+
+		return ERR_PTR(-EPROTONOSUPPORT);
+	}
+
+	if (SKB_GSO_CB(skb)->encap_level == 0) {
+		iph = ip_hdr(skb);
+		skb_reset_network_header(skb);
+	} else {
+		iph = (struct iphdr *)skb->data;
+	}
+
+	if (unlikely(!pskb_may_pull(skb, sizeof(*iph))))
+		goto out;
+
+	ihl = iph->ihl * 4;
+	if (ihl < sizeof(*iph))
+		goto out;
+
+	SKB_GSO_CB(skb)->encap_level += ihl;
+
+	if (unlikely(!pskb_may_pull(skb, ihl)))
+		goto out;
+
+	__skb_pull(skb, ihl);
+
+	proto = iph->protocol;
+
+	segs = ERR_PTR(-EPROTONOSUPPORT);
+
+	ops = rcu_dereference(nft_ip_offloads[proto]);
+	if (likely(ops && ops->callbacks.gso_segment))
+		segs = ops->callbacks.gso_segment(skb, features);
+
+out:
+	return segs;
+}
+
+static int nft_ipv4_gro_complete(struct sk_buff *skb, int nhoff)
+{
+	struct iphdr *iph = (struct iphdr *)(skb->data + nhoff);
+	struct dst_entry *dst = skb_dst(skb);
+	struct rtable *rt = (struct rtable *)dst;
+	const struct net_offload *ops;
+	struct packet_offload *ptype;
+	struct net_device *dev;
+	struct neighbour *neigh;
+	unsigned int hh_len;
+	int err = 0;
+	u32 nexthop;
+	u16 count;
+
+	count = NAPI_GRO_CB(skb)->count;
+
+	if (!NAPI_GRO_CB(skb)->is_ffwd) {
+		ptype = dev_get_packet_offload(skb->protocol, 1);
+		if (ptype)
+			return ptype->callbacks.gro_complete(skb, nhoff);
+
+		return 0;
+	}
+
+	rcu_read_lock();
+	ops = rcu_dereference(nft_ip_offloads[iph->protocol]);
+	if (!ops || !ops->callbacks.gro_complete)
+		goto out_unlock;
+
+	/* Only need to add sizeof(*iph) to get to the next hdr below
+	 * because any hdr with option will have been flushed in
+	 * inet_gro_receive().
+	 */
+	err = ops->callbacks.gro_complete(skb, nhoff + sizeof(*iph));
+
+out_unlock:
+	rcu_read_unlock();
+
+	if (err)
+		return err;
+
+	skb_shinfo(skb)->gso_type |= SKB_GSO_NFT;
+	skb_shinfo(skb)->gso_segs = count;
+
+	dev = dst->dev;
+	dev_hold(dev);
+	skb->dev = dev;
+
+	if (skb_dst(skb)->xfrm) {
+		err = dst_output(dev_net(dev), NULL, skb);
+		if (err != -EREMOTE)
+			return -EINPROGRESS;
+	}
+
+	if (count <= 1)
+		skb_gso_reset(skb);
+
+	hh_len = LL_RESERVED_SPACE(dev);
+
+	if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
+		struct sk_buff *skb2;
+
+		skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev));
+		if (!skb2) {
+			kfree_skb(skb);
+			return -ENOMEM;
+		}
+		consume_skb(skb);
+		skb = skb2;
+	}
+	rcu_read_lock();
+	nexthop = (__force u32) rt_nexthop(rt, iph->daddr);
+	neigh = __ipv4_neigh_lookup_noref(dev, nexthop);
+	if (unlikely(!neigh))
+		neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);
+	if (!IS_ERR(neigh))
+		neigh_output(neigh, skb);
+	rcu_read_unlock();
+
+	return -EINPROGRESS;
+}
+
+static struct sk_buff **nft_ipv4_gro_receive(struct sk_buff **head,
+					     struct sk_buff *skb)
+{
+	const struct net_offload *ops;
+	struct packet_offload *ptype;
+	struct sk_buff **pp = NULL;
+	struct sk_buff *p;
+	struct iphdr *iph;
+	unsigned int hlen;
+	unsigned int off;
+	int proto, ret;
+
+	off = skb_gro_offset(skb);
+	hlen = off + sizeof(*iph);
+
+	iph = skb_gro_header_slow(skb, hlen, off);
+	if (unlikely(!iph)) {
+		pp = ERR_PTR(-EPERM);
+		goto out;
+	}
+
+	proto = iph->protocol;
+
+	rcu_read_lock();
+
+	if (*(u8 *)iph != 0x45) {
+		kfree_skb(skb);
+		pp = ERR_PTR(-EPERM);
+		goto out_unlock;
+	}
+
+	if (unlikely(ip_fast_csum((u8 *)iph, 5))) {
+		kfree_skb(skb);
+		pp = ERR_PTR(-EPERM);
+		goto out_unlock;
+	}
+
+	if (ip_is_fragment(iph))
+		goto out_unlock;
+
+	ret = nf_hook_early_ingress(skb);
+	switch (ret) {
+	case NF_STOLEN:
+		break;
+	case NF_ACCEPT:
+		ptype = dev_get_packet_offload(skb->protocol, 1);
+		if (ptype)
+			pp = ptype->callbacks.gro_receive(head, skb);
+
+		goto out_unlock;
+	case NF_DROP:
+		pp = ERR_PTR(-EPERM);
+		goto out_unlock;
+	}
+
+	ops = rcu_dereference(nft_ip_offloads[proto]);
+	if (!ops || !ops->callbacks.gro_receive)
+		goto out_unlock;
+
+	if (iph->ttl <= 1) {
+		kfree_skb(skb);
+		pp = ERR_PTR(-EPERM);
+		goto out_unlock;
+	}
+
+	skb->ip_summed = CHECKSUM_UNNECESSARY;
+
+	for (p = *head; p; p = p->next) {
+		struct iphdr *iph2;
+
+		if (!NAPI_GRO_CB(p)->same_flow)
+			continue;
+
+		iph2 = ip_hdr(p);
+		/* The above works because, with the exception of the top
+		 * (inner most) layer, we only aggregate pkts with the same
+		 * hdr length so all the hdrs we'll need to verify will start
+		 * at the same offset.
+		 */
+		if ((iph->protocol ^ iph2->protocol) |
+		    ((__force u32)iph->saddr ^ (__force u32)iph2->saddr) |
+		    ((__force u32)iph->daddr ^ (__force u32)iph2->daddr)) {
+			NAPI_GRO_CB(p)->same_flow = 0;
+			continue;
+		}
+
+		if (!NAPI_GRO_CB(p)->is_ffwd)
+			continue;
+
+		if (!skb_dst(p))
+			continue;
+
+		/* All fields must match except length and checksum. */
+		NAPI_GRO_CB(p)->flush |=
+			((iph->ttl - 1) ^ iph2->ttl) |
+			(iph->tos ^ iph2->tos) |
+			((iph->frag_off ^ iph2->frag_off) & htons(IP_DF));
+
+		pp = &p;
+
+		break;
+	}
+
+	NAPI_GRO_CB(skb)->is_atomic = !!(iph->frag_off & htons(IP_DF));
+
+	ip_decrease_ttl(iph);
+	skb->priority = rt_tos2priority(iph->tos);
+
+	skb_pull(skb, off);
+	NAPI_GRO_CB(skb)->data_offset = sizeof(*iph);
+	skb_reset_network_header(skb);
+	skb_set_transport_header(skb, sizeof(*iph));
+
+	pp = call_gro_receive(ops->callbacks.gro_receive, head, skb);
+out_unlock:
+	rcu_read_unlock();
+
+out:
+	NAPI_GRO_CB(skb)->data_offset = 0;
+	return pp;
+}
+
+static struct packet_offload nft_ipv4_packet_offload __read_mostly = {
+	.type = cpu_to_be16(ETH_P_IP),
+	.priority = 0,
+	.callbacks = {
+		.gro_receive = nft_ipv4_gro_receive,
+		.gro_complete = nft_ipv4_gro_complete,
+		.gso_segment = nft_ipv4_gso_segment,
+	},
+};
+
+static const struct net_offload nft_udp4_offload = {
+	.callbacks = {
+		.gso_segment = nft_udp4_gso_segment,
+		.gro_receive  =	nft_udp_gro_receive,
+	},
+};
+
+static const struct net_offload nft_tcp4_offload = {
+	.callbacks = {
+		.gso_segment = nft_tcp4_gso_segment,
+		.gro_receive  =	nft_tcp_gro_receive,
+	},
+};
+
+static const struct net_offload __rcu *nft_ip_offloads[MAX_INET_PROTOS] __read_mostly = {
+	[IPPROTO_UDP]	= &nft_udp4_offload,
+	[IPPROTO_TCP]	= &nft_tcp4_offload,
+};
+
+void nf_early_ingress_ip_enable(void)
+{
+	dev_add_offload(&nft_ipv4_packet_offload);
+}
+
+void nf_early_ingress_ip_disable(void)
+{
+	dev_remove_offload(&nft_ipv4_packet_offload);
+}
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index dbd7d1fad277..8f803a1fd76e 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -9,6 +9,14 @@ config NETFILTER_INGRESS
 	  This allows you to classify packets from ingress using the Netfilter
 	  infrastructure.
 
+config NETFILTER_EARLY_INGRESS
+	bool "Netfilter early ingress support"
+	default y
+	help
+	  This allows you to perform very early filtering and packet aggregation
+	  for fast forwarding bypass by exercising the GRO engine from the
+	  Netfilter infrastructure.
+
 config NETFILTER_NETLINK
 	tristate
 
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 44449389e527..eebc0e35f9e5 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 netfilter-objs := core.o nf_log.o nf_queue.o nf_sockopt.o utils.o
+netfilter-$(CONFIG_NETFILTER_EARLY_INGRESS) += early_ingress.o
 
 nf_conntrack-y	:= nf_conntrack_core.o nf_conntrack_standalone.o nf_conntrack_expect.o nf_conntrack_helper.o nf_conntrack_proto.o nf_conntrack_l3proto_generic.o nf_conntrack_proto_generic.o nf_conntrack_proto_tcp.o nf_conntrack_proto_udp.o nf_conntrack_extend.o nf_conntrack_acct.o nf_conntrack_seqadj.o
 nf_conntrack-$(CONFIG_NF_CONNTRACK_TIMEOUT) += nf_conntrack_timeout.o
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 168af54db975..4885365380d3 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -306,6 +306,11 @@ nf_hook_entry_head(struct net *net, int pf, unsigned int hooknum,
 			return &dev->nf_hooks_ingress;
 	}
 #endif
+	if (hooknum == NF_NETDEV_EARLY_INGRESS) {
+		if (dev && dev_net(dev) == net)
+			return &dev->nf_hooks_early_ingress;
+	}
+
 	WARN_ON_ONCE(1);
 	return NULL;
 }
@@ -321,7 +326,8 @@ static int __nf_register_net_hook(struct net *net, int pf,
 		if (reg->hooknum == NF_NETDEV_INGRESS)
 			return -EOPNOTSUPP;
 #endif
-		if (reg->hooknum != NF_NETDEV_INGRESS ||
+		if ((reg->hooknum != NF_NETDEV_INGRESS &&
+		     reg->hooknum != NF_NETDEV_EARLY_INGRESS) ||
 		    !reg->dev || dev_net(reg->dev) != net)
 			return -EINVAL;
 	}
@@ -347,6 +353,9 @@ static int __nf_register_net_hook(struct net *net, int pf,
 	if (pf == NFPROTO_NETDEV && reg->hooknum == NF_NETDEV_INGRESS)
 		net_inc_ingress_queue();
 #endif
+	if (pf == NFPROTO_NETDEV && reg->hooknum == NF_NETDEV_EARLY_INGRESS)
+		nf_early_ingress_enable();
+
 #ifdef HAVE_JUMP_LABEL
 	static_key_slow_inc(&nf_hooks_needed[pf][reg->hooknum]);
 #endif
@@ -404,6 +413,9 @@ static void __nf_unregister_net_hook(struct net *net, int pf,
 #ifdef CONFIG_NETFILTER_INGRESS
 		if (pf == NFPROTO_NETDEV && reg->hooknum == NF_NETDEV_INGRESS)
 			net_dec_ingress_queue();
+
+		if (pf == NFPROTO_NETDEV && reg->hooknum == NF_NETDEV_EARLY_INGRESS)
+			nf_early_ingress_disable();
 #endif
 #ifdef HAVE_JUMP_LABEL
 		static_key_slow_dec(&nf_hooks_needed[pf][reg->hooknum]);
@@ -535,6 +547,27 @@ int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state,
 }
 EXPORT_SYMBOL(nf_hook_slow);
 
+int nf_hook_netdev(struct sk_buff *skb, struct nf_hook_state *state,
+		   const struct nf_hook_entries *e)
+{
+	unsigned int verdict, s, v = NF_ACCEPT;
+
+	for (s = 0; s < e->num_hook_entries; s++) {
+		verdict = nf_hook_entry_hookfn(&e->hooks[s], skb, state);
+		v = verdict & NF_VERDICT_MASK;
+		switch (v) {
+		case NF_ACCEPT:
+			break;
+		case NF_DROP:
+			kfree_skb(skb);
+			/* Fall through */
+		default:
+			return v;
+		}
+	}
+
+	return v;
+}
 
 int skb_make_writable(struct sk_buff *skb, unsigned int writable_len)
 {
diff --git a/net/netfilter/early_ingress.c b/net/netfilter/early_ingress.c
new file mode 100644
index 000000000000..bf31aa8b3721
--- /dev/null
+++ b/net/netfilter/early_ingress.c
@@ -0,0 +1,323 @@
+#include <linux/kernel.h>
+#include <linux/netfilter.h>
+#include <linux/types.h>
+#include <net/xfrm.h>
+#include <net/arp.h>
+#include <net/udp.h>
+#include <net/tcp.h>
+#include <net/protocol.h>
+#include <crypto/aead.h>
+#include <net/netfilter/early_ingress.h>
+
+/* XXX: Maybe export this from net/core/skbuff.c
+ * instead of holding a local copy */
+static void skb_headers_offset_update(struct sk_buff *skb, int off)
+{
+	/* Only adjust this if it actually is csum_start rather than csum */
+	if (skb->ip_summed == CHECKSUM_PARTIAL)
+		skb->csum_start += off;
+	/* {transport,network,mac}_header and tail are relative to skb->head */
+	skb->transport_header += off;
+	skb->network_header   += off;
+	if (skb_mac_header_was_set(skb))
+		skb->mac_header += off;
+	skb->inner_transport_header += off;
+	skb->inner_network_header += off;
+	skb->inner_mac_header += off;
+}
+
+struct sk_buff *nft_skb_segment(struct sk_buff *head_skb)
+{
+	unsigned int headroom;
+	struct sk_buff *nskb;
+	struct sk_buff *segs = NULL;
+	struct sk_buff *tail = NULL;
+	unsigned int doffset = head_skb->data - skb_mac_header(head_skb);
+	struct sk_buff *list_skb = skb_shinfo(head_skb)->frag_list;
+	unsigned int tnl_hlen = skb_tnl_header_len(head_skb);
+	unsigned int delta_segs, delta_len, delta_truesize;
+
+	__skb_push(head_skb, doffset);
+
+	headroom = skb_headroom(head_skb);
+
+	delta_segs = delta_len = delta_truesize = 0;
+
+	skb_shinfo(head_skb)->frag_list = NULL;
+
+	segs = skb_clone(head_skb, GFP_ATOMIC);
+	if (unlikely(!segs))
+		return ERR_PTR(-ENOMEM);
+
+	do {
+		nskb = list_skb;
+
+		list_skb = list_skb->next;
+
+		if (!tail)
+			segs->next = nskb;
+		else
+			tail->next = nskb;
+
+		tail = nskb;
+
+		delta_len += nskb->len;
+		delta_truesize += nskb->truesize;
+
+		skb_push(nskb, doffset);
+
+		nskb->dev = head_skb->dev;
+		nskb->queue_mapping = head_skb->queue_mapping;
+		nskb->network_header = head_skb->network_header;
+		nskb->mac_len = head_skb->mac_len;
+		nskb->mac_header = head_skb->mac_header;
+		nskb->transport_header = head_skb->transport_header;
+
+		if (!secpath_exists(nskb))
+			nskb->sp = secpath_get(head_skb->sp);
+
+		skb_headers_offset_update(nskb, skb_headroom(nskb) - headroom);
+
+		skb_copy_from_linear_data_offset(head_skb, -tnl_hlen,
+						 nskb->data - tnl_hlen,
+						 doffset + tnl_hlen);
+
+	} while (list_skb);
+
+	segs->len = head_skb->len - delta_len;
+	segs->data_len = head_skb->data_len - delta_len;
+	segs->truesize += head_skb->data_len - delta_truesize;
+
+	head_skb->len = segs->len;
+	head_skb->data_len = segs->data_len;
+	head_skb->truesize += segs->truesize;
+
+	skb_shinfo(segs)->gso_size = 0;
+	skb_shinfo(segs)->gso_segs = 0;
+	skb_shinfo(segs)->gso_type = 0;
+
+	segs->prev = tail;
+
+	return segs;
+}
+
+static int nft_skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
+{
+	struct sk_buff *p = *head;
+
+	if (unlikely((!NAPI_GRO_CB(p)->is_ffwd) || !skb_dst(p)))
+		return -EINVAL;
+
+	if (NAPI_GRO_CB(p)->last == p)
+		skb_shinfo(p)->frag_list = skb;
+	else
+		NAPI_GRO_CB(p)->last->next = skb;
+	NAPI_GRO_CB(p)->last = skb;
+
+	NAPI_GRO_CB(p)->count++;
+	p->data_len += skb->len;
+	p->truesize += skb->truesize;
+	p->len += skb->len;
+
+	NAPI_GRO_CB(skb)->same_flow = 1;
+	return 0;
+}
+
+static struct sk_buff **udp_gro_ffwd_receive(struct sk_buff **head,
+					     struct sk_buff *skb,
+					     struct udphdr *uh)
+{
+	struct sk_buff *p = NULL;
+	struct sk_buff **pp = NULL;
+	struct udphdr *uh2;
+	int flush = 0;
+
+	for (; (p = *head); head = &p->next) {
+
+		if (!NAPI_GRO_CB(p)->same_flow)
+			continue;
+
+		uh2 = udp_hdr(p);
+
+		/* Match ports and either checksums are either both zero
+		 * or nonzero.
+		 */
+		if ((*(u32 *)&uh->source != *(u32 *)&uh2->source) ||
+		    (!uh->check ^ !uh2->check)) {
+			NAPI_GRO_CB(p)->same_flow = 0;
+			continue;
+		}
+
+		goto found;
+	}
+
+	goto out;
+
+found:
+	p = *head;
+
+	if (nft_skb_gro_receive(head, skb))
+		flush = 1;
+
+out:
+	if (p && (!NAPI_GRO_CB(skb)->same_flow || flush))
+		pp = head;
+
+	NAPI_GRO_CB(skb)->flush |= flush;
+	return pp;
+}
+
+struct sk_buff **nft_udp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
+{
+	struct udphdr *uh;
+
+	uh = skb_gro_header_slow(skb, skb_transport_offset(skb) + sizeof(struct udphdr),
+				 skb_transport_offset(skb));
+
+	if (unlikely(!uh))
+		goto flush;
+
+	if (NAPI_GRO_CB(skb)->flush)
+		goto flush;
+
+	if (NAPI_GRO_CB(skb)->is_ffwd)
+		return udp_gro_ffwd_receive(head, skb, uh);
+
+flush:
+	NAPI_GRO_CB(skb)->flush = 1;
+	return NULL;
+}
+
+struct sk_buff **nft_tcp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
+{
+	struct sk_buff **pp = NULL;
+	struct sk_buff *p;
+	struct tcphdr *th;
+	struct tcphdr *th2;
+	unsigned int len;
+	unsigned int thlen;
+	__be32 flags;
+	unsigned int mss = 1;
+	unsigned int hlen;
+	int flush = 1;
+	int i;
+
+	th = skb_gro_header_slow(skb, skb_transport_offset(skb) + sizeof(struct tcphdr),
+				 skb_transport_offset(skb));
+	if (unlikely(!th))
+		goto out;
+
+	thlen = th->doff * 4;
+	if (thlen < sizeof(*th))
+		goto out;
+
+	hlen = skb_transport_offset(skb) + thlen;
+
+	th = skb_gro_header_slow(skb, hlen, skb_transport_offset(skb));
+	if (unlikely(!th))
+		goto out;
+
+	skb_gro_pull(skb, thlen);
+	len = skb_gro_len(skb);
+	flags = tcp_flag_word(th);
+
+	for (; (p = *head); head = &p->next) {
+		if (!NAPI_GRO_CB(p)->same_flow)
+			continue;
+
+		th2 = tcp_hdr(p);
+
+		if (*(u32 *)&th->source ^ *(u32 *)&th2->source) {
+			NAPI_GRO_CB(p)->same_flow = 0;
+			continue;
+		}
+
+		goto found;
+	}
+
+	goto out_check_final;
+
+found:
+	flush = NAPI_GRO_CB(p)->flush;
+	flush |= (__force int)(flags & TCP_FLAG_CWR);
+	flush |= (__force int)((flags ^ tcp_flag_word(th2)) &
+		  ~(TCP_FLAG_CWR | TCP_FLAG_FIN | TCP_FLAG_PSH));
+	flush |= (__force int)(th->ack_seq ^ th2->ack_seq);
+	for (i = sizeof(*th); i < thlen; i += 4)
+		flush |= *(u32 *)((u8 *)th + i) ^
+			 *(u32 *)((u8 *)th2 + i);
+
+	mss = skb_shinfo(p)->gso_size;
+
+	flush |= (len - 1) >= mss;
+	flush |= (ntohl(th2->seq) + (skb_gro_len(p) - (hlen * (NAPI_GRO_CB(p)->count - 1)))) ^ ntohl(th->seq);
+
+	if (flush || nft_skb_gro_receive(head, skb)) {
+		mss = 1;
+		goto out_check_final;
+	}
+
+	p = *head;
+
+out_check_final:
+	flush = len < mss;
+	flush |= (__force int)(flags & (TCP_FLAG_URG | TCP_FLAG_PSH |
+					TCP_FLAG_RST | TCP_FLAG_SYN |
+					TCP_FLAG_FIN));
+
+	if (p && (!NAPI_GRO_CB(skb)->same_flow || flush))
+		pp = head;
+
+out:
+	NAPI_GRO_CB(skb)->flush |= (flush != 0);
+
+	return pp;
+}
+
+static inline bool nf_hook_early_ingress_active(const struct sk_buff *skb)
+{
+#ifdef HAVE_JUMP_LABEL
+	if (!static_key_false(&nf_hooks_needed[NFPROTO_NETDEV][NF_NETDEV_EARLY_INGRESS]))
+		return false;
+#endif
+	return rcu_access_pointer(skb->dev->nf_hooks_early_ingress);
+}
+
+int nf_hook_early_ingress(struct sk_buff *skb)
+{
+	struct nf_hook_entries *e =
+		rcu_dereference(skb->dev->nf_hooks_early_ingress);
+	struct nf_hook_state state;
+	int ret = NF_ACCEPT;
+
+	if (nf_hook_early_ingress_active(skb)) {
+		if (unlikely(!e))
+			return 0;
+
+		nf_hook_state_init(&state, NF_NETDEV_EARLY_INGRESS,
+				   NFPROTO_NETDEV, skb->dev, NULL, NULL,
+				   dev_net(skb->dev), NULL);
+
+		ret = nf_hook_netdev(skb, &state, e);
+	}
+
+	return ret;
+}
+
+/* protected by nf_hook_mutex. */
+static int nf_early_ingress_use;
+
+void nf_early_ingress_enable(void)
+{
+	if (nf_early_ingress_use++ == 0) {
+		nf_early_ingress_use++;
+		nf_early_ingress_ip_enable();
+	}
+}
+
+void nf_early_ingress_disable(void)
+{
+	if (--nf_early_ingress_use == 0) {
+		nf_early_ingress_ip_disable();
+	}
+}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next,RFC 06/13] netfilter: add early ingress support for IPv6
  2018-06-14 14:19 [PATCH net-next,RFC 00/13] New fast forwarding path Pablo Neira Ayuso
                   ` (4 preceding siblings ...)
  2018-06-14 14:19 ` [PATCH net-next,RFC 05/13] netfilter: add early ingress hook for IPv4 Pablo Neira Ayuso
@ 2018-06-14 14:19 ` Pablo Neira Ayuso
  2018-06-14 14:19 ` [PATCH net-next,RFC 07/13] netfilter: add ESP support for early ingress Pablo Neira Ayuso
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, steffen.klassert

From: Steffen Klassert <steffen.klassert@secunet.com>

This patch adds the custom GSO and GRO logic for the IPv6 early ingress
hook. Layer 4 supports UDP and TCP at this stage.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/early_ingress.h |   2 +
 net/ipv6/netfilter/Makefile           |   1 +
 net/ipv6/netfilter/early_ingress.c    | 307 ++++++++++++++++++++++++++++++++++
 net/netfilter/early_ingress.c         |   2 +
 4 files changed, 312 insertions(+)
 create mode 100644 net/ipv6/netfilter/early_ingress.c

diff --git a/include/net/netfilter/early_ingress.h b/include/net/netfilter/early_ingress.h
index caaef9fe619f..9ba8e2875345 100644
--- a/include/net/netfilter/early_ingress.h
+++ b/include/net/netfilter/early_ingress.h
@@ -13,6 +13,8 @@ int nf_hook_early_ingress(struct sk_buff *skb);
 
 void nf_early_ingress_ip_enable(void);
 void nf_early_ingress_ip_disable(void);
+void nf_early_ingress_ip6_enable(void);
+void nf_early_ingress_ip6_disable(void);
 
 void nf_early_ingress_enable(void);
 void nf_early_ingress_disable(void);
diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile
index 10a5a1c87320..445dfcf51ca8 100644
--- a/net/ipv6/netfilter/Makefile
+++ b/net/ipv6/netfilter/Makefile
@@ -2,6 +2,7 @@
 #
 # Makefile for the netfilter modules on top of IPv6.
 #
+obj-$(CONFIG_NETFILTER_EARLY_INGRESS) += early_ingress.o
 
 # Link order matters here.
 obj-$(CONFIG_IP6_NF_IPTABLES) += ip6_tables.o
diff --git a/net/ipv6/netfilter/early_ingress.c b/net/ipv6/netfilter/early_ingress.c
new file mode 100644
index 000000000000..026d2814530a
--- /dev/null
+++ b/net/ipv6/netfilter/early_ingress.c
@@ -0,0 +1,307 @@
+#include <linux/kernel.h>
+#include <linux/netfilter.h>
+#include <linux/types.h>
+#include <net/xfrm.h>
+#include <net/arp.h>
+#include <net/udp.h>
+#include <net/tcp.h>
+#include <net/protocol.h>
+#include <net/netfilter/early_ingress.h>
+#include <net/ip6_route.h>
+
+static const struct net_offload __rcu *nft_ip6_offloads[MAX_INET_PROTOS] __read_mostly;
+
+static struct sk_buff *nft_udp6_gso_segment(struct sk_buff *skb,
+					    netdev_features_t features)
+{
+	skb_push(skb, sizeof(struct ipv6hdr));
+	return nft_skb_segment(skb);
+}
+
+static struct sk_buff *nft_tcp6_gso_segment(struct sk_buff *skb,
+					    netdev_features_t features)
+{
+	skb_push(skb, sizeof(struct ipv6hdr));
+	return nft_skb_segment(skb);
+}
+
+static struct sk_buff *nft_ipv6_gso_segment(struct sk_buff *skb,
+					    netdev_features_t features)
+{
+	struct sk_buff *segs = ERR_PTR(-EINVAL);
+	const struct net_offload *ops;
+	struct packet_offload *ptype;
+	struct ipv6hdr *iph;
+	int proto;
+
+	if (!(skb_shinfo(skb)->gso_type & SKB_GSO_NFT)) {
+		ptype = dev_get_packet_offload(skb->protocol, 1);
+		if (ptype)
+			return ptype->callbacks.gso_segment(skb, features);
+
+		return ERR_PTR(-EPROTONOSUPPORT);
+	}
+
+	if (SKB_GSO_CB(skb)->encap_level == 0) {
+		iph = ipv6_hdr(skb);
+		skb_reset_network_header(skb);
+	} else {
+		iph = (struct ipv6hdr *)skb->data;
+	}
+
+	if (unlikely(!pskb_may_pull(skb, sizeof(*iph))))
+		goto out;
+
+	SKB_GSO_CB(skb)->encap_level += sizeof(*iph);
+
+	if (unlikely(!pskb_may_pull(skb, sizeof(*iph))))
+		goto out;
+
+	__skb_pull(skb, sizeof(*iph));
+
+	proto = iph->nexthdr;
+
+	segs = ERR_PTR(-EPROTONOSUPPORT);
+
+	ops = rcu_dereference(nft_ip6_offloads[proto]);
+	if (likely(ops && ops->callbacks.gso_segment))
+		segs = ops->callbacks.gso_segment(skb, features);
+
+out:
+	return segs;
+}
+
+static int nft_ipv6_gro_complete(struct sk_buff *skb, int nhoff)
+{
+	struct ipv6hdr *iph = (struct ipv6hdr *)(skb->data + nhoff);
+	struct dst_entry *dst = skb_dst(skb);
+	struct rt6_info *rt = (struct rt6_info *)dst;
+	const struct net_offload *ops;
+	struct packet_offload *ptype;
+	int proto = iph->nexthdr;
+	struct in6_addr *nexthop;
+	struct neighbour *neigh;
+	struct net_device *dev;
+	unsigned int hh_len;
+	int err = 0;
+	u16 count;
+
+	count = NAPI_GRO_CB(skb)->count;
+
+	if (!NAPI_GRO_CB(skb)->is_ffwd) {
+		ptype = dev_get_packet_offload(skb->protocol, 1);
+		if (ptype)
+			return ptype->callbacks.gro_complete(skb, nhoff);
+
+		return 0;
+	}
+
+	rcu_read_lock();
+	ops = rcu_dereference(nft_ip6_offloads[proto]);
+	if (!ops || !ops->callbacks.gro_complete)
+		goto out_unlock;
+
+	/* Only need to add sizeof(*iph) to get to the next hdr below
+	 * because any hdr with option will have been flushed in
+	 * inet_gro_receive().
+	 */
+	err = ops->callbacks.gro_complete(skb, nhoff + sizeof(*iph));
+
+out_unlock:
+	rcu_read_unlock();
+
+	if (err)
+		return err;
+
+	skb_shinfo(skb)->gso_type |= SKB_GSO_NFT;
+	skb_shinfo(skb)->gso_segs = count;
+
+	dev = dst->dev;
+	dev_hold(dev);
+	skb->dev = dev;
+
+	if (skb_dst(skb)->xfrm) {
+		err = dst_output(dev_net(dev), NULL, skb);
+		if (err != -EREMOTE)
+			return -EINPROGRESS;
+	}
+
+	if (count <= 1)
+		skb_gso_reset(skb);
+
+	hh_len = LL_RESERVED_SPACE(dev);
+
+	if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
+		struct sk_buff *skb2;
+
+		skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev));
+		if (!skb2) {
+			kfree_skb(skb);
+			return -ENOMEM;
+		}
+		consume_skb(skb);
+		skb = skb2;
+	}
+	rcu_read_lock();
+	nexthop = rt6_nexthop(rt, &iph->daddr);
+	neigh = __ipv6_neigh_lookup_noref(dev, nexthop);
+	if (unlikely(!neigh))
+		neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);
+	if (!IS_ERR(neigh))
+		neigh_output(neigh, skb);
+	rcu_read_unlock();
+
+	return -EINPROGRESS;
+}
+
+static struct sk_buff **nft_ipv6_gro_receive(struct sk_buff **head,
+					     struct sk_buff *skb)
+{
+	const struct net_offload *ops;
+	struct packet_offload *ptype;
+	struct sk_buff **pp = NULL;
+	struct sk_buff *p;
+	struct ipv6hdr *iph;
+	unsigned int nlen;
+	unsigned int hlen;
+	unsigned int off;
+	int proto, ret;
+
+	off = skb_gro_offset(skb);
+	hlen = off + sizeof(*iph);
+
+	iph = skb_gro_header_slow(skb, hlen, off);
+	if (unlikely(!iph))
+		goto out;
+
+	proto = iph->nexthdr;
+
+	rcu_read_lock();
+
+	if (iph->version != 6)
+		goto out_unlock;
+
+	nlen = skb_network_header_len(skb);
+
+	ret = nf_hook_early_ingress(skb);
+	switch (ret) {
+	case NF_STOLEN:
+		break;
+	case NF_ACCEPT:
+		ptype = dev_get_packet_offload(skb->protocol, 1);
+		if (ptype)
+			pp = ptype->callbacks.gro_receive(head, skb);
+
+		goto out_unlock;
+	case NF_DROP:
+		pp = ERR_PTR(-EPERM);
+		goto out_unlock;
+	}
+
+	ops = rcu_dereference(nft_ip6_offloads[proto]);
+	if (!ops || !ops->callbacks.gro_receive)
+		goto out_unlock;
+
+	if (iph->hop_limit <= 1)
+		goto out_unlock;
+
+	skb->ip_summed = CHECKSUM_UNNECESSARY;
+
+	for (p = *head; p; p = p->next) {
+		struct ipv6hdr *iph2;
+		__be32 first_word; /* <Version:4><Traffic_Class:8><Flow_Label:20> */
+
+		if (!NAPI_GRO_CB(p)->same_flow)
+			continue;
+
+		if (!NAPI_GRO_CB(p)->is_ffwd) {
+			NAPI_GRO_CB(p)->same_flow = 0;
+			continue;
+		}
+
+		if (!skb_dst(p)) {
+			NAPI_GRO_CB(p)->same_flow = 0;
+			continue;
+		}
+
+		iph2 = ipv6_hdr(p);
+		first_word = *(__be32 *)iph ^ *(__be32 *)iph2;
+
+		/* All fields must match except length and Traffic Class.
+		 * XXX skbs on the gro_list have all been parsed and pulled
+		 * already so we don't need to compare nlen
+		 * (nlen != (sizeof(*iph2) + ipv6_exthdrs_len(iph2, &ops)))
+		 * memcmp() alone below is suffcient, right?
+		 */
+		if ((first_word & htonl(0xF00FFFFF)) ||
+		   memcmp(&iph->nexthdr, &iph2->nexthdr,
+			  nlen - offsetof(struct ipv6hdr, nexthdr))) {
+			NAPI_GRO_CB(p)->same_flow = 0;
+			continue;
+		}
+		/* flush if Traffic Class fields are different */
+		NAPI_GRO_CB(p)->flush |= !!(first_word & htonl(0x0FF00000));
+
+		NAPI_GRO_CB(skb)->is_ffwd = 1;
+		skb_dst_set_noref(skb, skb_dst(p));
+		pp = &p;
+
+		break;
+	}
+
+	NAPI_GRO_CB(skb)->is_atomic = true;
+
+	iph->hop_limit--;
+
+	skb_pull(skb, off);
+	NAPI_GRO_CB(skb)->data_offset = sizeof(*iph);
+	skb_reset_network_header(skb);
+	skb_set_transport_header(skb, sizeof(*iph));
+
+	pp = call_gro_receive(ops->callbacks.gro_receive, head, skb);
+out_unlock:
+	rcu_read_unlock();
+
+out:
+	NAPI_GRO_CB(skb)->data_offset = 0;
+	return pp;
+}
+
+static struct packet_offload nft_ip6_packet_offload __read_mostly = {
+	.type = cpu_to_be16(ETH_P_IPV6),
+	.priority = 0,
+	.callbacks = {
+		.gro_receive = nft_ipv6_gro_receive,
+		.gro_complete = nft_ipv6_gro_complete,
+		.gso_segment = nft_ipv6_gso_segment,
+	},
+};
+
+static const struct net_offload nft_udp6_offload = {
+	.callbacks = {
+		.gso_segment = nft_udp6_gso_segment,
+		.gro_receive  =	nft_udp_gro_receive,
+	},
+};
+
+static const struct net_offload nft_tcp6_offload = {
+	.callbacks = {
+		.gso_segment = nft_tcp6_gso_segment,
+		.gro_receive  =	nft_tcp_gro_receive,
+	},
+};
+
+static const struct net_offload __rcu *nft_ip6_offloads[MAX_INET_PROTOS] __read_mostly = {
+	[IPPROTO_UDP]	= &nft_udp6_offload,
+	[IPPROTO_TCP]	= &nft_tcp6_offload,
+};
+
+void nf_early_ingress_ip6_enable(void)
+{
+	dev_add_offload(&nft_ip6_packet_offload);
+}
+
+void nf_early_ingress_ip6_disable(void)
+{
+	dev_remove_offload(&nft_ip6_packet_offload);
+}
diff --git a/net/netfilter/early_ingress.c b/net/netfilter/early_ingress.c
index bf31aa8b3721..4daf6cfea304 100644
--- a/net/netfilter/early_ingress.c
+++ b/net/netfilter/early_ingress.c
@@ -312,6 +312,7 @@ void nf_early_ingress_enable(void)
 	if (nf_early_ingress_use++ == 0) {
 		nf_early_ingress_use++;
 		nf_early_ingress_ip_enable();
+		nf_early_ingress_ip6_enable();
 	}
 }
 
@@ -319,5 +320,6 @@ void nf_early_ingress_disable(void)
 {
 	if (--nf_early_ingress_use == 0) {
 		nf_early_ingress_ip_disable();
+		nf_early_ingress_ip6_disable();
 	}
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next,RFC 07/13] netfilter: add ESP support for early ingress
  2018-06-14 14:19 [PATCH net-next,RFC 00/13] New fast forwarding path Pablo Neira Ayuso
                   ` (5 preceding siblings ...)
  2018-06-14 14:19 ` [PATCH net-next,RFC 06/13] netfilter: add early ingress support for IPv6 Pablo Neira Ayuso
@ 2018-06-14 14:19 ` Pablo Neira Ayuso
  2018-06-14 14:19 ` [PATCH net-next,RFC 08/13] netfilter: nft_chain_filter: add " Pablo Neira Ayuso
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, steffen.klassert

From: Steffen Klassert <steffen.klassert@secunet.com>

This patch adds the GSO logic for ESP and the codepath that allows
the xfrm infrastructure to signal the GRO layer that the packet is
following the fast forwarding path.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/early_ingress.h |  2 ++
 net/ipv4/netfilter/early_ingress.c    |  8 ++++++++
 net/ipv6/netfilter/early_ingress.c    |  8 ++++++++
 net/netfilter/early_ingress.c         | 36 +++++++++++++++++++++++++++++++++++
 net/xfrm/xfrm_output.c                |  4 ++++
 5 files changed, 58 insertions(+)

diff --git a/include/net/netfilter/early_ingress.h b/include/net/netfilter/early_ingress.h
index 9ba8e2875345..6653b294f25a 100644
--- a/include/net/netfilter/early_ingress.h
+++ b/include/net/netfilter/early_ingress.h
@@ -8,6 +8,8 @@ struct sk_buff **nft_udp_gro_receive(struct sk_buff **head,
 				     struct sk_buff *skb);
 struct sk_buff **nft_tcp_gro_receive(struct sk_buff **head,
 				     struct sk_buff *skb);
+struct sk_buff *nft_esp_gso_segment(struct sk_buff *skb,
+				    netdev_features_t features);
 
 int nf_hook_early_ingress(struct sk_buff *skb);
 
diff --git a/net/ipv4/netfilter/early_ingress.c b/net/ipv4/netfilter/early_ingress.c
index 6ff6e34e5eff..74f3a7f1273d 100644
--- a/net/ipv4/netfilter/early_ingress.c
+++ b/net/ipv4/netfilter/early_ingress.c
@@ -5,6 +5,7 @@
 #include <net/arp.h>
 #include <net/udp.h>
 #include <net/tcp.h>
+#include <net/esp.h>
 #include <net/protocol.h>
 #include <net/netfilter/early_ingress.h>
 
@@ -303,9 +304,16 @@ static const struct net_offload nft_tcp4_offload = {
 	},
 };
 
+static const struct net_offload nft_esp4_offload = {
+	.callbacks = {
+		.gso_segment = nft_esp_gso_segment,
+	},
+};
+
 static const struct net_offload __rcu *nft_ip_offloads[MAX_INET_PROTOS] __read_mostly = {
 	[IPPROTO_UDP]	= &nft_udp4_offload,
 	[IPPROTO_TCP]	= &nft_tcp4_offload,
+	[IPPROTO_ESP]	= &nft_esp4_offload,
 };
 
 void nf_early_ingress_ip_enable(void)
diff --git a/net/ipv6/netfilter/early_ingress.c b/net/ipv6/netfilter/early_ingress.c
index 026d2814530a..fb00b083593b 100644
--- a/net/ipv6/netfilter/early_ingress.c
+++ b/net/ipv6/netfilter/early_ingress.c
@@ -5,6 +5,7 @@
 #include <net/arp.h>
 #include <net/udp.h>
 #include <net/tcp.h>
+#include <net/esp.h>
 #include <net/protocol.h>
 #include <net/netfilter/early_ingress.h>
 #include <net/ip6_route.h>
@@ -291,9 +292,16 @@ static const struct net_offload nft_tcp6_offload = {
 	},
 };
 
+static const struct net_offload nft_esp6_offload = {
+	.callbacks = {
+		.gso_segment = nft_esp_gso_segment,
+	},
+};
+
 static const struct net_offload __rcu *nft_ip6_offloads[MAX_INET_PROTOS] __read_mostly = {
 	[IPPROTO_UDP]	= &nft_udp6_offload,
 	[IPPROTO_TCP]	= &nft_tcp6_offload,
+	[IPPROTO_ESP]	= &nft_esp6_offload,
 };
 
 void nf_early_ingress_ip6_enable(void)
diff --git a/net/netfilter/early_ingress.c b/net/netfilter/early_ingress.c
index 4daf6cfea304..10d718bbe495 100644
--- a/net/netfilter/early_ingress.c
+++ b/net/netfilter/early_ingress.c
@@ -5,6 +5,7 @@
 #include <net/arp.h>
 #include <net/udp.h>
 #include <net/tcp.h>
+#include <net/esp.h>
 #include <net/protocol.h>
 #include <crypto/aead.h>
 #include <net/netfilter/early_ingress.h>
@@ -274,6 +275,41 @@ struct sk_buff **nft_tcp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 	return pp;
 }
 
+struct sk_buff *nft_esp_gso_segment(struct sk_buff *skb,
+				    netdev_features_t features)
+{
+	struct xfrm_offload *xo = xfrm_offload(skb);
+	netdev_features_t esp_features = features;
+	struct crypto_aead *aead;
+	struct ip_esp_hdr *esph;
+	struct xfrm_state *x;
+
+	if (!xo)
+		return ERR_PTR(-EINVAL);
+
+	x = skb->sp->xvec[skb->sp->len - 1];
+	aead = x->data;
+	esph = ip_esp_hdr(skb);
+
+	if (esph->spi != x->id.spi)
+		return ERR_PTR(-EINVAL);
+
+	if (!pskb_may_pull(skb, sizeof(*esph) + crypto_aead_ivsize(aead)))
+		return ERR_PTR(-EINVAL);
+
+	__skb_pull(skb, sizeof(*esph) + crypto_aead_ivsize(aead));
+
+	skb->encap_hdr_csum = 1;
+
+	if (!(features & NETIF_F_HW_ESP) || !x->xso.offload_handle ||
+	    (x->xso.dev != skb->dev))
+		esp_features = features & ~(NETIF_F_SG | NETIF_F_CSUM_MASK);
+
+	xo->flags |= XFRM_GSO_SEGMENT;
+
+	return x->outer_mode->gso_segment(x, skb, esp_features);
+}
+
 static inline bool nf_hook_early_ingress_active(const struct sk_buff *skb)
 {
 #ifdef HAVE_JUMP_LABEL
diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index 89b178a78dc7..c63b157f46ce 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -146,6 +146,10 @@ int xfrm_output_resume(struct sk_buff *skb, int err)
 	while (likely((err = xfrm_output_one(skb, err)) == 0)) {
 		nf_reset(skb);
 
+		if (!skb_dst(skb)->xfrm && skb->sp &&
+		    (skb_shinfo(skb)->gso_type & SKB_GSO_NFT))
+			return -EREMOTE;
+
 		err = skb_dst(skb)->ops->local_out(net, skb->sk, skb);
 		if (unlikely(err != 1))
 			goto out;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next,RFC 08/13] netfilter: nft_chain_filter: add support for early ingress
  2018-06-14 14:19 [PATCH net-next,RFC 00/13] New fast forwarding path Pablo Neira Ayuso
                   ` (6 preceding siblings ...)
  2018-06-14 14:19 ` [PATCH net-next,RFC 07/13] netfilter: add ESP support for early ingress Pablo Neira Ayuso
@ 2018-06-14 14:19 ` Pablo Neira Ayuso
  2018-06-14 14:19 ` [PATCH net-next,RFC 09/13] netfilter: nf_flow_table: add hooknum to flowtable type Pablo Neira Ayuso
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, steffen.klassert

This patch adds the new filter chain at the early ingress hook.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/netfilter/nft_chain_filter.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c
index 84c902477a91..bc7fb2dc0e44 100644
--- a/net/netfilter/nft_chain_filter.c
+++ b/net/netfilter/nft_chain_filter.c
@@ -277,9 +277,11 @@ static const struct nft_chain_type nft_chain_filter_netdev = {
 	.name		= "filter",
 	.type		= NFT_CHAIN_T_DEFAULT,
 	.family		= NFPROTO_NETDEV,
-	.hook_mask	= (1 << NF_NETDEV_INGRESS),
+	.hook_mask	= (1 << NF_NETDEV_INGRESS) |
+			  (1 << NF_NETDEV_EARLY_INGRESS),
 	.hooks		= {
-		[NF_NETDEV_INGRESS]	= nft_do_chain_netdev,
+		[NF_NETDEV_INGRESS]		= nft_do_chain_netdev,
+		[NF_NETDEV_EARLY_INGRESS]	= nft_do_chain_netdev,
 	},
 };
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next,RFC 09/13] netfilter: nf_flow_table: add hooknum to flowtable type
  2018-06-14 14:19 [PATCH net-next,RFC 00/13] New fast forwarding path Pablo Neira Ayuso
                   ` (7 preceding siblings ...)
  2018-06-14 14:19 ` [PATCH net-next,RFC 08/13] netfilter: nft_chain_filter: add " Pablo Neira Ayuso
@ 2018-06-14 14:19 ` Pablo Neira Ayuso
  2018-06-14 14:19 ` [PATCH net-next,RFC 10/13] netfilter: nf_flow_table: add flowtable for early ingress hook Pablo Neira Ayuso
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, steffen.klassert

This allows us to register different flowtable variants depending on the
hook type, hence we can define flowtable for new hook types.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/netfilter/nf_flow_table.h   |   1 +
 net/ipv4/netfilter/nf_flow_table_ipv4.c |   1 +
 net/ipv6/netfilter/nf_flow_table_ipv6.c |   1 +
 net/netfilter/nf_flow_table_inet.c      |   1 +
 net/netfilter/nf_tables_api.c           | 120 +++++++++++++++++---------------
 5 files changed, 67 insertions(+), 57 deletions(-)

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index ba9fa4592f2b..4606bad41155 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -14,6 +14,7 @@ struct nf_flowtable;
 struct nf_flowtable_type {
 	struct list_head		list;
 	int				family;
+	unsigned int			hooknum;
 	int				(*init)(struct nf_flowtable *ft);
 	void				(*free)(struct nf_flowtable *ft);
 	nf_hookfn			*hook;
diff --git a/net/ipv4/netfilter/nf_flow_table_ipv4.c b/net/ipv4/netfilter/nf_flow_table_ipv4.c
index e1e56d7123d2..681c0d5c47d7 100644
--- a/net/ipv4/netfilter/nf_flow_table_ipv4.c
+++ b/net/ipv4/netfilter/nf_flow_table_ipv4.c
@@ -7,6 +7,7 @@
 
 static struct nf_flowtable_type flowtable_ipv4 = {
 	.family		= NFPROTO_IPV4,
+	.hooknum	= NF_NETDEV_INGRESS,
 	.init		= nf_flow_table_init,
 	.free		= nf_flow_table_free,
 	.hook		= nf_flow_offload_ip_hook,
diff --git a/net/ipv6/netfilter/nf_flow_table_ipv6.c b/net/ipv6/netfilter/nf_flow_table_ipv6.c
index c511d206bf9b..f1f976bdc151 100644
--- a/net/ipv6/netfilter/nf_flow_table_ipv6.c
+++ b/net/ipv6/netfilter/nf_flow_table_ipv6.c
@@ -8,6 +8,7 @@
 
 static struct nf_flowtable_type flowtable_ipv6 = {
 	.family		= NFPROTO_IPV6,
+	.hooknum	= NF_NETDEV_INGRESS,
 	.init		= nf_flow_table_init,
 	.free		= nf_flow_table_free,
 	.hook		= nf_flow_offload_ipv6_hook,
diff --git a/net/netfilter/nf_flow_table_inet.c b/net/netfilter/nf_flow_table_inet.c
index 99771aa7e7ea..347a640d9723 100644
--- a/net/netfilter/nf_flow_table_inet.c
+++ b/net/netfilter/nf_flow_table_inet.c
@@ -22,6 +22,7 @@ nf_flow_offload_inet_hook(void *priv, struct sk_buff *skb,
 
 static struct nf_flowtable_type flowtable_inet = {
 	.family		= NFPROTO_INET,
+	.hooknum	= NF_NETDEV_INGRESS,
 	.init		= nf_flow_table_init,
 	.free		= nf_flow_table_free,
 	.hook		= nf_flow_offload_inet_hook,
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index ca4c4d994ddb..5d6c3b9eee6b 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -5266,6 +5266,40 @@ static int nf_tables_parse_devices(const struct nft_ctx *ctx,
 	return err;
 }
 
+static const struct nf_flowtable_type *__nft_flowtable_type_get(u8 family,
+								int hooknum)
+{
+	const struct nf_flowtable_type *type;
+
+	list_for_each_entry(type, &nf_tables_flowtables, list) {
+		if (family == type->family &&
+		    hooknum == type->hooknum)
+			return type;
+	}
+	return NULL;
+}
+
+static const struct nf_flowtable_type *nft_flowtable_type_get(u8 family,
+							      int hooknum)
+{
+	const struct nf_flowtable_type *type;
+
+	type = __nft_flowtable_type_get(family, hooknum);
+	if (type != NULL && try_module_get(type->owner))
+		return type;
+
+#ifdef CONFIG_MODULES
+	if (type == NULL) {
+		nfnl_unlock(NFNL_SUBSYS_NFTABLES);
+		request_module("nf-flowtable-%u", family);
+		nfnl_lock(NFNL_SUBSYS_NFTABLES);
+		if (__nft_flowtable_type_get(family, hooknum))
+			return ERR_PTR(-EAGAIN);
+	}
+#endif
+	return ERR_PTR(-ENOENT);
+}
+
 static const struct nla_policy nft_flowtable_hook_policy[NFTA_FLOWTABLE_HOOK_MAX + 1] = {
 	[NFTA_FLOWTABLE_HOOK_NUM]	= { .type = NLA_U32 },
 	[NFTA_FLOWTABLE_HOOK_PRIORITY]	= { .type = NLA_U32 },
@@ -5278,6 +5312,7 @@ static int nf_tables_flowtable_parse_hook(const struct nft_ctx *ctx,
 {
 	struct net_device *dev_array[NFT_FLOWTABLE_DEVICE_MAX];
 	struct nlattr *tb[NFTA_FLOWTABLE_HOOK_MAX + 1];
+	const struct nf_flowtable_type *type;
 	struct nf_hook_ops *ops;
 	int hooknum, priority;
 	int err, n = 0, i;
@@ -5293,19 +5328,31 @@ static int nf_tables_flowtable_parse_hook(const struct nft_ctx *ctx,
 		return -EINVAL;
 
 	hooknum = ntohl(nla_get_be32(tb[NFTA_FLOWTABLE_HOOK_NUM]));
-	if (hooknum != NF_NETDEV_INGRESS)
+	if (hooknum != NF_NETDEV_INGRESS &&
+	    hooknum != NF_NETDEV_EARLY_INGRESS)
 		return -EINVAL;
 
+	type = nft_flowtable_type_get(ctx->family, hooknum);
+	if (IS_ERR(type))
+		return PTR_ERR(type);
+
+	flowtable->data.type = type;
+	err = type->init(&flowtable->data);
+	if (err < 0)
+		goto err1;
+
 	priority = ntohl(nla_get_be32(tb[NFTA_FLOWTABLE_HOOK_PRIORITY]));
 
 	err = nf_tables_parse_devices(ctx, tb[NFTA_FLOWTABLE_HOOK_DEVS],
 				      dev_array, &n);
 	if (err < 0)
-		return err;
+		goto err2;
 
 	ops = kzalloc(sizeof(struct nf_hook_ops) * n, GFP_KERNEL);
-	if (!ops)
-		return -ENOMEM;
+	if (!ops) {
+		err = -ENOMEM;
+		goto err2;
+	}
 
 	flowtable->hooknum	= hooknum;
 	flowtable->priority	= priority;
@@ -5323,38 +5370,13 @@ static int nf_tables_flowtable_parse_hook(const struct nft_ctx *ctx,
 							  GFP_KERNEL);
 	}
 
-	return err;
-}
-
-static const struct nf_flowtable_type *__nft_flowtable_type_get(u8 family)
-{
-	const struct nf_flowtable_type *type;
-
-	list_for_each_entry(type, &nf_tables_flowtables, list) {
-		if (family == type->family)
-			return type;
-	}
-	return NULL;
-}
-
-static const struct nf_flowtable_type *nft_flowtable_type_get(u8 family)
-{
-	const struct nf_flowtable_type *type;
-
-	type = __nft_flowtable_type_get(family);
-	if (type != NULL && try_module_get(type->owner))
-		return type;
+	return 0;
+err2:
+	flowtable->data.type->free(&flowtable->data);
+err1:
+	module_put(type->owner);
 
-#ifdef CONFIG_MODULES
-	if (type == NULL) {
-		nfnl_unlock(NFNL_SUBSYS_NFTABLES);
-		request_module("nf-flowtable-%u", family);
-		nfnl_lock(NFNL_SUBSYS_NFTABLES);
-		if (__nft_flowtable_type_get(family))
-			return ERR_PTR(-EAGAIN);
-	}
-#endif
-	return ERR_PTR(-ENOENT);
+	return err;
 }
 
 static void nft_unregister_flowtable_net_hooks(struct net *net,
@@ -5377,7 +5399,6 @@ static int nf_tables_newflowtable(struct net *net, struct sock *nlsk,
 				  struct netlink_ext_ack *extack)
 {
 	const struct nfgenmsg *nfmsg = nlmsg_data(nlh);
-	const struct nf_flowtable_type *type;
 	struct nft_flowtable *flowtable, *ft;
 	u8 genmask = nft_genmask_next(net);
 	int family = nfmsg->nfgen_family;
@@ -5429,21 +5450,10 @@ static int nf_tables_newflowtable(struct net *net, struct sock *nlsk,
 		goto err1;
 	}
 
-	type = nft_flowtable_type_get(family);
-	if (IS_ERR(type)) {
-		err = PTR_ERR(type);
-		goto err2;
-	}
-
-	flowtable->data.type = type;
-	err = type->init(&flowtable->data);
-	if (err < 0)
-		goto err3;
-
 	err = nf_tables_flowtable_parse_hook(&ctx, nla[NFTA_FLOWTABLE_HOOK],
 					     flowtable);
 	if (err < 0)
-		goto err4;
+		goto err2;
 
 	for (i = 0; i < flowtable->ops_len; i++) {
 		if (!flowtable->ops[i].dev)
@@ -5457,37 +5467,33 @@ static int nf_tables_newflowtable(struct net *net, struct sock *nlsk,
 				if (flowtable->ops[i].dev == ft->ops[k].dev &&
 				    flowtable->ops[i].pf == ft->ops[k].pf) {
 					err = -EBUSY;
-					goto err5;
+					goto err3;
 				}
 			}
 		}
 
 		err = nf_register_net_hook(net, &flowtable->ops[i]);
 		if (err < 0)
-			goto err5;
+			goto err3;
 	}
 
 	err = nft_trans_flowtable_add(&ctx, NFT_MSG_NEWFLOWTABLE, flowtable);
 	if (err < 0)
-		goto err6;
+		goto err4;
 
 	list_add_tail_rcu(&flowtable->list, &table->flowtables);
 	table->use++;
 
 	return 0;
-err6:
+err4:
 	i = flowtable->ops_len;
-err5:
+err3:
 	for (k = i - 1; k >= 0; k--) {
 		kfree(flowtable->dev_name[k]);
 		nf_unregister_net_hook(net, &flowtable->ops[k]);
 	}
 
 	kfree(flowtable->ops);
-err4:
-	flowtable->data.type->free(&flowtable->data);
-err3:
-	module_put(type->owner);
 err2:
 	kfree(flowtable->name);
 err1:
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next,RFC 10/13] netfilter: nf_flow_table: add flowtable for early ingress hook
  2018-06-14 14:19 [PATCH net-next,RFC 00/13] New fast forwarding path Pablo Neira Ayuso
                   ` (8 preceding siblings ...)
  2018-06-14 14:19 ` [PATCH net-next,RFC 09/13] netfilter: nf_flow_table: add hooknum to flowtable type Pablo Neira Ayuso
@ 2018-06-14 14:19 ` Pablo Neira Ayuso
  2018-06-14 14:19 ` [PATCH net-next,RFC 11/13] netfilter: nft_flow_offload: enable offload after second packet is seen Pablo Neira Ayuso
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, steffen.klassert

Add the new flowtable type for the early ingress hook, this allows
us to combine the custom GRO chaining with the flowtable abstraction
to define fastpaths.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/netfilter/nf_flow_table.h   |  3 ++
 net/ipv4/netfilter/nf_flow_table_ipv4.c | 11 ++++++
 net/netfilter/nf_flow_table_ip.c        | 62 +++++++++++++++++++++++++++++++++
 3 files changed, 76 insertions(+)

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 4606bad41155..e270269dd1e8 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -126,6 +126,9 @@ unsigned int nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
 				     const struct nf_hook_state *state);
 unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
 				       const struct nf_hook_state *state);
+unsigned int nf_flow_offload_early_ingress_ip_hook(void *priv,
+						   struct sk_buff *skb,
+						   const struct nf_hook_state *state);
 
 #define MODULE_ALIAS_NF_FLOWTABLE(family)	\
 	MODULE_ALIAS("nf-flowtable-" __stringify(family))
diff --git a/net/ipv4/netfilter/nf_flow_table_ipv4.c b/net/ipv4/netfilter/nf_flow_table_ipv4.c
index 681c0d5c47d7..b771000ca894 100644
--- a/net/ipv4/netfilter/nf_flow_table_ipv4.c
+++ b/net/ipv4/netfilter/nf_flow_table_ipv4.c
@@ -14,15 +14,26 @@ static struct nf_flowtable_type flowtable_ipv4 = {
 	.owner		= THIS_MODULE,
 };
 
+static struct nf_flowtable_type flowtable_ipv4_early = {
+	.family		= NFPROTO_IPV4,
+	.hooknum	= NF_NETDEV_EARLY_INGRESS,
+	.init		= nf_flow_table_init,
+	.free		= nf_flow_table_free,
+	.hook		= nf_flow_offload_early_ingress_ip_hook,
+	.owner		= THIS_MODULE,
+};
+
 static int __init nf_flow_ipv4_module_init(void)
 {
 	nft_register_flowtable_type(&flowtable_ipv4);
+	nft_register_flowtable_type(&flowtable_ipv4_early);
 
 	return 0;
 }
 
 static void __exit nf_flow_ipv4_module_exit(void)
 {
+	nft_unregister_flowtable_type(&flowtable_ipv4_early);
 	nft_unregister_flowtable_type(&flowtable_ipv4);
 }
 
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 15ed91309992..0828e49bd95e 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -11,6 +11,7 @@
 #include <net/ip6_route.h>
 #include <net/neighbour.h>
 #include <net/netfilter/nf_flow_table.h>
+#include <net/xfrm.h>
 /* For layer 4 checksum field offset. */
 #include <linux/tcp.h>
 #include <linux/udp.h>
@@ -487,3 +488,64 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
 	return NF_STOLEN;
 }
 EXPORT_SYMBOL_GPL(nf_flow_offload_ipv6_hook);
+
+unsigned int
+nf_flow_offload_early_ingress_ip_hook(void *priv, struct sk_buff *skb,
+				      const struct nf_hook_state *state)
+{
+	struct flow_offload_tuple_rhash *tuplehash;
+	struct nf_flowtable *flow_table = priv;
+	struct flow_offload_tuple tuple = {};
+	enum flow_offload_tuple_dir dir;
+	struct flow_offload *flow;
+	struct net_device *outdev;
+	const struct rtable *rt;
+	unsigned int thoff;
+	struct iphdr *iph;
+
+	if (skb->protocol != htons(ETH_P_IP))
+		return NF_ACCEPT;
+
+	if (nf_flow_tuple_ip(skb, state->in, &tuple) < 0)
+		return NF_ACCEPT;
+
+	tuplehash = flow_offload_lookup(flow_table, &tuple);
+	if (tuplehash == NULL)
+		return NF_ACCEPT;
+
+	outdev = dev_get_by_index_rcu(state->net, tuplehash->tuple.oifidx);
+	if (!outdev)
+		return NF_ACCEPT;
+
+	dir = tuplehash->tuple.dir;
+	flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
+	rt = (const struct rtable *)flow->tuplehash[dir].tuple.dst_cache;
+
+	if (unlikely(nf_flow_exceeds_mtu(skb, flow->tuplehash[dir].tuple.mtu)) &&
+	    (ip_hdr(skb)->frag_off & htons(IP_DF)) != 0)
+		return NF_ACCEPT;
+
+	if (skb_try_make_writable(skb, sizeof(*iph)))
+		return NF_DROP;
+
+	thoff = ip_hdr(skb)->ihl * 4;
+	if (nf_flow_state_check(flow, ip_hdr(skb)->protocol, skb, thoff))
+		return NF_ACCEPT;
+
+	if (flow->flags & (FLOW_OFFLOAD_SNAT | FLOW_OFFLOAD_DNAT) &&
+	    nf_flow_nat_ip(flow, skb, thoff, dir) < 0)
+		return NF_DROP;
+
+	flow->timeout = (u32)jiffies + NF_FLOW_TIMEOUT;
+
+	skb_dst_set_noref(skb, flow->tuplehash[dir].tuple.dst_cache);
+
+	if (skb_dst(skb)->xfrm &&
+	    !xfrm_dev_offload_ok(skb, skb_dst(skb)->xfrm))
+		return NF_ACCEPT;
+
+	NAPI_GRO_CB(skb)->is_ffwd = 1;
+
+	return NF_STOLEN;
+}
+EXPORT_SYMBOL_GPL(nf_flow_offload_early_ingress_ip_hook);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next,RFC 11/13] netfilter: nft_flow_offload: enable offload after second packet is seen
  2018-06-14 14:19 [PATCH net-next,RFC 00/13] New fast forwarding path Pablo Neira Ayuso
                   ` (9 preceding siblings ...)
  2018-06-14 14:19 ` [PATCH net-next,RFC 10/13] netfilter: nf_flow_table: add flowtable for early ingress hook Pablo Neira Ayuso
@ 2018-06-14 14:19 ` Pablo Neira Ayuso
  2018-06-14 14:19 ` [PATCH net-next,RFC 12/13] netfilter: nft_flow_offload: remove secpath check Pablo Neira Ayuso
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, steffen.klassert

Once we have a confirmed conntrack, ie. a packet went through the stack
and a conntrack was added, then allow second packet to configure the
flowtable offload.

This allows UDP media traffic going in only one direction to enable offloads.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/netfilter/nft_flow_offload.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index d6bab8c3cbb0..f2e95edfb4de 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -88,14 +88,9 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
 		goto out;
 	}
 
-	if (test_bit(IPS_HELPER_BIT, &ct->status))
-		goto out;
-
-	if (ctinfo == IP_CT_NEW ||
-	    ctinfo == IP_CT_RELATED)
-		goto out;
-
-	if (test_and_set_bit(IPS_OFFLOAD_BIT, &ct->status))
+	if (test_bit(IPS_HELPER_BIT, &ct->status) ||
+	    !test_bit(IPS_CONFIRMED_BIT, &ct->status) ||
+	    test_and_set_bit(IPS_OFFLOAD_BIT, &ct->status))
 		goto out;
 
 	dir = CTINFO2DIR(ctinfo);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next,RFC 12/13] netfilter: nft_flow_offload: remove secpath check
  2018-06-14 14:19 [PATCH net-next,RFC 00/13] New fast forwarding path Pablo Neira Ayuso
                   ` (10 preceding siblings ...)
  2018-06-14 14:19 ` [PATCH net-next,RFC 11/13] netfilter: nft_flow_offload: enable offload after second packet is seen Pablo Neira Ayuso
@ 2018-06-14 14:19 ` Pablo Neira Ayuso
  2018-06-14 14:19 ` [PATCH net-next,RFC 13/13] netfilter: nft_flow_offload: make sure route is not stale Pablo Neira Ayuso
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, steffen.klassert

It is safe to place a flow that is coming from IPSec into the flowtable.
So decapsulated can benefit from the flowtable fastpath.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/netfilter/nft_flow_offload.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index f2e95edfb4de..a7f529b79bdb 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -54,8 +54,6 @@ static bool nft_flow_offload_skip(struct sk_buff *skb)
 
 	if (unlikely(opt->optlen))
 		return true;
-	if (skb_sec_path(skb))
-		return true;
 
 	return false;
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next,RFC 13/13] netfilter: nft_flow_offload: make sure route is not stale
  2018-06-14 14:19 [PATCH net-next,RFC 00/13] New fast forwarding path Pablo Neira Ayuso
                   ` (11 preceding siblings ...)
  2018-06-14 14:19 ` [PATCH net-next,RFC 12/13] netfilter: nft_flow_offload: remove secpath check Pablo Neira Ayuso
@ 2018-06-14 14:19 ` Pablo Neira Ayuso
  2018-06-14 15:50 ` [PATCH net-next,RFC 00/13] New fast forwarding path Willem de Bruijn
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Pablo Neira Ayuso @ 2018-06-14 14:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, steffen.klassert

Use dst_check() to validate that route is still valid, otherwise,
tear down the flow entry and pass up packet to the standard forwarding
path so we have a chance to cache the fresh route again.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/netfilter/nf_flow_table_ip.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 0828e49bd95e..2bdf740debac 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -244,6 +244,11 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
 	flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
 	rt = (struct rtable *)flow->tuplehash[dir].tuple.dst_cache;
 
+	if (dst_check(&rt->dst, 0)) {
+		flow_offload_teardown(flow);
+		return NF_ACCEPT;
+	}
+
 	if (unlikely(nf_flow_exceeds_mtu(skb, flow->tuplehash[dir].tuple.mtu)) &&
 	    (ip_hdr(skb)->frag_off & htons(IP_DF)) != 0)
 		return NF_ACCEPT;
@@ -462,6 +467,11 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
 	flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
 	rt = (struct rt6_info *)flow->tuplehash[dir].tuple.dst_cache;
 
+	if (dst_check(&rt->dst, 0)) {
+		flow_offload_teardown(flow);
+		return NF_ACCEPT;
+	}
+
 	if (unlikely(nf_flow_exceeds_mtu(skb, flow->tuplehash[dir].tuple.mtu)))
 		return NF_ACCEPT;
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
  2018-06-14 14:19 [PATCH net-next,RFC 00/13] New fast forwarding path Pablo Neira Ayuso
                   ` (12 preceding siblings ...)
  2018-06-14 14:19 ` [PATCH net-next,RFC 13/13] netfilter: nft_flow_offload: make sure route is not stale Pablo Neira Ayuso
@ 2018-06-14 15:50 ` Willem de Bruijn
  2018-06-15  5:23   ` Steffen Klassert
  2018-06-14 15:57 ` Eric Dumazet
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 33+ messages in thread
From: Willem de Bruijn @ 2018-06-14 15:50 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, Network Development, steffen.klassert

> This patchset supports both layer 3 IPv4 and IPv6, and layer 4 TCP and
> UDP protocols. This fastpath also integrates with the IPSec
> infrastructure and the ESP protocol.
>
> We have collected performance numbers:
>
>         TCP TSO         TCP Fast Forward
>         32.5 Gbps       35.6 Gbps
>
>         UDP             UDP Fast Forward
>         17.6 Gbps       35.6 Gbps
>
>         ESP             ESP Fast Forward
>         6 Gbps          7.5 Gbps
>
> For UDP, this is doubling performance, and we almost achieve line rate
> with one single CPU using the Intel i40e NIC. We got similar numbers
> with the Mellanox ConnectX-4. For TCP, this is slightly improving things
> even if TSO is being defeated given that we need to segment the packet
> chain in software.

The difference between TCP and UDP stems from lack of GRO for UDP. We
recently added UDP GSO to allow for batch traversal of the UDP stack on
transmission. Adding a UDP GRO handler can probably extend batching to
the forwarding path in a similar way without the need for a new infrastructure.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
  2018-06-14 14:19 [PATCH net-next,RFC 00/13] New fast forwarding path Pablo Neira Ayuso
                   ` (13 preceding siblings ...)
  2018-06-14 15:50 ` [PATCH net-next,RFC 00/13] New fast forwarding path Willem de Bruijn
@ 2018-06-14 15:57 ` Eric Dumazet
  2018-06-15  6:03   ` Steffen Klassert
  2018-06-14 17:18 ` David Miller
  2018-06-14 20:52 ` Tom Herbert
  16 siblings, 1 reply; 33+ messages in thread
From: Eric Dumazet @ 2018-06-14 15:57 UTC (permalink / raw)
  To: Pablo Neira Ayuso, netfilter-devel; +Cc: netdev, steffen.klassert



On 06/14/2018 07:19 AM, Pablo Neira Ayuso wrote:
> Hi,
> 

> We have collected performance numbers:
> 
>         TCP TSO         TCP Fast Forward
>         32.5 Gbps       35.6 Gbps
> 
>         UDP             UDP Fast Forward
>         17.6 Gbps       35.6 Gbps
> 
>         ESP             ESP Fast Forward
>         6 Gbps          7.5 Gbps
> 
> For UDP, this is doubling performance, and we almost achieve line rate
> with one single CPU using the Intel i40e NIC. We got similar numbers
> with the Mellanox ConnectX-4. For TCP, this is slightly improving things
> even if TSO is being defeated given that we need to segment the packet
> chain in software. We would like to explore HW GRO support with hardware
> vendors with this new mode, we think that should improve the TCP numbers
> we are showing above even more.

Hi Pablo

Not very convincing numbers, because it is unclear what traffic patterns were used.

We normally use packets per second to measure a forwarding workload,
and it is not clear if you tried a DDOS, or/and a mix of packets being locally
delivered and packets being forwarded.

Presumably adding cache line misses (to probe for flows) will slow down the things.

I suspect the NIC you use has some kind of bottleneck on sending TSO packets,
or that you hit the issue that GRO might cook suboptimal packets for forwarding workloads
(eg setting frag_list)

This path series add yet more code to GRO engine which is already very fat
to the point many people advocate to turn it off.
Saving cpu cycles on moderate load is not okay if added complexity
slows down the DDOS (or stress) by 10 % :/

To me, GRO is specialized to optimize the non-forwarding case,
so it is counter-intuitive to base a fast forwarding path on top of it.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
  2018-06-14 14:19 [PATCH net-next,RFC 00/13] New fast forwarding path Pablo Neira Ayuso
                   ` (14 preceding siblings ...)
  2018-06-14 15:57 ` Eric Dumazet
@ 2018-06-14 17:18 ` David Miller
  2018-06-14 18:14   ` Florian Fainelli
  2018-06-15  6:17   ` Steffen Klassert
  2018-06-14 20:52 ` Tom Herbert
  16 siblings, 2 replies; 33+ messages in thread
From: David Miller @ 2018-06-14 17:18 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev, steffen.klassert

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Thu, 14 Jun 2018 16:19:34 +0200

> This patchset proposes a new fast forwarding path infrastructure
> that combines the GRO/GSO and the flowtable infrastructures. The
> idea is to add a hook at the GRO layer that is invoked before the
> standard GRO protocol offloads. This allows us to build custom
> packet chains that we can quickly pass in one go to the neighbour
> layer to define fast forwarding path for flows.

We have full, complete, customizability of the packet path via XDP
and eBPF.

XDP and eBPF supports everything necessary to accomplish that,
there are implementations of forwarding implementations in
the tree and elsewhere.

And most importantly, XDP and eBPF are optimized in drivers and
offloaded to hardware.

There really is no need for something like what you are proposing.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
  2018-06-14 17:18 ` David Miller
@ 2018-06-14 18:14   ` Florian Fainelli
  2018-06-14 23:55     ` David Miller
  2018-06-15  6:17   ` Steffen Klassert
  1 sibling, 1 reply; 33+ messages in thread
From: Florian Fainelli @ 2018-06-14 18:14 UTC (permalink / raw)
  To: David Miller, pablo; +Cc: netfilter-devel, netdev, steffen.klassert



On 06/14/2018 10:18 AM, David Miller wrote:
> From: Pablo Neira Ayuso <pablo@netfilter.org>
> Date: Thu, 14 Jun 2018 16:19:34 +0200
> 
>> This patchset proposes a new fast forwarding path infrastructure
>> that combines the GRO/GSO and the flowtable infrastructures. The
>> idea is to add a hook at the GRO layer that is invoked before the
>> standard GRO protocol offloads. This allows us to build custom
>> packet chains that we can quickly pass in one go to the neighbour
>> layer to define fast forwarding path for flows.
> 
> We have full, complete, customizability of the packet path via XDP
> and eBPF.
> 
> XDP and eBPF supports everything necessary to accomplish that,
> there are implementations of forwarding implementations in
> the tree and elsewhere.
> 
> And most importantly, XDP and eBPF are optimized in drivers and
> offloaded to hardware.
> 
> There really is no need for something like what you are proposing.
> 

I see one possible upside to that approach here which is the low end
MIPS/ARM/PowerPC 32-bit based routers that do not have an eBPF JIT
available (that's only MIPS32 and PowerPC AFAICT), it would be great to
see what happens on those systems and if we do get any performance
improvements for a traditional forwarding/routing workload. On those
platforms there are a number of things that just literally kill the
routing performance: small I and D caches, small or not L2, limited
bandwidth DRAM, huge call depths, big struct sk_buff layout, you name it.
-- 
Florian

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
  2018-06-14 14:19 [PATCH net-next,RFC 00/13] New fast forwarding path Pablo Neira Ayuso
                   ` (15 preceding siblings ...)
  2018-06-14 17:18 ` David Miller
@ 2018-06-14 20:52 ` Tom Herbert
  2018-06-14 23:58   ` David Miller
  2018-06-15  6:27   ` Steffen Klassert
  16 siblings, 2 replies; 33+ messages in thread
From: Tom Herbert @ 2018-06-14 20:52 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, Linux Kernel Network Developers, Steffen Klassert

On Thu, Jun 14, 2018 at 7:19 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Hi,
>
> This patchset proposes a new fast forwarding path infrastructure that
> combines the GRO/GSO and the flowtable infrastructures. The idea is to
> add a hook at the GRO layer that is invoked before the standard GRO
> protocol offloads. This allows us to build custom packet chains that we
> can quickly pass in one go to the neighbour layer to define fast
> forwarding path for flows.
>
> For each packet that gets into the GRO layer, we first check if there is
> an entry in the flowtable, if so, the packet is placed in a list until
> the GRO infrastructure decides to send the batch from gro_complete to
> the neighbour layer. The first packet in the list takes the route from
> the flowtable entry, so we avoid reiterative routing lookups.
>
> In case no entry is found in the flowtable, the packet is passed up to
> the classic GRO offload handlers. Thus, this packet follows the standard
> forwarding path. Note that the initial packets of the flow always go
> through the standard IPv4/IPv6 netfilter forward hook, that is used to
> configure what flows are placed in the flowtable. Therefore, only a few
> (initial) packets follow the standard forwarding path while most of the
> follow up packets take this new fast forwarding path.
>

IIRC, there was a similar proposal a while back that want to bundle
packets of the same flow together (without doing GRO) so that they
could be processed by various functions by looking at just one
representative packet in the group. The concept had some promise, but
in the end it created quite a bit of complexity since at some point
the packet bundle needed to be undone to go back to processing the
individual packets.

Tom

> The fast forwarding path is enabled through explicit user policy, so the
> user needs to request this behaviour from control plane, the following
> example shows how to place flows in the new fast forwarding path from
> the netfilter forward chain:
>
>  table x {
>         flowtable f {
>                 hook early_ingress priority 0; devices = { eth0, eth1 }
>         }
>
>         chain y {
>                 type filter hook forward priority 0;
>                 ip protocol tcp flow offload @f
>         }
>  }
>
> The example above defines a fastpath for TCP flows that are placed in
> the flowtable 'f', this flowtable is hooked at the new early_ingress
> hook. The initial TCP packets that match this rule from the standard
> fowarding path create an entry in the flowtable, thus, GRO creates chain
> of packets for those that find an entry in the flowtable and send
> them through the neighbour layer.
>
> This new hook is happening before the ingress taps, therefore, packets
> that follow this new fast forwarding path are not shown by tcpdump.
>
> This patchset supports both layer 3 IPv4 and IPv6, and layer 4 TCP and
> UDP protocols. This fastpath also integrates with the IPSec
> infrastructure and the ESP protocol.
>
> We have collected performance numbers:
>
>         TCP TSO         TCP Fast Forward
>         32.5 Gbps       35.6 Gbps
>
>         UDP             UDP Fast Forward
>         17.6 Gbps       35.6 Gbps
>
>         ESP             ESP Fast Forward
>         6 Gbps          7.5 Gbps
>
> For UDP, this is doubling performance, and we almost achieve line rate
> with one single CPU using the Intel i40e NIC. We got similar numbers
> with the Mellanox ConnectX-4. For TCP, this is slightly improving things
> even if TSO is being defeated given that we need to segment the packet
> chain in software. We would like to explore HW GRO support with hardware
> vendors with this new mode, we think that should improve the TCP numbers
> we are showing above even more. For ESP traffic, performance improvement
> is ~25%, in this case, perf shows the bottleneck becomes the crypto layer.
>
> This patchset is co-authored work with Steffen Klassert.
>
> Comments are welcome, thanks.
>
>
> Pablo Neira Ayuso (6):
>   netfilter: nft_chain_filter: add support for early ingress
>   netfilter: nf_flow_table: add hooknum to flowtable type
>   netfilter: nf_flow_table: add flowtable for early ingress hook
>   netfilter: nft_flow_offload: enable offload after second packet is seen
>   netfilter: nft_flow_offload: remove secpath check
>   netfilter: nft_flow_offload: make sure route is not stale
>
> Steffen Klassert (7):
>   net: Add a helper to get the packet offload callbacks by priority.
>   net: Change priority of ipv4 and ipv6 packet offloads.
>   net: Add a GSO feature bit for the netfilter forward fastpath.
>   net: Use one bit of NAPI_GRO_CB for the netfilter fastpath.
>   netfilter: add early ingress hook for IPv4
>   netfilter: add early ingress support for IPv6
>   netfilter: add ESP support for early ingress
>
>  include/linux/netdev_features.h         |   4 +-
>  include/linux/netdevice.h               |   6 +-
>  include/linux/netfilter.h               |   6 +
>  include/linux/netfilter_ingress.h       |   1 +
>  include/linux/skbuff.h                  |   2 +
>  include/net/netfilter/early_ingress.h   |  24 +++
>  include/net/netfilter/nf_flow_table.h   |   4 +
>  include/uapi/linux/netfilter.h          |   1 +
>  net/core/dev.c                          |  50 ++++-
>  net/ipv4/af_inet.c                      |   1 +
>  net/ipv4/netfilter/Makefile             |   1 +
>  net/ipv4/netfilter/early_ingress.c      | 327 +++++++++++++++++++++++++++++
>  net/ipv4/netfilter/nf_flow_table_ipv4.c |  12 ++
>  net/ipv6/ip6_offload.c                  |   1 +
>  net/ipv6/netfilter/Makefile             |   1 +
>  net/ipv6/netfilter/early_ingress.c      | 315 ++++++++++++++++++++++++++++
>  net/ipv6/netfilter/nf_flow_table_ipv6.c |   1 +
>  net/netfilter/Kconfig                   |   8 +
>  net/netfilter/Makefile                  |   1 +
>  net/netfilter/core.c                    |  35 +++-
>  net/netfilter/early_ingress.c           | 361 ++++++++++++++++++++++++++++++++
>  net/netfilter/nf_flow_table_inet.c      |   1 +
>  net/netfilter/nf_flow_table_ip.c        |  72 +++++++
>  net/netfilter/nf_tables_api.c           | 120 ++++++-----
>  net/netfilter/nft_chain_filter.c        |   6 +-
>  net/netfilter/nft_flow_offload.c        |  13 +-
>  net/xfrm/xfrm_output.c                  |   4 +
>  27 files changed, 1297 insertions(+), 81 deletions(-)
>  create mode 100644 include/net/netfilter/early_ingress.h
>  create mode 100644 net/ipv4/netfilter/early_ingress.c
>  create mode 100644 net/ipv6/netfilter/early_ingress.c
>  create mode 100644 net/netfilter/early_ingress.c
>
> --
> 2.11.0
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
  2018-06-14 18:14   ` Florian Fainelli
@ 2018-06-14 23:55     ` David Miller
  2018-06-20  0:56       ` Andrew Collins
  0 siblings, 1 reply; 33+ messages in thread
From: David Miller @ 2018-06-14 23:55 UTC (permalink / raw)
  To: f.fainelli; +Cc: pablo, netfilter-devel, netdev, steffen.klassert

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Thu, 14 Jun 2018 11:14:37 -0700

> On those platforms there are a number of things that just literally
> kill the routing performance: small I and D caches, small or not L2,
> limited bandwidth DRAM, huge call depths, big struct sk_buff layout,
> you name it.

Another reason to work on a 64-bit MIPS eBPF JIT.

We have a model, and game plan for this kind of application.  And it's
XDP and eBPF with JITs.

We are fully commited to this approach, and I see anything else that
tries to slip in and approach some sub-part of the problem as a
complete distraction and a step backwards.

All of the effort on this work could have been spent filling in the
missing pieces you mention.

And guess what?  Then millions of possibilities would have been
openned up, rather than just this one special case.

So, I ask, please see the larger picture.

Thank you.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
  2018-06-14 20:52 ` Tom Herbert
@ 2018-06-14 23:58   ` David Miller
  2018-06-15  6:34     ` Steffen Klassert
  2018-06-15 20:12     ` Tom Herbert
  2018-06-15  6:27   ` Steffen Klassert
  1 sibling, 2 replies; 33+ messages in thread
From: David Miller @ 2018-06-14 23:58 UTC (permalink / raw)
  To: tom; +Cc: pablo, netfilter-devel, netdev, steffen.klassert

From: Tom Herbert <tom@herbertland.com>
Date: Thu, 14 Jun 2018 13:52:03 -0700

> IIRC, there was a similar proposal a while back that want to bundle
> packets of the same flow together (without doing GRO) so that they
> could be processed by various functions by looking at just one
> representative packet in the group. The concept had some promise, but
> in the end it created quite a bit of complexity since at some point
> the packet bundle needed to be undone to go back to processing the
> individual packets.

You're probably talking about Edward Cree's SKB list stuff, and as
per his presenation at netconf 2 weeks ago he plans to revitalize
it given how Spectre et al. gives cause to reevaluate all bulking
techniques.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
  2018-06-14 15:50 ` [PATCH net-next,RFC 00/13] New fast forwarding path Willem de Bruijn
@ 2018-06-15  5:23   ` Steffen Klassert
  0 siblings, 0 replies; 33+ messages in thread
From: Steffen Klassert @ 2018-06-15  5:23 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Pablo Neira Ayuso, netfilter-devel, Network Development

On Thu, Jun 14, 2018 at 11:50:49AM -0400, Willem de Bruijn wrote:
> > This patchset supports both layer 3 IPv4 and IPv6, and layer 4 TCP and
> > UDP protocols. This fastpath also integrates with the IPSec
> > infrastructure and the ESP protocol.
> >
> > We have collected performance numbers:
> >
> >         TCP TSO         TCP Fast Forward
> >         32.5 Gbps       35.6 Gbps
> >
> >         UDP             UDP Fast Forward
> >         17.6 Gbps       35.6 Gbps
> >
> >         ESP             ESP Fast Forward
> >         6 Gbps          7.5 Gbps
> >
> > For UDP, this is doubling performance, and we almost achieve line rate
> > with one single CPU using the Intel i40e NIC. We got similar numbers
> > with the Mellanox ConnectX-4. For TCP, this is slightly improving things
> > even if TSO is being defeated given that we need to segment the packet
> > chain in software.
> 
> The difference between TCP and UDP stems from lack of GRO for UDP.

Right.

> We
> recently added UDP GSO to allow for batch traversal of the UDP stack on
> transmission. Adding a UDP GRO handler can probably extend batching to
> the forwarding path in a similar way without the need for a new infrastructure.

That's more or less what we did. The batching method ist just
optimized for the forwarding path. We are generating skb chains
by chaning at the frag_list pointer of the first skb. With that,
we don't need to mange packet. We keep the packets in the native
form, so the 'segmentation' is rather easy.

The rest is just to be able to configure this and to make
sure that we handle only flows that are going to be (fast)
forwarded, as the upper stack can not (yet) handle such
skb chains.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
  2018-06-14 15:57 ` Eric Dumazet
@ 2018-06-15  6:03   ` Steffen Klassert
  2018-06-15 13:01     ` Eric Dumazet
  0 siblings, 1 reply; 33+ messages in thread
From: Steffen Klassert @ 2018-06-15  6:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Pablo Neira Ayuso, netfilter-devel, netdev

On Thu, Jun 14, 2018 at 08:57:20AM -0700, Eric Dumazet wrote:
> 
> 
> On 06/14/2018 07:19 AM, Pablo Neira Ayuso wrote:
> > Hi,
> > 
> 
> > We have collected performance numbers:
> > 
> >         TCP TSO         TCP Fast Forward
> >         32.5 Gbps       35.6 Gbps
> > 
> >         UDP             UDP Fast Forward
> >         17.6 Gbps       35.6 Gbps
> > 
> >         ESP             ESP Fast Forward
> >         6 Gbps          7.5 Gbps
> > 
> > For UDP, this is doubling performance, and we almost achieve line rate
> > with one single CPU using the Intel i40e NIC. We got similar numbers
> > with the Mellanox ConnectX-4. For TCP, this is slightly improving things
> > even if TSO is being defeated given that we need to segment the packet
> > chain in software. We would like to explore HW GRO support with hardware
> > vendors with this new mode, we think that should improve the TCP numbers
> > we are showing above even more.
> 
> Hi Pablo
> 
> Not very convincing numbers, because it is unclear what traffic patterns were used.
> 
> We normally use packets per second to measure a forwarding workload,
> and it is not clear if you tried a DDOS, or/and a mix of packets being locally
> delivered and packets being forwarded.

Yes, these number need some more explaination. We used my IPsec
forwarding test setup for this. It looks like this:

	   ------------         ------------
	-->| router 1 |-------->| router 2 |--
	|  ------------         ------------  |
	|                                     |
	|       --------------------          |
	--------|Spirent Testcenter|<----------
	        --------------------

The numbers are from single stream forwarding tests, no local delivery.
Packet size in the UDP case was 1460 byte. I used this packet size
because such packets still fit into the mtu when encapsulated by IPsec.

> 
> Presumably adding cache line misses (to probe for flows) will slow down the things.
> 
> I suspect the NIC you use has some kind of bottleneck on sending TSO packets,
> or that you hit the issue that GRO might cook suboptimal packets for forwarding workloads
> (eg setting frag_list)

That might be, I was a bit surprised about the TCP numbers myself.
I was more focused on UDP and IPsec because these don't have
hardware segmentation support. I've just added a TCP handler to
see what happens, the numbers looked ok, so I kept it.

All this is based on the approach I pesented last year at the nefilter
workshop.

> 
> This path series add yet more code to GRO engine which is already very fat
> to the point many people advocate to turn it off.

We tried to stay away from the generic codepath as much as possible.
Currently we need five 'if' statements, two of them are in error
paths (Patch 4).

> Saving cpu cycles on moderate load is not okay if added complexity
> slows down the DDOS (or stress) by 10 % :/

Why 10%?

> 
> To me, GRO is specialized to optimize the non-forwarding case,
> so it is counter-intuitive to base a fast forwarding path on top of it.

It is optimized for the non-forwarding case, but it seems that forwarding
can benefit from that too with very little cost for the non-forwarding case.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
  2018-06-14 17:18 ` David Miller
  2018-06-14 18:14   ` Florian Fainelli
@ 2018-06-15  6:17   ` Steffen Klassert
  2018-06-15 13:22     ` Daniel Borkmann
  1 sibling, 1 reply; 33+ messages in thread
From: Steffen Klassert @ 2018-06-15  6:17 UTC (permalink / raw)
  To: David Miller; +Cc: pablo, netfilter-devel, netdev

On Thu, Jun 14, 2018 at 10:18:31AM -0700, David Miller wrote:
> From: Pablo Neira Ayuso <pablo@netfilter.org>
> Date: Thu, 14 Jun 2018 16:19:34 +0200
> 
> > This patchset proposes a new fast forwarding path infrastructure
> > that combines the GRO/GSO and the flowtable infrastructures. The
> > idea is to add a hook at the GRO layer that is invoked before the
> > standard GRO protocol offloads. This allows us to build custom
> > packet chains that we can quickly pass in one go to the neighbour
> > layer to define fast forwarding path for flows.
> 
> We have full, complete, customizability of the packet path via XDP
> and eBPF.
> 
> XDP and eBPF supports everything necessary to accomplish that,
> there are implementations of forwarding implementations in
> the tree and elsewhere.
> 
> And most importantly, XDP and eBPF are optimized in drivers and
> offloaded to hardware.
> 
> There really is no need for something like what you are proposing.

I started with this last year because I wanted to improve
the IPsec (and UDP) forwarding path. Batching packets
at layer2  and send them directly to the output path
seemed to be a good method to improve this.

In particular, we need to do only one IPsec lookup
for the whole packet chain. So it relaxes the pain
from reomoving the IPsec flowcache a bit. It can be
only a first step, but we need some improvements here
as people start to complain about that.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
  2018-06-14 20:52 ` Tom Herbert
  2018-06-14 23:58   ` David Miller
@ 2018-06-15  6:27   ` Steffen Klassert
  1 sibling, 0 replies; 33+ messages in thread
From: Steffen Klassert @ 2018-06-15  6:27 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Pablo Neira Ayuso, netfilter-devel, Linux Kernel Network Developers

On Thu, Jun 14, 2018 at 01:52:03PM -0700, Tom Herbert wrote:
> On Thu, Jun 14, 2018 at 7:19 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Hi,
> >
> > This patchset proposes a new fast forwarding path infrastructure that
> > combines the GRO/GSO and the flowtable infrastructures. The idea is to
> > add a hook at the GRO layer that is invoked before the standard GRO
> > protocol offloads. This allows us to build custom packet chains that we
> > can quickly pass in one go to the neighbour layer to define fast
> > forwarding path for flows.
> >
> > For each packet that gets into the GRO layer, we first check if there is
> > an entry in the flowtable, if so, the packet is placed in a list until
> > the GRO infrastructure decides to send the batch from gro_complete to
> > the neighbour layer. The first packet in the list takes the route from
> > the flowtable entry, so we avoid reiterative routing lookups.
> >
> > In case no entry is found in the flowtable, the packet is passed up to
> > the classic GRO offload handlers. Thus, this packet follows the standard
> > forwarding path. Note that the initial packets of the flow always go
> > through the standard IPv4/IPv6 netfilter forward hook, that is used to
> > configure what flows are placed in the flowtable. Therefore, only a few
> > (initial) packets follow the standard forwarding path while most of the
> > follow up packets take this new fast forwarding path.
> >
> 
> IIRC, there was a similar proposal a while back that want to bundle
> packets of the same flow together (without doing GRO) so that they
> could be processed by various functions by looking at just one
> representative packet in the group. The concept had some promise, but
> in the end it created quite a bit of complexity since at some point
> the packet bundle needed to be undone to go back to processing the
> individual packets.

With the way we chain the packets it is not too complicated to
undo this chaining (nft_skb_segment in patch 5 implements this).
After that, this looks like a chain of usual segments, so we
trigger xmit_more with every packet chain.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
  2018-06-14 23:58   ` David Miller
@ 2018-06-15  6:34     ` Steffen Klassert
  2018-06-15 12:18       ` Edward Cree
  2018-06-15 20:12     ` Tom Herbert
  1 sibling, 1 reply; 33+ messages in thread
From: Steffen Klassert @ 2018-06-15  6:34 UTC (permalink / raw)
  To: David Miller; +Cc: tom, pablo, netfilter-devel, netdev

On Thu, Jun 14, 2018 at 04:58:34PM -0700, David Miller wrote:
> From: Tom Herbert <tom@herbertland.com>
> Date: Thu, 14 Jun 2018 13:52:03 -0700
> 
> > IIRC, there was a similar proposal a while back that want to bundle
> > packets of the same flow together (without doing GRO) so that they
> > could be processed by various functions by looking at just one
> > representative packet in the group. The concept had some promise, but
> > in the end it created quite a bit of complexity since at some point
> > the packet bundle needed to be undone to go back to processing the
> > individual packets.
> 
> You're probably talking about Edward Cree's SKB list stuff, and as
> per his presenation at netconf 2 weeks ago he plans to revitalize
> it given how Spectre et al. gives cause to reevaluate all bulking
> techniques.

Are there patches for the proposal Edward did a while ago,
or was it just a concept?

Maybe we can somehow put things together, I just need some
batching method that works for IPsec and UDP. It does not
need to be exactly the one we proposing here.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
  2018-06-15  6:34     ` Steffen Klassert
@ 2018-06-15 12:18       ` Edward Cree
  0 siblings, 0 replies; 33+ messages in thread
From: Edward Cree @ 2018-06-15 12:18 UTC (permalink / raw)
  To: Steffen Klassert, David Miller; +Cc: tom, pablo, netfilter-devel, netdev

On 15/06/18 07:34, Steffen Klassert wrote:
> On Thu, Jun 14, 2018 at 04:58:34PM -0700, David Miller wrote:
>> You're probably talking about Edward Cree's SKB list stuff, and as
>> per his presenation at netconf 2 weeks ago he plans to revitalize
>> it given how Spectre et al. gives cause to reevaluate all bulking
>> techniques.
> Are there patches for the proposal Edward did a while ago,
> or was it just a concept?
Old patches are at http://lists.openwall.net/netdev/2016/04/19/89
 (note that the absolute numbers I give in the cover letter are wrong;
 I quoted them as Mpps but they're actually Mbps which is 8x higher).
I hope to have a new series ready shortly after net-next reopens.

-Ed

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
  2018-06-15  6:03   ` Steffen Klassert
@ 2018-06-15 13:01     ` Eric Dumazet
  0 siblings, 0 replies; 33+ messages in thread
From: Eric Dumazet @ 2018-06-15 13:01 UTC (permalink / raw)
  To: Steffen Klassert, Eric Dumazet; +Cc: Pablo Neira Ayuso, netfilter-devel, netdev



On 06/14/2018 11:03 PM, Steffen Klassert wrote:
> On Thu, Jun 14, 2018 at 08:57:20AM -0700, Eric Dumazet wrote:
>>

>> Saving cpu cycles on moderate load is not okay if added complexity
>> slows down the DDOS (or stress) by 10 % :/
> 
> Why 10%?
> 

GRO adds a ~6 % cost on UDP receive path at this moment, depending on the state
of GRO engine (number of packets in the napi->gro_list)

Adding yet another conditions and icache pressure might raise the cost to 10%,
but we do not know because the numbers presented in this RFC do not include that.

(Early demux is also adding extra costs for UDP on 'non connected sockets' BTW)

Most linux hosts are not routers, but end hosts, lets not forget this...

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
  2018-06-15  6:17   ` Steffen Klassert
@ 2018-06-15 13:22     ` Daniel Borkmann
  2018-06-17  9:23       ` Steffen Klassert
  0 siblings, 1 reply; 33+ messages in thread
From: Daniel Borkmann @ 2018-06-15 13:22 UTC (permalink / raw)
  To: Steffen Klassert, David Miller; +Cc: pablo, netfilter-devel, netdev

Hi Steffen,

On 06/15/2018 08:17 AM, Steffen Klassert wrote:
> On Thu, Jun 14, 2018 at 10:18:31AM -0700, David Miller wrote:
>> From: Pablo Neira Ayuso <pablo@netfilter.org>
>> Date: Thu, 14 Jun 2018 16:19:34 +0200
>>
>>> This patchset proposes a new fast forwarding path infrastructure
>>> that combines the GRO/GSO and the flowtable infrastructures. The
>>> idea is to add a hook at the GRO layer that is invoked before the
>>> standard GRO protocol offloads. This allows us to build custom
>>> packet chains that we can quickly pass in one go to the neighbour
>>> layer to define fast forwarding path for flows.
>>
>> We have full, complete, customizability of the packet path via XDP
>> and eBPF.
>>
>> XDP and eBPF supports everything necessary to accomplish that,
>> there are implementations of forwarding implementations in
>> the tree and elsewhere.
>>
>> And most importantly, XDP and eBPF are optimized in drivers and
>> offloaded to hardware.
>>
>> There really is no need for something like what you are proposing.
> 
> I started with this last year because I wanted to improve
> the IPsec (and UDP) forwarding path. Batching packets
> at layer2  and send them directly to the output path
> seemed to be a good method to improve this.
> 
> In particular, we need to do only one IPsec lookup
> for the whole packet chain. So it relaxes the pain
> from reomoving the IPsec flowcache a bit. It can be
> only a first step, but we need some improvements here
> as people start to complain about that.

But did you also experiment with XDP on this? Would be curious about
the numbers. You'd get implicit batching for the forwarding via devmap
as well if you're required to flush it out via different device with
XDP_REDIRECT; otherwise XDP_TX of course. Given we have recently
integrated helpers for XDP to do a FIB and neighbor lookup from the
kernel tables, where it's thus shared and integrated with the rest of
the stack and tooling, it would be awesome to get to the same point
with xfrm as well. Eyal recently did a start on that for xfrm for tc
progs; would be nice to have integration on XDP as well, potentially
it might also result in a bigger plus on the forwarding numbers.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
  2018-06-14 23:58   ` David Miller
  2018-06-15  6:34     ` Steffen Klassert
@ 2018-06-15 20:12     ` Tom Herbert
  1 sibling, 0 replies; 33+ messages in thread
From: Tom Herbert @ 2018-06-15 20:12 UTC (permalink / raw)
  To: David Miller
  Cc: Pablo Neira Ayuso, netfilter-devel,
	Linux Kernel Network Developers, Steffen Klassert

On Thu, Jun 14, 2018 at 4:58 PM, David Miller <davem@davemloft.net> wrote:
> From: Tom Herbert <tom@herbertland.com>
> Date: Thu, 14 Jun 2018 13:52:03 -0700
>
>> IIRC, there was a similar proposal a while back that want to bundle
>> packets of the same flow together (without doing GRO) so that they
>> could be processed by various functions by looking at just one
>> representative packet in the group. The concept had some promise, but
>> in the end it created quite a bit of complexity since at some point
>> the packet bundle needed to be undone to go back to processing the
>> individual packets.
>
> You're probably talking about Edward Cree's SKB list stuff, and as
> per his presenation at netconf 2 weeks ago he plans to revitalize
> it given how Spectre et al. gives cause to reevaluate all bulking
> techniques.nearly

The use case for that will be an interesting question. GSO/GRO solves
the problem for TCP and this extends to nearly all cases where TCP is
in an encapsulated packet. Super efficient forwarding can be done in
XDP/BPF (without needing overhead of GSO/GRO). That pretty much leaves
UDP as non-encapsulation end protocol, which I guess these days pretty
much means QUIC :-) I am still interested to see if we can implement
GSO/GRO for QUIC (via a generic GSO/GRO BPF function so we don't
hardcode any QUIC protocol or other application protocols in kernel).

Tom

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
  2018-06-15 13:22     ` Daniel Borkmann
@ 2018-06-17  9:23       ` Steffen Klassert
  2018-06-19 22:22         ` Daniel Borkmann
  0 siblings, 1 reply; 33+ messages in thread
From: Steffen Klassert @ 2018-06-17  9:23 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David Miller, pablo, netfilter-devel, netdev

Hi Daniel,

On Fri, Jun 15, 2018 at 03:22:24PM +0200, Daniel Borkmann wrote:
> Hi Steffen,
> 
> On 06/15/2018 08:17 AM, Steffen Klassert wrote:
> > 
> > I started with this last year because I wanted to improve
> > the IPsec (and UDP) forwarding path. Batching packets
> > at layer2  and send them directly to the output path
> > seemed to be a good method to improve this.
> > 
> > In particular, we need to do only one IPsec lookup
> > for the whole packet chain. So it relaxes the pain
> > from reomoving the IPsec flowcache a bit. It can be
> > only a first step, but we need some improvements here
> > as people start to complain about that.
> 
> But did you also experiment with XDP on this? 

I've already tried to figure out what I have to to
do to get XDP with forwarding, but still don't realy
know how to set this up.

Maybe it is time to have a deeper look into BPF/XDP,
but for now I feel a bit lost with this.

> Would be curious about
> the numbers. You'd get implicit batching for the forwarding via devmap
> as well if you're required to flush it out via different device with
> XDP_REDIRECT; otherwise XDP_TX of course. Given we have recently
> integrated helpers for XDP to do a FIB and neighbor lookup from the
> kernel tables, where it's thus shared and integrated with the rest of
> the stack and tooling, it would be awesome to get to the same point
> with xfrm as well. Eyal recently did a start on that for xfrm for tc
> progs; would be nice to have integration on XDP as well, potentially
> it might also result in a bigger plus on the forwarding numbers.

It might make sense to intrgrate XDP with xfrm to
be able to compare numbers etc. But I need a working
XDP setup and some understanding about it first, what
could take some time.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
  2018-06-17  9:23       ` Steffen Klassert
@ 2018-06-19 22:22         ` Daniel Borkmann
  0 siblings, 0 replies; 33+ messages in thread
From: Daniel Borkmann @ 2018-06-19 22:22 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David Miller, pablo, netfilter-devel, netdev

On 06/17/2018 11:23 AM, Steffen Klassert wrote:
[...]
>> Would be curious about
>> the numbers. You'd get implicit batching for the forwarding via devmap
>> as well if you're required to flush it out via different device with
>> XDP_REDIRECT; otherwise XDP_TX of course. Given we have recently
>> integrated helpers for XDP to do a FIB and neighbor lookup from the
>> kernel tables, where it's thus shared and integrated with the rest of
>> the stack and tooling, it would be awesome to get to the same point
>> with xfrm as well. Eyal recently did a start on that for xfrm for tc
>> progs; would be nice to have integration on XDP as well, potentially
>> it might also result in a bigger plus on the forwarding numbers.
> 
> It might make sense to intrgrate XDP with xfrm to
> be able to compare numbers etc. But I need a working
> XDP setup and some understanding about it first, what
> could take some time.

Okay, no prob. If you have any questions feel free to shoot an email.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
  2018-06-14 23:55     ` David Miller
@ 2018-06-20  0:56       ` Andrew Collins
  0 siblings, 0 replies; 33+ messages in thread
From: Andrew Collins @ 2018-06-20  0:56 UTC (permalink / raw)
  To: David Miller; +Cc: f.fainelli, pablo, netfilter-devel, netdev, Steffen Klassert

On Thu, Jun 14, 2018 at 5:55 PM, David Miller <davem@davemloft.net> wrote:
> And guess what?  Then millions of possibilities would have been
> openned up, rather than just this one special case.
>
> So, I ask, please see the larger picture.

+cc netdev/etc

This is perhaps unrelated to the topic at hand, but as someone who's shipped
a bunch of devices over the years using the linux kernel forwarding path and
needs performance but wants to avoid moving to out of tree userspace offload
for all the reasons that you and many others have stated, is the long
term vision that the existing kernel forwarding path will transparently take
advantage of eBPF (ala bpfilter), or that users will write custom/individualized
eBPF forwarding paths for their usecases as necessary?

I (and I suspect many others) will start on the latter anyways, I'm just curious
whether it's desired/expected that such custom fastpath users will eventually
be rolled back into/replaced by a transparent upstream in-kernel equivalent.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2018-06-20  0:56 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-14 14:19 [PATCH net-next,RFC 00/13] New fast forwarding path Pablo Neira Ayuso
2018-06-14 14:19 ` [PATCH net-next,RFC 01/13] net: Add a helper to get the packet offload callbacks by priority Pablo Neira Ayuso
2018-06-14 14:19 ` [PATCH net-next,RFC 02/13] net: Change priority of ipv4 and ipv6 packet offloads Pablo Neira Ayuso
2018-06-14 14:19 ` [PATCH net-next,RFC 03/13] net: Add a GSO feature bit for the netfilter forward fastpath Pablo Neira Ayuso
2018-06-14 14:19 ` [PATCH net-next,RFC 04/13] net: Use one bit of NAPI_GRO_CB for the netfilter fastpath Pablo Neira Ayuso
2018-06-14 14:19 ` [PATCH net-next,RFC 05/13] netfilter: add early ingress hook for IPv4 Pablo Neira Ayuso
2018-06-14 14:19 ` [PATCH net-next,RFC 06/13] netfilter: add early ingress support for IPv6 Pablo Neira Ayuso
2018-06-14 14:19 ` [PATCH net-next,RFC 07/13] netfilter: add ESP support for early ingress Pablo Neira Ayuso
2018-06-14 14:19 ` [PATCH net-next,RFC 08/13] netfilter: nft_chain_filter: add " Pablo Neira Ayuso
2018-06-14 14:19 ` [PATCH net-next,RFC 09/13] netfilter: nf_flow_table: add hooknum to flowtable type Pablo Neira Ayuso
2018-06-14 14:19 ` [PATCH net-next,RFC 10/13] netfilter: nf_flow_table: add flowtable for early ingress hook Pablo Neira Ayuso
2018-06-14 14:19 ` [PATCH net-next,RFC 11/13] netfilter: nft_flow_offload: enable offload after second packet is seen Pablo Neira Ayuso
2018-06-14 14:19 ` [PATCH net-next,RFC 12/13] netfilter: nft_flow_offload: remove secpath check Pablo Neira Ayuso
2018-06-14 14:19 ` [PATCH net-next,RFC 13/13] netfilter: nft_flow_offload: make sure route is not stale Pablo Neira Ayuso
2018-06-14 15:50 ` [PATCH net-next,RFC 00/13] New fast forwarding path Willem de Bruijn
2018-06-15  5:23   ` Steffen Klassert
2018-06-14 15:57 ` Eric Dumazet
2018-06-15  6:03   ` Steffen Klassert
2018-06-15 13:01     ` Eric Dumazet
2018-06-14 17:18 ` David Miller
2018-06-14 18:14   ` Florian Fainelli
2018-06-14 23:55     ` David Miller
2018-06-20  0:56       ` Andrew Collins
2018-06-15  6:17   ` Steffen Klassert
2018-06-15 13:22     ` Daniel Borkmann
2018-06-17  9:23       ` Steffen Klassert
2018-06-19 22:22         ` Daniel Borkmann
2018-06-14 20:52 ` Tom Herbert
2018-06-14 23:58   ` David Miller
2018-06-15  6:34     ` Steffen Klassert
2018-06-15 12:18       ` Edward Cree
2018-06-15 20:12     ` Tom Herbert
2018-06-15  6:27   ` Steffen Klassert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.