netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH nf-next 0/3] Netfilter egress hook
@ 2020-03-11 11:59 Lukas Wunner
  2020-03-11 11:59 ` [PATCH nf-next 1/3] netfilter: Rename ingress hook include file Lukas Wunner
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Lukas Wunner @ 2020-03-11 11:59 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal
  Cc: netfilter-devel, coreteam, netdev, Martin Mares, Daniel Borkmann,
	Dmitry Safonov, Thomas Graf, Alexei Starovoitov

Introduce a netfilter egress hook to complement the existing ingress hook.

User space support for nft will be submitted separately in a minute.

I'm re-submitting this as non-RFC per Pablo's request.  Compared to the
RFC, I've changed the order in patch [3/3] to perform netfilter first,
then tc (instead of the other way round).  The rationale is provided in
the commit message.  I've also extended the commit message with performance
measurements.

To reproduce the performance measurements in patch [3/3], you'll need
net-next commit 1e09e5818b3a ("pktgen: Allow on loopback device").

Link to the RFC version:
https://lore.kernel.org/netdev/cover.1572528496.git.lukas@wunner.de/

Thanks!

Lukas Wunner (3):
  netfilter: Rename ingress hook include file
  netfilter: Generalize ingress hook
  netfilter: Introduce egress hook

 include/linux/netdevice.h         |   4 ++
 include/linux/netfilter_ingress.h |  58 -----------------
 include/linux/netfilter_netdev.h  | 102 ++++++++++++++++++++++++++++++
 include/uapi/linux/netfilter.h    |   1 +
 net/core/dev.c                    |  27 ++++++--
 net/netfilter/Kconfig             |   8 +++
 net/netfilter/core.c              |  24 +++++--
 net/netfilter/nft_chain_filter.c  |   4 +-
 8 files changed, 160 insertions(+), 68 deletions(-)
 delete mode 100644 include/linux/netfilter_ingress.h
 create mode 100644 include/linux/netfilter_netdev.h

-- 
2.25.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH nf-next 1/3] netfilter: Rename ingress hook include file
  2020-03-11 11:59 [PATCH nf-next 0/3] Netfilter egress hook Lukas Wunner
@ 2020-03-11 11:59 ` Lukas Wunner
  2020-03-11 11:59 ` [PATCH nf-next 2/3] netfilter: Generalize ingress hook Lukas Wunner
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Lukas Wunner @ 2020-03-11 11:59 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal
  Cc: netfilter-devel, coreteam, netdev, Martin Mares, Daniel Borkmann,
	Dmitry Safonov, Thomas Graf, Alexei Starovoitov

Prepare for addition of a netfilter egress hook by renaming
<linux/netfilter_ingress.h> to <linux/netfilter_netdev.h>.

The egress hook also necessitates a refactoring of the include file,
but that is done in a separate commit to ease reviewing.

No functional change intended.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/{netfilter_ingress.h => netfilter_netdev.h} | 0
 net/core/dev.c                                            | 2 +-
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename include/linux/{netfilter_ingress.h => netfilter_netdev.h} (100%)

diff --git a/include/linux/netfilter_ingress.h b/include/linux/netfilter_netdev.h
similarity index 100%
rename from include/linux/netfilter_ingress.h
rename to include/linux/netfilter_netdev.h
diff --git a/net/core/dev.c b/net/core/dev.c
index e10bd680dc03..b378b669a8ab 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -135,7 +135,7 @@
 #include <linux/if_macvlan.h>
 #include <linux/errqueue.h>
 #include <linux/hrtimer.h>
-#include <linux/netfilter_ingress.h>
+#include <linux/netfilter_netdev.h>
 #include <linux/crash_dump.h>
 #include <linux/sctp.h>
 #include <net/udp_tunnel.h>
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH nf-next 2/3] netfilter: Generalize ingress hook
  2020-03-11 11:59 [PATCH nf-next 0/3] Netfilter egress hook Lukas Wunner
  2020-03-11 11:59 ` [PATCH nf-next 1/3] netfilter: Rename ingress hook include file Lukas Wunner
@ 2020-03-11 11:59 ` Lukas Wunner
  2020-03-11 11:59 ` [PATCH nf-next 3/3] netfilter: Introduce egress hook Lukas Wunner
  2020-03-18  0:21 ` [PATCH nf-next 0/3] Netfilter " Pablo Neira Ayuso
  3 siblings, 0 replies; 17+ messages in thread
From: Lukas Wunner @ 2020-03-11 11:59 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal
  Cc: netfilter-devel, coreteam, netdev, Martin Mares, Daniel Borkmann,
	Dmitry Safonov, Thomas Graf, Alexei Starovoitov

Prepare for addition of a netfilter egress hook by generalizing the
ingress hook introduced by commit e687ad60af09 ("netfilter: add
netfilter ingress hook after handle_ing() under unique static key").

In particular, rename and refactor the ingress hook's static inlines
such that they can be reused for an egress hook.

No functional change intended.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/netfilter_netdev.h | 45 ++++++++++++++++++++++----------
 net/core/dev.c                   |  2 +-
 2 files changed, 32 insertions(+), 15 deletions(-)

diff --git a/include/linux/netfilter_netdev.h b/include/linux/netfilter_netdev.h
index a13774be2eb5..49e26479642e 100644
--- a/include/linux/netfilter_netdev.h
+++ b/include/linux/netfilter_netdev.h
@@ -1,34 +1,37 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _NETFILTER_INGRESS_H_
-#define _NETFILTER_INGRESS_H_
+#ifndef _NETFILTER_NETDEV_H_
+#define _NETFILTER_NETDEV_H_
 
 #include <linux/netfilter.h>
 #include <linux/netdevice.h>
 
-#ifdef CONFIG_NETFILTER_INGRESS
-static inline bool nf_hook_ingress_active(const struct sk_buff *skb)
+#ifdef CONFIG_NETFILTER
+static __always_inline bool nf_hook_netdev_active(enum nf_dev_hooks hooknum,
+					  struct nf_hook_entries __rcu *hooks)
 {
 #ifdef CONFIG_JUMP_LABEL
-	if (!static_key_false(&nf_hooks_needed[NFPROTO_NETDEV][NF_NETDEV_INGRESS]))
+	if (!static_key_false(&nf_hooks_needed[NFPROTO_NETDEV][hooknum]))
 		return false;
 #endif
-	return rcu_access_pointer(skb->dev->nf_hooks_ingress);
+	return rcu_access_pointer(hooks);
 }
 
 /* caller must hold rcu_read_lock */
-static inline int nf_hook_ingress(struct sk_buff *skb)
+static __always_inline int nf_hook_netdev(struct sk_buff *skb,
+					  enum nf_dev_hooks hooknum,
+					  struct nf_hook_entries __rcu *hooks)
 {
-	struct nf_hook_entries *e = rcu_dereference(skb->dev->nf_hooks_ingress);
+	struct nf_hook_entries *e = rcu_dereference(hooks);
 	struct nf_hook_state state;
 	int ret;
 
-	/* Must recheck the ingress hook head, in the event it became NULL
-	 * after the check in nf_hook_ingress_active evaluated to true.
+	/* Must recheck the hook head, in the event it became NULL
+	 * after the check in nf_hook_netdev_active evaluated to true.
 	 */
 	if (unlikely(!e))
 		return 0;
 
-	nf_hook_state_init(&state, NF_NETDEV_INGRESS,
+	nf_hook_state_init(&state, hooknum,
 			   NFPROTO_NETDEV, skb->dev, NULL, NULL,
 			   dev_net(skb->dev), NULL);
 	ret = nf_hook_slow(skb, &state, e, 0);
@@ -37,10 +40,26 @@ static inline int nf_hook_ingress(struct sk_buff *skb)
 
 	return ret;
 }
+#endif /* CONFIG_NETFILTER */
 
-static inline void nf_hook_ingress_init(struct net_device *dev)
+static inline void nf_hook_netdev_init(struct net_device *dev)
 {
+#ifdef CONFIG_NETFILTER_INGRESS
 	RCU_INIT_POINTER(dev->nf_hooks_ingress, NULL);
+#endif
+}
+
+#ifdef CONFIG_NETFILTER_INGRESS
+static inline bool nf_hook_ingress_active(const struct sk_buff *skb)
+{
+	return nf_hook_netdev_active(NF_NETDEV_INGRESS,
+				     skb->dev->nf_hooks_ingress);
+}
+
+static inline int nf_hook_ingress(struct sk_buff *skb)
+{
+	return nf_hook_netdev(skb, NF_NETDEV_INGRESS,
+			      skb->dev->nf_hooks_ingress);
 }
 #else /* CONFIG_NETFILTER_INGRESS */
 static inline int nf_hook_ingress_active(struct sk_buff *skb)
@@ -52,7 +71,5 @@ static inline int nf_hook_ingress(struct sk_buff *skb)
 {
 	return 0;
 }
-
-static inline void nf_hook_ingress_init(struct net_device *dev) {}
 #endif /* CONFIG_NETFILTER_INGRESS */
 #endif /* _NETFILTER_INGRESS_H_ */
diff --git a/net/core/dev.c b/net/core/dev.c
index b378b669a8ab..2e55d4e41517 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9844,7 +9844,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	if (!dev->ethtool_ops)
 		dev->ethtool_ops = &default_ethtool_ops;
 
-	nf_hook_ingress_init(dev);
+	nf_hook_netdev_init(dev);
 
 	return dev;
 
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH nf-next 3/3] netfilter: Introduce egress hook
  2020-03-11 11:59 [PATCH nf-next 0/3] Netfilter egress hook Lukas Wunner
  2020-03-11 11:59 ` [PATCH nf-next 1/3] netfilter: Rename ingress hook include file Lukas Wunner
  2020-03-11 11:59 ` [PATCH nf-next 2/3] netfilter: Generalize ingress hook Lukas Wunner
@ 2020-03-11 11:59 ` Lukas Wunner
  2020-03-11 14:05   ` Daniel Borkmann
  2020-03-18  0:21 ` [PATCH nf-next 0/3] Netfilter " Pablo Neira Ayuso
  3 siblings, 1 reply; 17+ messages in thread
From: Lukas Wunner @ 2020-03-11 11:59 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal
  Cc: netfilter-devel, coreteam, netdev, Martin Mares, Daniel Borkmann,
	Dmitry Safonov, Thomas Graf, Alexei Starovoitov

Commit e687ad60af09 ("netfilter: add netfilter ingress hook after
handle_ing() under unique static key") introduced the ability to
classify packets on ingress.

Allow the same on egress.  Position the hook immediately before a packet
is handed to tc and then sent out on an interface, thereby mirroring the
ingress order.  This order allows marking packets in the netfilter
egress hook and subsequently using the mark in tc.  Another benefit of
this order is consistency with a lot of existing documentation which
says that egress tc is performed after netfilter hooks.

Egress hooks already exist for the most common protocols, such as
NF_INET_LOCAL_OUT or NF_ARP_OUT, and those are to be preferred because
they are executed earlier during packet processing.  However for more
exotic protocols, there is currently no provision to apply netfilter on
egress.  A common workaround is to enslave the interface to a bridge and
use ebtables, or to resort to tc.  But when the ingress hook was
introduced, consensus was that users should be given the choice to use
netfilter or tc, whichever tool suits their needs best:
https://lore.kernel.org/netdev/20150430153317.GA3230@salvia/

There have also been occasional user requests for a netfilter egress
hook in the past, e.g.:
https://www.spinics.net/lists/netfilter/msg50038.html

Performance measurements with pktgen surprisingly show a speedup rather
than a slowdown with this commit:

* Without this commit:
  Result: OK: 34240933(c34238375+d2558) usec, 100000000 (60byte,0frags)
  2920481pps 1401Mb/sec (1401830880bps) errors: 0

* With this commit:
  Result: OK: 33997299(c33994193+d3106) usec, 100000000 (60byte,0frags)
  2941410pps 1411Mb/sec (1411876800bps) errors: 0

* Without this commit + tc egress:
  Result: OK: 39022386(c39019547+d2839) usec, 100000000 (60byte,0frags)
  2562631pps 1230Mb/sec (1230062880bps) errors: 0

* With this commit + tc egress:
  Result: OK: 37604447(c37601877+d2570) usec, 100000000 (60byte,0frags)
  2659259pps 1276Mb/sec (1276444320bps) errors: 0

* With this commit + nft egress:
  Result: OK: 41436689(c41434088+d2600) usec, 100000000 (60byte,0frags)
  2413320pps 1158Mb/sec (1158393600bps) errors: 0

Tested on a bare-metal Core i7-3615QM, each measurement was performed
three times to verify that the numbers are stable.

Commands to perform a measurement:
modprobe pktgen
echo "add_device lo@3" > /proc/net/pktgen/kpktgend_3
samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i 'lo@3' -n 100000000

Commands for testing tc egress:
tc qdisc add dev lo clsact
tc filter add dev lo egress protocol ip prio 1 u32 match ip dst 4.3.2.1/32

Commands for testing nft egress:
nft add table netdev t
nft add chain netdev t co \{ type filter hook egress device lo priority 0 \; \}
nft add rule netdev t co ip daddr 4.3.2.1/32 drop

All testing was performed on the loopback interface to avoid distorting
measurements by the packet handling in the low-level Ethernet driver.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/netdevice.h        |  4 ++++
 include/linux/netfilter_netdev.h | 27 +++++++++++++++++++++++++++
 include/uapi/linux/netfilter.h   |  1 +
 net/core/dev.c                   | 23 ++++++++++++++++++++---
 net/netfilter/Kconfig            |  8 ++++++++
 net/netfilter/core.c             | 24 ++++++++++++++++++++----
 net/netfilter/nft_chain_filter.c |  4 +++-
 7 files changed, 83 insertions(+), 8 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 6c3f7032e8d9..2d2606d1b1b3 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1750,6 +1750,7 @@ enum netdev_priv_flags {
  *	@xps_maps:	XXX: need comments on this one
  *	@miniq_egress:		clsact qdisc specific data for
  *				egress processing
+ *	@nf_hooks_egress:	netfilter hooks executed for egress packets
  *	@qdisc_hash:		qdisc hash table
  *	@watchdog_timeo:	Represents the timeout that is used by
  *				the watchdog (see dev_watchdog())
@@ -2025,6 +2026,9 @@ struct net_device {
 #ifdef CONFIG_NET_CLS_ACT
 	struct mini_Qdisc __rcu	*miniq_egress;
 #endif
+#ifdef CONFIG_NETFILTER_EGRESS
+	struct nf_hook_entries __rcu *nf_hooks_egress;
+#endif
 
 #ifdef CONFIG_NET_SCHED
 	DECLARE_HASHTABLE	(qdisc_hash, 4);
diff --git a/include/linux/netfilter_netdev.h b/include/linux/netfilter_netdev.h
index 49e26479642e..92d3611a782e 100644
--- a/include/linux/netfilter_netdev.h
+++ b/include/linux/netfilter_netdev.h
@@ -47,6 +47,9 @@ static inline void nf_hook_netdev_init(struct net_device *dev)
 #ifdef CONFIG_NETFILTER_INGRESS
 	RCU_INIT_POINTER(dev->nf_hooks_ingress, NULL);
 #endif
+#ifdef CONFIG_NETFILTER_EGRESS
+	RCU_INIT_POINTER(dev->nf_hooks_egress, NULL);
+#endif
 }
 
 #ifdef CONFIG_NETFILTER_INGRESS
@@ -72,4 +75,28 @@ static inline int nf_hook_ingress(struct sk_buff *skb)
 	return 0;
 }
 #endif /* CONFIG_NETFILTER_INGRESS */
+
+#ifdef CONFIG_NETFILTER_EGRESS
+static inline bool nf_hook_egress_active(const struct sk_buff *skb)
+{
+	return nf_hook_netdev_active(NF_NETDEV_EGRESS,
+				     skb->dev->nf_hooks_egress);
+}
+
+static inline int nf_hook_egress(struct sk_buff *skb)
+{
+	return nf_hook_netdev(skb, NF_NETDEV_EGRESS,
+			      skb->dev->nf_hooks_egress);
+}
+#else /* CONFIG_NETFILTER_EGRESS */
+static inline int nf_hook_egress_active(struct sk_buff *skb)
+{
+	return 0;
+}
+
+static inline int nf_hook_egress(struct sk_buff *skb)
+{
+	return 0;
+}
+#endif /* CONFIG_NETFILTER_EGRESS */
 #endif /* _NETFILTER_INGRESS_H_ */
diff --git a/include/uapi/linux/netfilter.h b/include/uapi/linux/netfilter.h
index ca9e63d6e0e4..d1616574c54f 100644
--- a/include/uapi/linux/netfilter.h
+++ b/include/uapi/linux/netfilter.h
@@ -50,6 +50,7 @@ enum nf_inet_hooks {
 
 enum nf_dev_hooks {
 	NF_NETDEV_INGRESS,
+	NF_NETDEV_EGRESS,
 	NF_NETDEV_NUMHOOKS
 };
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 2e55d4e41517..89724a9f1bbb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3771,6 +3771,7 @@ EXPORT_SYMBOL(dev_loopback_xmit);
 static struct sk_buff *
 sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 {
+#ifdef CONFIG_NET_CLS_ACT
 	struct mini_Qdisc *miniq = rcu_dereference_bh(dev->miniq_egress);
 	struct tcf_result cl_res;
 
@@ -3804,11 +3805,24 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 	default:
 		break;
 	}
-
+#endif /* CONFIG_NET_CLS_ACT */
 	return skb;
 }
 #endif /* CONFIG_NET_EGRESS */
 
+static inline int nf_egress(struct sk_buff *skb)
+{
+	if (nf_hook_egress_active(skb)) {
+		int ret;
+
+		rcu_read_lock();
+		ret = nf_hook_egress(skb);
+		rcu_read_unlock();
+		return ret;
+	}
+	return 0;
+}
+
 #ifdef CONFIG_XPS
 static int __get_xps_queue_idx(struct net_device *dev, struct sk_buff *skb,
 			       struct xps_dev_maps *dev_maps, unsigned int tci)
@@ -3995,13 +4009,16 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 	qdisc_pkt_len_init(skb);
 #ifdef CONFIG_NET_CLS_ACT
 	skb->tc_at_ingress = 0;
-# ifdef CONFIG_NET_EGRESS
+#endif
+#ifdef CONFIG_NET_EGRESS
 	if (static_branch_unlikely(&egress_needed_key)) {
+		if (nf_egress(skb) < 0)
+			goto out;
+
 		skb = sch_handle_egress(skb, &rc, dev);
 		if (!skb)
 			goto out;
 	}
-# endif
 #endif
 	/* If device/qdisc don't need skb->dst, release it right now while
 	 * its hot in this cpu cache.
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 468fea1aebba..f4c68f60f241 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -10,6 +10,14 @@ config NETFILTER_INGRESS
 	  This allows you to classify packets from ingress using the Netfilter
 	  infrastructure.
 
+config NETFILTER_EGRESS
+	bool "Netfilter egress support"
+	default y
+	select NET_EGRESS
+	help
+	  This allows you to classify packets before transmission using the
+	  Netfilter infrastructure.
+
 config NETFILTER_NETLINK
 	tristate
 
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 78f046ec506f..85e9c959aba7 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -306,6 +306,12 @@ nf_hook_entry_head(struct net *net, int pf, unsigned int hooknum,
 		if (dev && dev_net(dev) == net)
 			return &dev->nf_hooks_ingress;
 	}
+#endif
+#ifdef CONFIG_NETFILTER_EGRESS
+	if (hooknum == NF_NETDEV_EGRESS) {
+		if (dev && dev_net(dev) == net)
+			return &dev->nf_hooks_egress;
+	}
 #endif
 	WARN_ON_ONCE(1);
 	return NULL;
@@ -318,11 +324,13 @@ static int __nf_register_net_hook(struct net *net, int pf,
 	struct nf_hook_entries __rcu **pp;
 
 	if (pf == NFPROTO_NETDEV) {
-#ifndef CONFIG_NETFILTER_INGRESS
-		if (reg->hooknum == NF_NETDEV_INGRESS)
+		if ((!IS_ENABLED(CONFIG_NETFILTER_INGRESS) &&
+		     reg->hooknum == NF_NETDEV_INGRESS) ||
+		    (!IS_ENABLED(CONFIG_NETFILTER_EGRESS) &&
+		     reg->hooknum == NF_NETDEV_EGRESS))
 			return -EOPNOTSUPP;
-#endif
-		if (reg->hooknum != NF_NETDEV_INGRESS ||
+		if ((reg->hooknum != NF_NETDEV_INGRESS &&
+		     reg->hooknum != NF_NETDEV_EGRESS) ||
 		    !reg->dev || dev_net(reg->dev) != net)
 			return -EINVAL;
 	}
@@ -348,6 +356,10 @@ static int __nf_register_net_hook(struct net *net, int pf,
 	if (pf == NFPROTO_NETDEV && reg->hooknum == NF_NETDEV_INGRESS)
 		net_inc_ingress_queue();
 #endif
+#ifdef CONFIG_NETFILTER_EGRESS
+	if (pf == NFPROTO_NETDEV && reg->hooknum == NF_NETDEV_EGRESS)
+		net_inc_egress_queue();
+#endif
 #ifdef CONFIG_JUMP_LABEL
 	static_key_slow_inc(&nf_hooks_needed[pf][reg->hooknum]);
 #endif
@@ -406,6 +418,10 @@ static void __nf_unregister_net_hook(struct net *net, int pf,
 		if (pf == NFPROTO_NETDEV && reg->hooknum == NF_NETDEV_INGRESS)
 			net_dec_ingress_queue();
 #endif
+#ifdef CONFIG_NETFILTER_EGRESS
+		if (pf == NFPROTO_NETDEV && reg->hooknum == NF_NETDEV_EGRESS)
+			net_dec_egress_queue();
+#endif
 #ifdef CONFIG_JUMP_LABEL
 		static_key_slow_dec(&nf_hooks_needed[pf][reg->hooknum]);
 #endif
diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c
index c78d01bc02e9..67ce6dbb5496 100644
--- a/net/netfilter/nft_chain_filter.c
+++ b/net/netfilter/nft_chain_filter.c
@@ -277,9 +277,11 @@ static const struct nft_chain_type nft_chain_filter_netdev = {
 	.name		= "filter",
 	.type		= NFT_CHAIN_T_DEFAULT,
 	.family		= NFPROTO_NETDEV,
-	.hook_mask	= (1 << NF_NETDEV_INGRESS),
+	.hook_mask	= (1 << NF_NETDEV_INGRESS) |
+			  (1 << NF_NETDEV_EGRESS),
 	.hooks		= {
 		[NF_NETDEV_INGRESS]	= nft_do_chain_netdev,
+		[NF_NETDEV_EGRESS]	= nft_do_chain_netdev,
 	},
 };
 
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH nf-next 3/3] netfilter: Introduce egress hook
  2020-03-11 11:59 ` [PATCH nf-next 3/3] netfilter: Introduce egress hook Lukas Wunner
@ 2020-03-11 14:05   ` Daniel Borkmann
  2020-03-11 15:54     ` Lukas Wunner
  2020-03-13 14:55     ` Pablo Neira Ayuso
  0 siblings, 2 replies; 17+ messages in thread
From: Daniel Borkmann @ 2020-03-11 14:05 UTC (permalink / raw)
  To: Lukas Wunner, Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal
  Cc: netfilter-devel, coreteam, netdev, Martin Mares, Dmitry Safonov,
	Thomas Graf, Alexei Starovoitov, David Miller

On 3/11/20 12:59 PM, Lukas Wunner wrote:
> Commit e687ad60af09 ("netfilter: add netfilter ingress hook after
> handle_ing() under unique static key") introduced the ability to
> classify packets on ingress.
> 
> Allow the same on egress.  Position the hook immediately before a packet
> is handed to tc and then sent out on an interface, thereby mirroring the
> ingress order.  This order allows marking packets in the netfilter
> egress hook and subsequently using the mark in tc.  Another benefit of
> this order is consistency with a lot of existing documentation which
> says that egress tc is performed after netfilter hooks.
> 
> Egress hooks already exist for the most common protocols, such as
> NF_INET_LOCAL_OUT or NF_ARP_OUT, and those are to be preferred because
> they are executed earlier during packet processing.  However for more
> exotic protocols, there is currently no provision to apply netfilter on
> egress.  A common workaround is to enslave the interface to a bridge and

Sorry for late reply, but still NAK. The primary use-case for this patch set
is out of tree module which you mentioned last time [0] ...

   There's another reason I chose netfilter over tc:  I need to activate the
   filter from a kernel module, hence need an in-kernel (rather than user space)
   API.

   netfilter provides that via nf_register_net_hook(), I couldn't find
   anything similar for tc.  And an egress netfilter hook seemed like
   an obvious missing feature given the presence of an ingress hook.

   The module I need this for is out-of-tree:
   https://github.com/RevolutionPi/piControl/commit/da199ccd2099

   In my experience the argument that a feature is needed for an out-of-tree
   module holds zero value upstream.  If there's no in-tree user, the feature
   isn't merged, I've seen this more than enough.  Which is why I didn't mention
   it in the first place.

   For our use case I wouldn't even need the nft user space support which I
   posted separately, I just implemented it for completeness and to increase
   acceptability of the present series.

... and as you mentioned there's local out and post routing already where you
can mark packets, so no need to make the fast-path slower for exotic protocols
which can be solved through other means.

   [0] https://lore.kernel.org/netdev/20191123142305.g2kkaudhhyui22fq@wunner.de/

> use ebtables, or to resort to tc.  But when the ingress hook was
> introduced, consensus was that users should be given the choice to use
> netfilter or tc, whichever tool suits their needs best:
> https://lore.kernel.org/netdev/20150430153317.GA3230@salvia/
> 
> There have also been occasional user requests for a netfilter egress
> hook in the past, e.g.:
> https://www.spinics.net/lists/netfilter/msg50038.html
> 
> Performance measurements with pktgen surprisingly show a speedup rather
> than a slowdown with this commit:
> 
> * Without this commit:
>    Result: OK: 34240933(c34238375+d2558) usec, 100000000 (60byte,0frags)
>    2920481pps 1401Mb/sec (1401830880bps) errors: 0
> 
> * With this commit:
>    Result: OK: 33997299(c33994193+d3106) usec, 100000000 (60byte,0frags)
>    2941410pps 1411Mb/sec (1411876800bps) errors: 0

So you are suggesting that we've just optimized the stack by adding more
hooks to it ...?

> * Without this commit + tc egress:
>    Result: OK: 39022386(c39019547+d2839) usec, 100000000 (60byte,0frags)
>    2562631pps 1230Mb/sec (1230062880bps) errors: 0
> 
> * With this commit + tc egress:
>    Result: OK: 37604447(c37601877+d2570) usec, 100000000 (60byte,0frags)
>    2659259pps 1276Mb/sec (1276444320bps) errors: 0
> 
> * With this commit + nft egress:
>    Result: OK: 41436689(c41434088+d2600) usec, 100000000 (60byte,0frags)
>    2413320pps 1158Mb/sec (1158393600bps) errors: 0
> 
> Tested on a bare-metal Core i7-3615QM, each measurement was performed
> three times to verify that the numbers are stable.
> 
> Commands to perform a measurement:
> modprobe pktgen
> echo "add_device lo@3" > /proc/net/pktgen/kpktgend_3
> samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i 'lo@3' -n 100000000
> 
> Commands for testing tc egress:
> tc qdisc add dev lo clsact
> tc filter add dev lo egress protocol ip prio 1 u32 match ip dst 4.3.2.1/32
> 
> Commands for testing nft egress:
> nft add table netdev t
> nft add chain netdev t co \{ type filter hook egress device lo priority 0 \; \}
> nft add rule netdev t co ip daddr 4.3.2.1/32 drop
> 
> All testing was performed on the loopback interface to avoid distorting
> measurements by the packet handling in the low-level Ethernet driver.
> 
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf-next 3/3] netfilter: Introduce egress hook
  2020-03-11 14:05   ` Daniel Borkmann
@ 2020-03-11 15:54     ` Lukas Wunner
  2020-03-12 22:40       ` Daniel Borkmann
  2020-03-13 14:55     ` Pablo Neira Ayuso
  1 sibling, 1 reply; 17+ messages in thread
From: Lukas Wunner @ 2020-03-11 15:54 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
	netfilter-devel, coreteam, netdev, Martin Mares, Dmitry Safonov,
	Thomas Graf, Alexei Starovoitov, David Miller

On Wed, Mar 11, 2020 at 03:05:16PM +0100, Daniel Borkmann wrote:
> no need to make the fast-path slower for exotic protocols
> which can be solved through other means.

As said the fast-path gets faster, not slower.

> > * Without this commit:
> >    Result: OK: 34240933(c34238375+d2558) usec, 100000000 (60byte,0frags)
> >    2920481pps 1401Mb/sec (1401830880bps) errors: 0
> > 
> > * With this commit:
> >    Result: OK: 33997299(c33994193+d3106) usec, 100000000 (60byte,0frags)
> >    2941410pps 1411Mb/sec (1411876800bps) errors: 0
> 
> So you are suggesting that we've just optimized the stack by adding more
> hooks to it ...?

Since I've provided numbers to disprove your allegation, I think the
onus is now on you to prove that your allegation holds any water.
Please reproduce the measurements and let's go from there.

This isn't much work, I've made it really easy by providing all the
steps necessary in the commit message.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf-next 3/3] netfilter: Introduce egress hook
  2020-03-11 15:54     ` Lukas Wunner
@ 2020-03-12 22:40       ` Daniel Borkmann
  0 siblings, 0 replies; 17+ messages in thread
From: Daniel Borkmann @ 2020-03-12 22:40 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
	netfilter-devel, coreteam, netdev, Martin Mares, Dmitry Safonov,
	Thomas Graf, Alexei Starovoitov, David Miller

On 3/11/20 4:54 PM, Lukas Wunner wrote:
> On Wed, Mar 11, 2020 at 03:05:16PM +0100, Daniel Borkmann wrote:
>> no need to make the fast-path slower for exotic protocols
>> which can be solved through other means.
> 
> As said the fast-path gets faster, not slower.
> 
>>> * Without this commit:
>>>     Result: OK: 34240933(c34238375+d2558) usec, 100000000 (60byte,0frags)
>>>     2920481pps 1401Mb/sec (1401830880bps) errors: 0
>>>
>>> * With this commit:
>>>     Result: OK: 33997299(c33994193+d3106) usec, 100000000 (60byte,0frags)
>>>     2941410pps 1411Mb/sec (1411876800bps) errors: 0
>>
>> So you are suggesting that we've just optimized the stack by adding more
>> hooks to it ...?
> 
> Since I've provided numbers to disprove your allegation, I think the
> onus is now on you to prove that your allegation holds any water.
> Please reproduce the measurements and let's go from there.
> 
> This isn't much work, I've made it really easy by providing all the
> steps necessary in the commit message.

So in terms of micro-benchmarking with pktgen, if I understand you correctly,
you are basically measuring loopback device by pushing packets through the
__dev_queue_xmit() -> loopback_xmit() -> netif_rx() till the stack drops them
in IP layer on ingress side? I wonder how your perf profile looks like ...
Setting a drop point in tc layer and then measuring the effect before/after
this change with CONFIG_NETFILTER_EGRESS enabled, I'm getting a stable degration
from ~4.123Mpps to ~4.082Mpps with pktgen, definitely not seeing a speedup.

# ip link add dev foo type dummy
# ip link set dev foo up
# tc qdisc add dev foo clsact
# tc filter add dev foo egress bpf da bytecode '1,6 0 0 0,'
# modprobe pktgen
# echo "add_device foo" > /proc/net/pktgen/kpktgend_3
# samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i 'foo' -n 400000000 -m '11:11:11:11:11:11' -d '1.1.1.1'

Also to let pktgen count the skb:

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index acc849df60b5..8920da7a7a67 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -3372,11 +3372,11 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
                 ret = dev_queue_xmit(pkt_dev->skb);
                 switch (ret) {
                 case NET_XMIT_SUCCESS:
+               case NET_XMIT_DROP:
                         pkt_dev->sofar++;
                         pkt_dev->seq_num++;
                         pkt_dev->tx_bytes += pkt_dev->last_pkt_size;
                         break;
-               case NET_XMIT_DROP:
                 case NET_XMIT_CN:
                 /* These are all valid return codes for a qdisc but
                  * indicate packets are being dropped or will likely

(In any case, this whole discussion is moot for out of tree code.)

Thanks,
Daniel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH nf-next 3/3] netfilter: Introduce egress hook
  2020-03-11 14:05   ` Daniel Borkmann
  2020-03-11 15:54     ` Lukas Wunner
@ 2020-03-13 14:55     ` Pablo Neira Ayuso
  2020-03-14  0:12       ` Daniel Borkmann
  1 sibling, 1 reply; 17+ messages in thread
From: Pablo Neira Ayuso @ 2020-03-13 14:55 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Lukas Wunner, Jozsef Kadlecsik, Florian Westphal,
	netfilter-devel, coreteam, netdev, Martin Mares, Dmitry Safonov,
	Thomas Graf, Alexei Starovoitov, David Miller

On Wed, Mar 11, 2020 at 03:05:16PM +0100, Daniel Borkmann wrote:
> On 3/11/20 12:59 PM, Lukas Wunner wrote:
> > Commit e687ad60af09 ("netfilter: add netfilter ingress hook after
> > handle_ing() under unique static key") introduced the ability to
> > classify packets on ingress.
> > 
> > Allow the same on egress.  Position the hook immediately before a packet
> > is handed to tc and then sent out on an interface, thereby mirroring the
> > ingress order.  This order allows marking packets in the netfilter
> > egress hook and subsequently using the mark in tc.  Another benefit of
> > this order is consistency with a lot of existing documentation which
> > says that egress tc is performed after netfilter hooks.
> > 
> > Egress hooks already exist for the most common protocols, such as
> > NF_INET_LOCAL_OUT or NF_ARP_OUT, and those are to be preferred because
> > they are executed earlier during packet processing.  However for more
> > exotic protocols, there is currently no provision to apply netfilter on
> > egress.  A common workaround is to enslave the interface to a bridge and
> 
> Sorry for late reply, but still NAK.

I agree Lukas use-case is very specific.

However, this is useful.

We have plans to support for NAT64 and NAT46, this is the right spot
to do this mangling. There is already support for the tunneling
infrastructure in netfilter from ingress, this spot from egress will
allow us to perform the tunneling from here. There is also no way to
drop traffic generated by dhclient, this also allow for filtering such
locally generated traffic. And many more.

Performance impact is negligible, Lukas already provided what you
asked for.

And more importantly:

I really think this patchset is _not_ interfering in your goals at all.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf-next 3/3] netfilter: Introduce egress hook
  2020-03-13 14:55     ` Pablo Neira Ayuso
@ 2020-03-14  0:12       ` Daniel Borkmann
  2020-03-15 13:28         ` Pablo Neira Ayuso
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Borkmann @ 2020-03-14  0:12 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Lukas Wunner, Jozsef Kadlecsik, Florian Westphal,
	netfilter-devel, coreteam, netdev, Martin Mares, Dmitry Safonov,
	Thomas Graf, Alexei Starovoitov, David Miller

On 3/13/20 3:55 PM, Pablo Neira Ayuso wrote:
> On Wed, Mar 11, 2020 at 03:05:16PM +0100, Daniel Borkmann wrote:
>> On 3/11/20 12:59 PM, Lukas Wunner wrote:
>>> Commit e687ad60af09 ("netfilter: add netfilter ingress hook after
>>> handle_ing() under unique static key") introduced the ability to
>>> classify packets on ingress.
>>>
>>> Allow the same on egress.  Position the hook immediately before a packet
>>> is handed to tc and then sent out on an interface, thereby mirroring the
>>> ingress order.  This order allows marking packets in the netfilter
>>> egress hook and subsequently using the mark in tc.  Another benefit of
>>> this order is consistency with a lot of existing documentation which
>>> says that egress tc is performed after netfilter hooks.
>>>
>>> Egress hooks already exist for the most common protocols, such as
>>> NF_INET_LOCAL_OUT or NF_ARP_OUT, and those are to be preferred because
>>> they are executed earlier during packet processing.  However for more
>>> exotic protocols, there is currently no provision to apply netfilter on
>>> egress.  A common workaround is to enslave the interface to a bridge and
>>
>> Sorry for late reply, but still NAK.
> 
> I agree Lukas use-case is very specific.
> 
> However, this is useful.
> 
> We have plans to support for NAT64 and NAT46, this is the right spot
> to do this mangling. There is already support for the tunneling

But why is existing local-out or post-routing hook _not_ sufficient for
NAT64 given it being IP based?

> infrastructure in netfilter from ingress, this spot from egress will
> allow us to perform the tunneling from here. There is also no way to
> drop traffic generated by dhclient, this also allow for filtering such
> locally generated traffic. And many more.

This is a known fact for ~17 years [0] or probably more by now and noone
from netfilter folks cared to address it in all the years, so I presume
it cannot be important enough, and these days it can be filtered through
other means already. Tbh, it's a bit laughable that you bring this up as
an argument ...

   [0] https://www.spinics.net/lists/netfilter/msg19488.html

> Performance impact is negligible, Lukas already provided what you
> asked for.

Sure, and the claimed result was "as said the fast-path gets faster, not
slower" without any explanation or digging into details on why this might
be, especially since it appears counter-intuitive as was stated by the
author ... and later demonstrated w/ measurements that show the opposite.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf-next 3/3] netfilter: Introduce egress hook
  2020-03-14  0:12       ` Daniel Borkmann
@ 2020-03-15 13:28         ` Pablo Neira Ayuso
  2020-04-23 14:44           ` Laura Garcia
  0 siblings, 1 reply; 17+ messages in thread
From: Pablo Neira Ayuso @ 2020-03-15 13:28 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Lukas Wunner, Jozsef Kadlecsik, Florian Westphal,
	netfilter-devel, coreteam, netdev, Martin Mares, Dmitry Safonov,
	Thomas Graf, Alexei Starovoitov, David Miller

Hello Daniel,

On Sat, Mar 14, 2020 at 01:12:02AM +0100, Daniel Borkmann wrote:
> On 3/13/20 3:55 PM, Pablo Neira Ayuso wrote:
[...]
> > We have plans to support for NAT64 and NAT46, this is the right spot
> > to do this mangling. There is already support for the tunneling
> 
> But why is existing local-out or post-routing hook _not_ sufficient for
> NAT64 given it being IP based?

Those hooks are not coming at the end of the IP processing. There is
very relevant IP code after those hooks that cannot be bypassed such
as fragmentation, tunneling and neighbour output. Such transformation
needs to happen after the IP processing, exactly from where Lukas is
proposing.

[...]
> > infrastructure in netfilter from ingress, this spot from egress will
> > allow us to perform the tunneling from here. There is also no way to
> > drop traffic generated by dhclient, this also allow for filtering such
> > locally generated traffic. And many more.
> 
> This is a known fact for ~17 years [0] or probably more by now and noone
> from netfilter folks cared to address it in all the years, so I presume
> it cannot be important enough, and these days it can be filtered through
> other means already. Tbh, it's a bit laughable that you bring this up as
> an argument ...
> 
>   [0] https://www.spinics.net/lists/netfilter/msg19488.html

Look: ip6tables, arptables and ebtables are a copy and paste from the
original iptables.

At that time, the only way one way to add support for ingress/egress
classification in netfilter: add "devtables", yet another copy and
past from iptables, that was a no-go.

This is not a problem anymore since there is a consolidated netfilter
framework to achieve ingress/egress classification.

> > Performance impact is negligible, Lukas already provided what you
> > asked for.
> 
> Sure, and the claimed result was "as said the fast-path gets faster, not
> slower" without any explanation or digging into details on why this might
> be, especially since it appears counter-intuitive as was stated by the
> author ... and later demonstrated w/ measurements that show the opposite.

I remember one of your collegues used this same argument against new
hooks back in 2015 [0], and the introduction of this hook was proven
to be negligible. This patchset introduces code that looks very much
the same.

I can make a list of recent updates to the output path, several of
them very are targeted to very specific usecases. You did not care at
all about performance impact of those at all, however, you care about
netfilter for some unknown reason.

In my opinion, your original feedback has been addressed, it's time to
move on.

Thank you.

[0] https://lwn.net/Articles/642414/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf-next 0/3] Netfilter egress hook
  2020-03-11 11:59 [PATCH nf-next 0/3] Netfilter egress hook Lukas Wunner
                   ` (2 preceding siblings ...)
  2020-03-11 11:59 ` [PATCH nf-next 3/3] netfilter: Introduce egress hook Lukas Wunner
@ 2020-03-18  0:21 ` Pablo Neira Ayuso
  3 siblings, 0 replies; 17+ messages in thread
From: Pablo Neira Ayuso @ 2020-03-18  0:21 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Jozsef Kadlecsik, Florian Westphal, netfilter-devel, coreteam,
	netdev, Martin Mares, Daniel Borkmann, Dmitry Safonov,
	Thomas Graf, Alexei Starovoitov

On Wed, Mar 11, 2020 at 12:59:00PM +0100, Lukas Wunner wrote:
> Introduce a netfilter egress hook to complement the existing ingress hook.
> 
> User space support for nft will be submitted separately in a minute.
> 
> I'm re-submitting this as non-RFC per Pablo's request.  Compared to the
> RFC, I've changed the order in patch [3/3] to perform netfilter first,
> then tc (instead of the other way round).  The rationale is provided in
> the commit message.  I've also extended the commit message with performance
> measurements.
> 
> To reproduce the performance measurements in patch [3/3], you'll need
> net-next commit 1e09e5818b3a ("pktgen: Allow on loopback device").

Series applied to nf-next, thank you.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf-next 3/3] netfilter: Introduce egress hook
  2020-03-15 13:28         ` Pablo Neira Ayuso
@ 2020-04-23 14:44           ` Laura Garcia
  2020-04-23 16:05             ` Lukas Wunner
  0 siblings, 1 reply; 17+ messages in thread
From: Laura Garcia @ 2020-04-23 14:44 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Lukas Wunner
  Cc: Daniel Borkmann, Jozsef Kadlecsik, Florian Westphal,
	Netfilter Development Mailing list, coreteam, netdev,
	Martin Mares, Dmitry Safonov, Thomas Graf, Alexei Starovoitov,
	David Miller

On Sun, Mar 15, 2020 at 2:29 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>
> Hello Daniel,
>
> On Sat, Mar 14, 2020 at 01:12:02AM +0100, Daniel Borkmann wrote:
> > On 3/13/20 3:55 PM, Pablo Neira Ayuso wrote:
> [...]
> > > We have plans to support for NAT64 and NAT46, this is the right spot
> > > to do this mangling. There is already support for the tunneling
> >
> > But why is existing local-out or post-routing hook _not_ sufficient for
> > NAT64 given it being IP based?
>
> Those hooks are not coming at the end of the IP processing. There is
> very relevant IP code after those hooks that cannot be bypassed such
> as fragmentation, tunneling and neighbour output. Such transformation
> needs to happen after the IP processing, exactly from where Lukas is
> proposing.
>
> [...]
> > > infrastructure in netfilter from ingress, this spot from egress will
> > > allow us to perform the tunneling from here. There is also no way to
> > > drop traffic generated by dhclient, this also allow for filtering such
> > > locally generated traffic. And many more.

Hi,

Any chance to continue with this approach? I'm afraid outbound
af_packets also could not be filtered without this hook.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf-next 3/3] netfilter: Introduce egress hook
  2020-04-23 14:44           ` Laura Garcia
@ 2020-04-23 16:05             ` Lukas Wunner
  2020-04-27 23:44               ` Pablo Neira Ayuso
  2020-04-28 20:11               ` Daniel Borkmann
  0 siblings, 2 replies; 17+ messages in thread
From: Lukas Wunner @ 2020-04-23 16:05 UTC (permalink / raw)
  To: Laura Garcia
  Cc: Pablo Neira Ayuso, Daniel Borkmann, Jozsef Kadlecsik,
	Florian Westphal, Netfilter Development Mailing list, coreteam,
	netdev, Martin Mares, Dmitry Safonov, Thomas Graf,
	Alexei Starovoitov, David Miller

On Thu, Apr 23, 2020 at 04:44:44PM +0200, Laura Garcia wrote:
> On Sun, Mar 15, 2020 at 2:29 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Sat, Mar 14, 2020 at 01:12:02AM +0100, Daniel Borkmann wrote:
> > > On 3/13/20 3:55 PM, Pablo Neira Ayuso wrote:
> > > > We have plans to support for NAT64 and NAT46, this is the right spot
> > > > to do this mangling. There is already support for the tunneling
> > >
> > > But why is existing local-out or post-routing hook _not_ sufficient for
> > > NAT64 given it being IP based?
> >
> > Those hooks are not coming at the end of the IP processing. There is
> > very relevant IP code after those hooks that cannot be bypassed such
> > as fragmentation, tunneling and neighbour output. Such transformation
> > needs to happen after the IP processing, exactly from where Lukas is
> > proposing.
> >
> > [...]
> > > > infrastructure in netfilter from ingress, this spot from egress will
> > > > allow us to perform the tunneling from here. There is also no way to
> > > > drop traffic generated by dhclient, this also allow for filtering such
> > > > locally generated traffic. And many more.
> 
> Any chance to continue with this approach? I'm afraid outbound
> af_packets also could not be filtered without this hook.

Thanks Laura, good to hear there's interest in the functionality.

Daniel submitted a revert of this series but didn't cc me:

https://lore.kernel.org/netdev/bbdee6355234e730ef686f9321bd072bcf4bb232.1584523237.git.daniel@iogearbox.net/

In the ensuing discussion it turned out that the performance argument
may be addressed by a rearrangement of sch_handle_egress() and
nf_egress() invocations.  I could look into amending the series
accordingly and resubmitting, though I'm currently swamped with
other work.

The question is whether that's going to be sufficient because Daniel
mentioned having an in-tree user as a prerequisite for accepting this
feature, to which Pablo responded with NAT64/NAT46.  I don't have
intentions of implementing those, but maybe someone else has.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf-next 3/3] netfilter: Introduce egress hook
  2020-04-23 16:05             ` Lukas Wunner
@ 2020-04-27 23:44               ` Pablo Neira Ayuso
  2020-04-28 20:11               ` Daniel Borkmann
  1 sibling, 0 replies; 17+ messages in thread
From: Pablo Neira Ayuso @ 2020-04-27 23:44 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Laura Garcia, Daniel Borkmann, Jozsef Kadlecsik,
	Florian Westphal, Netfilter Development Mailing list, coreteam,
	netdev, Martin Mares, Dmitry Safonov, Thomas Graf,
	Alexei Starovoitov, David Miller

Hi Lukas,

On Thu, Apr 23, 2020 at 06:05:42PM +0200, Lukas Wunner wrote:
[...]
> Daniel submitted a revert of this series but didn't cc me:
> 
> https://lore.kernel.org/netdev/bbdee6355234e730ef686f9321bd072bcf4bb232.1584523237.git.daniel@iogearbox.net/
> 
> In the ensuing discussion it turned out that the performance argument
> may be addressed by a rearrangement of sch_handle_egress() and
> nf_egress() invocations.  I could look into amending the series
> accordingly and resubmitting, though I'm currently swamped with
> other work.
> 
> The question is whether that's going to be sufficient because Daniel
> mentioned having an in-tree user as a prerequisite for accepting this
> feature, to which Pablo responded with NAT64/NAT46.  I don't have
> intentions of implementing those, but maybe someone else has.

I'd love to work on integrating that feature, there are a few
implementations outthere that might be useful for this.

However, I'm terribly biased, I'm the Netfilter maintainer.

Even though, I really think this hook is going to be very useful for
the Linux community from day one.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf-next 3/3] netfilter: Introduce egress hook
  2020-04-23 16:05             ` Lukas Wunner
  2020-04-27 23:44               ` Pablo Neira Ayuso
@ 2020-04-28 20:11               ` Daniel Borkmann
  2020-08-20 10:37                 ` Lukas Wunner
  1 sibling, 1 reply; 17+ messages in thread
From: Daniel Borkmann @ 2020-04-28 20:11 UTC (permalink / raw)
  To: Lukas Wunner, Laura Garcia
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
	Netfilter Development Mailing list, coreteam, netdev,
	Martin Mares, Dmitry Safonov, Thomas Graf, Alexei Starovoitov,
	David Miller

Hey Lukas,

On 4/23/20 6:05 PM, Lukas Wunner wrote:
> On Thu, Apr 23, 2020 at 04:44:44PM +0200, Laura Garcia wrote:
>> On Sun, Mar 15, 2020 at 2:29 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>>> On Sat, Mar 14, 2020 at 01:12:02AM +0100, Daniel Borkmann wrote:
>>>> On 3/13/20 3:55 PM, Pablo Neira Ayuso wrote:
>>>>> We have plans to support for NAT64 and NAT46, this is the right spot
>>>>> to do this mangling. There is already support for the tunneling
>>>>
>>>> But why is existing local-out or post-routing hook _not_ sufficient for
>>>> NAT64 given it being IP based?
>>>
>>> Those hooks are not coming at the end of the IP processing. There is
>>> very relevant IP code after those hooks that cannot be bypassed such
>>> as fragmentation, tunneling and neighbour output. Such transformation
>>> needs to happen after the IP processing, exactly from where Lukas is
>>> proposing.
>>>
>>> [...]
>>>>> infrastructure in netfilter from ingress, this spot from egress will
>>>>> allow us to perform the tunneling from here. There is also no way to
>>>>> drop traffic generated by dhclient, this also allow for filtering such
>>>>> locally generated traffic. And many more.
>>
>> Any chance to continue with this approach? I'm afraid outbound
>> af_packets also could not be filtered without this hook.
> 
> Thanks Laura, good to hear there's interest in the functionality.
> 
> Daniel submitted a revert of this series but didn't cc me:
> 
> https://lore.kernel.org/netdev/bbdee6355234e730ef686f9321bd072bcf4bb232.1584523237.git.daniel@iogearbox.net/
> 
> In the ensuing discussion it turned out that the performance argument
> may be addressed by a rearrangement of sch_handle_egress() and
> nf_egress() invocations.  I could look into amending the series
> accordingly and resubmitting, though I'm currently swamped with
> other work.

The rework of these hooks is still on my todo list, too swamped with
other stuff as well right now, but I'll see to have a prototype this
net-next development cycle.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf-next 3/3] netfilter: Introduce egress hook
  2020-04-28 20:11               ` Daniel Borkmann
@ 2020-08-20 10:37                 ` Lukas Wunner
  2020-08-20 16:35                   ` Lukas Wunner
  0 siblings, 1 reply; 17+ messages in thread
From: Lukas Wunner @ 2020-08-20 10:37 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Laura Garcia, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Florian Westphal, Netfilter Development Mailing list, coreteam,
	netdev, Martin Mares, Dmitry Safonov, Thomas Graf,
	Alexei Starovoitov, David Miller

On Tue, Apr 28, 2020 at 10:11:19PM +0200, Daniel Borkmann wrote:
> > https://lore.kernel.org/netdev/bbdee6355234e730ef686f9321bd072bcf4bb232.1584523237.git.daniel@iogearbox.net/
> > 
> > In the ensuing discussion it turned out that the performance argument
> > may be addressed by a rearrangement of sch_handle_egress() and
> > nf_egress() invocations.  I could look into amending the series
> > accordingly and resubmitting, though I'm currently swamped with
> > other work.
> 
> The rework of these hooks is still on my todo list, too swamped with
> other stuff as well right now, but I'll see to have a prototype this
> net-next development cycle.

Daniel, I have it running now the way you want it (I think) and am
benchmarking several variants.  I'm now faced with the following
choice:

* One variant speeds up the default case with neither tc nor nft in use
  (2241 -> 2285 Mb/sec), but tc becomes a little slower (1929 -> 1927
  Mb/sec).

* Another variant still speeds up the default case but not by as much
  (2241 -> 2274 Mb/sec) and speeds up tc as well (1929 -> 1931 Mb/sec).

The difference is that the first variant uses an outer static_key which
patches in a function containing two inner static_keys for nft + tc.
The second variant uses an outer static_key and a single inner static_key
for nft (but no static_key for tc).

nft is slower in the second variant (2231 -> 2225 Mb/sec).

I'm leaning towards the first variant, but because it incurs a small
performance degradation if tc is used, I wanted to give you a heads-up.
If this is totally unacceptable for you, I'll post the second variant
instead.

I need a few more days to update the commit messages, perform further
testing and apply polish, so I plan to dump the patches to the list
next week.  Just thought I'd ask for your opinion, I'm aware this is
difficult to judge without seeing the code.

I'm using static_keys instead of fmodify_return (which you've suggested)
because I would like to avoid a dependency on BPF as it might not be an
option for space-constrained embedded machines and a BPF JIT isn't
available on as many arches as CONFIG_JUMP_LABEL.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH nf-next 3/3] netfilter: Introduce egress hook
  2020-08-20 10:37                 ` Lukas Wunner
@ 2020-08-20 16:35                   ` Lukas Wunner
  0 siblings, 0 replies; 17+ messages in thread
From: Lukas Wunner @ 2020-08-20 16:35 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Laura Garcia, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Florian Westphal, Netfilter Development Mailing list, coreteam,
	netdev, Martin Mares, Dmitry Safonov, Thomas Graf,
	Alexei Starovoitov, David Miller

On Thu, Aug 20, 2020 at 12:37:01PM +0200, Lukas Wunner wrote:
> I need a few more days to update the commit messages, perform further
> testing and apply polish, so I plan to dump the patches to the list
> next week.  Just thought I'd ask for your opinion, I'm aware this is
> difficult to judge without seeing the code.

FWIW, the code for the first variant is in the top-most commit on the
following branch:

https://github.com/l1k/linux/commits/nft_egress_v3

Again, it gives the best performance if neither nft nor tc classifying
is enabled on egress, but incurs a small performance degradation for
the tc-only case.  Ignore the commit message, I haven't updated it yet.

And to get the second variant, the following patch needs to be applied
with "patch -R -p1".  This variant speeds up the normal case but not by
as much, also speeds up tc, has worse performance for nft.  It doesn't
use a static_key for tc, save for the outer one (shared with nft),
hence no performance degradation for the tc-only case.

-- >8 --
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 93e63e756771..ef2cc21a0f11 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -785,6 +785,10 @@ struct xps_dev_maps {
 
 #endif /* CONFIG_XPS */
 
+#ifdef CONFIG_NET_EGRESS
+extern struct static_key_false egress_needed_key;
+#endif
+
 #define TC_MAX_QUEUE	16
 #define TC_BITMASK	15
 /* HW offloaded queuing disciplines txq count and offset maps */
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index bb9cb84114c1..49fe4f3b8fbd 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -97,7 +97,7 @@ void net_inc_ingress_queue(void);
 void net_dec_ingress_queue(void);
 #endif
 
-#ifdef CONFIG_NET_EGRESS
+#ifdef CONFIG_NET_SCH_EGRESS
 void net_inc_egress_queue(void);
 void net_dec_egress_queue(void);
 #endif
diff --git a/net/core/dev.c b/net/core/dev.c
index a7e2ff191481..f1ac84beef76 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1815,20 +1815,30 @@ EXPORT_SYMBOL_GPL(net_dec_ingress_queue);
 #endif
 
 #ifdef CONFIG_NET_EGRESS
-static DEFINE_STATIC_KEY_FALSE(egress_needed_key);
+DEFINE_STATIC_KEY_FALSE(egress_needed_key);
+
+#ifdef CONFIG_NET_SCH_EGRESS
+static DEFINE_STATIC_KEY_FALSE(sch_egress_needed_key);
 
 void net_inc_egress_queue(void)
 {
 	static_branch_inc(&egress_needed_key);
+	static_branch_inc(&sch_egress_needed_key);
 }
 EXPORT_SYMBOL_GPL(net_inc_egress_queue);
 
 void net_dec_egress_queue(void)
 {
+	static_branch_dec(&sch_egress_needed_key);
 	static_branch_dec(&egress_needed_key);
 }
 EXPORT_SYMBOL_GPL(net_dec_egress_queue);
-#endif
+
+#define sch_egress_needed static_branch_likely(&sch_egress_needed_key)
+#else
+#define sch_egress_needed false
+#endif /* CONFIG_NET_SCH_EGRESS */
+#endif /* CONFIG_NET_EGRESS */
 
 static DEFINE_STATIC_KEY_FALSE(netstamp_needed_key);
 #ifdef CONFIG_JUMP_LABEL
@@ -3604,7 +3614,7 @@ static int nf_egress(struct sk_buff *skb)
 static inline struct sk_buff *
 sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 {
-#ifdef CONFIG_NET_CLS_ACT
+#ifdef CONFIG_NET_SCH_EGRESS
 	struct mini_Qdisc *miniq = rcu_dereference_bh(dev->miniq_egress);
 	struct tcf_result cl_res;
 
@@ -3638,7 +3648,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 	default:
 		break;
 	}
-#endif /* CONFIG_NET_CLS_ACT */
+#endif /* CONFIG_NET_SCH_EGRESS */
 
 	return skb;
 }
@@ -3655,7 +3665,10 @@ nf_sch_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 			return NULL;
 	}
 
-	return sch_handle_egress(skb, ret, dev);
+	if (sch_egress_needed)
+		return sch_handle_egress(skb, ret, dev);
+
+	return skb;
 }
 #endif /* CONFIG_NET_EGRESS */
 
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index a78a439e4fdd..8e7d5eed663c 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -358,7 +358,7 @@ static int __nf_register_net_hook(struct net *net, int pf,
 #endif
 #ifdef CONFIG_NETFILTER_EGRESS
 	if (pf == NFPROTO_NETDEV && reg->hooknum == NF_NETDEV_EGRESS)
-		net_inc_egress_queue();
+		static_branch_inc(&egress_needed_key);
 #endif
 #ifdef CONFIG_JUMP_LABEL
 	static_key_slow_inc(&nf_hooks_needed[pf][reg->hooknum]);
@@ -420,7 +420,7 @@ static void __nf_unregister_net_hook(struct net *net, int pf,
 #endif
 #ifdef CONFIG_NETFILTER_EGRESS
 		if (pf == NFPROTO_NETDEV && reg->hooknum == NF_NETDEV_EGRESS)
-			net_dec_egress_queue();
+			static_branch_dec(&egress_needed_key);
 #endif
 #ifdef CONFIG_JUMP_LABEL
 		static_key_slow_dec(&nf_hooks_needed[pf][reg->hooknum]);
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index afd2ba157a13..806d5d60fc9a 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -383,6 +383,9 @@ config NET_SCH_INGRESS
 	  To compile this code as a module, choose M here: the module will be
 	  called sch_ingress with alias of sch_clsact.
 
+config NET_SCH_EGRESS
+	def_bool NET_SCH_INGRESS
+
 config NET_SCH_PLUG
 	tristate "Plug network traffic until release (PLUG)"
 	---help---

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-08-20 16:41 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-11 11:59 [PATCH nf-next 0/3] Netfilter egress hook Lukas Wunner
2020-03-11 11:59 ` [PATCH nf-next 1/3] netfilter: Rename ingress hook include file Lukas Wunner
2020-03-11 11:59 ` [PATCH nf-next 2/3] netfilter: Generalize ingress hook Lukas Wunner
2020-03-11 11:59 ` [PATCH nf-next 3/3] netfilter: Introduce egress hook Lukas Wunner
2020-03-11 14:05   ` Daniel Borkmann
2020-03-11 15:54     ` Lukas Wunner
2020-03-12 22:40       ` Daniel Borkmann
2020-03-13 14:55     ` Pablo Neira Ayuso
2020-03-14  0:12       ` Daniel Borkmann
2020-03-15 13:28         ` Pablo Neira Ayuso
2020-04-23 14:44           ` Laura Garcia
2020-04-23 16:05             ` Lukas Wunner
2020-04-27 23:44               ` Pablo Neira Ayuso
2020-04-28 20:11               ` Daniel Borkmann
2020-08-20 10:37                 ` Lukas Wunner
2020-08-20 16:35                   ` Lukas Wunner
2020-03-18  0:21 ` [PATCH nf-next 0/3] Netfilter " Pablo Neira Ayuso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).