All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH nf-next v5 0/6] Netfilter egress hook
@ 2021-09-28  9:55 Pablo Neira Ayuso
  2021-09-28  9:55 ` [PATCH nf-next v5 1/6] netfilter: Rename ingress hook include file Pablo Neira Ayuso
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2021-09-28  9:55 UTC (permalink / raw)
  To: netfilter-devel
  Cc: davem, netdev, kuba, lukas, daniel, kadlec, fw, ast, edumazet,
	tgraf, nevola, john.fastabend, willemb

Hi,

This patchset v5 that re-adds the Netfilter egress:

1) Rename linux/netfilter_ingress.h to linux/netfilter_netdev.h
   from Lukas Wunner.

2) Generalize ingress hook file to accomodate egress support,
   from Lukas Wunner.

3) Modularize Netfilter ingress hook into nf_tables_netdev: Daniel
   Borkmann is requesting for a mechanism to allow to blacklist
   Netfilter, this allows users to blacklist this new module that
   includes ingress chain and the new egress chain for the netdev
   family. There is no other in-tree user of the ingress and egress
   hooks than this which might interfer with his matter.

4) Place the egress hook again before the tc egress hook as requested
   by Daniel Borkmann. Patch to add egress hook from Lukas Wunner.
   The Netfilter egress hook remains behind the static key, if unused
   performance degradation is negligible.

5) Add netfilter egress handling to af_packet.

Arguably, distributors might decide to compile nf_tables_netdev
built-in. Traditionally, distributors have compiled their kernels using
the default configuration that Netfilter Kconfig provides (ie. use
modules whenever possible). In any case, I consider that distributor
policy is out of scope in this discussion, providing a mechanism to
allow Daniel to prevent Netfilter ingress and egress chains to be loaded
should be sufficient IMHO.

Joint work with Lukas Wunner.

Please review, thanks.

Lukas Wunner (3):
  netfilter: Rename ingress hook include file
  netfilter: Generalize ingress hook include file
  netfilter: Introduce egress hook

Pablo Neira Ayuso (3):
  netfilter: nf_tables: move netdev ingress filter chain to nf_tables_netdev.c
  af_packet: Introduce egress hook
  netfilter: nf_tables: add egress support

 include/linux/netdevice.h         |   4 +
 include/linux/netfilter_ingress.h |  58 ------------
 include/linux/netfilter_netdev.h  | 112 ++++++++++++++++++++++
 include/uapi/linux/netfilter.h    |   1 +
 net/core/dev.c                    |  15 ++-
 net/netfilter/Kconfig             |  10 +-
 net/netfilter/Makefile            |   1 +
 net/netfilter/core.c              |  34 ++++++-
 net/netfilter/nf_tables_api.c     |   7 +-
 net/netfilter/nf_tables_netdev.c  | 150 ++++++++++++++++++++++++++++++
 net/netfilter/nft_chain_filter.c  | 143 ----------------------------
 net/packet/af_packet.c            |  35 +++++++
 12 files changed, 358 insertions(+), 212 deletions(-)
 delete mode 100644 include/linux/netfilter_ingress.h
 create mode 100644 include/linux/netfilter_netdev.h
 create mode 100644 net/netfilter/nf_tables_netdev.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH nf-next v5 1/6] netfilter: Rename ingress hook include file
  2021-09-28  9:55 [PATCH nf-next v5 0/6] Netfilter egress hook Pablo Neira Ayuso
@ 2021-09-28  9:55 ` Pablo Neira Ayuso
  2021-09-28  9:55 ` [PATCH nf-next v5 2/6] netfilter: Generalize " Pablo Neira Ayuso
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2021-09-28  9:55 UTC (permalink / raw)
  To: netfilter-devel
  Cc: davem, netdev, kuba, lukas, daniel, kadlec, fw, ast, edumazet,
	tgraf, nevola, john.fastabend, willemb

From: Lukas Wunner <lukas@wunner.de>

Prepare for addition of a netfilter egress hook by renaming
<linux/netfilter_ingress.h> to <linux/netfilter_netdev.h>.

The egress hook also necessitates a refactoring of the include file,
but that is done in a separate commit to ease reviewing.

No functional change intended.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/{netfilter_ingress.h => netfilter_netdev.h} | 0
 net/core/dev.c                                            | 2 +-
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename include/linux/{netfilter_ingress.h => netfilter_netdev.h} (100%)

diff --git a/include/linux/netfilter_ingress.h b/include/linux/netfilter_netdev.h
similarity index 100%
rename from include/linux/netfilter_ingress.h
rename to include/linux/netfilter_netdev.h
diff --git a/net/core/dev.c b/net/core/dev.c
index 7ee9fecd3aff..a92823710a25 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -140,7 +140,7 @@
 #include <linux/if_macvlan.h>
 #include <linux/errqueue.h>
 #include <linux/hrtimer.h>
-#include <linux/netfilter_ingress.h>
+#include <linux/netfilter_netdev.h>
 #include <linux/crash_dump.h>
 #include <linux/sctp.h>
 #include <net/udp_tunnel.h>
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH nf-next v5 2/6] netfilter: Generalize ingress hook include file
  2021-09-28  9:55 [PATCH nf-next v5 0/6] Netfilter egress hook Pablo Neira Ayuso
  2021-09-28  9:55 ` [PATCH nf-next v5 1/6] netfilter: Rename ingress hook include file Pablo Neira Ayuso
@ 2021-09-28  9:55 ` Pablo Neira Ayuso
  2021-09-28  9:55 ` [PATCH nf-next v5 3/6] netfilter: nf_tables: move netdev ingress filter chain to nf_tables_netdev.c Pablo Neira Ayuso
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2021-09-28  9:55 UTC (permalink / raw)
  To: netfilter-devel
  Cc: davem, netdev, kuba, lukas, daniel, kadlec, fw, ast, edumazet,
	tgraf, nevola, john.fastabend, willemb

From: Lukas Wunner <lukas@wunner.de>

Prepare for addition of a netfilter egress hook by generalizing the
ingress hook include file.

No functional change intended.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/netfilter_netdev.h | 20 +++++++++++---------
 net/core/dev.c                   |  2 +-
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/include/linux/netfilter_netdev.h b/include/linux/netfilter_netdev.h
index a13774be2eb5..5812b0fb0278 100644
--- a/include/linux/netfilter_netdev.h
+++ b/include/linux/netfilter_netdev.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _NETFILTER_INGRESS_H_
-#define _NETFILTER_INGRESS_H_
+#ifndef _NETFILTER_NETDEV_H_
+#define _NETFILTER_NETDEV_H_
 
 #include <linux/netfilter.h>
 #include <linux/netdevice.h>
@@ -38,10 +38,6 @@ static inline int nf_hook_ingress(struct sk_buff *skb)
 	return ret;
 }
 
-static inline void nf_hook_ingress_init(struct net_device *dev)
-{
-	RCU_INIT_POINTER(dev->nf_hooks_ingress, NULL);
-}
 #else /* CONFIG_NETFILTER_INGRESS */
 static inline int nf_hook_ingress_active(struct sk_buff *skb)
 {
@@ -52,7 +48,13 @@ static inline int nf_hook_ingress(struct sk_buff *skb)
 {
 	return 0;
 }
-
-static inline void nf_hook_ingress_init(struct net_device *dev) {}
 #endif /* CONFIG_NETFILTER_INGRESS */
-#endif /* _NETFILTER_INGRESS_H_ */
+
+static inline void nf_hook_netdev_init(struct net_device *dev)
+{
+#ifdef CONFIG_NETFILTER_INGRESS
+	RCU_INIT_POINTER(dev->nf_hooks_ingress, NULL);
+#endif
+}
+
+#endif /* _NETFILTER_NETDEV_H_ */
diff --git a/net/core/dev.c b/net/core/dev.c
index a92823710a25..ba0700dea307 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10860,7 +10860,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	if (!dev->ethtool_ops)
 		dev->ethtool_ops = &default_ethtool_ops;
 
-	nf_hook_ingress_init(dev);
+	nf_hook_netdev_init(dev);
 
 	return dev;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH nf-next v5 3/6] netfilter: nf_tables: move netdev ingress filter chain to nf_tables_netdev.c
  2021-09-28  9:55 [PATCH nf-next v5 0/6] Netfilter egress hook Pablo Neira Ayuso
  2021-09-28  9:55 ` [PATCH nf-next v5 1/6] netfilter: Rename ingress hook include file Pablo Neira Ayuso
  2021-09-28  9:55 ` [PATCH nf-next v5 2/6] netfilter: Generalize " Pablo Neira Ayuso
@ 2021-09-28  9:55 ` Pablo Neira Ayuso
  2021-09-28  9:55 ` [PATCH nf-next v5 4/6] netfilter: Introduce egress hook Pablo Neira Ayuso
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2021-09-28  9:55 UTC (permalink / raw)
  To: netfilter-devel
  Cc: davem, netdev, kuba, lukas, daniel, kadlec, fw, ast, edumazet,
	tgraf, nevola, john.fastabend, willemb

Add a tristate Kconfig toggle whose default is to compile support for
the netdev family as a module, this allows to blacklist Netfilter as
Daniel Borkmann requests.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/Kconfig            |   2 +-
 net/netfilter/Makefile           |   1 +
 net/netfilter/nf_tables_api.c    |   7 +-
 net/netfilter/nf_tables_netdev.c | 148 +++++++++++++++++++++++++++++++
 net/netfilter/nft_chain_filter.c | 143 -----------------------------
 5 files changed, 154 insertions(+), 147 deletions(-)
 create mode 100644 net/netfilter/nf_tables_netdev.c

diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 54395266339d..b45fb3de8209 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -480,7 +480,7 @@ config NF_TABLES_INET
 	  This option enables support for a mixed IPv4/IPv6 "inet" table.
 
 config NF_TABLES_NETDEV
-	bool "Netfilter nf_tables netdev tables support"
+	tristate "Netfilter nf_tables netdev tables support"
 	help
 	  This option enables support for the "netdev" table.
 
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index aab20e575ecd..21c23ff8630d 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -86,6 +86,7 @@ endif
 endif
 
 obj-$(CONFIG_NF_TABLES)		+= nf_tables.o
+obj-$(CONFIG_NF_TABLES_NETDEV)	+= nf_tables_netdev.o
 obj-$(CONFIG_NFT_COMPAT)	+= nft_compat.o
 obj-$(CONFIG_NFT_CONNLIMIT)	+= nft_connlimit.o
 obj-$(CONFIG_NFT_NUMGEN)	+= nft_numgen.o
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index c0851fec11d4..200c5af3c427 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -26,6 +26,7 @@
 #define NFT_MODULE_AUTOLOAD_LIMIT (MODULE_NAME_LEN - sizeof("nft-expr-255-"))
 
 unsigned int nf_tables_net_id __read_mostly;
+EXPORT_SYMBOL_GPL(nf_tables_net_id);
 
 static LIST_HEAD(nf_tables_expressions);
 static LIST_HEAD(nf_tables_objects);
@@ -1948,8 +1949,6 @@ static int nft_chain_parse_hook(struct net *net,
 	hook->priority = ntohl(nla_get_be32(ha[NFTA_HOOK_PRIORITY]));
 
 	type = __nft_chain_type_get(family, NFT_CHAIN_T_DEFAULT);
-	if (!type)
-		return -EOPNOTSUPP;
 
 	if (nla[NFTA_CHAIN_TYPE]) {
 		type = nf_tables_chain_type_lookup(net, nla[NFTA_CHAIN_TYPE],
@@ -1958,7 +1957,9 @@ static int nft_chain_parse_hook(struct net *net,
 			NL_SET_BAD_ATTR(extack, nla[NFTA_CHAIN_TYPE]);
 			return PTR_ERR(type);
 		}
-	}
+	} else if (!type)
+		return -EOPNOTSUPP;
+
 	if (hook->num >= NFT_MAX_HOOKS || !(type->hook_mask & (1 << hook->num)))
 		return -EOPNOTSUPP;
 
diff --git a/net/netfilter/nf_tables_netdev.c b/net/netfilter/nf_tables_netdev.c
new file mode 100644
index 000000000000..8c42ea7d1be9
--- /dev/null
+++ b/net/netfilter/nf_tables_netdev.c
@@ -0,0 +1,148 @@
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/netdevice.h>
+#include <net/net_namespace.h>
+#include <net/netfilter/nf_tables.h>
+#include <linux/netfilter_ipv4.h>
+#include <linux/netfilter_ipv6.h>
+#include <linux/netfilter_bridge.h>
+#include <linux/netfilter_arp.h>
+#include <net/netfilter/nf_tables_ipv4.h>
+#include <net/netfilter/nf_tables_ipv6.h>
+
+static unsigned int nft_do_chain_netdev(void *priv, struct sk_buff *skb,
+					const struct nf_hook_state *state)
+{
+	struct nft_pktinfo pkt;
+
+	nft_set_pktinfo(&pkt, skb, state);
+
+	switch (skb->protocol) {
+	case htons(ETH_P_IP):
+		nft_set_pktinfo_ipv4_validate(&pkt);
+		break;
+	case htons(ETH_P_IPV6):
+		nft_set_pktinfo_ipv6_validate(&pkt);
+		break;
+	default:
+		nft_set_pktinfo_unspec(&pkt);
+		break;
+	}
+
+	return nft_do_chain(&pkt, priv);
+}
+
+static const struct nft_chain_type nft_chain_filter_netdev = {
+	.name		= "filter",
+	.type		= NFT_CHAIN_T_DEFAULT,
+	.family		= NFPROTO_NETDEV,
+	.hook_mask	= (1 << NF_NETDEV_INGRESS),
+	.hooks		= {
+		[NF_NETDEV_INGRESS]	= nft_do_chain_netdev,
+	},
+};
+
+static void nft_netdev_event(unsigned long event, struct net_device *dev,
+			     struct nft_ctx *ctx)
+{
+	struct nft_base_chain *basechain = nft_base_chain(ctx->chain);
+	struct nft_hook *hook, *found = NULL;
+	int n = 0;
+
+	if (event != NETDEV_UNREGISTER)
+		return;
+
+	list_for_each_entry(hook, &basechain->hook_list, list) {
+		if (hook->ops.dev == dev)
+			found = hook;
+
+		n++;
+	}
+	if (!found)
+		return;
+
+	if (n > 1) {
+		nf_unregister_net_hook(ctx->net, &found->ops);
+		list_del_rcu(&found->list);
+		kfree_rcu(found, rcu);
+		return;
+	}
+
+	/* UNREGISTER events are also happening on netns exit.
+	 *
+	 * Although nf_tables core releases all tables/chains, only this event
+	 * handler provides guarantee that hook->ops.dev is still accessible,
+	 * so we cannot skip exiting net namespaces.
+	 */
+	__nft_release_basechain(ctx);
+}
+
+static int nf_tables_netdev_event(struct notifier_block *this,
+				  unsigned long event, void *ptr)
+{
+	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	struct nftables_pernet *nft_net;
+	struct nft_table *table;
+	struct nft_chain *chain, *nr;
+	struct nft_ctx ctx = {
+		.net	= dev_net(dev),
+	};
+
+	if (event != NETDEV_UNREGISTER &&
+	    event != NETDEV_CHANGENAME)
+		return NOTIFY_DONE;
+
+	nft_net = nft_pernet(ctx.net);
+	mutex_lock(&nft_net->commit_mutex);
+	list_for_each_entry(table, &nft_net->tables, list) {
+		if (table->family != NFPROTO_NETDEV)
+			continue;
+
+		ctx.family = table->family;
+		ctx.table = table;
+		list_for_each_entry_safe(chain, nr, &table->chains, list) {
+			if (!nft_is_base_chain(chain))
+				continue;
+
+			ctx.chain = chain;
+			nft_netdev_event(event, dev, &ctx);
+		}
+	}
+	mutex_unlock(&nft_net->commit_mutex);
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block nf_tables_netdev_notifier = {
+	.notifier_call	= nf_tables_netdev_event,
+};
+
+static int nft_chain_filter_netdev_init(void)
+{
+	int err;
+
+	nft_register_chain_type(&nft_chain_filter_netdev);
+
+	err = register_netdevice_notifier(&nf_tables_netdev_notifier);
+	if (err)
+		goto err_register_netdevice_notifier;
+
+	return 0;
+
+err_register_netdevice_notifier:
+	nft_unregister_chain_type(&nft_chain_filter_netdev);
+
+	return err;
+}
+
+static void nft_chain_filter_netdev_fini(void)
+{
+	nft_unregister_chain_type(&nft_chain_filter_netdev);
+	unregister_netdevice_notifier(&nf_tables_netdev_notifier);
+}
+
+module_init(nft_chain_filter_netdev_init);
+module_exit(nft_chain_filter_netdev_fini);
+
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_NFT_CHAIN(5, "filter");	/* NFPROTO_NETDEV */
diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c
index 5b02408a920b..1ce2ffb71981 100644
--- a/net/netfilter/nft_chain_filter.c
+++ b/net/netfilter/nft_chain_filter.c
@@ -283,150 +283,8 @@ static inline void nft_chain_filter_bridge_init(void) {}
 static inline void nft_chain_filter_bridge_fini(void) {}
 #endif /* CONFIG_NF_TABLES_BRIDGE */
 
-#ifdef CONFIG_NF_TABLES_NETDEV
-static unsigned int nft_do_chain_netdev(void *priv, struct sk_buff *skb,
-					const struct nf_hook_state *state)
-{
-	struct nft_pktinfo pkt;
-
-	nft_set_pktinfo(&pkt, skb, state);
-
-	switch (skb->protocol) {
-	case htons(ETH_P_IP):
-		nft_set_pktinfo_ipv4_validate(&pkt);
-		break;
-	case htons(ETH_P_IPV6):
-		nft_set_pktinfo_ipv6_validate(&pkt);
-		break;
-	default:
-		nft_set_pktinfo_unspec(&pkt);
-		break;
-	}
-
-	return nft_do_chain(&pkt, priv);
-}
-
-static const struct nft_chain_type nft_chain_filter_netdev = {
-	.name		= "filter",
-	.type		= NFT_CHAIN_T_DEFAULT,
-	.family		= NFPROTO_NETDEV,
-	.hook_mask	= (1 << NF_NETDEV_INGRESS),
-	.hooks		= {
-		[NF_NETDEV_INGRESS]	= nft_do_chain_netdev,
-	},
-};
-
-static void nft_netdev_event(unsigned long event, struct net_device *dev,
-			     struct nft_ctx *ctx)
-{
-	struct nft_base_chain *basechain = nft_base_chain(ctx->chain);
-	struct nft_hook *hook, *found = NULL;
-	int n = 0;
-
-	if (event != NETDEV_UNREGISTER)
-		return;
-
-	list_for_each_entry(hook, &basechain->hook_list, list) {
-		if (hook->ops.dev == dev)
-			found = hook;
-
-		n++;
-	}
-	if (!found)
-		return;
-
-	if (n > 1) {
-		nf_unregister_net_hook(ctx->net, &found->ops);
-		list_del_rcu(&found->list);
-		kfree_rcu(found, rcu);
-		return;
-	}
-
-	/* UNREGISTER events are also happening on netns exit.
-	 *
-	 * Although nf_tables core releases all tables/chains, only this event
-	 * handler provides guarantee that hook->ops.dev is still accessible,
-	 * so we cannot skip exiting net namespaces.
-	 */
-	__nft_release_basechain(ctx);
-}
-
-static int nf_tables_netdev_event(struct notifier_block *this,
-				  unsigned long event, void *ptr)
-{
-	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
-	struct nftables_pernet *nft_net;
-	struct nft_table *table;
-	struct nft_chain *chain, *nr;
-	struct nft_ctx ctx = {
-		.net	= dev_net(dev),
-	};
-
-	if (event != NETDEV_UNREGISTER &&
-	    event != NETDEV_CHANGENAME)
-		return NOTIFY_DONE;
-
-	nft_net = nft_pernet(ctx.net);
-	mutex_lock(&nft_net->commit_mutex);
-	list_for_each_entry(table, &nft_net->tables, list) {
-		if (table->family != NFPROTO_NETDEV)
-			continue;
-
-		ctx.family = table->family;
-		ctx.table = table;
-		list_for_each_entry_safe(chain, nr, &table->chains, list) {
-			if (!nft_is_base_chain(chain))
-				continue;
-
-			ctx.chain = chain;
-			nft_netdev_event(event, dev, &ctx);
-		}
-	}
-	mutex_unlock(&nft_net->commit_mutex);
-
-	return NOTIFY_DONE;
-}
-
-static struct notifier_block nf_tables_netdev_notifier = {
-	.notifier_call	= nf_tables_netdev_event,
-};
-
-static int nft_chain_filter_netdev_init(void)
-{
-	int err;
-
-	nft_register_chain_type(&nft_chain_filter_netdev);
-
-	err = register_netdevice_notifier(&nf_tables_netdev_notifier);
-	if (err)
-		goto err_register_netdevice_notifier;
-
-	return 0;
-
-err_register_netdevice_notifier:
-	nft_unregister_chain_type(&nft_chain_filter_netdev);
-
-	return err;
-}
-
-static void nft_chain_filter_netdev_fini(void)
-{
-	nft_unregister_chain_type(&nft_chain_filter_netdev);
-	unregister_netdevice_notifier(&nf_tables_netdev_notifier);
-}
-#else
-static inline int nft_chain_filter_netdev_init(void) { return 0; }
-static inline void nft_chain_filter_netdev_fini(void) {}
-#endif /* CONFIG_NF_TABLES_NETDEV */
-
 int __init nft_chain_filter_init(void)
 {
-	int err;
-
-	err = nft_chain_filter_netdev_init();
-	if (err < 0)
-		return err;
-
 	nft_chain_filter_ipv4_init();
 	nft_chain_filter_ipv6_init();
 	nft_chain_filter_arp_init();
@@ -443,5 +301,4 @@ void nft_chain_filter_fini(void)
 	nft_chain_filter_arp_fini();
 	nft_chain_filter_ipv6_fini();
 	nft_chain_filter_ipv4_fini();
-	nft_chain_filter_netdev_fini();
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH nf-next v5 4/6] netfilter: Introduce egress hook
  2021-09-28  9:55 [PATCH nf-next v5 0/6] Netfilter egress hook Pablo Neira Ayuso
                   ` (2 preceding siblings ...)
  2021-09-28  9:55 ` [PATCH nf-next v5 3/6] netfilter: nf_tables: move netdev ingress filter chain to nf_tables_netdev.c Pablo Neira Ayuso
@ 2021-09-28  9:55 ` Pablo Neira Ayuso
  2021-09-28  9:55 ` [PATCH nf-next v5 5/6] af_packet: " Pablo Neira Ayuso
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2021-09-28  9:55 UTC (permalink / raw)
  To: netfilter-devel
  Cc: davem, netdev, kuba, lukas, daniel, kadlec, fw, ast, edumazet,
	tgraf, nevola, john.fastabend, willemb

From: Lukas Wunner <lukas@wunner.de>

Commit e687ad60af09 ("netfilter: add netfilter ingress hook after
handle_ing() under unique static key") introduced the ability to
classify packets with netfilter on ingress.

Support the same on egress to satisfy user requirements such as:
* outbound security policies for containers (Laura)
* filtering and mangling intra-node Direct Server Return (DSR) traffic
  on a load balancer (Laura)
* filtering locally generated traffic coming in through AF_PACKET,
  such as local ARP traffic generated for clustering purposes or DHCP
  (Laura; the AF_PACKET plumbing is contained in a separate commit)
* L2 filtering from ingress and egress for AVB (Audio Video Bridging)
  and gPTP with nftables (Pablo)
* in the future: in-kernel NAT64/NAT46 (Pablo)

As well as to allow to use the existing nftables features from the
egress.

A patch for nftables to hook up egress rules from user space has been
submitted separately, so users may immediately take advantage of the
feature.

The hook is positioned before packet handling by traffic control as
requested by Daniel Borkmann.

The only in-tree user for this hook is the nf_tables_netdev module
that can be blacklist if users do not want to allow to register
nf_tables ingress and egress hooks.

If egress netfilter handling is not enabled on any interface, it is
patched out of the data path by way of a static_key and doesn't make a
performance difference that is discernible from noise.

These are the performance results from the previous patchset round:

Before:             2076 2076 2076 2077 2077 2074 Mb/sec
After:              2080 2078 2078 2079 2079 2077 Mb/sec

Measurements were performed on a Core i7-3615QM.  Commands to reproduce:
ip link add dev foo type dummy
ip link set dev foo up
modprobe pktgen
echo "add_device foo" > /proc/net/pktgen/kpktgend_3
samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i foo -n 400000000 -m "11:11:11:11:11:11" -d 1.1.1.1

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Laura García Liébana <nevola@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Thomas Graf <tgraf@suug.ch>
---
 include/linux/netdevice.h        |  4 +++
 include/linux/netfilter_netdev.h | 52 ++++++++++++++++++++++++++++++++
 include/uapi/linux/netfilter.h   |  1 +
 net/core/dev.c                   | 11 +++++--
 net/netfilter/Kconfig            |  8 +++++
 net/netfilter/core.c             | 34 +++++++++++++++++++--
 6 files changed, 105 insertions(+), 5 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d79163208dfd..e9a48068f306 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1861,6 +1861,7 @@ enum netdev_ml_priv_type {
  *	@xps_maps:	XXX: need comments on this one
  *	@miniq_egress:		clsact qdisc specific data for
  *				egress processing
+ *	@nf_hooks_egress:	netfilter hooks executed for egress packets
  *	@qdisc_hash:		qdisc hash table
  *	@watchdog_timeo:	Represents the timeout that is used by
  *				the watchdog (see dev_watchdog())
@@ -2161,6 +2162,9 @@ struct net_device {
 #ifdef CONFIG_NET_CLS_ACT
 	struct mini_Qdisc __rcu	*miniq_egress;
 #endif
+#ifdef CONFIG_NETFILTER_EGRESS
+	struct nf_hook_entries __rcu *nf_hooks_egress;
+#endif
 
 #ifdef CONFIG_NET_SCHED
 	DECLARE_HASHTABLE	(qdisc_hash, 4);
diff --git a/include/linux/netfilter_netdev.h b/include/linux/netfilter_netdev.h
index 5812b0fb0278..5ed6e90d46f6 100644
--- a/include/linux/netfilter_netdev.h
+++ b/include/linux/netfilter_netdev.h
@@ -50,11 +50,63 @@ static inline int nf_hook_ingress(struct sk_buff *skb)
 }
 #endif /* CONFIG_NETFILTER_INGRESS */
 
+#ifdef CONFIG_NETFILTER_EGRESS
+static inline bool nf_hook_egress_active(void)
+{
+#ifdef CONFIG_JUMP_LABEL
+	if (!static_key_false(&nf_hooks_needed[NFPROTO_NETDEV][NF_NETDEV_EGRESS]))
+		return false;
+#endif
+	return true;
+}
+
+/* caller must hold rcu_read_lock */
+static inline struct sk_buff *nf_hook_egress(struct sk_buff *skb, int *rc,
+					     struct net_device *dev)
+{
+	struct nf_hook_entries *e = rcu_dereference(dev->nf_hooks_egress);
+	struct nf_hook_state state;
+	int ret;
+
+	if (!e)
+		return skb;
+
+	nf_hook_state_init(&state, NF_NETDEV_EGRESS,
+			   NFPROTO_NETDEV, dev, NULL, NULL,
+			   dev_net(dev), NULL);
+	ret = nf_hook_slow(skb, &state, e, 0);
+
+	if (ret == 1) {
+		return skb;
+	} else if (ret < 0) {
+		*rc = NET_XMIT_DROP;
+		return NULL;
+	} else { /* ret == 0 */
+		*rc = NET_XMIT_SUCCESS;
+		return NULL;
+	}
+}
+#else /* CONFIG_NETFILTER_EGRESS */
+static inline bool nf_hook_egress_active(void)
+{
+	return false;
+}
+
+static inline struct sk_buff *nf_hook_egress(struct sk_buff *skb, int *rc,
+					     struct net_device *dev)
+{
+	return skb;
+}
+#endif /* CONFIG_NETFILTER_EGRESS */
+
 static inline void nf_hook_netdev_init(struct net_device *dev)
 {
 #ifdef CONFIG_NETFILTER_INGRESS
 	RCU_INIT_POINTER(dev->nf_hooks_ingress, NULL);
 #endif
+#ifdef CONFIG_NETFILTER_EGRESS
+	RCU_INIT_POINTER(dev->nf_hooks_egress, NULL);
+#endif
 }
 
 #endif /* _NETFILTER_NETDEV_H_ */
diff --git a/include/uapi/linux/netfilter.h b/include/uapi/linux/netfilter.h
index ef9a44286e23..53411ccc69db 100644
--- a/include/uapi/linux/netfilter.h
+++ b/include/uapi/linux/netfilter.h
@@ -51,6 +51,7 @@ enum nf_inet_hooks {
 
 enum nf_dev_hooks {
 	NF_NETDEV_INGRESS,
+	NF_NETDEV_EGRESS,
 	NF_NETDEV_NUMHOOKS
 };
 
diff --git a/net/core/dev.c b/net/core/dev.c
index ba0700dea307..ed847b43ba6c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3918,6 +3918,7 @@ EXPORT_SYMBOL(dev_loopback_xmit);
 static struct sk_buff *
 sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 {
+#ifdef CONFIG_NET_CLS_ACT
 	struct mini_Qdisc *miniq = rcu_dereference_bh(dev->miniq_egress);
 	struct tcf_result cl_res;
 
@@ -3953,6 +3954,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 	default:
 		break;
 	}
+#endif /* CONFIG_NET_CLS_ACT */
 
 	return skb;
 }
@@ -4146,13 +4148,18 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 	qdisc_pkt_len_init(skb);
 #ifdef CONFIG_NET_CLS_ACT
 	skb->tc_at_ingress = 0;
-# ifdef CONFIG_NET_EGRESS
+#endif
+#ifdef CONFIG_NET_EGRESS
 	if (static_branch_unlikely(&egress_needed_key)) {
+		if (nf_hook_egress_active()) {
+			skb = nf_hook_egress(skb, &rc, dev);
+			if (!skb)
+				goto out;
+		}
 		skb = sch_handle_egress(skb, &rc, dev);
 		if (!skb)
 			goto out;
 	}
-# endif
 #endif
 	/* If device/qdisc don't need skb->dst, release it right now while
 	 * its hot in this cpu cache.
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index b45fb3de8209..265d432a6dba 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -10,6 +10,14 @@ config NETFILTER_INGRESS
 	  This allows you to classify packets from ingress using the Netfilter
 	  infrastructure.
 
+config NETFILTER_EGRESS
+	bool "Netfilter egress support"
+	default y
+	select NET_EGRESS
+	help
+	  This allows you to classify packets before transmission using the
+	  Netfilter infrastructure.
+
 config NETFILTER_NETLINK
 	tristate
 
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 63d032191e62..3a32a813fcde 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -316,6 +316,12 @@ nf_hook_entry_head(struct net *net, int pf, unsigned int hooknum,
 		if (dev && dev_net(dev) == net)
 			return &dev->nf_hooks_ingress;
 	}
+#endif
+#ifdef CONFIG_NETFILTER_EGRESS
+	if (hooknum == NF_NETDEV_EGRESS) {
+		if (dev && dev_net(dev) == net)
+			return &dev->nf_hooks_egress;
+	}
 #endif
 	WARN_ON_ONCE(1);
 	return NULL;
@@ -344,6 +350,11 @@ static inline bool nf_ingress_hook(const struct nf_hook_ops *reg, int pf)
 	return false;
 }
 
+static inline bool nf_egress_hook(const struct nf_hook_ops *reg, int pf)
+{
+	return pf == NFPROTO_NETDEV && reg->hooknum == NF_NETDEV_EGRESS;
+}
+
 static void nf_static_key_inc(const struct nf_hook_ops *reg, int pf)
 {
 #ifdef CONFIG_JUMP_LABEL
@@ -383,9 +394,18 @@ static int __nf_register_net_hook(struct net *net, int pf,
 
 	switch (pf) {
 	case NFPROTO_NETDEV:
-		err = nf_ingress_check(net, reg, NF_NETDEV_INGRESS);
-		if (err < 0)
-			return err;
+#ifndef CONFIG_NETFILTER_INGRESS
+		if (reg->hooknum == NF_NETDEV_INGRESS)
+			return -EOPNOTSUPP;
+#endif
+#ifndef CONFIG_NETFILTER_EGRESS
+		if (reg->hooknum == NF_NETDEV_EGRESS)
+			return -EOPNOTSUPP;
+#endif
+		if ((reg->hooknum != NF_NETDEV_INGRESS &&
+		     reg->hooknum != NF_NETDEV_EGRESS) ||
+		    !reg->dev || dev_net(reg->dev) != net)
+			return -EINVAL;
 		break;
 	case NFPROTO_INET:
 		if (reg->hooknum != NF_INET_INGRESS)
@@ -417,6 +437,10 @@ static int __nf_register_net_hook(struct net *net, int pf,
 #ifdef CONFIG_NETFILTER_INGRESS
 	if (nf_ingress_hook(reg, pf))
 		net_inc_ingress_queue();
+#endif
+#ifdef CONFIG_NETFILTER_EGRESS
+	if (nf_egress_hook(reg, pf))
+		net_inc_egress_queue();
 #endif
 	nf_static_key_inc(reg, pf);
 
@@ -474,6 +498,10 @@ static void __nf_unregister_net_hook(struct net *net, int pf,
 #ifdef CONFIG_NETFILTER_INGRESS
 		if (nf_ingress_hook(reg, pf))
 			net_dec_ingress_queue();
+#endif
+#ifdef CONFIG_NETFILTER_EGRESS
+		if (nf_egress_hook(reg, pf))
+			net_dec_egress_queue();
 #endif
 		nf_static_key_dec(reg, pf);
 	} else {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH nf-next v5 5/6] af_packet: Introduce egress hook
  2021-09-28  9:55 [PATCH nf-next v5 0/6] Netfilter egress hook Pablo Neira Ayuso
                   ` (3 preceding siblings ...)
  2021-09-28  9:55 ` [PATCH nf-next v5 4/6] netfilter: Introduce egress hook Pablo Neira Ayuso
@ 2021-09-28  9:55 ` Pablo Neira Ayuso
  2021-09-28  9:55 ` [PATCH nf-next v5 6/6] netfilter: nf_tables: add egress support Pablo Neira Ayuso
  2021-09-30  6:08 ` [PATCH nf-next v5 0/6] Netfilter egress hook Daniel Borkmann
  6 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2021-09-28  9:55 UTC (permalink / raw)
  To: netfilter-devel
  Cc: davem, netdev, kuba, lukas, daniel, kadlec, fw, ast, edumazet,
	tgraf, nevola, john.fastabend, willemb

Add egress hook for AF_PACKET sockets that have the PACKET_QDISC_BYPASS
socket option set to on, which allows packets to escape without being
filtered in the egress path.

This patch only updates the AF_PACKET path, it does not update
dev_direct_xmit() so the XDP infrastructure has a chance to bypass
Netfilter.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[lukas: acquire rcu_read_lock, fix typos, rebase]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 net/packet/af_packet.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 2a2bc64f75cf..46943a18a10d 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -91,6 +91,7 @@
 #endif
 #include <linux/bpf.h>
 #include <net/compat.h>
+#include <linux/netfilter_netdev.h>
 
 #include "internal.h"
 
@@ -241,8 +242,42 @@ struct packet_skb_cb {
 static void __fanout_unlink(struct sock *sk, struct packet_sock *po);
 static void __fanout_link(struct sock *sk, struct packet_sock *po);
 
+#ifdef CONFIG_NETFILTER_EGRESS
+static noinline struct sk_buff *nf_hook_direct_egress(struct sk_buff *skb)
+{
+	struct sk_buff *next, *head = NULL, *tail;
+	int rc;
+
+	rcu_read_lock();
+	for (; skb != NULL; skb = next) {
+		next = skb->next;
+		skb_mark_not_on_list(skb);
+
+		if (!nf_hook_egress(skb, &rc, skb->dev))
+			continue;
+
+		if (!head)
+			head = skb;
+		else
+			tail->next = skb;
+
+		tail = skb;
+	}
+	rcu_read_unlock();
+
+	return head;
+}
+#endif
+
 static int packet_direct_xmit(struct sk_buff *skb)
 {
+#ifdef CONFIG_NETFILTER_EGRESS
+	if (nf_hook_egress_active()) {
+		skb = nf_hook_direct_egress(skb);
+		if (!skb)
+			return NET_XMIT_DROP;
+	}
+#endif
 	return dev_direct_xmit(skb, packet_pick_tx_queue(skb));
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH nf-next v5 6/6] netfilter: nf_tables: add egress support
  2021-09-28  9:55 [PATCH nf-next v5 0/6] Netfilter egress hook Pablo Neira Ayuso
                   ` (4 preceding siblings ...)
  2021-09-28  9:55 ` [PATCH nf-next v5 5/6] af_packet: " Pablo Neira Ayuso
@ 2021-09-28  9:55 ` Pablo Neira Ayuso
  2021-09-30  6:08 ` [PATCH nf-next v5 0/6] Netfilter egress hook Daniel Borkmann
  6 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2021-09-28  9:55 UTC (permalink / raw)
  To: netfilter-devel
  Cc: davem, netdev, kuba, lukas, daniel, kadlec, fw, ast, edumazet,
	tgraf, nevola, john.fastabend, willemb

Add egress chain type for the netdev family.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_tables_netdev.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_tables_netdev.c b/net/netfilter/nf_tables_netdev.c
index 8c42ea7d1be9..6abe4b59ce7d 100644
--- a/net/netfilter/nf_tables_netdev.c
+++ b/net/netfilter/nf_tables_netdev.c
@@ -36,9 +36,11 @@ static const struct nft_chain_type nft_chain_filter_netdev = {
 	.name		= "filter",
 	.type		= NFT_CHAIN_T_DEFAULT,
 	.family		= NFPROTO_NETDEV,
-	.hook_mask	= (1 << NF_NETDEV_INGRESS),
+	.hook_mask	= (1 << NF_NETDEV_INGRESS) |
+			  (1 << NF_NETDEV_EGRESS),
 	.hooks		= {
 		[NF_NETDEV_INGRESS]	= nft_do_chain_netdev,
+		[NF_NETDEV_EGRESS]	= nft_do_chain_netdev,
 	},
 };
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH nf-next v5 0/6] Netfilter egress hook
  2021-09-28  9:55 [PATCH nf-next v5 0/6] Netfilter egress hook Pablo Neira Ayuso
                   ` (5 preceding siblings ...)
  2021-09-28  9:55 ` [PATCH nf-next v5 6/6] netfilter: nf_tables: add egress support Pablo Neira Ayuso
@ 2021-09-30  6:08 ` Daniel Borkmann
  2021-09-30  6:52   ` Lukas Wunner
  2021-09-30  7:19   ` Pablo Neira Ayuso
  6 siblings, 2 replies; 22+ messages in thread
From: Daniel Borkmann @ 2021-09-30  6:08 UTC (permalink / raw)
  To: Pablo Neira Ayuso, netfilter-devel
  Cc: davem, netdev, kuba, lukas, kadlec, fw, ast, edumazet, tgraf,
	nevola, john.fastabend, willemb

On 9/28/21 11:55 AM, Pablo Neira Ayuso wrote:
> Hi,
> 
> This patchset v5 that re-adds the Netfilter egress:
> 
> 1) Rename linux/netfilter_ingress.h to linux/netfilter_netdev.h
>     from Lukas Wunner.
> 
> 2) Generalize ingress hook file to accomodate egress support,
>     from Lukas Wunner.
> 
> 3) Modularize Netfilter ingress hook into nf_tables_netdev: Daniel
>     Borkmann is requesting for a mechanism to allow to blacklist
>     Netfilter, this allows users to blacklist this new module that
>     includes ingress chain and the new egress chain for the netdev
>     family. There is no other in-tree user of the ingress and egress
>     hooks than this which might interfer with his matter.
> 
> 4) Place the egress hook again before the tc egress hook as requested
>     by Daniel Borkmann. Patch to add egress hook from Lukas Wunner.
>     The Netfilter egress hook remains behind the static key, if unused
>     performance degradation is negligible.
> 
> 5) Add netfilter egress handling to af_packet.
> 
> Arguably, distributors might decide to compile nf_tables_netdev
> built-in. Traditionally, distributors have compiled their kernels using
> the default configuration that Netfilter Kconfig provides (ie. use
> modules whenever possible). In any case, I consider that distributor
> policy is out of scope in this discussion, providing a mechanism to
> allow Daniel to prevent Netfilter ingress and egress chains to be loaded
> should be sufficient IMHO.

Hm, so in the case of SRv6 users were running into a similar issue and commit
7a3f5b0de364 ("netfilter: add netfilter hooks to SRv6 data plane") [0] added
a new hook along with a sysctl which defaults the new hook to off.

The rationale for it was given as "the hooks are enabled via nf_hooks_lwtunnel
sysctl to make sure existing netfilter rulesets do not break." [0,1]

If the suggestion to flag the skb [2] one way or another from the tc forwarding
path (e.g. skb bit or per-cpu marker) is not technically feasible, then why not
do a sysctl toggle like in the SRv6 case?

   [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7a3f5b0de3647c854e34269c3332d7a1e902901a
   [1] https://lore.kernel.org/netdev/20210830093852.21654-1-pablo@netfilter.org/
   [2] https://lore.kernel.org/netdev/11584665-e3b5-3afe-d72c-ef92ad0b9313@iogearbox.net/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH nf-next v5 0/6] Netfilter egress hook
  2021-09-30  6:08 ` [PATCH nf-next v5 0/6] Netfilter egress hook Daniel Borkmann
@ 2021-09-30  6:52   ` Lukas Wunner
  2021-09-30  7:10     ` Daniel Borkmann
  2021-09-30  7:21     ` Pablo Neira Ayuso
  2021-09-30  7:19   ` Pablo Neira Ayuso
  1 sibling, 2 replies; 22+ messages in thread
From: Lukas Wunner @ 2021-09-30  6:52 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Pablo Neira Ayuso, netfilter-devel, davem, netdev, kuba, kadlec,
	fw, ast, edumazet, tgraf, nevola, john.fastabend, willemb

On Thu, Sep 30, 2021 at 08:08:53AM +0200, Daniel Borkmann wrote:
> Hm, so in the case of SRv6 users were running into a similar issue
> and commit 7a3f5b0de364 ("netfilter: add netfilter hooks to SRv6
> data plane") [0] added a new hook along with a sysctl which defaults
> the new hook to off.
> 
> The rationale for it was given as "the hooks are enabled via
> nf_hooks_lwtunnel sysctl to make sure existing netfilter rulesets
>  do not break." [0,1]
> 
> If the suggestion to flag the skb [2] one way or another from the
> tc forwarding path (e.g. skb bit or per-cpu marker) is not
> technically feasible, then why not do a sysctl toggle like in the
> SRv6 case?

The skb flag *is* technically feasible.  I amended the patches with
the flag and was going to post them this week, but Pablo beat me to
the punch and posted his alternative version, which lacks the flag
but modularizes netfilter ingress/egress processing instead.

Honestly I think a hodge-podge of config options and sysctl toggles
is awful and I would prefer the skb flag you suggested.  I kind of
like your idea of considering tc and netfilter as layers.

FWIW the finished patches *with* the flag are on this branch:
https://github.com/l1k/linux/commits/nft_egress_v5

Below is the "git range-diff" between Pablo's patches and mine
(just the hunks which pertain to the skb flag, plus excerpts
from the commit message).

Would you find the patch set acceptable with this skb flag?

-- >8 --

    +    Alternatively or in addition to netfilter, packets can be classified
    +    with traffic control (tc).  On ingress, packets are classified first by
    +    tc, then by netfilter.  On egress, the order is reversed for symmetry.
    +    Conceptually, tc and netfilter can be thought of as layers, with
    +    netfilter layered above tc.
     
    +    Traffic control is capable of redirecting packets to another interface
    +    (man 8 tc-mirred).  E.g., an ingress packet may be redirected from the
    +    host namespace to a container via a veth connection:
    +    tc ingress (host) -> tc egress (veth host) -> tc ingress (veth container)
     
    +    In this case, netfilter egress classifying is not performed when leaving
    +    the host namespace!  That's because the packet is still on the tc layer.
    +    If tc redirects the packet to a physical interface in the host namespace
    +    such that it leaves the system, the packet is never subjected to
    +    netfilter egress classifying.  That is only logical since it hasn't
    +    passed through netfilter ingress classifying either.
     
    +    Packets can alternatively be redirected at the netfilter layer using
    +    nft fwd.  Such a packet *is* subjected to netfilter egress classifying.
    +    Internally, the skb->nf_skip_egress flag controls whether netfilter is
    +    invoked on egress by __dev_queue_xmit().
    +
    +    Interaction between tc and netfilter is possible by setting and querying
    +    skb->mark.
     
    @@ include/linux/netfilter_netdev.h: static inline int nf_hook_ingress(struct sk_bu
     +static inline struct sk_buff *nf_hook_egress(struct sk_buff *skb, int *rc,
     +					     struct net_device *dev)
     +{
    -+	struct nf_hook_entries *e = rcu_dereference(dev->nf_hooks_egress);
    ++	struct nf_hook_entries *e;
     +	struct nf_hook_state state;
     +	int ret;
     +
    ++	if (skb->nf_skip_egress)
    ++		return skb;
    ++
    ++	e = rcu_dereference(dev->nf_hooks_egress);
     +	if (!e)
     +		return skb;
     +
    @@ include/linux/netfilter_netdev.h: static inline int nf_hook_ingress(struct sk_bu
     +		return NULL;
     +	}
     +}
    ++
    ++static inline void nf_skip_egress(struct sk_buff *skb, bool skip)
    ++{
    ++	skb->nf_skip_egress = skip;
    ++}
     +#else /* CONFIG_NETFILTER_EGRESS */
     +static inline bool nf_hook_egress_active(void)
     +{
    @@ include/linux/netfilter_netdev.h: static inline int nf_hook_ingress(struct sk_bu
     +{
     +	return skb;
     +}
    ++
    ++static inline void nf_skip_egress(struct sk_buff *skb, bool skip)
    ++{
    ++}
     +#endif /* CONFIG_NETFILTER_EGRESS */
     +
      static inline void nf_hook_netdev_init(struct net_device *dev)
    @@ include/linux/netfilter_netdev.h: static inline int nf_hook_ingress(struct sk_bu
      
      #endif /* _NETFILTER_NETDEV_H_ */
     
    + ## include/linux/skbuff.h ##
    +@@ include/linux/skbuff.h: typedef unsigned char *sk_buff_data_t;
    +  *	@tc_at_ingress: used within tc_classify to distinguish in/egress
    +  *	@redirected: packet was redirected by packet classifier
    +  *	@from_ingress: packet was redirected from the ingress path
    ++ *	@nf_skip_egress: packet shall skip netfilter egress processing
    +  *	@peeked: this packet has been seen already, so stats have been
    +  *		done for it, don't do them again
    +  *	@nf_trace: netfilter packet trace flag
    +@@ include/linux/skbuff.h: struct sk_buff {
    + #ifdef CONFIG_NET_REDIRECT
    + 	__u8			from_ingress:1;
    + #endif
    ++#ifdef CONFIG_NETFILTER_EGRESS
    ++	__u8			nf_skip_egress:1;
    ++#endif
    + #ifdef CONFIG_TLS_DEVICE
    + 	__u8			decrypted:1;
    + #endif
    +
      ## include/uapi/linux/netfilter.h ##
     @@ include/uapi/linux/netfilter.h: enum nf_inet_hooks {
      
    @@ net/core/dev.c: static int __dev_queue_xmit(struct sk_buff *skb, struct net_devi
     +			if (!skb)
     +				goto out;
     +		}
    ++		nf_skip_egress(skb, true);
      		skb = sch_handle_egress(skb, &rc, dev);
      		if (!skb)
      			goto out;
    @@ net/core/dev.c: static int __dev_queue_xmit(struct sk_buff *skb, struct net_devi
      #endif
      	/* If device/qdisc don't need skb->dst, release it right now while
      	 * its hot in this cpu cache.
    +@@ net/core/dev.c: static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
    + 	if (static_branch_unlikely(&ingress_needed_key)) {
    + 		bool another = false;
    + 
    ++		nf_skip_egress(skb, true);
    + 		skb = sch_handle_ingress(skb, &pt_prev, &ret, orig_dev,
    + 					 &another);
    + 		if (another)
    +@@ net/core/dev.c: static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
    + 		if (!skb)
    + 			goto out;
    + 
    ++		nf_skip_egress(skb, false);
    + 		if (nf_ingress(skb, &pt_prev, &ret, orig_dev) < 0)
    + 			goto out;
    + 	}


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH nf-next v5 0/6] Netfilter egress hook
  2021-09-30  6:52   ` Lukas Wunner
@ 2021-09-30  7:10     ` Daniel Borkmann
  2021-09-30  7:21     ` Pablo Neira Ayuso
  1 sibling, 0 replies; 22+ messages in thread
From: Daniel Borkmann @ 2021-09-30  7:10 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Pablo Neira Ayuso, netfilter-devel, davem, netdev, kuba, kadlec,
	fw, ast, edumazet, tgraf, nevola, john.fastabend, willemb

On 9/30/21 8:52 AM, Lukas Wunner wrote:
> On Thu, Sep 30, 2021 at 08:08:53AM +0200, Daniel Borkmann wrote:
>> Hm, so in the case of SRv6 users were running into a similar issue
>> and commit 7a3f5b0de364 ("netfilter: add netfilter hooks to SRv6
>> data plane") [0] added a new hook along with a sysctl which defaults
>> the new hook to off.
>>
>> The rationale for it was given as "the hooks are enabled via
>> nf_hooks_lwtunnel sysctl to make sure existing netfilter rulesets
>>   do not break." [0,1]
>>
>> If the suggestion to flag the skb [2] one way or another from the
>> tc forwarding path (e.g. skb bit or per-cpu marker) is not
>> technically feasible, then why not do a sysctl toggle like in the
>> SRv6 case?
> 
> The skb flag *is* technically feasible.  I amended the patches with
> the flag and was going to post them this week, but Pablo beat me to
> the punch and posted his alternative version, which lacks the flag
> but modularizes netfilter ingress/egress processing instead.
> 
> Honestly I think a hodge-podge of config options and sysctl toggles
> is awful and I would prefer the skb flag you suggested.  I kind of
> like your idea of considering tc and netfilter as layers.
> 
> FWIW the finished patches *with* the flag are on this branch:
> https://github.com/l1k/linux/commits/nft_egress_v5
> 
> Below is the "git range-diff" between Pablo's patches and mine
> (just the hunks which pertain to the skb flag, plus excerpts
> from the commit message).
> 
> Would you find the patch set acceptable with this skb flag?

Yes, that looks good to me, thanks Lukas!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH nf-next v5 0/6] Netfilter egress hook
  2021-09-30  6:08 ` [PATCH nf-next v5 0/6] Netfilter egress hook Daniel Borkmann
  2021-09-30  6:52   ` Lukas Wunner
@ 2021-09-30  7:19   ` Pablo Neira Ayuso
  2021-09-30  7:33     ` Daniel Borkmann
  1 sibling, 1 reply; 22+ messages in thread
From: Pablo Neira Ayuso @ 2021-09-30  7:19 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: netfilter-devel, davem, netdev, kuba, lukas, kadlec, fw, ast,
	edumazet, tgraf, nevola, john.fastabend, willemb

On Thu, Sep 30, 2021 at 08:08:53AM +0200, Daniel Borkmann wrote:
> On 9/28/21 11:55 AM, Pablo Neira Ayuso wrote:
> > Hi,
> > 
> > This patchset v5 that re-adds the Netfilter egress:
> > 
> > 1) Rename linux/netfilter_ingress.h to linux/netfilter_netdev.h
> >     from Lukas Wunner.
> > 
> > 2) Generalize ingress hook file to accomodate egress support,
> >     from Lukas Wunner.
> > 
> > 3) Modularize Netfilter ingress hook into nf_tables_netdev: Daniel
> >     Borkmann is requesting for a mechanism to allow to blacklist
> >     Netfilter, this allows users to blacklist this new module that
> >     includes ingress chain and the new egress chain for the netdev
> >     family. There is no other in-tree user of the ingress and egress
> >     hooks than this which might interfer with his matter.
> > 
> > 4) Place the egress hook again before the tc egress hook as requested
> >     by Daniel Borkmann. Patch to add egress hook from Lukas Wunner.
> >     The Netfilter egress hook remains behind the static key, if unused
> >     performance degradation is negligible.
> > 
> > 5) Add netfilter egress handling to af_packet.
> > 
> > Arguably, distributors might decide to compile nf_tables_netdev
> > built-in. Traditionally, distributors have compiled their kernels using
> > the default configuration that Netfilter Kconfig provides (ie. use
> > modules whenever possible). In any case, I consider that distributor
> > policy is out of scope in this discussion, providing a mechanism to
> > allow Daniel to prevent Netfilter ingress and egress chains to be loaded
> > should be sufficient IMHO.
> 
> Hm, so in the case of SRv6 users were running into a similar issue and commit
> 7a3f5b0de364 ("netfilter: add netfilter hooks to SRv6 data plane") [0] added
> a new hook along with a sysctl which defaults the new hook to off.
> 
> The rationale for it was given as "the hooks are enabled via nf_hooks_lwtunnel
> sysctl to make sure existing netfilter rulesets do not break." [0,1]
> 
> If the suggestion to flag the skb [2] one way or another from the tc forwarding
> path (e.g. skb bit or per-cpu marker) is not technically feasible, then why not
> do a sysctl toggle like in the SRv6 case?

I am already providing a global toggle to disable netdev
ingress/egress hooks?

In the SRv6 case that is not possible.

Why do you need you need a sysctl knob when my proposal is already
addressing your needs?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH nf-next v5 0/6] Netfilter egress hook
  2021-09-30  6:52   ` Lukas Wunner
  2021-09-30  7:10     ` Daniel Borkmann
@ 2021-09-30  7:21     ` Pablo Neira Ayuso
  1 sibling, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2021-09-30  7:21 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Daniel Borkmann, netfilter-devel, davem, netdev, kuba, kadlec,
	fw, ast, edumazet, tgraf, nevola, john.fastabend, willemb

On Thu, Sep 30, 2021 at 08:52:38AM +0200, Lukas Wunner wrote:
> On Thu, Sep 30, 2021 at 08:08:53AM +0200, Daniel Borkmann wrote:
> > Hm, so in the case of SRv6 users were running into a similar issue
> > and commit 7a3f5b0de364 ("netfilter: add netfilter hooks to SRv6
> > data plane") [0] added a new hook along with a sysctl which defaults
> > the new hook to off.
> > 
> > The rationale for it was given as "the hooks are enabled via
> > nf_hooks_lwtunnel sysctl to make sure existing netfilter rulesets
> >  do not break." [0,1]
> > 
> > If the suggestion to flag the skb [2] one way or another from the
> > tc forwarding path (e.g. skb bit or per-cpu marker) is not
> > technically feasible, then why not do a sysctl toggle like in the
> > SRv6 case?
> 
> The skb flag *is* technically feasible.  I amended the patches with
> the flag and was going to post them this week, but Pablo beat me to
> the punch and posted his alternative version, which lacks the flag
> but modularizes netfilter ingress/egress processing instead.
> 
> Honestly I think a hodge-podge of config options and sysctl toggles
> is awful and I would prefer the skb flag you suggested.  I kind of
> like your idea of considering tc and netfilter as layers.
> 
> FWIW the finished patches *with* the flag are on this branch:
> https://github.com/l1k/linux/commits/nft_egress_v5
> 
> Below is the "git range-diff" between Pablo's patches and mine
> (just the hunks which pertain to the skb flag, plus excerpts
> from the commit message).
> 
> Would you find the patch set acceptable with this skb flag?

Why do you need a programmatic skb flag?

Are you planning to do:

        skb->skip_nf_egress = random();

from the packet path?

Seriously, Daniel is asking for a global toggle to disable Netfilter.

What is wrong with the Netfilter blacklisting approach?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH nf-next v5 0/6] Netfilter egress hook
  2021-09-30  7:19   ` Pablo Neira Ayuso
@ 2021-09-30  7:33     ` Daniel Borkmann
  2021-09-30  9:21       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Borkmann @ 2021-09-30  7:33 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, davem, netdev, kuba, lukas, kadlec, fw, ast,
	edumazet, tgraf, nevola, john.fastabend, willemb

On 9/30/21 9:19 AM, Pablo Neira Ayuso wrote:
> On Thu, Sep 30, 2021 at 08:08:53AM +0200, Daniel Borkmann wrote:
>> On 9/28/21 11:55 AM, Pablo Neira Ayuso wrote:
>>> Hi,
>>>
>>> This patchset v5 that re-adds the Netfilter egress:
>>>
>>> 1) Rename linux/netfilter_ingress.h to linux/netfilter_netdev.h
>>>      from Lukas Wunner.
>>>
>>> 2) Generalize ingress hook file to accomodate egress support,
>>>      from Lukas Wunner.
>>>
>>> 3) Modularize Netfilter ingress hook into nf_tables_netdev: Daniel
>>>      Borkmann is requesting for a mechanism to allow to blacklist
>>>      Netfilter, this allows users to blacklist this new module that
>>>      includes ingress chain and the new egress chain for the netdev
>>>      family. There is no other in-tree user of the ingress and egress
>>>      hooks than this which might interfer with his matter.
>>>
>>> 4) Place the egress hook again before the tc egress hook as requested
>>>      by Daniel Borkmann. Patch to add egress hook from Lukas Wunner.
>>>      The Netfilter egress hook remains behind the static key, if unused
>>>      performance degradation is negligible.
>>>
>>> 5) Add netfilter egress handling to af_packet.
>>>
>>> Arguably, distributors might decide to compile nf_tables_netdev
>>> built-in. Traditionally, distributors have compiled their kernels using
>>> the default configuration that Netfilter Kconfig provides (ie. use
>>> modules whenever possible). In any case, I consider that distributor
>>> policy is out of scope in this discussion, providing a mechanism to
>>> allow Daniel to prevent Netfilter ingress and egress chains to be loaded
>>> should be sufficient IMHO.
>>
>> Hm, so in the case of SRv6 users were running into a similar issue and commit
>> 7a3f5b0de364 ("netfilter: add netfilter hooks to SRv6 data plane") [0] added
>> a new hook along with a sysctl which defaults the new hook to off.
>>
>> The rationale for it was given as "the hooks are enabled via nf_hooks_lwtunnel
>> sysctl to make sure existing netfilter rulesets do not break." [0,1]
>>
>> If the suggestion to flag the skb [2] one way or another from the tc forwarding
>> path (e.g. skb bit or per-cpu marker) is not technically feasible, then why not
>> do a sysctl toggle like in the SRv6 case?
> 
> I am already providing a global toggle to disable netdev
> ingress/egress hooks?
> 
> In the SRv6 case that is not possible.
> 
> Why do you need you need a sysctl knob when my proposal is already
> addressing your needs?

Well, it's not addressing anything ... you even mention it yourself "arguably,
distributors might decide to compile nf_tables_netdev built-in".

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH nf-next v5 0/6] Netfilter egress hook
  2021-09-30  7:33     ` Daniel Borkmann
@ 2021-09-30  9:21       ` Pablo Neira Ayuso
  2021-09-30 14:28         ` Jakub Kicinski
  0 siblings, 1 reply; 22+ messages in thread
From: Pablo Neira Ayuso @ 2021-09-30  9:21 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: netfilter-devel, davem, netdev, kuba, lukas, kadlec, fw, ast,
	edumazet, tgraf, nevola, john.fastabend, willemb

On Thu, Sep 30, 2021 at 09:33:23AM +0200, Daniel Borkmann wrote:
> On 9/30/21 9:19 AM, Pablo Neira Ayuso wrote:
[...]
> > Why do you need you need a sysctl knob when my proposal is already
> > addressing your needs?
> 
> Well, it's not addressing anything ... you even mention it yourself "arguably,
> distributors might decide to compile nf_tables_netdev built-in".

I said distributors traditionally select the option that we signal to
them, which is to enable this as module. We can document this in
Kconfig. I think distributors should select whatever is better for
their needs.

Anyway, I'll tell you why module blacklisting is bad: It is a hammer,
it is a band aid to a problem. Blacklisting is just making things
worst because it makes some people believe that something is
unfixable. Yes, it took me a while to figure out.

We already entered the let's bloat the skbuff for many years already,
this is stuffing one more bit into the skbuff just because maybe users
might break an existing setup when they load new rules to the new
netfilter egress hook.

Probably the sysctl for this new egress hook is the way to go as you
suggest.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH nf-next v5 0/6] Netfilter egress hook
  2021-09-30  9:21       ` Pablo Neira Ayuso
@ 2021-09-30 14:28         ` Jakub Kicinski
  2021-09-30 15:13           ` Pablo Neira Ayuso
  2021-09-30 17:12           ` Lukas Wunner
  0 siblings, 2 replies; 22+ messages in thread
From: Jakub Kicinski @ 2021-09-30 14:28 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Daniel Borkmann, netfilter-devel, davem, netdev, lukas, kadlec,
	fw, ast, edumazet, tgraf, nevola, john.fastabend, willemb

On Thu, 30 Sep 2021 11:21:42 +0200 Pablo Neira Ayuso wrote:
> On Thu, Sep 30, 2021 at 09:33:23AM +0200, Daniel Borkmann wrote:
> > On 9/30/21 9:19 AM, Pablo Neira Ayuso wrote:  
> > > Why do you need you need a sysctl knob when my proposal is already
> > > addressing your needs?  
> > 
> > Well, it's not addressing anything ... you even mention it yourself "arguably,
> > distributors might decide to compile nf_tables_netdev built-in".  
> 
> I said distributors traditionally select the option that we signal to
> them, which is to enable this as module. We can document this in
> Kconfig. I think distributors should select whatever is better for
> their needs.
> 
> Anyway, I'll tell you why module blacklisting is bad: It is a hammer,
> it is a band aid to a problem. Blacklisting is just making things
> worst because it makes some people believe that something is
> unfixable. Yes, it took me a while to figure out.
> 
> We already entered the let's bloat the skbuff for many years already,
> this is stuffing one more bit into the skbuff just because maybe users
> might break an existing setup when they load new rules to the new
> netfilter egress hook.

The lifetime of this information is constrained, can't it be a percpu
flag, like xmit_more?

> Probably the sysctl for this new egress hook is the way to go as you
> suggest.

Knobs is making users pay, let's do our best to avoid that.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH nf-next v5 0/6] Netfilter egress hook
  2021-09-30 14:28         ` Jakub Kicinski
@ 2021-09-30 15:13           ` Pablo Neira Ayuso
  2021-09-30 16:06             ` Jakub Kicinski
  2021-09-30 17:12           ` Lukas Wunner
  1 sibling, 1 reply; 22+ messages in thread
From: Pablo Neira Ayuso @ 2021-09-30 15:13 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Daniel Borkmann, netfilter-devel, davem, netdev, lukas, kadlec,
	fw, ast, edumazet, tgraf, nevola, john.fastabend, willemb

On Thu, Sep 30, 2021 at 07:28:35AM -0700, Jakub Kicinski wrote:
> On Thu, 30 Sep 2021 11:21:42 +0200 Pablo Neira Ayuso wrote:
[...]
> The lifetime of this information is constrained, can't it be a percpu
> flag, like xmit_more?

It's just one single bit in this case after all.

> > Probably the sysctl for this new egress hook is the way to go as you
> > suggest.
> 
> Knobs is making users pay, let's do our best to avoid that.

Could you elaborate?

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH nf-next v5 0/6] Netfilter egress hook
  2021-09-30 15:13           ` Pablo Neira Ayuso
@ 2021-09-30 16:06             ` Jakub Kicinski
  2021-09-30 18:00               ` Pablo Neira Ayuso
  0 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2021-09-30 16:06 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Daniel Borkmann, netfilter-devel, davem, netdev, lukas, kadlec,
	fw, ast, edumazet, tgraf, nevola, john.fastabend, willemb

On Thu, 30 Sep 2021 17:13:37 +0200 Pablo Neira Ayuso wrote:
> On Thu, Sep 30, 2021 at 07:28:35AM -0700, Jakub Kicinski wrote:
> > The lifetime of this information is constrained, can't it be a percpu
> > flag, like xmit_more?  
> 
> It's just one single bit in this case after all.

??

> > > Probably the sysctl for this new egress hook is the way to go as you
> > > suggest.  
> > 
> > Knobs is making users pay, let's do our best to avoid that.  
> 
> Could you elaborate?

My reading of Daniel's objections was that the layering is incorrect
because tc is not exclusively "under" nf. That problem is not solved 
by adding a knob. The only thing the knob achieves is let someone
deploying tc/bpf based solution protect themselves from accidental
nf deployment.

That's just background / level set. IDK what requires explanation 
in my statement itself. I thought "admin knobs are bad" is as
universally agreed on as, say, "testing is good".

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH nf-next v5 0/6] Netfilter egress hook
  2021-09-30 14:28         ` Jakub Kicinski
  2021-09-30 15:13           ` Pablo Neira Ayuso
@ 2021-09-30 17:12           ` Lukas Wunner
  2021-09-30 17:19             ` Jakub Kicinski
  1 sibling, 1 reply; 22+ messages in thread
From: Lukas Wunner @ 2021-09-30 17:12 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Pablo Neira Ayuso, Daniel Borkmann, netfilter-devel, davem,
	netdev, kadlec, fw, ast, edumazet, tgraf, nevola, john.fastabend,
	willemb

On Thu, Sep 30, 2021 at 07:28:35AM -0700, Jakub Kicinski wrote:
> On Thu, 30 Sep 2021 11:21:42 +0200 Pablo Neira Ayuso wrote:
> > this is stuffing one more bit into the skbuff
> 
> The lifetime of this information is constrained, can't it be a percpu
> flag, like xmit_more?

Hm, can't an skb be queued and processed later on a different cpu?
E.g. what about fragments?

That would rule out a percpu flag, leaving a flag in struct sk_buff
as the only option.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH nf-next v5 0/6] Netfilter egress hook
  2021-09-30 17:12           ` Lukas Wunner
@ 2021-09-30 17:19             ` Jakub Kicinski
  2021-09-30 17:36               ` Lukas Wunner
  0 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2021-09-30 17:19 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Pablo Neira Ayuso, Daniel Borkmann, netfilter-devel, davem,
	netdev, kadlec, fw, ast, edumazet, tgraf, nevola, john.fastabend,
	willemb

On Thu, 30 Sep 2021 19:12:53 +0200 Lukas Wunner wrote:
> On Thu, Sep 30, 2021 at 07:28:35AM -0700, Jakub Kicinski wrote:
> > On Thu, 30 Sep 2021 11:21:42 +0200 Pablo Neira Ayuso wrote:  
> > > this is stuffing one more bit into the skbuff  
> > 
> > The lifetime of this information is constrained, can't it be a percpu
> > flag, like xmit_more?  
> 
> Hm, can't an skb be queued and processed later on a different cpu?
> E.g. what about fragments?
> 
> That would rule out a percpu flag, leaving a flag in struct sk_buff
> as the only option.

What queuing do you have in mind? Qdisc is after the egress hook.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH nf-next v5 0/6] Netfilter egress hook
  2021-09-30 17:19             ` Jakub Kicinski
@ 2021-09-30 17:36               ` Lukas Wunner
  0 siblings, 0 replies; 22+ messages in thread
From: Lukas Wunner @ 2021-09-30 17:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Pablo Neira Ayuso, Daniel Borkmann, netfilter-devel, davem,
	netdev, kadlec, fw, ast, edumazet, tgraf, nevola, john.fastabend,
	willemb

On Thu, Sep 30, 2021 at 10:19:20AM -0700, Jakub Kicinski wrote:
> On Thu, 30 Sep 2021 19:12:53 +0200 Lukas Wunner wrote:
> > On Thu, Sep 30, 2021 at 07:28:35AM -0700, Jakub Kicinski wrote:
> > > On Thu, 30 Sep 2021 11:21:42 +0200 Pablo Neira Ayuso wrote:  
> > > > this is stuffing one more bit into the skbuff  
> > > 
> > > The lifetime of this information is constrained, can't it be a percpu
> > > flag, like xmit_more?  
> > 
> > Hm, can't an skb be queued and processed later on a different cpu?
> > E.g. what about fragments?
> > 
> > That would rule out a percpu flag, leaving a flag in struct sk_buff
> > as the only option.
> 
> What queuing do you have in mind? Qdisc is after the egress hook.

Ingress queueing.  E.g. a packet may be redirected or mirrored by tc
on ingress to another interface, resulting in a recursive call to
netif_receive_skb() or dev_queue_xmit().  The packet may be bounced
around an arbitrary number of times this way.  Forwarding like that
can happen both at the tc and the netfilter "layer".

I'm concerned that a packet may be handled by different cpus along the way
and queueing might be one possibility how this could happen.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH nf-next v5 0/6] Netfilter egress hook
  2021-09-30 16:06             ` Jakub Kicinski
@ 2021-09-30 18:00               ` Pablo Neira Ayuso
  2021-09-30 19:17                 ` Jakub Kicinski
  0 siblings, 1 reply; 22+ messages in thread
From: Pablo Neira Ayuso @ 2021-09-30 18:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Daniel Borkmann, netfilter-devel, davem, netdev, lukas, kadlec,
	fw, ast, edumazet, tgraf, nevola, john.fastabend, willemb

On Thu, Sep 30, 2021 at 09:06:52AM -0700, Jakub Kicinski wrote:
> On Thu, 30 Sep 2021 17:13:37 +0200 Pablo Neira Ayuso wrote:
> > On Thu, Sep 30, 2021 at 07:28:35AM -0700, Jakub Kicinski wrote:
> > > The lifetime of this information is constrained, can't it be a percpu
> > > flag, like xmit_more?  
> > 
> > It's just one single bit in this case after all.
> 
> ??

There are "escape" points such ifb from ingress, where the packets gets
enqueued and then percpu might not help, it might be fragile to use
percpu in this case.

> > > > Probably the sysctl for this new egress hook is the way to go as you
> > > > suggest.  
> > > 
> > > Knobs is making users pay, let's do our best to avoid that.  
> > 
> > Could you elaborate?
> 
> My reading of Daniel's objections was that the layering is incorrect
> because tc is not exclusively "under" nf. That problem is not solved 
> by adding a knob. The only thing the knob achieves is let someone
> deploying tc/bpf based solution protect themselves from accidental
> nf deployment.
> 
> That's just background / level set. IDK what requires explanation 
> in my statement itself. I thought "admin knobs are bad" is as
> universally agreed on as, say, "testing is good".

Yes, knobs are not ideal but Daniel mentioned it as a posibility, the
skbuff bit might not ideal either because it might be not easy to
debug the behaviour that it turns on to the user, but it could only be
set on from act_mirred, Daniel mentioned to cover only the
skb->redirect case.

Thanks

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH nf-next v5 0/6] Netfilter egress hook
  2021-09-30 18:00               ` Pablo Neira Ayuso
@ 2021-09-30 19:17                 ` Jakub Kicinski
  0 siblings, 0 replies; 22+ messages in thread
From: Jakub Kicinski @ 2021-09-30 19:17 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Daniel Borkmann, netfilter-devel, davem, netdev, lukas, kadlec,
	fw, ast, edumazet, tgraf, nevola, john.fastabend, willemb

On Thu, 30 Sep 2021 20:00:56 +0200 Pablo Neira Ayuso wrote:
> On Thu, Sep 30, 2021 at 09:06:52AM -0700, Jakub Kicinski wrote:
> > On Thu, 30 Sep 2021 17:13:37 +0200 Pablo Neira Ayuso wrote:  
> > > It's just one single bit in this case after all.  
> > 
> > ??  
> 
> There are "escape" points such ifb from ingress, where the packets gets
> enqueued and then percpu might not help, it might be fragile to use
> percpu in this case.

You still have to scrub the skb mark at the correct points, otherwise
the ignoring egress may propagate beyond the "paired hook". I don't see
much difference in fragility TBH.

Speaking of ifb, doesn't it have an egress hook? And ingress on the way
out? IMHO the "ignore egress" mark should not survive going thru ifb.

Anyway, that's just my preference. Whatever you, Daniel and Lukas
decide together in the end is fine by me.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-09-30 19:18 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-28  9:55 [PATCH nf-next v5 0/6] Netfilter egress hook Pablo Neira Ayuso
2021-09-28  9:55 ` [PATCH nf-next v5 1/6] netfilter: Rename ingress hook include file Pablo Neira Ayuso
2021-09-28  9:55 ` [PATCH nf-next v5 2/6] netfilter: Generalize " Pablo Neira Ayuso
2021-09-28  9:55 ` [PATCH nf-next v5 3/6] netfilter: nf_tables: move netdev ingress filter chain to nf_tables_netdev.c Pablo Neira Ayuso
2021-09-28  9:55 ` [PATCH nf-next v5 4/6] netfilter: Introduce egress hook Pablo Neira Ayuso
2021-09-28  9:55 ` [PATCH nf-next v5 5/6] af_packet: " Pablo Neira Ayuso
2021-09-28  9:55 ` [PATCH nf-next v5 6/6] netfilter: nf_tables: add egress support Pablo Neira Ayuso
2021-09-30  6:08 ` [PATCH nf-next v5 0/6] Netfilter egress hook Daniel Borkmann
2021-09-30  6:52   ` Lukas Wunner
2021-09-30  7:10     ` Daniel Borkmann
2021-09-30  7:21     ` Pablo Neira Ayuso
2021-09-30  7:19   ` Pablo Neira Ayuso
2021-09-30  7:33     ` Daniel Borkmann
2021-09-30  9:21       ` Pablo Neira Ayuso
2021-09-30 14:28         ` Jakub Kicinski
2021-09-30 15:13           ` Pablo Neira Ayuso
2021-09-30 16:06             ` Jakub Kicinski
2021-09-30 18:00               ` Pablo Neira Ayuso
2021-09-30 19:17                 ` Jakub Kicinski
2021-09-30 17:12           ` Lukas Wunner
2021-09-30 17:19             ` Jakub Kicinski
2021-09-30 17:36               ` Lukas Wunner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.