All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 nf-next 0/5] netfilter: nft: introduce routing expression
@ 2016-10-19 18:34 Anders K. Pedersen | Cohaesio
  2016-10-19 18:35 ` [PATCH v2 nf-next 1/5] netfilter: nft: UAPI headers for " Anders K. Pedersen | Cohaesio
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Anders K. Pedersen | Cohaesio @ 2016-10-19 18:34 UTC (permalink / raw)
  To: netfilter-devel, pablo

Version 2 changes:
- Fix comments for enum nft_rt_keys
- Remove nft_rt*_select_ops
- Add new nft_rt* modules to kernel_cleanup() in tests/shell/run-tests.sh

This patch series introduces an nftables rt expression for routing related
data with support for nexthop (i.e. the directly connected IP address that
an outgoing packet is sent to), which can be used either for matching or
accounting, eg.

 # nft add rule filter postrouting \
        ip daddr 192.168.1.0/24 rt ip nexthop != 192.168.0.1 drop

This will drop any traffic to 192.168.1.0/24 that is not routed via
192.168.0.1.

 # nft add rule filter postrouting \
        flow table acct { rt ip nexthop timeout 600s counter }
 # nft add rule ip6 filter postrouting \
        flow table acct { rt ip6 nexthop6 timeout 600s counter }

These rules count outgoing traffic per nexthop. Note that the timeout
releases an entry if no traffic is seen for this nexthop within 10 minutes.

 # nft add rule inet filter postrouting \
        ether type ip \
        flow table acct { rt ip nexthop timeout 600s counter }
 # nft add rule inet filter postrouting \
        ether type ip6 \
        flow table acct { rt ip6 nexthop6 timeout 600s counter }

Same as above, but via the inet family, where the ether type must be
specified explicitly.

"rt classid" is also implemented identical to "meta rtclassid", since it
is more logical to have this match in the routing expression going forward.

Regards,
Anders K. Pedersen
---
 include/net/netfilter/nft_rt.h           |  23 +++++
 include/uapi/linux/netfilter/nf_tables.h |  27 ++++++
 net/ipv4/netfilter/Kconfig               |   4 +
 net/ipv4/netfilter/Makefile              |   1 +
 net/ipv4/netfilter/nft_rt_ipv4.c         | 109 ++++++++++++++++++++++++
 net/ipv6/netfilter/Kconfig               |   4 +
 net/ipv6/netfilter/Makefile              |   1 +
 net/ipv6/netfilter/nft_rt_ipv6.c         | 110 ++++++++++++++++++++++++
 net/netfilter/Kconfig                    |  11 +++
 net/netfilter/Makefile                   |   2 +
 net/netfilter/nft_rt.c                   | 137 +++++++++++++++++++++++++++++
 net/netfilter/nft_rt_inet.c              | 142 +++++++++++++++++++++++++++++++
 12 files changed, 571 insertions(+)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 nf-next 1/5] netfilter: nft: UAPI headers for routing expression
  2016-10-19 18:34 [PATCH v2 nf-next 0/5] netfilter: nft: introduce routing expression Anders K. Pedersen | Cohaesio
@ 2016-10-19 18:35 ` Anders K. Pedersen | Cohaesio
  2016-10-19 18:38 ` [PATCH v2 nf-next 2/5] netfilter: nft: basic " Anders K. Pedersen | Cohaesio
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Anders K. Pedersen | Cohaesio @ 2016-10-19 18:35 UTC (permalink / raw)
  To: netfilter-devel, pablo

From: Anders K. Pedersen <akp@cohaesio.com>

Add new UAPI header definitions for nftables "rt" expression, which will
enable usage of routing related data.

Signed-off-by: Anders K. Pedersen <akp@cohaesio.com>
---
v2
- fix comments for enum nft_rt_keys

 include/uapi/linux/netfilter/nf_tables.h |  27 ++++++
 1 files changed, 27 insertions(+)

diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index c6c4477..1992766 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -759,6 +759,17 @@ enum nft_meta_keys {
 };
 
 /**
+ * enum nft_rt_keys - nf_tables routing expression keys
+ *
+ * @NFT_RT_CLASSID: realm value of packet's route (skb->dst->tclassid)
+ * @NFT_RT_NEXTHOP: routing nexthop
+ */
+enum nft_rt_keys {
+	NFT_RT_CLASSID,
+	NFT_RT_NEXTHOP,
+};
+
+/**
  * enum nft_hash_attributes - nf_tables hash expression netlink attributes
  *
  * @NFTA_HASH_SREG: source register (NLA_U32)
@@ -797,6 +808,22 @@ enum nft_meta_attributes {
 #define NFTA_META_MAX		(__NFTA_META_MAX - 1)
 
 /**
+ * enum nft_rt_attributes - nf_tables routing expression netlink attributes
+ *
+ * @NFTA_RT_DREG: destination register (NLA_U32)
+ * @NFTA_RT_KEY: meta data item to load (NLA_U32: nft_rt_keys)
+ * @NFTA_RT_FAMILY: Address family (NLA_U32)
+ */
+enum nft_rt_attributes {
+	NFTA_RT_UNSPEC,
+	NFTA_RT_DREG,
+	NFTA_RT_KEY,
+	NFTA_RT_FAMILY,
+	__NFTA_RT_MAX
+};
+#define NFTA_RT_MAX		(__NFTA_RT_MAX - 1)
+
+/**
  * enum nft_ct_keys - nf_tables ct expression keys
  *
  * @NFT_CT_STATE: conntrack state (bitmask of enum ip_conntrack_info)

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 nf-next 2/5] netfilter: nft: basic routing expression
  2016-10-19 18:34 [PATCH v2 nf-next 0/5] netfilter: nft: introduce routing expression Anders K. Pedersen | Cohaesio
  2016-10-19 18:35 ` [PATCH v2 nf-next 1/5] netfilter: nft: UAPI headers for " Anders K. Pedersen | Cohaesio
@ 2016-10-19 18:38 ` Anders K. Pedersen | Cohaesio
  2016-10-19 18:39 ` [PATCH v2 nf-next 3/5] netfilter: nft: rt nexthop for IPv4 family Anders K. Pedersen | Cohaesio
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Anders K. Pedersen | Cohaesio @ 2016-10-19 18:38 UTC (permalink / raw)
  To: netfilter-devel, pablo

From: Anders K. Pedersen <akp@cohaesio.com>

Introduce basic infrastructure for nftables rt expression for routing
related data. Initially "rt classid" is implemented identical to "meta
rtclassid", since it is more logical to have this match in the routing
expression going forward.

Signed-off-by: Anders K. Pedersen <akp@cohaesio.com>
---
v2
- remove nft_rt_select_ops

 include/net/netfilter/nft_rt.h           |  23 +++++
 net/netfilter/Kconfig                    |   6 ++
 net/netfilter/Makefile                   |   1 +
 net/netfilter/nft_rt.c                   | 137 +++++++++++++++++++++++++++++
 4 files changed, 167 insertions(+)

diff --git a/include/net/netfilter/nft_rt.h b/include/net/netfilter/nft_rt.h
new file mode 100644
index 0000000..e32e9e6
--- /dev/null
+++ b/include/net/netfilter/nft_rt.h
@@ -0,0 +1,23 @@
+#ifndef _NFT_RT_H_
+#define _NFT_RT_H_
+
+struct nft_rt {
+	enum nft_rt_keys	key:8;
+	enum nft_registers	dreg:8;
+	u8			family;
+};
+
+extern const struct nla_policy nft_rt_policy[];
+
+int nft_rt_get_init(const struct nft_ctx *ctx,
+		    const struct nft_expr *expr,
+		    const struct nlattr * const tb[]);
+
+int nft_rt_get_dump(struct sk_buff *skb,
+		    const struct nft_expr *expr);
+
+void nft_rt_get_eval(const struct nft_expr *expr,
+		     struct nft_regs *regs,
+		     const struct nft_pktinfo *pkt);
+
+#endif
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index e8d56d9..40d6e0f 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -474,6 +474,12 @@ config NFT_META
 	  This option adds the "meta" expression that you can use to match and
 	  to set packet metainformation such as the packet mark.
 
+config NFT_RT
+	tristate "Netfilter nf_tables routing module"
+	help
+	  This option adds the "rt" expression that you can use to match
+	  packet routing information such as the packet nexthop.
+
 config NFT_NUMGEN
 	tristate "Netfilter nf_tables number generator module"
 	help
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index c23c3c8..f3a0c59 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -81,6 +81,7 @@ obj-$(CONFIG_NF_TABLES_NETDEV)	+= nf_tables_netdev.o
 obj-$(CONFIG_NFT_COMPAT)	+= nft_compat.o
 obj-$(CONFIG_NFT_EXTHDR)	+= nft_exthdr.o
 obj-$(CONFIG_NFT_META)		+= nft_meta.o
+obj-$(CONFIG_NFT_RT)		+= nft_rt.o
 obj-$(CONFIG_NFT_NUMGEN)	+= nft_numgen.o
 obj-$(CONFIG_NFT_CT)		+= nft_ct.o
 obj-$(CONFIG_NFT_LIMIT)		+= nft_limit.o
diff --git a/net/netfilter/nft_rt.c b/net/netfilter/nft_rt.c
new file mode 100644
index 0000000..775c283
--- /dev/null
+++ b/net/netfilter/nft_rt.c
@@ -0,0 +1,137 @@
+/*
+ * Copyright (c) 2016 Anders K. Pedersen <akp@cohaesio.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/netlink.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter/nf_tables.h>
+#include <net/dst.h>
+#include <net/netfilter/nf_tables.h>
+#include <net/netfilter/nf_tables_core.h>
+#include <net/netfilter/nft_rt.h>
+
+void nft_rt_get_eval(const struct nft_expr *expr,
+		     struct nft_regs *regs,
+		     const struct nft_pktinfo *pkt)
+{
+	const struct nft_rt *priv = nft_expr_priv(expr);
+	const struct sk_buff *skb = pkt->skb;
+	u32 *dest = &regs->data[priv->dreg];
+
+	switch (priv->key) {
+#ifdef CONFIG_IP_ROUTE_CLASSID
+	case NFT_RT_CLASSID: {
+		const struct dst_entry *dst = skb_dst(skb);
+
+		if (dst == NULL)
+			goto err;
+		*dest = dst->tclassid;
+		break;
+	}
+#endif
+	default:
+		WARN_ON(1);
+		goto err;
+	}
+	return;
+
+err:
+	regs->verdict.code = NFT_BREAK;
+}
+EXPORT_SYMBOL_GPL(nft_rt_get_eval);
+
+const struct nla_policy nft_rt_policy[NFTA_RT_MAX + 1] = {
+	[NFTA_RT_DREG]		= { .type = NLA_U32 },
+	[NFTA_RT_KEY]		= { .type = NLA_U32 },
+	[NFTA_RT_FAMILY]	= { .type = NLA_U32 },
+};
+EXPORT_SYMBOL_GPL(nft_rt_policy);
+
+int nft_rt_get_init(const struct nft_ctx *ctx,
+		    const struct nft_expr *expr,
+		    const struct nlattr * const tb[])
+{
+	struct nft_rt *priv = nft_expr_priv(expr);
+	unsigned int len;
+
+	if (tb[NFTA_RT_KEY] == NULL ||
+	    tb[NFTA_RT_DREG] == NULL ||
+	    tb[NFTA_RT_FAMILY] == NULL)
+		return -EINVAL;
+
+	priv->key = ntohl(nla_get_be32(tb[NFTA_RT_KEY]));
+	switch (priv->key) {
+#ifdef CONFIG_IP_ROUTE_CLASSID
+	case NFT_RT_CLASSID:
+		len = sizeof(u32);
+		break;
+#endif
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	priv->family = ntohl(nla_get_be32(tb[NFTA_RT_FAMILY]));
+	priv->dreg = nft_parse_register(tb[NFTA_RT_DREG]);
+	return nft_validate_register_store(ctx, priv->dreg, NULL,
+					   NFT_DATA_VALUE, len);
+}
+EXPORT_SYMBOL_GPL(nft_rt_get_init);
+
+int nft_rt_get_dump(struct sk_buff *skb,
+		    const struct nft_expr *expr)
+{
+	const struct nft_rt *priv = nft_expr_priv(expr);
+
+	if (nla_put_be32(skb, NFTA_RT_KEY, htonl(priv->key)))
+		goto nla_put_failure;
+	if (nft_dump_register(skb, NFTA_RT_DREG, priv->dreg))
+		goto nla_put_failure;
+	if (nla_put_be32(skb, NFTA_RT_FAMILY, htonl(priv->family)))
+		goto nla_put_failure;
+	return 0;
+
+nla_put_failure:
+	return -1;
+}
+EXPORT_SYMBOL_GPL(nft_rt_get_dump);
+
+static struct nft_expr_type nft_rt_type;
+static const struct nft_expr_ops nft_rt_get_ops = {
+	.type		= &nft_rt_type,
+	.size		= NFT_EXPR_SIZE(sizeof(struct nft_rt)),
+	.eval		= nft_rt_get_eval,
+	.init		= nft_rt_get_init,
+	.dump		= nft_rt_get_dump,
+};
+
+static struct nft_expr_type nft_rt_type __read_mostly = {
+	.name		= "rt",
+	.ops		= &nft_rt_get_ops,
+	.policy		= nft_rt_policy,
+	.maxattr	= NFTA_RT_MAX,
+	.owner		= THIS_MODULE,
+};
+
+static int __init nft_rt_module_init(void)
+{
+	return nft_register_expr(&nft_rt_type);
+}
+
+static void __exit nft_rt_module_exit(void)
+{
+	nft_unregister_expr(&nft_rt_type);
+}
+
+module_init(nft_rt_module_init);
+module_exit(nft_rt_module_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Anders K. Pedersen <akp@cohaesio.com>");
+MODULE_ALIAS_NFT_EXPR("rt");

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 nf-next 3/5] netfilter: nft: rt nexthop for IPv4 family
  2016-10-19 18:34 [PATCH v2 nf-next 0/5] netfilter: nft: introduce routing expression Anders K. Pedersen | Cohaesio
  2016-10-19 18:35 ` [PATCH v2 nf-next 1/5] netfilter: nft: UAPI headers for " Anders K. Pedersen | Cohaesio
  2016-10-19 18:38 ` [PATCH v2 nf-next 2/5] netfilter: nft: basic " Anders K. Pedersen | Cohaesio
@ 2016-10-19 18:39 ` Anders K. Pedersen | Cohaesio
  2016-10-19 18:40 ` [PATCH v2 nf-next 4/5] netfilter: nft: rt nexthop for IPv6 family Anders K. Pedersen | Cohaesio
  2016-10-19 18:41 ` [PATCH v2 nf-next 5/5] netfilter: nft: rt nexthop for inet family Anders K. Pedersen | Cohaesio
  4 siblings, 0 replies; 23+ messages in thread
From: Anders K. Pedersen | Cohaesio @ 2016-10-19 18:39 UTC (permalink / raw)
  To: netfilter-devel, pablo

From: Anders K. Pedersen <akp@cohaesio.com>

Add nftables IPv4 family support for an "rt ip nexthop" expression
allowing usage of the routing nexthop (i.e. the directly connected IP
address that an outgoing packet is sent to) for matching or accounting, eg.

 # nft add rule filter postrouting \
	ip daddr 192.168.1.0/24 rt ip nexthop != 192.168.0.1 drop

This will drop any traffic to 192.168.1.0/24 that is not routed via
192.168.0.1.

 # nft add rule filter postrouting \
	flow table acct { rt ip nexthop timeout 600s counter }

This rule counts outgoing traffic per nexthop. Note that the timeout
releases an entry if no traffic is seen for this nexthop within 10 minutes.

Signed-off-by: Anders K. Pedersen <akp@cohaesio.com>
---
v2
- remove nft_rt_ipv4_select_ops

 net/ipv4/netfilter/Kconfig               |   4 +
 net/ipv4/netfilter/Makefile              |   1 +
 net/ipv4/netfilter/nft_rt_ipv4.c         | 109 ++++++++++++++++++++++++
 3 files changed, 114 insertions(+)

diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index d613309..c5ecb78 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -42,6 +42,10 @@ config NFT_CHAIN_ROUTE_IPV4
 	  fields such as the source, destination, type of service and
 	  the packet mark.
 
+config NFT_RT_IPV4
+	default NFT_RT
+	tristate
+
 config NFT_REJECT_IPV4
 	select NF_REJECT_IPV4
 	default NFT_REJECT
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index 853328f..73fae55 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -33,6 +33,7 @@ obj-$(CONFIG_NF_NAT_PROTO_GRE) += nf_nat_proto_gre.o
 obj-$(CONFIG_NF_TABLES_IPV4) += nf_tables_ipv4.o
 obj-$(CONFIG_NFT_CHAIN_ROUTE_IPV4) += nft_chain_route_ipv4.o
 obj-$(CONFIG_NFT_CHAIN_NAT_IPV4) += nft_chain_nat_ipv4.o
+obj-$(CONFIG_NFT_RT_IPV4) += nft_rt_ipv4.o
 obj-$(CONFIG_NFT_REJECT_IPV4) += nft_reject_ipv4.o
 obj-$(CONFIG_NFT_MASQ_IPV4) += nft_masq_ipv4.o
 obj-$(CONFIG_NFT_REDIR_IPV4) += nft_redir_ipv4.o
diff --git a/net/ipv4/netfilter/nft_rt_ipv4.c b/net/ipv4/netfilter/nft_rt_ipv4.c
new file mode 100644
index 0000000..7473e84
--- /dev/null
+++ b/net/ipv4/netfilter/nft_rt_ipv4.c
@@ -0,0 +1,109 @@
+/*
+ * Copyright (c) 2016 Anders K. Pedersen <akp@cohaesio.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/netlink.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter/nf_tables.h>
+#include <net/route.h>
+#include <net/netfilter/nf_tables.h>
+#include <net/netfilter/nft_rt.h>
+
+static void nft_rt_ipv4_get_eval(const struct nft_expr *expr,
+				 struct nft_regs *regs,
+				 const struct nft_pktinfo *pkt)
+{
+	const struct nft_rt *priv = nft_expr_priv(expr);
+	const struct sk_buff *skb = pkt->skb;
+	u32 *dest = &regs->data[priv->dreg];
+
+	switch (priv->key) {
+	case NFT_RT_NEXTHOP: {
+		const struct rtable *rt = skb_rtable(skb);
+
+		if (!rt)
+			goto err;
+		*dest = rt_nexthop(rt, ip_hdr(skb)->daddr);
+		break;
+	}
+	default:
+		return nft_rt_get_eval(expr, regs, pkt);
+	}
+
+	return;
+err:
+	regs->verdict.code = NFT_BREAK;
+}
+
+static int nft_rt_ipv4_get_init(const struct nft_ctx *ctx,
+				const struct nft_expr *expr,
+				const struct nlattr * const tb[])
+{
+	struct nft_rt *priv = nft_expr_priv(expr);
+	unsigned int len;
+
+	if (tb[NFTA_RT_KEY] == NULL ||
+	    tb[NFTA_RT_DREG] == NULL ||
+	    tb[NFTA_RT_FAMILY] == NULL)
+		return -EINVAL;
+
+	priv->key = ntohl(nla_get_be32(tb[NFTA_RT_KEY]));
+	switch (priv->key) {
+	case NFT_RT_NEXTHOP:
+		len = sizeof(u32);
+		break;
+	default:
+		return nft_rt_get_init(ctx, expr, tb);
+	}
+
+	priv->family = ntohl(nla_get_be32(tb[NFTA_RT_FAMILY]));
+	if (priv->family != NFPROTO_IPV4)
+		return -EINVAL;
+
+	priv->dreg = nft_parse_register(tb[NFTA_RT_DREG]);
+	return nft_validate_register_store(ctx, priv->dreg, NULL,
+					   NFT_DATA_VALUE, len);
+}
+
+static struct nft_expr_type nft_rt_ipv4_type;
+static const struct nft_expr_ops nft_rt_ipv4_get_ops = {
+	.type		= &nft_rt_ipv4_type,
+	.size		= NFT_EXPR_SIZE(sizeof(struct nft_rt)),
+	.eval		= nft_rt_ipv4_get_eval,
+	.init		= nft_rt_ipv4_get_init,
+	.dump		= nft_rt_get_dump,
+};
+
+static struct nft_expr_type nft_rt_ipv4_type __read_mostly = {
+	.family         = NFPROTO_IPV4,
+	.name           = "rt",
+	.ops		= &nft_rt_ipv4_get_ops,
+	.policy         = nft_rt_policy,
+	.maxattr        = NFTA_RT_MAX,
+	.owner          = THIS_MODULE,
+};
+
+static int __init nft_rt_ipv4_module_init(void)
+{
+	return nft_register_expr(&nft_rt_ipv4_type);
+}
+
+static void __exit nft_rt_ipv4_module_exit(void)
+{
+	nft_unregister_expr(&nft_rt_ipv4_type);
+}
+
+module_init(nft_rt_ipv4_module_init);
+module_exit(nft_rt_ipv4_module_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Anders K. Pedersen <akp@cohaesio.com>");
+MODULE_ALIAS_NFT_AF_EXPR(AF_INET, "rt");

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 nf-next 4/5] netfilter: nft: rt nexthop for IPv6 family
  2016-10-19 18:34 [PATCH v2 nf-next 0/5] netfilter: nft: introduce routing expression Anders K. Pedersen | Cohaesio
                   ` (2 preceding siblings ...)
  2016-10-19 18:39 ` [PATCH v2 nf-next 3/5] netfilter: nft: rt nexthop for IPv4 family Anders K. Pedersen | Cohaesio
@ 2016-10-19 18:40 ` Anders K. Pedersen | Cohaesio
  2016-10-19 18:41 ` [PATCH v2 nf-next 5/5] netfilter: nft: rt nexthop for inet family Anders K. Pedersen | Cohaesio
  4 siblings, 0 replies; 23+ messages in thread
From: Anders K. Pedersen | Cohaesio @ 2016-10-19 18:40 UTC (permalink / raw)
  To: netfilter-devel, pablo

From: Anders K. Pedersen <akp@cohaesio.com>

Add nftables IPv6 family support for an "rt ip6 nexthop" expression
allowing usage of the routing nexthop (i.e. the directly connected IP
address that an outgoing packet is sent to) for matching or accounting, eg.

 # nft add rule ip6 filter postrouting \
	ip6 daddr fd01::/16 rt ip6 nexthop != fd00::1 drop

This will drop any traffic to fd01::/16 that is not routed via fd00::1.

 # nft add rule ip6 filter postrouting \
	flow table acct { rt ip6 nexthop timeout 600s counter }

This rule counts outgoing traffic per nexthop. Note that the timeout
releases an entry if no traffic is seen for this nexthop within 10 minutes.

Signed-off-by: Anders K. Pedersen <akp@cohaesio.com>
---
v2
- remove nft_rt_ipv6_select_ops

 net/ipv6/netfilter/Kconfig               |   4 +
 net/ipv6/netfilter/Makefile              |   1 +
 net/ipv6/netfilter/nft_rt_ipv6.c         | 110 ++++++++++++++++++++++++
 3 files changed, 115 insertions(+)

diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig
index e10a04c..8d3527e 100644
--- a/net/ipv6/netfilter/Kconfig
+++ b/net/ipv6/netfilter/Kconfig
@@ -42,6 +42,10 @@ config NFT_CHAIN_ROUTE_IPV6
 	  fields such as the source, destination, flowlabel, hop-limit and
 	  the packet mark.
 
+config NFT_RT_IPV6
+	default NFT_RT
+	tristate
+
 config NFT_REJECT_IPV6
 	select NF_REJECT_IPV6
 	default NFT_REJECT
diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile
index b4f7d0b..6958c35 100644
--- a/net/ipv6/netfilter/Makefile
+++ b/net/ipv6/netfilter/Makefile
@@ -36,6 +36,7 @@ obj-$(CONFIG_NF_DUP_IPV6) += nf_dup_ipv6.o
 obj-$(CONFIG_NF_TABLES_IPV6) += nf_tables_ipv6.o
 obj-$(CONFIG_NFT_CHAIN_ROUTE_IPV6) += nft_chain_route_ipv6.o
 obj-$(CONFIG_NFT_CHAIN_NAT_IPV6) += nft_chain_nat_ipv6.o
+obj-$(CONFIG_NFT_RT_IPV6) += nft_rt_ipv6.o
 obj-$(CONFIG_NFT_REJECT_IPV6) += nft_reject_ipv6.o
 obj-$(CONFIG_NFT_MASQ_IPV6) += nft_masq_ipv6.o
 obj-$(CONFIG_NFT_REDIR_IPV6) += nft_redir_ipv6.o
diff --git a/net/ipv6/netfilter/nft_rt_ipv6.c b/net/ipv6/netfilter/nft_rt_ipv6.c
new file mode 100644
index 0000000..32d8ec8
--- /dev/null
+++ b/net/ipv6/netfilter/nft_rt_ipv6.c
@@ -0,0 +1,110 @@
+/*
+ * Copyright (c) 2016 Anders K. Pedersen <akp@cohaesio.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/netlink.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter/nf_tables.h>
+#include <net/ip6_route.h>
+#include <net/netfilter/nf_tables.h>
+#include <net/netfilter/nft_rt.h>
+
+static void nft_rt_ipv6_get_eval(const struct nft_expr *expr,
+				 struct nft_regs *regs,
+				 const struct nft_pktinfo *pkt)
+{
+	const struct nft_rt *priv = nft_expr_priv(expr);
+	const struct sk_buff *skb = pkt->skb;
+	u32 *dest = &regs->data[priv->dreg];
+
+	switch (priv->key) {
+	case NFT_RT_NEXTHOP: {
+		struct rt6_info *rt = (struct rt6_info *)skb_dst(skb);
+
+		if (!rt)
+			goto err;
+		memcpy(dest, rt6_nexthop(rt, &ipv6_hdr(skb)->daddr),
+		       sizeof(struct in6_addr));
+		break;
+	}
+	default:
+		return nft_rt_get_eval(expr, regs, pkt);
+	}
+
+	return;
+err:
+	regs->verdict.code = NFT_BREAK;
+}
+
+static int nft_rt_ipv6_get_init(const struct nft_ctx *ctx,
+				const struct nft_expr *expr,
+				const struct nlattr * const tb[])
+{
+	struct nft_rt *priv = nft_expr_priv(expr);
+	unsigned int len;
+
+	if (tb[NFTA_RT_KEY] == NULL ||
+	    tb[NFTA_RT_DREG] == NULL ||
+	    tb[NFTA_RT_FAMILY] == NULL)
+		return -EINVAL;
+
+	priv->key = ntohl(nla_get_be32(tb[NFTA_RT_KEY]));
+	switch (priv->key) {
+	case NFT_RT_NEXTHOP:
+		len = sizeof(struct in6_addr);
+		break;
+	default:
+		return nft_rt_get_init(ctx, expr, tb);
+	}
+
+	priv->family = ntohl(nla_get_be32(tb[NFTA_RT_FAMILY]));
+	if (priv->family != NFPROTO_IPV6)
+		return -EINVAL;
+
+	priv->dreg = nft_parse_register(tb[NFTA_RT_DREG]);
+	return nft_validate_register_store(ctx, priv->dreg, NULL,
+					   NFT_DATA_VALUE, len);
+}
+
+static struct nft_expr_type nft_rt_ipv6_type;
+static const struct nft_expr_ops nft_rt_ipv6_get_ops = {
+	.type		= &nft_rt_ipv6_type,
+	.size		= NFT_EXPR_SIZE(sizeof(struct nft_rt)),
+	.eval		= nft_rt_ipv6_get_eval,
+	.init		= nft_rt_ipv6_get_init,
+	.dump		= nft_rt_get_dump,
+};
+
+static struct nft_expr_type nft_rt_ipv6_type __read_mostly = {
+	.family         = NFPROTO_IPV6,
+	.name           = "rt",
+	.ops		= &nft_rt_ipv6_get_ops,
+	.policy         = nft_rt_policy,
+	.maxattr        = NFTA_RT_MAX,
+	.owner          = THIS_MODULE,
+};
+
+static int __init nft_rt_ipv6_module_init(void)
+{
+	return nft_register_expr(&nft_rt_ipv6_type);
+}
+
+static void __exit nft_rt_ipv6_module_exit(void)
+{
+	nft_unregister_expr(&nft_rt_ipv6_type);
+}
+
+module_init(nft_rt_ipv6_module_init);
+module_exit(nft_rt_ipv6_module_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Anders K. Pedersen <akp@cohaesio.com>");
+MODULE_ALIAS_NFT_AF_EXPR(AF_INET6, "rt");

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 nf-next 5/5] netfilter: nft: rt nexthop for inet family
  2016-10-19 18:34 [PATCH v2 nf-next 0/5] netfilter: nft: introduce routing expression Anders K. Pedersen | Cohaesio
                   ` (3 preceding siblings ...)
  2016-10-19 18:40 ` [PATCH v2 nf-next 4/5] netfilter: nft: rt nexthop for IPv6 family Anders K. Pedersen | Cohaesio
@ 2016-10-19 18:41 ` Anders K. Pedersen | Cohaesio
  2016-10-20  9:13   ` Liping Zhang
  4 siblings, 1 reply; 23+ messages in thread
From: Anders K. Pedersen | Cohaesio @ 2016-10-19 18:41 UTC (permalink / raw)
  To: netfilter-devel, pablo

From: Anders K. Pedersen <akp@cohaesio.com>

Add nftables inet family support for an rt nexthop expression allowing
usage of the routing nexthop (i.e. the directly connected IP address that
an outgoing packet is sent to) for matching or accounting, eg.

 # nft add rule inet filter postrouting \
	ether type ip6 \
	ip6 daddr fd01::/16 rt ip6 nexthop != fd00::1 drop

This will drop any traffic to fd01::/16 that is not routed via fd00::1.
Note that "ether type" must be specified explicitly, when rt nexthop is
used from inet family tables.

 # nft add rule inet filter postrouting \
	ether type ip \
	flow table acct { rt ip nexthop timeout 600s counter }
 # nft add rule inet filter postrouting \
	ether type ip6 \
	flow table acct { rt ip6 nexthop6 timeout 600s counter }

These rules count outgoing traffic per nexthop. Note that the timeout
releases an entry if no traffic is seen for this nexthop within 10 minutes.

Signed-off-by: Anders K. Pedersen <akp@cohaesio.com>
---
v2
- remove nft_rt_inet_select_ops

 net/netfilter/Kconfig                    |   5 ++
 net/netfilter/Makefile                   |   1 +
 net/netfilter/nft_rt_inet.c              | 142 +++++++++++++++++++++++++++++++
 3 files changed, 148 insertions(+)

diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index e8d56d9..40d6e0f 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -480,6 +480,11 @@ config NFT_META
 	  This option adds the "rt" expression that you can use to match
 	  packet routing information such as the packet nexthop.
 
+config NFT_RT_INET
+	depends on NF_TABLES_INET
+	default NFT_RT
+	tristate
+
 config NFT_NUMGEN
 	tristate "Netfilter nf_tables number generator module"
 	help
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index c23c3c8..f3a0c59 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -82,6 +82,7 @@ obj-$(CONFIG_NF_TABLES_NETDEV)	+= nf_tables_netdev.o
 obj-$(CONFIG_NFT_EXTHDR)	+= nft_exthdr.o
 obj-$(CONFIG_NFT_META)		+= nft_meta.o
 obj-$(CONFIG_NFT_RT)		+= nft_rt.o
+obj-$(CONFIG_NFT_RT_INET)	+= nft_rt_inet.o
 obj-$(CONFIG_NFT_NUMGEN)	+= nft_numgen.o
 obj-$(CONFIG_NFT_CT)		+= nft_ct.o
 obj-$(CONFIG_NFT_LIMIT)		+= nft_limit.o
diff --git a/net/netfilter/nft_rt_inet.c b/net/netfilter/nft_rt_inet.c
new file mode 100644
index 0000000..c8f2d6a
--- /dev/null
+++ b/net/netfilter/nft_rt_inet.c
@@ -0,0 +1,142 @@
+/*
+ * Copyright (c) 2016 Anders K. Pedersen <akp@cohaesio.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/netlink.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter/nf_tables.h>
+#include <net/ip6_route.h>
+#include <net/route.h>
+#include <net/netfilter/nf_tables.h>
+#include <net/netfilter/nft_rt.h>
+
+static void nft_rt_inet_get_eval(const struct nft_expr *expr,
+				 struct nft_regs *regs,
+				 const struct nft_pktinfo *pkt)
+{
+	const struct nft_rt *priv = nft_expr_priv(expr);
+	const struct sk_buff *skb = pkt->skb;
+	u32 *dest = &regs->data[priv->dreg];
+
+	if (unlikely(pkt->pf != priv->family)) {
+		WARN_ONCE(1, "Address families do not match\n");
+		goto err;
+	}
+
+	switch (pkt->pf) {
+	case NFPROTO_IPV4:
+		switch (priv->key) {
+		case NFT_RT_NEXTHOP: {
+			const struct rtable *rt = skb_rtable(skb);
+
+			if (!rt)
+				goto err;
+			*dest = rt_nexthop(rt, ip_hdr(skb)->daddr);
+			break;
+		}
+		default:
+			goto out;
+		}
+		break;
+	case NFPROTO_IPV6:
+		switch (priv->key) {
+		case NFT_RT_NEXTHOP: {
+			struct rt6_info *rt =
+					(struct rt6_info *)skb_dst(skb);
+
+			if (!rt)
+				goto err;
+			memcpy(dest, rt6_nexthop(rt, &ipv6_hdr(skb)->daddr),
+			       sizeof(struct in6_addr));
+			break;
+		}
+		default:
+			goto out;
+		}
+		break;
+	}
+
+	return;
+out:
+	return nft_rt_get_eval(expr, regs, pkt);
+err:
+	regs->verdict.code = NFT_BREAK;
+}
+
+static int nft_rt_inet_get_init(const struct nft_ctx *ctx,
+				const struct nft_expr *expr,
+				const struct nlattr * const tb[])
+{
+	struct nft_rt *priv = nft_expr_priv(expr);
+	unsigned int len;
+
+	if (tb[NFTA_RT_KEY] == NULL ||
+	    tb[NFTA_RT_DREG] == NULL ||
+	    tb[NFTA_RT_FAMILY] == NULL)
+		return -EINVAL;
+
+	priv->key = ntohl(nla_get_be32(tb[NFTA_RT_KEY]));
+	priv->family = ntohl(nla_get_be32(tb[NFTA_RT_FAMILY]));
+	switch (priv->key) {
+	case NFT_RT_NEXTHOP:
+		switch (priv->family) {
+		case NFPROTO_IPV4:
+			len = sizeof(u32);
+			break;
+		case NFPROTO_IPV6:
+			len = sizeof(struct in6_addr);
+			break;
+		}
+		len = sizeof(u32);
+		break;
+	default:
+		return nft_rt_get_init(ctx, expr, tb);
+	}
+
+	priv->dreg = nft_parse_register(tb[NFTA_RT_DREG]);
+	return nft_validate_register_store(ctx, priv->dreg, NULL,
+					   NFT_DATA_VALUE, len);
+}
+
+static struct nft_expr_type nft_rt_inet_type;
+static const struct nft_expr_ops nft_rt_inet_get_ops = {
+	.type		= &nft_rt_inet_type,
+	.size		= NFT_EXPR_SIZE(sizeof(struct nft_rt)),
+	.eval		= nft_rt_inet_get_eval,
+	.init		= nft_rt_inet_get_init,
+	.dump		= nft_rt_get_dump,
+};
+
+static struct nft_expr_type nft_rt_inet_type __read_mostly = {
+	.family         = NFPROTO_INET,
+	.name           = "rt",
+	.ops		= &nft_rt_inet_get_ops,
+	.policy         = nft_rt_policy,
+	.maxattr        = NFTA_RT_MAX,
+	.owner          = THIS_MODULE,
+};
+
+static int __init nft_rt_inet_module_init(void)
+{
+	return nft_register_expr(&nft_rt_inet_type);
+}
+
+static void __exit nft_rt_inet_module_exit(void)
+{
+	nft_unregister_expr(&nft_rt_inet_type);
+}
+
+module_init(nft_rt_inet_module_init);
+module_exit(nft_rt_inet_module_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Anders K. Pedersen <akp@cohaesio.com>");
+MODULE_ALIAS_NFT_AF_EXPR(1, "rt");

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 nf-next 5/5] netfilter: nft: rt nexthop for inet family
  2016-10-19 18:41 ` [PATCH v2 nf-next 5/5] netfilter: nft: rt nexthop for inet family Anders K. Pedersen | Cohaesio
@ 2016-10-20  9:13   ` Liping Zhang
  2016-10-20 12:36     ` Anders K. Pedersen | Cohaesio
  0 siblings, 1 reply; 23+ messages in thread
From: Liping Zhang @ 2016-10-20  9:13 UTC (permalink / raw)
  To: Anders K. Pedersen | Cohaesio; +Cc: netfilter-devel, pablo

Hi Anders,
2016-10-20 2:41 GMT+08:00 Anders K. Pedersen | Cohaesio <akp@cohaesio.com>:

> +static void nft_rt_inet_get_eval(const struct nft_expr *expr,
> +                                struct nft_regs *regs,
> +                                const struct nft_pktinfo *pkt)
> +{
> +       const struct nft_rt *priv = nft_expr_priv(expr);
> +       const struct sk_buff *skb = pkt->skb;
> +       u32 *dest = &regs->data[priv->dreg];
> +
> +       if (unlikely(pkt->pf != priv->family)) {
> +               WARN_ONCE(1, "Address families do not match\n");

I think this WARN_ONCE seems unnecessary, in inet family, it's possible that
the family is mismatched.

> +               goto err;
> +       }
> +
> +       switch (pkt->pf) {
> +       case NFPROTO_IPV4:
> +               switch (priv->key) {
> +               case NFT_RT_NEXTHOP: {
> +                       const struct rtable *rt = skb_rtable(skb);
> +
> +                       if (!rt)
> +                               goto err;
> +                       *dest = rt_nexthop(rt, ip_hdr(skb)->daddr);
> +                       break;
> +               }
> +               default:
> +                       goto out;
> +               }
> +               break;
> +       case NFPROTO_IPV6:
> +               switch (priv->key) {
> +               case NFT_RT_NEXTHOP: {
> +                       struct rt6_info *rt =
> +                                       (struct rt6_info *)skb_dst(skb);
> +
> +                       if (!rt)
> +                               goto err;
> +                       memcpy(dest, rt6_nexthop(rt, &ipv6_hdr(skb)->daddr),
> +                              sizeof(struct in6_addr));
> +                       break;
> +               }
> +               default:
> +                       goto out;
> +               }
> +               break;
> +       }
> +

This code looks not clean, I think it can be converted to such:

switch (priv->key) {
        case NFT_RT_NEXTHOP:
                switch (pkt->pf)
                ...
        default:
                return nft_rt_get_eval(expr, regs, pkt);
}

> +       return;
> +out:
> +       return nft_rt_get_eval(expr, regs, pkt);
> +err:
> +       regs->verdict.code = NFT_BREAK;
> +}
> +
> +static int nft_rt_inet_get_init(const struct nft_ctx *ctx,
> +                               const struct nft_expr *expr,
> +                               const struct nlattr * const tb[])
> +{
> +       struct nft_rt *priv = nft_expr_priv(expr);
> +       unsigned int len;
> +
> +       if (tb[NFTA_RT_KEY] == NULL ||
> +           tb[NFTA_RT_DREG] == NULL ||
> +           tb[NFTA_RT_FAMILY] == NULL)
> +               return -EINVAL;
> +
> +       priv->key = ntohl(nla_get_be32(tb[NFTA_RT_KEY]));
> +       priv->family = ntohl(nla_get_be32(tb[NFTA_RT_FAMILY]));
> +       switch (priv->key) {
> +       case NFT_RT_NEXTHOP:
> +               switch (priv->family) {
> +               case NFPROTO_IPV4:
> +                       len = sizeof(u32);
> +                       break;
> +               case NFPROTO_IPV6:
> +                       len = sizeof(struct in6_addr);
> +                       break;
> +               }
> +               len = sizeof(u32);
> +               break;

If the family is invalid, you should report -EINVAL to the userspace.
And the code "len = sizeof(u32);" here makes no sense.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 nf-next 5/5] netfilter: nft: rt nexthop for inet family
  2016-10-20  9:13   ` Liping Zhang
@ 2016-10-20 12:36     ` Anders K. Pedersen | Cohaesio
  2016-10-20 13:27       ` Liping Zhang
  0 siblings, 1 reply; 23+ messages in thread
From: Anders K. Pedersen | Cohaesio @ 2016-10-20 12:36 UTC (permalink / raw)
  To: zlpnobody; +Cc: netfilter-devel, pablo

Hi Liping,

On thu, 2016-10-20 at 17:13 +0800, Liping Zhang wrote:
> Hi Anders,
> 2016-10-20 2:41 GMT+08:00 Anders K. Pedersen | Cohaesio <akp@cohaesio
> .com>:
> 
> > 
> > +static void nft_rt_inet_get_eval(const struct nft_expr *expr,
> > +                                struct nft_regs *regs,
> > +                                const struct nft_pktinfo *pkt)
> > +{
> > +       const struct nft_rt *priv = nft_expr_priv(expr);
> > +       const struct sk_buff *skb = pkt->skb;
> > +       u32 *dest = &regs->data[priv->dreg];
> > +
> > +       if (unlikely(pkt->pf != priv->family)) {
> > +               WARN_ONCE(1, "Address families do not match\n");
> 
> I think this WARN_ONCE seems unnecessary, in inet family, it's
> possible that
> the family is mismatched.

Well, the suggested userspace implementation doesn't allow a rule that
would hit this warning. It is required to use 'ether type ip ... rt ip
nexthop' or 'ether type ip6 ... rt ip6 nexthop', when nexthop is used
from the inet table.

So I found it reasonable to generate a warning if such a rule somehow
got configured anyway, but if you still feel that it should be removed,
I can do that.

> +               goto err;
> > +       }
> > +
> > +       switch (pkt->pf) {
> > +       case NFPROTO_IPV4:
> > +               switch (priv->key) {
> > +               case NFT_RT_NEXTHOP: {
> > +                       const struct rtable *rt = skb_rtable(skb);
> > +
> > +                       if (!rt)
> > +                               goto err;
> > +                       *dest = rt_nexthop(rt, ip_hdr(skb)->daddr);
> > +                       break;
> > +               }
> > +               default:
> > +                       goto out;
> > +               }
> > +               break;
> > +       case NFPROTO_IPV6:
> > +               switch (priv->key) {
> > +               case NFT_RT_NEXTHOP: {
> > +                       struct rt6_info *rt =
> > +                                       (struct rt6_info
> > *)skb_dst(skb);
> > +
> > +                       if (!rt)
> > +                               goto err;
> > +                       memcpy(dest, rt6_nexthop(rt,
> > &ipv6_hdr(skb)->daddr),
> > +                              sizeof(struct in6_addr));
> > +                       break;
> > +               }
> > +               default:
> > +                       goto out;
> > +               }
> > +               break;
> > +       }
> > +
> 
> This code looks not clean, I think it can be converted to such:
> 
> switch (priv->key) {
>         case NFT_RT_NEXTHOP:
>                 switch (pkt->pf)
>                 ...
>         default:
>                 return nft_rt_get_eval(expr, regs, pkt);
> }

My thinking was that for future rt sub-expressions, it will probably
make sense to move 

		const struct rtable *rt = skb_rtable(skb);

		if (!rt)
			goto err;

outside the inner switch statement, so it doesn't need to be duplicated
for each sub-expression. And ditto for the IPV6 part.

> > 
> > +       return;
> > +out:
> > +       return nft_rt_get_eval(expr, regs, pkt);
> > +err:
> > +       regs->verdict.code = NFT_BREAK;
> > +}
> > +
> > +static int nft_rt_inet_get_init(const struct nft_ctx *ctx,
> > +                               const struct nft_expr *expr,
> > +                               const struct nlattr * const tb[])
> > +{
> > +       struct nft_rt *priv = nft_expr_priv(expr);
> > +       unsigned int len;
> > +
> > +       if (tb[NFTA_RT_KEY] == NULL ||
> > +           tb[NFTA_RT_DREG] == NULL ||
> > +           tb[NFTA_RT_FAMILY] == NULL)
> > +               return -EINVAL;
> > +
> > +       priv->key = ntohl(nla_get_be32(tb[NFTA_RT_KEY]));
> > +       priv->family = ntohl(nla_get_be32(tb[NFTA_RT_FAMILY]));
> > +       switch (priv->key) {
> > +       case NFT_RT_NEXTHOP:
> > +               switch (priv->family) {
> > +               case NFPROTO_IPV4:
> > +                       len = sizeof(u32);
> > +                       break;
> > +               case NFPROTO_IPV6:
> > +                       len = sizeof(struct in6_addr);
> > +                       break;
> > +               }
> > +               len = sizeof(u32);
> > +               break;
> 
> If the family is invalid, you should report -EINVAL to the userspace.
> And the code "len = sizeof(u32);" here makes no sense.

Yes, I'll fix that. Thanks for reviewing.

Regards,
Anders K. Pedersen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 nf-next 5/5] netfilter: nft: rt nexthop for inet family
  2016-10-20 12:36     ` Anders K. Pedersen | Cohaesio
@ 2016-10-20 13:27       ` Liping Zhang
  2016-10-20 13:52         ` Anders K. Pedersen | Cohaesio
  0 siblings, 1 reply; 23+ messages in thread
From: Liping Zhang @ 2016-10-20 13:27 UTC (permalink / raw)
  To: Anders K. Pedersen | Cohaesio; +Cc: netfilter-devel, pablo

Hi Anders,

2016-10-20 20:36 GMT+08:00 Anders K. Pedersen | Cohaesio <akp@cohaesio.com>:
[...]
> Well, the suggested userspace implementation doesn't allow a rule that
> would hit this warning. It is required to use 'ether type ip ... rt ip
> nexthop' or 'ether type ip6 ... rt ip6 nexthop', when nexthop is used
> from the inet table.
>
> So I found it reasonable to generate a warning if such a rule somehow
> got configured anyway, but if you still feel that it should be removed,
> I can do that.

I think WARN_ON was used to hint that maybe there's a bug in kernel.
It is not appropriate to warn the user's wrong configuration.

>> switch (priv->key) {
>>         case NFT_RT_NEXTHOP:
>>                 switch (pkt->pf)
>>                 ...
>>         default:
>>                 return nft_rt_get_eval(expr, regs, pkt);
>> }
>
> My thinking was that for future rt sub-expressions, it will probably
> make sense to move
>
>                 const struct rtable *rt = skb_rtable(skb);
>
>                 if (!rt)
>                         goto err;
>
> outside the inner switch statement, so it doesn't need to be duplicated
> for each sub-expression. And ditto for the IPV6 part.
>

If so, for future rt sub-expressions extension, I think the following codes
maybe better?

switch (pkt->pf)
     case IPV4:
           nft_rt_ipv4_get_eval();
           break;
     case IPV6:
           nft_rt_ipv6_get_eval();
           break;

This can reduce the duplicate codes. Just a suggestion:)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 nf-next 5/5] netfilter: nft: rt nexthop for inet family
  2016-10-20 13:27       ` Liping Zhang
@ 2016-10-20 13:52         ` Anders K. Pedersen | Cohaesio
  2016-10-21  2:06           ` Liping Zhang
  0 siblings, 1 reply; 23+ messages in thread
From: Anders K. Pedersen | Cohaesio @ 2016-10-20 13:52 UTC (permalink / raw)
  To: zlpnobody; +Cc: netfilter-devel, pablo

Hi Liping,

On tor, 2016-10-20 at 21:27 +0800, Liping Zhang wrote:
> Hi Anders,
> 
> 2016-10-20 20:36 GMT+08:00 Anders K. Pedersen | Cohaesio <akp@cohaesi
> o.com>:
> [...]
> > 
> > Well, the suggested userspace implementation doesn't allow a rule
> > that
> > would hit this warning. It is required to use 'ether type ip ... rt
> > ip
> > nexthop' or 'ether type ip6 ... rt ip6 nexthop', when nexthop is
> > used
> > from the inet table.
> > 
> > So I found it reasonable to generate a warning if such a rule
> > somehow
> > got configured anyway, but if you still feel that it should be
> > removed,
> > I can do that.
> 
> I think WARN_ON was used to hint that maybe there's a bug in kernel.
> It is not appropriate to warn the user's wrong configuration.

OK, so would it be okay to replace it with

printk_once(KERN_WARNING KBUILD_MODNAME " Address families do not match\n");

?

> > 
> > > 
> > > switch (priv->key) {
> > >         case NFT_RT_NEXTHOP:
> > >                 switch (pkt->pf)
> > >                 ...
> > >         default:
> > >                 return nft_rt_get_eval(expr, regs, pkt);
> > > }
> > 
> > My thinking was that for future rt sub-expressions, it will
> > probably
> > make sense to move
> > 
> >                 const struct rtable *rt = skb_rtable(skb);
> > 
> >                 if (!rt)
> >                         goto err;
> > 
> > outside the inner switch statement, so it doesn't need to be
> > duplicated
> > for each sub-expression. And ditto for the IPV6 part.
> > 
> 
> If so, for future rt sub-expressions extension, I think the following
> codes
> maybe better?
> 
> switch (pkt->pf)
>      case IPV4:
>            nft_rt_ipv4_get_eval();
>            break;
>      case IPV6:
>            nft_rt_ipv6_get_eval();
>            break;
> 
> This can reduce the duplicate codes. Just a suggestion:)

Then NFT_RT_INET would have to select NFT_RT_IPV4 and NFT_RT_IPV6, but
I guess that is okay, since NFT_RT_INET depends on NF_TABLES_INET,
which selects NF_TABLES_IPV4 and NF_TABLES_IPV6, so I'll use this
solution.

Regards,
Anders K. Pedersen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 nf-next 5/5] netfilter: nft: rt nexthop for inet family
  2016-10-20 13:52         ` Anders K. Pedersen | Cohaesio
@ 2016-10-21  2:06           ` Liping Zhang
  2016-10-21  4:16             ` Anders K. Pedersen | Cohaesio
  0 siblings, 1 reply; 23+ messages in thread
From: Liping Zhang @ 2016-10-21  2:06 UTC (permalink / raw)
  To: Anders K. Pedersen | Cohaesio; +Cc: netfilter-devel, pablo

Hi Anders,

2016-10-20 21:52 GMT+08:00 Anders K. Pedersen | Cohaesio <akp@cohaesio.com>:
>
> OK, so would it be okay to replace it with
>
> printk_once(KERN_WARNING KBUILD_MODNAME " Address families do not match\n");
>
> ?
>

To this question, I think it's better to do NFT_BREAK sliently, the warning
message seems useless. Actually this can be avoided in userspace,
i.e. in nft.

For example, if you add the following rule in the inet family:
  # nft add rule inet filter output ip daddr 1.1.1.1

An implict rule will be added, and this can also be applied to rt expr,
we should first compare nfproto is AF_INET or not:
   [ meta load nfproto => reg 1 ]
   [ cmp eq reg 1 0x00000002 ]

But after I think it carefully, I think the NFTA_RT_FAMILY attr
seems useless, we can combine these four files nft_rt.c,
nft_rt_ipv4.c, nft_rt_ipv6.c and nft_rt_inet.c into a single one
file nft_rt.c.

For eval, we can use pkt->pf to decide which rt or rt6 nexthop
to be loaded, so ip/ip6/inet family has the same logical now,
for example:

static void nft_rt_get_eval(const struct nft_expr *expr,
                                struct nft_regs *regs,
                                const struct nft_pktinfo *pkt)
{
       const struct nft_rt *priv = nft_expr_priv(expr);
       const struct dst_entry *dst = skb_dst(skb);

       if (!dst)
            goto err;

       switch (key) {
              case NEXTHOP:
              switch(pkt->pf) {
                    case IPV4:
                    case IPV6:
               }
               case RTCLASSID:
                     ....
        }
        ....
}

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 nf-next 5/5] netfilter: nft: rt nexthop for inet family
  2016-10-21  2:06           ` Liping Zhang
@ 2016-10-21  4:16             ` Anders K. Pedersen | Cohaesio
  2016-10-21  6:17               ` Liping Zhang
  0 siblings, 1 reply; 23+ messages in thread
From: Anders K. Pedersen | Cohaesio @ 2016-10-21  4:16 UTC (permalink / raw)
  To: zlpnobody; +Cc: netfilter-devel, pablo

Hi Liping,

On fre, 2016-10-21 at 10:06 +0800, Liping Zhang wrote:
> Hi Anders,
> 
> 2016-10-20 21:52 GMT+08:00 Anders K. Pedersen | Cohaesio <akp@cohaesi
> o.com>:
> > OK, so would it be okay to replace it with
> > 
> > printk_once(KERN_WARNING KBUILD_MODNAME " Address families do not
> > match\n");
> > 
> > ?

> To this question, I think it's better to do NFT_BREAK sliently, the
> warning
> message seems useless. Actually this can be avoided in userspace,
> i.e. in nft.

Well, as mentioned the suggested userspace implementation for nft does
avoid it by requiring 'ether type' to be specified, when rt nexthop is
used from the inet family, so I found it reasonable to issue a warning,
if it happened anyway.

But okay, I see your point, and I'll remove the warning.

> For example, if you add the following rule in the inet family:
>   # nft add rule inet filter output ip daddr 1.1.1.1
> 
> An implict rule will be added, and this can also be applied to rt
> expr,
> we should first compare nfproto is AF_INET or not:
>    [ meta load nfproto => reg 1 ]
>    [ cmp eq reg 1 0x00000002 ]
> 
> But after I think it carefully, I think the NFTA_RT_FAMILY attr
> seems useless, we can combine these four files nft_rt.c,
> nft_rt_ipv4.c, nft_rt_ipv6.c and nft_rt_inet.c into a single one
> file nft_rt.c.

My implementation is based on the suggestion from Pablo at
http://marc.info/?l=netfilter-devel&m=147438531502686&w=4 .

> For eval, we can use pkt->pf to decide which rt or rt6 nexthop
> to be loaded, so ip/ip6/inet family has the same logical now,
> for example:

Yes, but pkt->pf is not available in init, where we have to answer what
the data size will be.

Regards,
Anders

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 nf-next 5/5] netfilter: nft: rt nexthop for inet family
  2016-10-21  4:16             ` Anders K. Pedersen | Cohaesio
@ 2016-10-21  6:17               ` Liping Zhang
  2016-10-21  8:26                 ` Anders K. Pedersen | Cohaesio
  2016-10-21  9:21                 ` Pablo Neira Ayuso
  0 siblings, 2 replies; 23+ messages in thread
From: Liping Zhang @ 2016-10-21  6:17 UTC (permalink / raw)
  To: Anders K. Pedersen | Cohaesio; +Cc: netfilter-devel, pablo

hi Anders,

2016-10-21 12:16 GMT+08:00 Anders K. Pedersen | Cohaesio <akp@cohaesio.com>:
>>
>> But after I think it carefully, I think the NFTA_RT_FAMILY attr
>> seems useless, we can combine these four files nft_rt.c,
>> nft_rt_ipv4.c, nft_rt_ipv6.c and nft_rt_inet.c into a single one
>> file nft_rt.c.
>
> My implementation is based on the suggestion from Pablo at
> http://marc.info/?l=netfilter-devel&m=147438531502686&w=4 .

Yes, but after I carefully read your codes, I find that the related
implementation code about the family attr is not very good.

Without the family attr, we can still make everything well, and
the codes will become more clean and straightforward.

As a summary:
For ip family, nexthop must be ipv4
For ip6 family, nexthop must be ipv6
For inet family, nexthop can be selected by pkt->pf and we can add
an implict rule that the user cannot do wrong operation.

So I think the NFTA_RT_FAMILY attr is almost useless.

>
>> For eval, we can use pkt->pf to decide which rt or rt6 nexthop
>> to be loaded, so ip/ip6/inet family has the same logical now,
>> for example:
>
> Yes, but pkt->pf is not available in init, where we have to answer what
> the data size will be.

In init ctx->afi->family is available, a example code is in nft_ct_get_init(),
you can take a look at this.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 nf-next 5/5] netfilter: nft: rt nexthop for inet family
  2016-10-21  6:17               ` Liping Zhang
@ 2016-10-21  8:26                 ` Anders K. Pedersen | Cohaesio
  2016-10-21 12:42                   ` Liping Zhang
  2016-10-21  9:21                 ` Pablo Neira Ayuso
  1 sibling, 1 reply; 23+ messages in thread
From: Anders K. Pedersen | Cohaesio @ 2016-10-21  8:26 UTC (permalink / raw)
  To: zlpnobody; +Cc: netfilter-devel, pablo

Hi Liping,

On fre, 2016-10-21 at 14:17 +0800, Liping Zhang wrote:
> 2016-10-21 12:16 GMT+08:00 Anders K. Pedersen | Cohaesio <akp@cohaesi
> o.com>:

> > > But after I think it carefully, I think the NFTA_RT_FAMILY attr
> > > seems useless, we can combine these four files nft_rt.c,
> > > nft_rt_ipv4.c, nft_rt_ipv6.c and nft_rt_inet.c into a single one
> > > file nft_rt.c.
> > 
> > My implementation is based on the suggestion from Pablo at
> > http://marc.info/?l=netfilter-devel&m=147438531502686&w=4 .
> 
> Yes, but after I carefully read your codes, I find that the related
> implementation code about the family attr is not very good.
> 
> Without the family attr, we can still make everything well, and
> the codes will become more clean and straightforward.
> 
> As a summary:
> For ip family, nexthop must be ipv4
> For ip6 family, nexthop must be ipv6
> For inet family, nexthop can be selected by pkt->pf and we can add
> an implict rule that the user cannot do wrong operation.
> 
> So I think the NFTA_RT_FAMILY attr is almost useless.
> 
> > 
> > 
> > > 
> > > For eval, we can use pkt->pf to decide which rt or rt6 nexthop
> > > to be loaded, so ip/ip6/inet family has the same logical now,
> > > for example:
> > 
> > Yes, but pkt->pf is not available in init, where we have to answer
> > what
> > the data size will be.
> 
> In init ctx->afi->family is available, a example code is in
> nft_ct_get_init(),
> you can take a look at this.

I had a look at it. This construct is used for NFT_CT_SRC and
NFT_CT_DST, where the init function just returns the IPv6 length for
the inet family. But I'm not sure how this can work for userspace, and
at least for current nftables there are problems:

# nft flush ruleset
# nft add table inet filter
# nft add chain inet filter input
# nft add rule inet filter input ether type ip flow table acct \{ ct original saddr timeout 600s counter \}
# nft list ruleset
Killed
# nft list flow tables
Killed

The latter two commands are killed by the OOM killer after a few
seconds. Same thing happens for 'ether type ip6', while it works fine
with 'ip saddr' or 'rt ip nexthop' in stead of 'ct original saddr'.

Regards,
Anders

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 nf-next 5/5] netfilter: nft: rt nexthop for inet family
  2016-10-21  6:17               ` Liping Zhang
  2016-10-21  8:26                 ` Anders K. Pedersen | Cohaesio
@ 2016-10-21  9:21                 ` Pablo Neira Ayuso
  2016-10-21 13:22                   ` Liping Zhang
  1 sibling, 1 reply; 23+ messages in thread
From: Pablo Neira Ayuso @ 2016-10-21  9:21 UTC (permalink / raw)
  To: Liping Zhang; +Cc: Anders K. Pedersen | Cohaesio, netfilter-devel

On Fri, Oct 21, 2016 at 02:17:11PM +0800, Liping Zhang wrote:
> hi Anders,
> 
> 2016-10-21 12:16 GMT+08:00 Anders K. Pedersen | Cohaesio <akp@cohaesio.com>:
> >>
> >> But after I think it carefully, I think the NFTA_RT_FAMILY attr
> >> seems useless, we can combine these four files nft_rt.c,
> >> nft_rt_ipv4.c, nft_rt_ipv6.c and nft_rt_inet.c into a single one
> >> file nft_rt.c.
> >
> > My implementation is based on the suggestion from Pablo at
> > http://marc.info/?l=netfilter-devel&m=147438531502686&w=4 .
> 
> Yes, but after I carefully read your codes, I find that the related
> implementation code about the family attr is not very good.
> 
> Without the family attr, we can still make everything well, and
> the codes will become more clean and straightforward.
> 
> As a summary:
> For ip family, nexthop must be ipv4
> For ip6 family, nexthop must be ipv6
> For inet family, nexthop can be selected by pkt->pf and we can add
> an implict rule that the user cannot do wrong operation.
> 
> So I think the NFTA_RT_FAMILY attr is almost useless.
>
> >> For eval, we can use pkt->pf to decide which rt or rt6 nexthop
> >> to be loaded, so ip/ip6/inet family has the same logical now,
> >> for example:
> >
> > Yes, but pkt->pf is not available in init, where we have to answer what
> > the data size will be.
> 
> In init ctx->afi->family is available, a example code is in nft_ct_get_init(),
> you can take a look at this.

ctx->afi->family indicates NFPROTO_INET in this case, so we have no
way to know if the user is requesting rt IPv4 or rt IPv6 from there.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 nf-next 5/5] netfilter: nft: rt nexthop for inet family
  2016-10-21  8:26                 ` Anders K. Pedersen | Cohaesio
@ 2016-10-21 12:42                   ` Liping Zhang
  2016-10-22 15:25                     ` Anders K. Pedersen | Cohaesio
  0 siblings, 1 reply; 23+ messages in thread
From: Liping Zhang @ 2016-10-21 12:42 UTC (permalink / raw)
  To: Anders K. Pedersen | Cohaesio; +Cc: netfilter-devel, pablo

Hi Anders,

2016-10-21 16:26 GMT+08:00 Anders K. Pedersen | Cohaesio <akp@cohaesio.com>:
[...]
> I had a look at it. This construct is used for NFT_CT_SRC and
> NFT_CT_DST, where the init function just returns the IPv6 length for
> the inet family. But I'm not sure how this can work for userspace, and
> at least for current nftables there are problems:
>
> # nft flush ruleset
> # nft add table inet filter
> # nft add chain inet filter input
> # nft add rule inet filter input ether type ip flow table acct \{ ct original saddr timeout 600s counter \}
> # nft list ruleset
> Killed
> # nft list flow tables
> Killed

I guess there's a bug in nft utility, same problem exists in ip/ip6 family.

In init routine, nft_validate_register_store was used to ensure
that we will not do overflow operation.

>
> The latter two commands are killed by the OOM killer after a few
> seconds. Same thing happens for 'ether type ip6', while it works fine
> with 'ip saddr' or 'rt ip nexthop' in stead of 'ct original saddr'.
>
> Regards,
> Anders

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 nf-next 5/5] netfilter: nft: rt nexthop for inet family
  2016-10-21  9:21                 ` Pablo Neira Ayuso
@ 2016-10-21 13:22                   ` Liping Zhang
  2016-10-21 16:58                     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 23+ messages in thread
From: Liping Zhang @ 2016-10-21 13:22 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Anders K. Pedersen | Cohaesio, netfilter-devel

Hi Pablo,

2016-10-21 17:21 GMT+08:00 Pablo Neira Ayuso <pablo@netfilter.org>:
[...]
>> In init ctx->afi->family is available, a example code is in nft_ct_get_init(),
>> you can take a look at this.
>
> ctx->afi->family indicates NFPROTO_INET in this case, so we have no
> way to know if the user is requesting rt IPv4 or rt IPv6 from there.

Yes, but I find that the NFTA_RT_FAMILY attr seems only useful
in init routine. For INET family, the length can simply to be setted
to sizeof(struct in6_addr), i.e:
    switch (ctx->afi->family) {
        case IPV4:
              len = 4;
              break;
        case IPV6:
        case INET:
              len = 16;
              break;
     }
     ...
     return nft_validate_register_store(..., len, ...);

The drawback of this implementation is that the user cannot
specify dreg to NFT_REG32_13, NFT_REG32_14 and
NFT_REG32_15 when we use rt4 nexthop in INET family.

But since the same problem exists when we use rt6 nexthop in
INET/IPV6 family, so I think this will not be a problem.

And for eval, nft rules can be interpreted as following:
(1) nft add rule filter inet output rt ip nexthop ... --debug=netlink
[ meta load nfproto => reg 1 ]
[ cmp eq reg 1 0x00000002 ]
[ rt nexthop => reg 1 ]
...

(2) nft add rule filter inet output rt ip6 nexthop ... --debug=netlink
[ meta load nfproto => reg 1 ]
[ cmp eq reg 1 0x0000000a ]
[ rt nexthop => reg 1 ]
...

So I think we cannot earn much benefits from the NFTA_RT_FAMILY attr.
Or I missed something?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 nf-next 5/5] netfilter: nft: rt nexthop for inet family
  2016-10-21 13:22                   ` Liping Zhang
@ 2016-10-21 16:58                     ` Pablo Neira Ayuso
  2016-10-22  1:44                       ` Liping Zhang
  0 siblings, 1 reply; 23+ messages in thread
From: Pablo Neira Ayuso @ 2016-10-21 16:58 UTC (permalink / raw)
  To: Liping Zhang; +Cc: Anders K. Pedersen | Cohaesio, netfilter-devel

On Fri, Oct 21, 2016 at 09:22:09PM +0800, Liping Zhang wrote:
> Hi Pablo,
> 
> 2016-10-21 17:21 GMT+08:00 Pablo Neira Ayuso <pablo@netfilter.org>:
> [...]
> >> In init ctx->afi->family is available, a example code is in nft_ct_get_init(),
> >> you can take a look at this.
> >
> > ctx->afi->family indicates NFPROTO_INET in this case, so we have no
> > way to know if the user is requesting rt IPv4 or rt IPv6 from there.
> 
> Yes, but I find that the NFTA_RT_FAMILY attr seems only useful
> in init routine. For INET family, the length can simply to be setted
> to sizeof(struct in6_addr), i.e:
>     switch (ctx->afi->family) {
>         case IPV4:
>               len = 4;
>               break;
>         case IPV6:
>         case INET:
>               len = 16;
>               break;
>      }
>      ...
>      return nft_validate_register_store(..., len, ...);
> 
> The drawback of this implementation is that the user cannot
> specify dreg to NFT_REG32_13, NFT_REG32_14 and
> NFT_REG32_15 when we use rt4 nexthop in INET family.
> 
> But since the same problem exists when we use rt6 nexthop in
> INET/IPV6 family, so I think this will not be a problem.
>
> And for eval, nft rules can be interpreted as following:
> (1) nft add rule filter inet output rt ip nexthop ... --debug=netlink
> [ meta load nfproto => reg 1 ]
> [ cmp eq reg 1 0x00000002 ]
> [ rt nexthop => reg 1 ]
> ...
> 
> (2) nft add rule filter inet output rt ip6 nexthop ... --debug=netlink
> [ meta load nfproto => reg 1 ]
> [ cmp eq reg 1 0x0000000a ]
> [ rt nexthop => reg 1 ]
> ...
> 
> So I think we cannot earn much benefits from the NFTA_RT_FAMILY attr.
> Or I missed something?

We still need datatype information for the inet family, from the
netlink_delinearize step, when dumping back the policy to userspace.
Given that we cannot infer the datatype from the data size, as this
would be 16 bytes for both cases, we would still need to annotate this
context information somewhere. The existing approach is providing this
datatype information.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 nf-next 5/5] netfilter: nft: rt nexthop for inet family
  2016-10-21 16:58                     ` Pablo Neira Ayuso
@ 2016-10-22  1:44                       ` Liping Zhang
  2016-10-22 16:08                         ` Anders K. Pedersen | Cohaesio
  0 siblings, 1 reply; 23+ messages in thread
From: Liping Zhang @ 2016-10-22  1:44 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Anders K. Pedersen | Cohaesio, netfilter-devel

Hi Pablo,
2016-10-22 0:58 GMT+08:00 Pablo Neira Ayuso <pablo@netfilter.org>:
> We still need datatype information for the inet family, from the
> netlink_delinearize step, when dumping back the policy to userspace.
> Given that we cannot infer the datatype from the data size, as this
> would be 16 bytes for both cases, we would still need to annotate this
> context information somewhere. The existing approach is providing this
> datatype information.

Yes, the NFTA_RT_FAMILY attr will simplify our work, without this we
will have much more work in nft utility.

Since Anders suggests the following usage in INET family:
 # nft add rule inet filter postrouting ether type ip6 rt ip6 nexthop...
                                                         ^^^^^^^^^^^^^
So I think it's better to add implict dependency rules in inet family:
for rt ip nexthop: we add "meta nfproto ipv4"
for rt ip6 nexthop: we add "meta nfproto ipv6"

Then like NFT_CT_SRC do, we can implement a similar
*ct_expr_update_type*, and call it in netlink_delinearize step.

I agree this will add much more work in nft utility, and actually,
use the NFTA_RT_FAMILY attr, if we did not add "ether type ip6"
or "meta nfproto ipv6", "rt ip6 nexthop" can still work well in
INET family.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 nf-next 5/5] netfilter: nft: rt nexthop for inet family
  2016-10-21 12:42                   ` Liping Zhang
@ 2016-10-22 15:25                     ` Anders K. Pedersen | Cohaesio
  0 siblings, 0 replies; 23+ messages in thread
From: Anders K. Pedersen | Cohaesio @ 2016-10-22 15:25 UTC (permalink / raw)
  To: zlpnobody; +Cc: netfilter-devel, pablo

Hi Liping,

On fre, 2016-10-21 at 20:42 +0800, Liping Zhang wrote:
> Hi Anders,
> 
> 2016-10-21 16:26 GMT+08:00 Anders K. Pedersen | Cohaesio <akp@cohaesi
> o.com>:
> [...]
> > 
> > I had a look at it. This construct is used for NFT_CT_SRC and
> > NFT_CT_DST, where the init function just returns the IPv6 length
> > for
> > the inet family. But I'm not sure how this can work for userspace,
> > and
> > at least for current nftables there are problems:
> > 
> > # nft flush ruleset
> > # nft add table inet filter
> > # nft add chain inet filter input
> > # nft add rule inet filter input ether type ip flow table acct \{
> > ct original saddr timeout 600s counter \}
> > # nft list ruleset
> > Killed
> > # nft list flow tables
> > Killed
> 
> I guess there's a bug in nft utility, same problem exists in ip/ip6
> family.

I looked into why this happens. 

netlink_delinearize_rule() loops through the expressions and
calls netlink_parse_rule_expr() for each of them (meta, cmp, payload,
cmp, ct, dynset). netlink_parse_rule_expr() calls netlink_parse_expr(),
which calls the specific expression parsers.

The problem begins for the ct expression, where netlink_parse_ct() and
netlink_parse_ct_expr() is called, which calls ct_expr_alloc(). The
latter fills a new expr with data from ct_templates[], which
has &invalid_type with length 0 for NFT_CT_SRC.

This expr is then used in netlink_parse_dynset(), where

        if (expr->len < set->keylen) {
                expr = netlink_parse_concat_expr(ctx, loc, sreg, set->keylen);

with (expr->len = 0) < (set->keylen = 32) tries to parse it as a
concatenated expression even though it's not, and ends up in an endless
loop:

        while (len > 0) {
		...
                len -= netlink_padded_len(expr->len);
                reg += netlink_register_space(expr->len);
        }

There's code in ct_expr_update_type() to fix up the data from
ct_templates[] for NFT_CT_SRC and others, but this is based on data
from struct proto_ctx *ctx and those are not available
in ct_expr_alloc().

I don't know how to solve this.

Regards,
Anders

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 nf-next 5/5] netfilter: nft: rt nexthop for inet family
  2016-10-22  1:44                       ` Liping Zhang
@ 2016-10-22 16:08                         ` Anders K. Pedersen | Cohaesio
  2016-10-23  5:01                           ` Liping Zhang
  0 siblings, 1 reply; 23+ messages in thread
From: Anders K. Pedersen | Cohaesio @ 2016-10-22 16:08 UTC (permalink / raw)
  To: zlpnobody, pablo; +Cc: netfilter-devel

Hi Liping and Pablo,

On lør, 2016-10-22 at 09:44 +0800, Liping Zhang wrote:
> Hi Pablo,
> 2016-10-22 0:58 GMT+08:00 Pablo Neira Ayuso <pablo@netfilter.org>:
> > 
> > We still need datatype information for the inet family, from the
> > netlink_delinearize step, when dumping back the policy to
> > userspace.
> > Given that we cannot infer the datatype from the data size, as this
> > would be 16 bytes for both cases, we would still need to annotate
> > this
> > context information somewhere. The existing approach is providing
> > this
> > datatype information.
> 
> Yes, the NFTA_RT_FAMILY attr will simplify our work, without this we
> will have much more work in nft utility.
> 
> Since Anders suggests the following usage in INET family:
>  # nft add rule inet filter postrouting ether type ip6 rt ip6 nexthop...
>                                                          ^^^^^^^^^^^^^

I'm considering to change the syntax to just "rt nexthop" and then
determine the family implicitly from
ctx->pctx.protocol[PROTO_BASE_NETWORK_HDR].desc (which must be set via
"ether type", "meta nfproto" or similar, when used from the inet
family) in expr_evaluate_rt() in stead of verifying it. This also seems
to be what NF_CT_SRC etc. currently does.

How does this sound?

> So I think it's better to add implict dependency rules in inet
> family:
> for rt ip nexthop: we add "meta nfproto ipv4"
> for rt ip6 nexthop: we add "meta nfproto ipv6"
> 
> Then like NFT_CT_SRC do, we can implement a similar
> *ct_expr_update_type*, and call it in netlink_delinearize step.

But ct_expr_update_type is only used during the netlink_delinearize
postprocess step, and that causes problems, when it is used in
combination with flow statements as described in other mail.

> I agree this will add much more work in nft utility, and actually,
> use the NFTA_RT_FAMILY attr, if we did not add "ether type ip6"
> or "meta nfproto ipv6", "rt ip6 nexthop" can still work well in
> INET family.

Would you object to dropping (i.e. kernel will not require it and
userspace will not include it) the NFTA_RT_FAMILY attribute for ip and
ip6 families, but unconditionally including it for the inet family?

Regards,
Anders

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 nf-next 5/5] netfilter: nft: rt nexthop for inet family
  2016-10-22 16:08                         ` Anders K. Pedersen | Cohaesio
@ 2016-10-23  5:01                           ` Liping Zhang
  2016-10-27 17:50                             ` Pablo Neira Ayuso
  0 siblings, 1 reply; 23+ messages in thread
From: Liping Zhang @ 2016-10-23  5:01 UTC (permalink / raw)
  To: Anders K. Pedersen | Cohaesio; +Cc: pablo, netfilter-devel

Hi Anders,

2016-10-23 0:08 GMT+08:00 Anders K. Pedersen | Cohaesio <akp@cohaesio.com>:
[...]
> But ct_expr_update_type is only used during the netlink_delinearize
> postprocess step, and that causes problems, when it is used in
> combination with flow statements as described in other mail.

I think this is a bug and should be fixed.

> Would you object to dropping (i.e. kernel will not require it and
> userspace will not include it) the NFTA_RT_FAMILY attribute for ip and
> ip6 families, but unconditionally including it for the inet family?

After I read your and Pablo's explanation, now I'm not sure which one
is better. :)

Maybe from this rt nexthop expression, we can get a relatively
consistent way to handle the INET family properly, either
explicitly add a _FAMILY_ attribute, or just like ct original saddr,
completely handle it properly in the nft utility.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 nf-next 5/5] netfilter: nft: rt nexthop for inet family
  2016-10-23  5:01                           ` Liping Zhang
@ 2016-10-27 17:50                             ` Pablo Neira Ayuso
  0 siblings, 0 replies; 23+ messages in thread
From: Pablo Neira Ayuso @ 2016-10-27 17:50 UTC (permalink / raw)
  To: Anders K. Pedersen | Cohaesio, netfilter-devel; +Cc: Liping Zhang

On Sun, Oct 23, 2016 at 01:01:51PM +0800, Liping Zhang wrote:
> Hi Anders,
> 
> 2016-10-23 0:08 GMT+08:00 Anders K. Pedersen | Cohaesio <akp@cohaesio.com>:
> [...]
> > But ct_expr_update_type is only used during the netlink_delinearize
> > postprocess step, and that causes problems, when it is used in
> > combination with flow statements as described in other mail.
> 
> I think this is a bug and should be fixed.
> 
> > Would you object to dropping (i.e. kernel will not require it and
> > userspace will not include it) the NFTA_RT_FAMILY attribute for ip and
> > ip6 families, but unconditionally including it for the inet family?
> 
> After I read your and Pablo's explanation, now I'm not sure which one
> is better. :)
> 
> Maybe from this rt nexthop expression, we can get a relatively
> consistent way to handle the INET family properly, either
> explicitly add a _FAMILY_ attribute, or just like ct original saddr,
> completely handle it properly in the nft utility.

Please, use the layer 3 network context as you said. For the inet family,
we would need to bail out in case no context is available.

We can probably get rid of the _FAMILY attribute by introducing
explicit rt keys, ie.

        NFTA_RT_NH_IPV4
        NFTA_RT_NH_IPV6

I agree the _FAMILY attribute is getting ugly, specifically we don't
need this attribute with classid, so better if we can avoid it.

Thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-10-27 17:50 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-19 18:34 [PATCH v2 nf-next 0/5] netfilter: nft: introduce routing expression Anders K. Pedersen | Cohaesio
2016-10-19 18:35 ` [PATCH v2 nf-next 1/5] netfilter: nft: UAPI headers for " Anders K. Pedersen | Cohaesio
2016-10-19 18:38 ` [PATCH v2 nf-next 2/5] netfilter: nft: basic " Anders K. Pedersen | Cohaesio
2016-10-19 18:39 ` [PATCH v2 nf-next 3/5] netfilter: nft: rt nexthop for IPv4 family Anders K. Pedersen | Cohaesio
2016-10-19 18:40 ` [PATCH v2 nf-next 4/5] netfilter: nft: rt nexthop for IPv6 family Anders K. Pedersen | Cohaesio
2016-10-19 18:41 ` [PATCH v2 nf-next 5/5] netfilter: nft: rt nexthop for inet family Anders K. Pedersen | Cohaesio
2016-10-20  9:13   ` Liping Zhang
2016-10-20 12:36     ` Anders K. Pedersen | Cohaesio
2016-10-20 13:27       ` Liping Zhang
2016-10-20 13:52         ` Anders K. Pedersen | Cohaesio
2016-10-21  2:06           ` Liping Zhang
2016-10-21  4:16             ` Anders K. Pedersen | Cohaesio
2016-10-21  6:17               ` Liping Zhang
2016-10-21  8:26                 ` Anders K. Pedersen | Cohaesio
2016-10-21 12:42                   ` Liping Zhang
2016-10-22 15:25                     ` Anders K. Pedersen | Cohaesio
2016-10-21  9:21                 ` Pablo Neira Ayuso
2016-10-21 13:22                   ` Liping Zhang
2016-10-21 16:58                     ` Pablo Neira Ayuso
2016-10-22  1:44                       ` Liping Zhang
2016-10-22 16:08                         ` Anders K. Pedersen | Cohaesio
2016-10-23  5:01                           ` Liping Zhang
2016-10-27 17:50                             ` Pablo Neira Ayuso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.