All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next PATCH 0/2] netfilter: Improve inverted IP prefix matches
@ 2020-10-01 16:57 Phil Sutter
  2020-10-01 16:57 ` [net-next PATCH 1/2] net: netfilter: Enable fast nft_cmp for inverted matches Phil Sutter
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Phil Sutter @ 2020-10-01 16:57 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

The following two patches improve packet throughput in a test setup
sending UDP packets (using iperf3) between two netns. The ruleset used
on receiver side is like this:

| *filter
| :test - [0:0]
| -A INPUT -j test
| -A INPUT -j ACCEPT
| -A test ! -s 10.0.0.0/10 -j DROP # this line repeats 10000 times
| COMMIT

These are the generated VM instructions for each rule:

| [ payload load 4b @ network header + 12 => reg 1 ]
| [ bitwise reg 1 = (reg=1 & 0x0000c0ff ) ^ 0x00000000 ]
| [ cmp eq reg 1 0x0000000a ]
| [ counter pkts 0 bytes 0 ]
| [ immediate reg 0 drop ]

Both sender and receiver reside within 10/10 network, iperf3 is just
used to fill the (virtual) wire:

| iperf3 -c 10.0.0.2 -u -b 10G -t 1000

On receiver side, "packets received" counter of 'netstat -su' is
monitored to calculate throughput. Averaging over about a minute, these
are the figures:

legacy:			~10000pkt/s
nft (base):		~3000pkt/s
nft (patch1):		~4000pkt/s
nft (patch1+2):		~5200pkt/s

In summary, this increases nftables throughput for this specific test
case from 1/3 of legacy iptables to 1/2.

Phil Sutter (2):
  net: netfilter: Enable fast nft_cmp for inverted matches
  net: netfilter: Implement fast bitwise expression

 include/net/netfilter/nf_tables_core.h |  11 ++
 net/netfilter/nf_tables_core.c         |  15 ++-
 net/netfilter/nft_bitwise.c            | 141 +++++++++++++++++++++++--
 net/netfilter/nft_cmp.c                |  10 +-
 4 files changed, 164 insertions(+), 13 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [net-next PATCH 1/2] net: netfilter: Enable fast nft_cmp for inverted matches
  2020-10-01 16:57 [net-next PATCH 0/2] netfilter: Improve inverted IP prefix matches Phil Sutter
@ 2020-10-01 16:57 ` Phil Sutter
  2020-10-02 13:50   ` [net-next PATCH 1/2 v2] " Phil Sutter
  2020-10-01 16:57 ` [net-next PATCH 2/2] net: netfilter: Implement fast bitwise expression Phil Sutter
  2020-10-01 22:25 ` [net-next PATCH 0/2] netfilter: Improve inverted IP prefix matches Florian Westphal
  2 siblings, 1 reply; 11+ messages in thread
From: Phil Sutter @ 2020-10-01 16:57 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Add a boolean indicating NFT_CMP_NEQ. To include it into the match
decision, it is sufficient to XOR it with the data comparison's result.

While being at it, store the mask that is calculated during expression
init and free the eval routine from having to recalculate it each time.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 include/net/netfilter/nf_tables_core.h |  2 ++
 net/netfilter/nf_tables_core.c         |  3 +--
 net/netfilter/nft_cmp.c                | 10 +++++-----
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h
index 78516de14d316..df2d91c814cb3 100644
--- a/include/net/netfilter/nf_tables_core.h
+++ b/include/net/netfilter/nf_tables_core.h
@@ -25,8 +25,10 @@ void nf_tables_core_module_exit(void);
 
 struct nft_cmp_fast_expr {
 	u32			data;
+	u32			mask;
 	enum nft_registers	sreg:8;
 	u8			len;
+	bool			inv;
 };
 
 struct nft_immediate_expr {
diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index 587897a2498b8..e92feacaf5516 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -51,9 +51,8 @@ static void nft_cmp_fast_eval(const struct nft_expr *expr,
 			      struct nft_regs *regs)
 {
 	const struct nft_cmp_fast_expr *priv = nft_expr_priv(expr);
-	u32 mask = nft_cmp_fast_mask(priv->len);
 
-	if ((regs->data[priv->sreg] & mask) == priv->data)
+	if (((regs->data[priv->sreg] & priv->mask) == priv->data) ^ priv->inv)
 		return;
 	regs->verdict.code = NFT_BREAK;
 }
diff --git a/net/netfilter/nft_cmp.c b/net/netfilter/nft_cmp.c
index 16f4d84599ac7..832c3b08991e7 100644
--- a/net/netfilter/nft_cmp.c
+++ b/net/netfilter/nft_cmp.c
@@ -167,7 +167,6 @@ static int nft_cmp_fast_init(const struct nft_ctx *ctx,
 	struct nft_cmp_fast_expr *priv = nft_expr_priv(expr);
 	struct nft_data_desc desc;
 	struct nft_data data;
-	u32 mask;
 	int err;
 
 	err = nft_data_init(NULL, &data, sizeof(data), &desc,
@@ -181,10 +180,11 @@ static int nft_cmp_fast_init(const struct nft_ctx *ctx,
 		return err;
 
 	desc.len *= BITS_PER_BYTE;
-	mask = nft_cmp_fast_mask(desc.len);
 
-	priv->data = data.data[0] & mask;
+	priv->mask = nft_cmp_fast_mask(desc.len);
+	priv->data = data.data[0] & priv->mask;
 	priv->len  = desc.len;
+	priv->inv  = ntohl(nla_get_be32(tb[NFTA_CMP_OP])) != NFT_CMP_EQ;
 	return 0;
 }
 
@@ -201,7 +201,7 @@ static int nft_cmp_fast_offload(struct nft_offload_ctx *ctx,
 		},
 		.sreg	= priv->sreg,
 		.len	= priv->len / BITS_PER_BYTE,
-		.op	= NFT_CMP_EQ,
+		.op	= priv->inv ? NFT_CMP_NEQ : NFT_CMP_EQ,
 	};
 
 	return __nft_cmp_offload(ctx, flow, &cmp);
@@ -272,7 +272,7 @@ nft_cmp_select_ops(const struct nft_ctx *ctx, const struct nlattr * const tb[])
 		goto err1;
 	}
 
-	if (desc.len <= sizeof(u32) && op == NFT_CMP_EQ)
+	if (desc.len <= sizeof(u32) && (op == NFT_CMP_EQ || op == NFT_CMP_NEQ))
 		return &nft_cmp_fast_ops;
 
 	return &nft_cmp_ops;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [net-next PATCH 2/2] net: netfilter: Implement fast bitwise expression
  2020-10-01 16:57 [net-next PATCH 0/2] netfilter: Improve inverted IP prefix matches Phil Sutter
  2020-10-01 16:57 ` [net-next PATCH 1/2] net: netfilter: Enable fast nft_cmp for inverted matches Phil Sutter
@ 2020-10-01 16:57 ` Phil Sutter
  2020-10-04 19:11   ` Pablo Neira Ayuso
  2020-10-01 22:25 ` [net-next PATCH 0/2] netfilter: Improve inverted IP prefix matches Florian Westphal
  2 siblings, 1 reply; 11+ messages in thread
From: Phil Sutter @ 2020-10-01 16:57 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

A typical use of bitwise expression is to mask out parts of an IP
address when matching on the network part only. Optimize for this common
use with a fast variant for NFT_BITWISE_BOOL-type expressions operating
on 32bit-sized values.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 include/net/netfilter/nf_tables_core.h |   9 ++
 net/netfilter/nf_tables_core.c         |  12 +++
 net/netfilter/nft_bitwise.c            | 141 +++++++++++++++++++++++--
 3 files changed, 156 insertions(+), 6 deletions(-)

diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h
index df2d91c814cb3..8657e6815b07c 100644
--- a/include/net/netfilter/nf_tables_core.h
+++ b/include/net/netfilter/nf_tables_core.h
@@ -23,6 +23,13 @@ extern struct nft_object_type nft_secmark_obj_type;
 int nf_tables_core_module_init(void);
 void nf_tables_core_module_exit(void);
 
+struct nft_bitwise_fast_expr {
+	u32			mask;
+	u32			xor;
+	enum nft_registers	sreg:8;
+	enum nft_registers	dreg:8;
+};
+
 struct nft_cmp_fast_expr {
 	u32			data;
 	u32			mask;
@@ -68,6 +75,8 @@ struct nft_payload_set {
 
 extern const struct nft_expr_ops nft_payload_fast_ops;
 
+extern const struct nft_expr_ops nft_bitwise_fast_ops;
+
 extern struct static_key_false nft_counters_enabled;
 extern struct static_key_false nft_trace_enabled;
 
diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index e92feacaf5516..dbc2e945c98eb 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -47,6 +47,16 @@ static inline void nft_trace_packet(struct nft_traceinfo *info,
 	}
 }
 
+static void nft_bitwise_fast_eval(const struct nft_expr *expr,
+				  struct nft_regs *regs)
+{
+	const struct nft_bitwise_fast_expr *priv = nft_expr_priv(expr);
+	u32 *src = &regs->data[priv->sreg];
+	u32 *dst = &regs->data[priv->dreg];
+
+	*dst = (*src & priv->mask) ^ priv->xor;
+}
+
 static void nft_cmp_fast_eval(const struct nft_expr *expr,
 			      struct nft_regs *regs)
 {
@@ -175,6 +185,8 @@ nft_do_chain(struct nft_pktinfo *pkt, void *priv)
 		nft_rule_for_each_expr(expr, last, rule) {
 			if (expr->ops == &nft_cmp_fast_ops)
 				nft_cmp_fast_eval(expr, &regs);
+			else if (expr->ops == &nft_bitwise_fast_ops)
+				nft_bitwise_fast_eval(expr, &regs);
 			else if (expr->ops != &nft_payload_fast_ops ||
 				 !nft_payload_fast_eval(expr, &regs, pkt))
 				expr_call_ops_eval(expr, &regs, pkt);
diff --git a/net/netfilter/nft_bitwise.c b/net/netfilter/nft_bitwise.c
index bc37d6c59db45..bbd773d743773 100644
--- a/net/netfilter/nft_bitwise.c
+++ b/net/netfilter/nft_bitwise.c
@@ -163,11 +163,6 @@ static int nft_bitwise_init(const struct nft_ctx *ctx,
 	u32 len;
 	int err;
 
-	if (!tb[NFTA_BITWISE_SREG] ||
-	    !tb[NFTA_BITWISE_DREG] ||
-	    !tb[NFTA_BITWISE_LEN])
-		return -EINVAL;
-
 	err = nft_parse_u32_check(tb[NFTA_BITWISE_LEN], U8_MAX, &len);
 	if (err < 0)
 		return err;
@@ -292,9 +287,143 @@ static const struct nft_expr_ops nft_bitwise_ops = {
 	.offload	= nft_bitwise_offload,
 };
 
+static int
+nft_bitwise_extract_u32_data(const struct nlattr * const tb, u32 *out)
+{
+	struct nft_data_desc desc;
+	struct nft_data data;
+	int err = 0;
+
+	err = nft_data_init(NULL, &data, sizeof(data), &desc, tb);
+	if (err < 0)
+		return err;
+
+	if (desc.type != NFT_DATA_VALUE || desc.len != sizeof(u32)) {
+		err = -EINVAL;
+		goto err;
+	}
+	*out = data.data[0];
+err:
+	nft_data_release(&data, desc.type);
+	return err;
+}
+
+static int nft_bitwise_fast_init(const struct nft_ctx *ctx,
+				 const struct nft_expr *expr,
+				 const struct nlattr * const tb[])
+{
+	struct nft_bitwise_fast_expr *priv = nft_expr_priv(expr);
+	int err;
+
+	priv->sreg = nft_parse_register(tb[NFTA_BITWISE_SREG]);
+	err = nft_validate_register_load(priv->sreg, sizeof(u32));
+	if (err < 0)
+		return err;
+
+	priv->dreg = nft_parse_register(tb[NFTA_BITWISE_DREG]);
+	err = nft_validate_register_store(ctx, priv->dreg, NULL,
+					  NFT_DATA_VALUE, sizeof(u32));
+	if (err < 0)
+		return err;
+
+	if (tb[NFTA_BITWISE_DATA])
+		return -EINVAL;
+
+	if (!tb[NFTA_BITWISE_MASK] ||
+	    !tb[NFTA_BITWISE_XOR])
+		return -EINVAL;
+
+	err = nft_bitwise_extract_u32_data(tb[NFTA_BITWISE_MASK], &priv->mask);
+	if (err < 0)
+		return err;
+
+	err = nft_bitwise_extract_u32_data(tb[NFTA_BITWISE_XOR], &priv->xor);
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+
+static int
+nft_bitwise_fast_dump(struct sk_buff *skb, const struct nft_expr *expr)
+{
+	const struct nft_bitwise_fast_expr *priv = nft_expr_priv(expr);
+	struct nft_data data;
+
+	if (nft_dump_register(skb, NFTA_BITWISE_SREG, priv->sreg))
+		return -1;
+	if (nft_dump_register(skb, NFTA_BITWISE_DREG, priv->dreg))
+		return -1;
+	if (nla_put_be32(skb, NFTA_BITWISE_LEN, htonl(sizeof(u32))))
+		return -1;
+	if (nla_put_be32(skb, NFTA_BITWISE_OP, htonl(NFT_BITWISE_BOOL)))
+		return -1;
+
+	data.data[0] = priv->mask;
+	if (nft_data_dump(skb, NFTA_BITWISE_MASK, &data,
+			  NFT_DATA_VALUE, sizeof(u32)) < 0)
+		return -1;
+
+	data.data[0] = priv->xor;
+	if (nft_data_dump(skb, NFTA_BITWISE_XOR, &data,
+			  NFT_DATA_VALUE, sizeof(u32)) < 0)
+		return -1;
+
+	return 0;
+}
+
+static int nft_bitwise_fast_offload(struct nft_offload_ctx *ctx,
+				    struct nft_flow_rule *flow,
+				    const struct nft_expr *expr)
+{
+	const struct nft_bitwise_fast_expr *priv = nft_expr_priv(expr);
+	struct nft_offload_reg *reg = &ctx->regs[priv->dreg];
+
+	if (priv->xor || priv->sreg != priv->dreg || reg->len != sizeof(u32))
+		return -EOPNOTSUPP;
+
+	reg->mask.data[0] = priv->mask;
+	return 0;
+}
+
+const struct nft_expr_ops nft_bitwise_fast_ops = {
+	.type		= &nft_bitwise_type,
+	.size		= NFT_EXPR_SIZE(sizeof(struct nft_bitwise_fast_expr)),
+	.eval		= NULL, /* inlined */
+	.init		= nft_bitwise_fast_init,
+	.dump		= nft_bitwise_fast_dump,
+	.offload	= nft_bitwise_fast_offload,
+};
+
+static const struct nft_expr_ops *
+nft_bitwise_select_ops(const struct nft_ctx *ctx,
+		       const struct nlattr * const tb[])
+{
+	int err;
+	u32 len;
+
+	if (!tb[NFTA_BITWISE_LEN] ||
+	    !tb[NFTA_BITWISE_SREG] ||
+	    !tb[NFTA_BITWISE_DREG])
+		return ERR_PTR(-EINVAL);
+
+	err = nft_parse_u32_check(tb[NFTA_BITWISE_LEN], U8_MAX, &len);
+	if (err < 0)
+		return ERR_PTR(err);
+
+	if (len != sizeof(u32))
+		return &nft_bitwise_ops;
+
+	if (tb[NFTA_BITWISE_OP] &&
+	    ntohl(nla_get_be32(tb[NFTA_BITWISE_OP])) != NFT_BITWISE_BOOL)
+		return &nft_bitwise_ops;
+
+	return &nft_bitwise_fast_ops;
+}
+
 struct nft_expr_type nft_bitwise_type __read_mostly = {
 	.name		= "bitwise",
-	.ops		= &nft_bitwise_ops,
+	.select_ops	= nft_bitwise_select_ops,
 	.policy		= nft_bitwise_policy,
 	.maxattr	= NFTA_BITWISE_MAX,
 	.owner		= THIS_MODULE,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [net-next PATCH 0/2] netfilter: Improve inverted IP prefix matches
  2020-10-01 16:57 [net-next PATCH 0/2] netfilter: Improve inverted IP prefix matches Phil Sutter
  2020-10-01 16:57 ` [net-next PATCH 1/2] net: netfilter: Enable fast nft_cmp for inverted matches Phil Sutter
  2020-10-01 16:57 ` [net-next PATCH 2/2] net: netfilter: Implement fast bitwise expression Phil Sutter
@ 2020-10-01 22:25 ` Florian Westphal
  2020-10-02  9:00   ` Phil Sutter
  2 siblings, 1 reply; 11+ messages in thread
From: Florian Westphal @ 2020-10-01 22:25 UTC (permalink / raw)
  To: Phil Sutter; +Cc: Pablo Neira Ayuso, netfilter-devel

Phil Sutter <phil@nwl.cc> wrote:
> The following two patches improve packet throughput in a test setup
> sending UDP packets (using iperf3) between two netns. The ruleset used
> on receiver side is like this:
> 
> | *filter
> | :test - [0:0]
> | -A INPUT -j test
> | -A INPUT -j ACCEPT
> | -A test ! -s 10.0.0.0/10 -j DROP # this line repeats 10000 times
> | COMMIT
> 
> These are the generated VM instructions for each rule:
> 
> | [ payload load 4b @ network header + 12 => reg 1 ]
> | [ bitwise reg 1 = (reg=1 & 0x0000c0ff ) ^ 0x00000000 ]

Not related to this patch, but we should avoid the bitop if the
netmask is divisble by 8 (can adjust the cmp -- adjusting the
payload expr is probably not worth it).

> | [ cmp eq reg 1 0x0000000a ]
> | [ counter pkts 0 bytes 0 ]

Out of curiosity, does omitting 'counter' help?

nft counter is rather expensive due to bh disable,
iptables does it once at the evaluation loop only.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [net-next PATCH 0/2] netfilter: Improve inverted IP prefix matches
  2020-10-01 22:25 ` [net-next PATCH 0/2] netfilter: Improve inverted IP prefix matches Florian Westphal
@ 2020-10-02  9:00   ` Phil Sutter
  2020-10-21 10:43     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 11+ messages in thread
From: Phil Sutter @ 2020-10-02  9:00 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Pablo Neira Ayuso, netfilter-devel

Hi Florian,

On Fri, Oct 02, 2020 at 12:25:36AM +0200, Florian Westphal wrote:
> Phil Sutter <phil@nwl.cc> wrote:
> > The following two patches improve packet throughput in a test setup
> > sending UDP packets (using iperf3) between two netns. The ruleset used
> > on receiver side is like this:
> > 
> > | *filter
> > | :test - [0:0]
> > | -A INPUT -j test
> > | -A INPUT -j ACCEPT
> > | -A test ! -s 10.0.0.0/10 -j DROP # this line repeats 10000 times
> > | COMMIT
> > 
> > These are the generated VM instructions for each rule:
> > 
> > | [ payload load 4b @ network header + 12 => reg 1 ]
> > | [ bitwise reg 1 = (reg=1 & 0x0000c0ff ) ^ 0x00000000 ]
> 
> Not related to this patch, but we should avoid the bitop if the
> netmask is divisble by 8 (can adjust the cmp -- adjusting the
> payload expr is probably not worth it).

See the patch I just sent to this list. I adjusted both - it simply
didn't appear to me that I could get by with reducing the cmp expression
size only. The upside though is that detecting the prefix match based on
payload expression length is quick and easy.

Someone will have to adjust nft tool, though. ;)

> > | [ cmp eq reg 1 0x0000000a ]
> > | [ counter pkts 0 bytes 0 ]
> 
> Out of curiosity, does omitting 'counter' help?
> 
> nft counter is rather expensive due to bh disable,
> iptables does it once at the evaluation loop only.

I changed the test to create the base ruleset using iptables-nft-restore
just as before, but create the rules in 'test' chain like so:

| nft add rule filter test ip saddr != 10.0.0.0/10 drop

The VM code is as expected:

| [ payload load 4b @ network header + 12 => reg 1 ]
| [ bitwise reg 1 = (reg=1 & 0x0000c0ff ) ^ 0x00000000 ]
| [ cmp eq reg 1 0x0000000a ]
| [ immediate reg 0 drop ]

Performance is ~7000pkt/s. So while it's faster than iptables-nft, it's
still quite a bit slower than legacy iptables despite the skipped
counters.

Cheers, Phil

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [net-next PATCH 1/2 v2] net: netfilter: Enable fast nft_cmp for inverted matches
  2020-10-01 16:57 ` [net-next PATCH 1/2] net: netfilter: Enable fast nft_cmp for inverted matches Phil Sutter
@ 2020-10-02 13:50   ` Phil Sutter
  2020-10-04 19:10     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 11+ messages in thread
From: Phil Sutter @ 2020-10-02 13:50 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Add a boolean indicating NFT_CMP_NEQ. To include it into the match
decision, it is sufficient to XOR it with the data comparison's result.

While being at it, store the mask that is calculated during expression
init and free the eval routine from having to recalculate it each time.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
Changes since v1:
- Forgot to update nft_cmp_fast_dump.
---
 include/net/netfilter/nf_tables_core.h |  2 ++
 net/netfilter/nf_tables_core.c         |  3 +--
 net/netfilter/nft_cmp.c                | 13 +++++++------
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h
index 78516de14d316..df2d91c814cb3 100644
--- a/include/net/netfilter/nf_tables_core.h
+++ b/include/net/netfilter/nf_tables_core.h
@@ -25,8 +25,10 @@ void nf_tables_core_module_exit(void);
 
 struct nft_cmp_fast_expr {
 	u32			data;
+	u32			mask;
 	enum nft_registers	sreg:8;
 	u8			len;
+	bool			inv;
 };
 
 struct nft_immediate_expr {
diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index 587897a2498b8..e92feacaf5516 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -51,9 +51,8 @@ static void nft_cmp_fast_eval(const struct nft_expr *expr,
 			      struct nft_regs *regs)
 {
 	const struct nft_cmp_fast_expr *priv = nft_expr_priv(expr);
-	u32 mask = nft_cmp_fast_mask(priv->len);
 
-	if ((regs->data[priv->sreg] & mask) == priv->data)
+	if (((regs->data[priv->sreg] & priv->mask) == priv->data) ^ priv->inv)
 		return;
 	regs->verdict.code = NFT_BREAK;
 }
diff --git a/net/netfilter/nft_cmp.c b/net/netfilter/nft_cmp.c
index 16f4d84599ac7..bc079d68a5365 100644
--- a/net/netfilter/nft_cmp.c
+++ b/net/netfilter/nft_cmp.c
@@ -167,7 +167,6 @@ static int nft_cmp_fast_init(const struct nft_ctx *ctx,
 	struct nft_cmp_fast_expr *priv = nft_expr_priv(expr);
 	struct nft_data_desc desc;
 	struct nft_data data;
-	u32 mask;
 	int err;
 
 	err = nft_data_init(NULL, &data, sizeof(data), &desc,
@@ -181,10 +180,11 @@ static int nft_cmp_fast_init(const struct nft_ctx *ctx,
 		return err;
 
 	desc.len *= BITS_PER_BYTE;
-	mask = nft_cmp_fast_mask(desc.len);
 
-	priv->data = data.data[0] & mask;
+	priv->mask = nft_cmp_fast_mask(desc.len);
+	priv->data = data.data[0] & priv->mask;
 	priv->len  = desc.len;
+	priv->inv  = ntohl(nla_get_be32(tb[NFTA_CMP_OP])) != NFT_CMP_EQ;
 	return 0;
 }
 
@@ -201,7 +201,7 @@ static int nft_cmp_fast_offload(struct nft_offload_ctx *ctx,
 		},
 		.sreg	= priv->sreg,
 		.len	= priv->len / BITS_PER_BYTE,
-		.op	= NFT_CMP_EQ,
+		.op	= priv->inv ? NFT_CMP_NEQ : NFT_CMP_EQ,
 	};
 
 	return __nft_cmp_offload(ctx, flow, &cmp);
@@ -210,11 +210,12 @@ static int nft_cmp_fast_offload(struct nft_offload_ctx *ctx,
 static int nft_cmp_fast_dump(struct sk_buff *skb, const struct nft_expr *expr)
 {
 	const struct nft_cmp_fast_expr *priv = nft_expr_priv(expr);
+	enum nft_cmp_ops op = priv->inv ? NFT_CMP_NEQ : NFT_CMP_EQ;
 	struct nft_data data;
 
 	if (nft_dump_register(skb, NFTA_CMP_SREG, priv->sreg))
 		goto nla_put_failure;
-	if (nla_put_be32(skb, NFTA_CMP_OP, htonl(NFT_CMP_EQ)))
+	if (nla_put_be32(skb, NFTA_CMP_OP, htonl(op)))
 		goto nla_put_failure;
 
 	data.data[0] = priv->data;
@@ -272,7 +273,7 @@ nft_cmp_select_ops(const struct nft_ctx *ctx, const struct nlattr * const tb[])
 		goto err1;
 	}
 
-	if (desc.len <= sizeof(u32) && op == NFT_CMP_EQ)
+	if (desc.len <= sizeof(u32) && (op == NFT_CMP_EQ || op == NFT_CMP_NEQ))
 		return &nft_cmp_fast_ops;
 
 	return &nft_cmp_ops;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [net-next PATCH 1/2 v2] net: netfilter: Enable fast nft_cmp for inverted matches
  2020-10-02 13:50   ` [net-next PATCH 1/2 v2] " Phil Sutter
@ 2020-10-04 19:10     ` Pablo Neira Ayuso
  0 siblings, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2020-10-04 19:10 UTC (permalink / raw)
  To: Phil Sutter; +Cc: netfilter-devel

On Fri, Oct 02, 2020 at 03:50:56PM +0200, Phil Sutter wrote:
> Add a boolean indicating NFT_CMP_NEQ. To include it into the match
> decision, it is sufficient to XOR it with the data comparison's result.
> 
> While being at it, store the mask that is calculated during expression
> init and free the eval routine from having to recalculate it each time.

Applied, thanks

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [net-next PATCH 2/2] net: netfilter: Implement fast bitwise expression
  2020-10-01 16:57 ` [net-next PATCH 2/2] net: netfilter: Implement fast bitwise expression Phil Sutter
@ 2020-10-04 19:11   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2020-10-04 19:11 UTC (permalink / raw)
  To: Phil Sutter; +Cc: netfilter-devel

On Thu, Oct 01, 2020 at 06:57:44PM +0200, Phil Sutter wrote:
> A typical use of bitwise expression is to mask out parts of an IP
> address when matching on the network part only. Optimize for this common
> use with a fast variant for NFT_BITWISE_BOOL-type expressions operating
> on 32bit-sized values.

Applied, thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [net-next PATCH 0/2] netfilter: Improve inverted IP prefix matches
  2020-10-02  9:00   ` Phil Sutter
@ 2020-10-21 10:43     ` Pablo Neira Ayuso
  2020-10-21 10:49       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 11+ messages in thread
From: Pablo Neira Ayuso @ 2020-10-21 10:43 UTC (permalink / raw)
  To: Phil Sutter, Florian Westphal, netfilter-devel

Hi Phil,

On Fri, Oct 02, 2020 at 11:00:33AM +0200, Phil Sutter wrote:
> Hi Florian,
> 
> On Fri, Oct 02, 2020 at 12:25:36AM +0200, Florian Westphal wrote:
> > Phil Sutter <phil@nwl.cc> wrote:
> > > The following two patches improve packet throughput in a test setup
> > > sending UDP packets (using iperf3) between two netns. The ruleset used
> > > on receiver side is like this:
> > > 
> > > | *filter
> > > | :test - [0:0]
> > > | -A INPUT -j test
> > > | -A INPUT -j ACCEPT
> > > | -A test ! -s 10.0.0.0/10 -j DROP # this line repeats 10000 times
> > > | COMMIT
> > > 
> > > These are the generated VM instructions for each rule:
> > > 
> > > | [ payload load 4b @ network header + 12 => reg 1 ]
> > > | [ bitwise reg 1 = (reg=1 & 0x0000c0ff ) ^ 0x00000000 ]
> > 
> > Not related to this patch, but we should avoid the bitop if the
> > netmask is divisble by 8 (can adjust the cmp -- adjusting the
> > payload expr is probably not worth it).
> 
> See the patch I just sent to this list. I adjusted both - it simply
> didn't appear to me that I could get by with reducing the cmp expression
> size only. The upside though is that detecting the prefix match based on
> payload expression length is quick and easy.
> 
> Someone will have to adjust nft tool, though. ;)
> 
> > > | [ cmp eq reg 1 0x0000000a ]
> > > | [ counter pkts 0 bytes 0 ]
> > 
> > Out of curiosity, does omitting 'counter' help?
> > 
> > nft counter is rather expensive due to bh disable,
> > iptables does it once at the evaluation loop only.
> 
> I changed the test to create the base ruleset using iptables-nft-restore
> just as before, but create the rules in 'test' chain like so:
> 
> | nft add rule filter test ip saddr != 10.0.0.0/10 drop
> 
> The VM code is as expected:
> 
> | [ payload load 4b @ network header + 12 => reg 1 ]
> | [ bitwise reg 1 = (reg=1 & 0x0000c0ff ) ^ 0x00000000 ]
> | [ cmp eq reg 1 0x0000000a ]
> | [ immediate reg 0 drop ]
> 
> Performance is ~7000pkt/s. So while it's faster than iptables-nft, it's
> still quite a bit slower than legacy iptables despite the skipped
> counters.

iptables is optimized for matching on input/output device name and
IPv4 address + mask (see ip_packet_match()) for historical reasons,
iptables does not use a match for this since the beginning.

One possibility (in the short-term) is to add an internal kernel
expression to achieve the same behaviour. The kernel needs to detects
for:

payload (nh, offset to ip saddr or ip daddr or ip protocol) + cmp
payload (nh, offset to ip saddr or ip daddr) + bitwise + cmp
meta (iifname or oifname) + bitwise + cmp
meta (iifname or oifname) + cmp

at the very beginning of the rule.

and squash these expressions into the "built-in" iptables match
expression which emulates ip_packet_match().

Not nice, but if microbenchmarks using thousand of rules really matter
(this is worst case O(n) linear list evaluation...) then it might make
sense to explore this.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [net-next PATCH 0/2] netfilter: Improve inverted IP prefix matches
  2020-10-21 10:43     ` Pablo Neira Ayuso
@ 2020-10-21 10:49       ` Pablo Neira Ayuso
  2020-10-26 12:29         ` Phil Sutter
  0 siblings, 1 reply; 11+ messages in thread
From: Pablo Neira Ayuso @ 2020-10-21 10:49 UTC (permalink / raw)
  To: Phil Sutter, Florian Westphal, netfilter-devel

On Wed, Oct 21, 2020 at 12:43:21PM +0200, Pablo Neira Ayuso wrote:
> Hi Phil,
> 
> On Fri, Oct 02, 2020 at 11:00:33AM +0200, Phil Sutter wrote:
> > Hi Florian,
> > 
> > On Fri, Oct 02, 2020 at 12:25:36AM +0200, Florian Westphal wrote:
> > > Phil Sutter <phil@nwl.cc> wrote:
> > > > The following two patches improve packet throughput in a test setup
> > > > sending UDP packets (using iperf3) between two netns. The ruleset used
> > > > on receiver side is like this:
> > > > 
> > > > | *filter
> > > > | :test - [0:0]
> > > > | -A INPUT -j test
> > > > | -A INPUT -j ACCEPT
> > > > | -A test ! -s 10.0.0.0/10 -j DROP # this line repeats 10000 times
> > > > | COMMIT
> > > > 
> > > > These are the generated VM instructions for each rule:
> > > > 
> > > > | [ payload load 4b @ network header + 12 => reg 1 ]
> > > > | [ bitwise reg 1 = (reg=1 & 0x0000c0ff ) ^ 0x00000000 ]
> > > 
> > > Not related to this patch, but we should avoid the bitop if the
> > > netmask is divisble by 8 (can adjust the cmp -- adjusting the
> > > payload expr is probably not worth it).
> > 
> > See the patch I just sent to this list. I adjusted both - it simply
> > didn't appear to me that I could get by with reducing the cmp expression
> > size only. The upside though is that detecting the prefix match based on
> > payload expression length is quick and easy.
> > 
> > Someone will have to adjust nft tool, though. ;)
> > 
> > > > | [ cmp eq reg 1 0x0000000a ]
> > > > | [ counter pkts 0 bytes 0 ]
> > > 
> > > Out of curiosity, does omitting 'counter' help?
> > > 
> > > nft counter is rather expensive due to bh disable,
> > > iptables does it once at the evaluation loop only.
> > 
> > I changed the test to create the base ruleset using iptables-nft-restore
> > just as before, but create the rules in 'test' chain like so:
> > 
> > | nft add rule filter test ip saddr != 10.0.0.0/10 drop
> > 
> > The VM code is as expected:
> > 
> > | [ payload load 4b @ network header + 12 => reg 1 ]
> > | [ bitwise reg 1 = (reg=1 & 0x0000c0ff ) ^ 0x00000000 ]
> > | [ cmp eq reg 1 0x0000000a ]
> > | [ immediate reg 0 drop ]
> > 
> > Performance is ~7000pkt/s. So while it's faster than iptables-nft, it's
> > still quite a bit slower than legacy iptables despite the skipped
> > counters.
> 
> iptables is optimized for matching on input/output device name and
> IPv4 address + mask (see ip_packet_match()) for historical reasons,
> iptables does not use a match for this since the beginning.

For clarity here, I mean: iptables does not use the generic match
infrastructure for matching on these fields, instead it is using
ip_packet_match() which is called from ipt_do_table() which is the
core function that evaluates the packet.

> One possibility (in the short-term) is to add an internal kernel
> expression to achieve the same behaviour. The kernel needs to detects
> for:
> 
> payload (nh, offset to ip saddr or ip daddr or ip protocol) + cmp
> payload (nh, offset to ip saddr or ip daddr) + bitwise + cmp
> meta (iifname or oifname) + bitwise + cmp
> meta (iifname or oifname) + cmp
> 
> at the very beginning of the rule.
> 
> and squash these expressions into the "built-in" iptables match
> expression which emulates ip_packet_match().
> 
> Not nice, but if microbenchmarks using thousand of rules really matter
> (this is worst case O(n) linear list evaluation...) then it might make
> sense to explore this.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [net-next PATCH 0/2] netfilter: Improve inverted IP prefix matches
  2020-10-21 10:49       ` Pablo Neira Ayuso
@ 2020-10-26 12:29         ` Phil Sutter
  0 siblings, 0 replies; 11+ messages in thread
From: Phil Sutter @ 2020-10-26 12:29 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel

Hi Pablo,

On Wed, Oct 21, 2020 at 12:49:52PM +0200, Pablo Neira Ayuso wrote:
> On Wed, Oct 21, 2020 at 12:43:21PM +0200, Pablo Neira Ayuso wrote:
> > Hi Phil,
> > 
> > On Fri, Oct 02, 2020 at 11:00:33AM +0200, Phil Sutter wrote:
> > > Hi Florian,
> > > 
> > > On Fri, Oct 02, 2020 at 12:25:36AM +0200, Florian Westphal wrote:
> > > > Phil Sutter <phil@nwl.cc> wrote:
> > > > > The following two patches improve packet throughput in a test setup
> > > > > sending UDP packets (using iperf3) between two netns. The ruleset used
> > > > > on receiver side is like this:
> > > > > 
> > > > > | *filter
> > > > > | :test - [0:0]
> > > > > | -A INPUT -j test
> > > > > | -A INPUT -j ACCEPT
> > > > > | -A test ! -s 10.0.0.0/10 -j DROP # this line repeats 10000 times
> > > > > | COMMIT
> > > > > 
> > > > > These are the generated VM instructions for each rule:
> > > > > 
> > > > > | [ payload load 4b @ network header + 12 => reg 1 ]
> > > > > | [ bitwise reg 1 = (reg=1 & 0x0000c0ff ) ^ 0x00000000 ]
> > > > 
> > > > Not related to this patch, but we should avoid the bitop if the
> > > > netmask is divisble by 8 (can adjust the cmp -- adjusting the
> > > > payload expr is probably not worth it).
> > > 
> > > See the patch I just sent to this list. I adjusted both - it simply
> > > didn't appear to me that I could get by with reducing the cmp expression
> > > size only. The upside though is that detecting the prefix match based on
> > > payload expression length is quick and easy.
> > > 
> > > Someone will have to adjust nft tool, though. ;)
> > > 
> > > > > | [ cmp eq reg 1 0x0000000a ]
> > > > > | [ counter pkts 0 bytes 0 ]
> > > > 
> > > > Out of curiosity, does omitting 'counter' help?
> > > > 
> > > > nft counter is rather expensive due to bh disable,
> > > > iptables does it once at the evaluation loop only.
> > > 
> > > I changed the test to create the base ruleset using iptables-nft-restore
> > > just as before, but create the rules in 'test' chain like so:
> > > 
> > > | nft add rule filter test ip saddr != 10.0.0.0/10 drop
> > > 
> > > The VM code is as expected:
> > > 
> > > | [ payload load 4b @ network header + 12 => reg 1 ]
> > > | [ bitwise reg 1 = (reg=1 & 0x0000c0ff ) ^ 0x00000000 ]
> > > | [ cmp eq reg 1 0x0000000a ]
> > > | [ immediate reg 0 drop ]
> > > 
> > > Performance is ~7000pkt/s. So while it's faster than iptables-nft, it's
> > > still quite a bit slower than legacy iptables despite the skipped
> > > counters.
> > 
> > iptables is optimized for matching on input/output device name and
> > IPv4 address + mask (see ip_packet_match()) for historical reasons,
> > iptables does not use a match for this since the beginning.

Ah, thanks for the pointer. That function (and the code therein) pretty
clearly shows why rule-shredding is so much slower in iptables-nft than
legacy despite the attempts at improving it.

> For clarity here, I mean: iptables does not use the generic match
> infrastructure for matching on these fields, instead it is using
> ip_packet_match() which is called from ipt_do_table() which is the
> core function that evaluates the packet.
> 
> > One possibility (in the short-term) is to add an internal kernel
> > expression to achieve the same behaviour. The kernel needs to detects
> > for:
> > 
> > payload (nh, offset to ip saddr or ip daddr or ip protocol) + cmp
> > payload (nh, offset to ip saddr or ip daddr) + bitwise + cmp
> > meta (iifname or oifname) + bitwise + cmp
> > meta (iifname or oifname) + cmp
> > 
> > at the very beginning of the rule.
> > 
> > and squash these expressions into the "built-in" iptables match
> > expression which emulates ip_packet_match().
> > 
> > Not nice, but if microbenchmarks using thousand of rules really matter
> > (this is worst case O(n) linear list evaluation...) then it might make
> > sense to explore this.

I appreciate the effort to identify a solution which "just works",
though am not sure if we really should implement such hacks (yet). That
said, the "fast" expressions strictly speaking are hacks as well ...

Cheers, Phil

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-10-26 12:46 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-01 16:57 [net-next PATCH 0/2] netfilter: Improve inverted IP prefix matches Phil Sutter
2020-10-01 16:57 ` [net-next PATCH 1/2] net: netfilter: Enable fast nft_cmp for inverted matches Phil Sutter
2020-10-02 13:50   ` [net-next PATCH 1/2 v2] " Phil Sutter
2020-10-04 19:10     ` Pablo Neira Ayuso
2020-10-01 16:57 ` [net-next PATCH 2/2] net: netfilter: Implement fast bitwise expression Phil Sutter
2020-10-04 19:11   ` Pablo Neira Ayuso
2020-10-01 22:25 ` [net-next PATCH 0/2] netfilter: Improve inverted IP prefix matches Florian Westphal
2020-10-02  9:00   ` Phil Sutter
2020-10-21 10:43     ` Pablo Neira Ayuso
2020-10-21 10:49       ` Pablo Neira Ayuso
2020-10-26 12:29         ` Phil Sutter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.