netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [iptables PATCH 00/14] Improve iptables-nft performance with large rulesets
@ 2019-09-16 16:49 Phil Sutter
  2019-09-16 16:49 ` [iptables PATCH 01/14] tests/shell: Make ebtables-basic test more verbose Phil Sutter
                   ` (13 more replies)
  0 siblings, 14 replies; 26+ messages in thread
From: Phil Sutter @ 2019-09-16 16:49 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

I came up with a bunch of tests to compare nft and legacy performance in
rulesets of varying size, so I could not only compare individual
performance but also scaling ability of each.

Initial results were sobering, current nft performs worse in all tests
and scales much worse in almost all of them. With this series applied,
nft is on par or better in most of the cases, often also scaling much
better. Leftovers are scenarios which require to fetch the large
ruleset, e.g. deleting a rule from a large chain or calling
iptables-restore with --noflush option.

Patches 1-6 are merely fallout, fixing things or improving code.

Patch 7 is the first performance-related one: Simply increasing
mnl_talk() receive buffer size speeds up all cache fetches.

The remaining patches uniformly deal with caching: Either avoiding
the cache entirely or allowing for finer granular cache content
selection.

Phil Sutter (14):
  tests/shell: Make ebtables-basic test more verbose
  tests/shell: Speed up ipt-restore/0004-restore-race_0
  DEBUG: Print to stderr to not disturb iptables-save
  nft: Use nftnl_*_set_str() functions
  nft: Introduce nft_bridge_commit()
  nft: Fix for add and delete of same rule in single batch
  nft Increase mnl_talk() receive buffer size
  xtables-restore: Avoid cache population when flushing
  nft: Rename have_cache into have_chain_cache
  nft: Fetch rule cache only if needed
  nft: Allow to fetch only a specific chain from kernel
  nft: Support fetching rules for a single chain only
  nft: Optimize flushing all chains of a table
  nft: Reduce impact of nft_chain_builtin_init()

 iptables/nft.c                                | 285 +++++++++++++-----
 iptables/nft.h                                |  13 +-
 .../testcases/ebtables/0001-ebtables-basic_0  |  28 +-
 .../ipt-restore/0003-restore-ordering_0       |  18 +-
 .../testcases/ipt-restore/0004-restore-race_0 |   4 +-
 iptables/xshared.h                            |   2 +-
 iptables/xtables-eb-standalone.c              |   2 +-
 iptables/xtables-restore.c                    |  11 +-
 iptables/xtables-save.c                       |   4 +-
 9 files changed, 268 insertions(+), 99 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [iptables PATCH 01/14] tests/shell: Make ebtables-basic test more verbose
  2019-09-16 16:49 [iptables PATCH 00/14] Improve iptables-nft performance with large rulesets Phil Sutter
@ 2019-09-16 16:49 ` Phil Sutter
  2019-09-20 10:32   ` Pablo Neira Ayuso
  2019-09-16 16:49 ` [iptables PATCH 02/14] tests/shell: Speed up ipt-restore/0004-restore-race_0 Phil Sutter
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Phil Sutter @ 2019-09-16 16:49 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Print expected entries count if it doesn't match.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 .../testcases/ebtables/0001-ebtables-basic_0  | 28 +++++++++++--------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/iptables/tests/shell/testcases/ebtables/0001-ebtables-basic_0 b/iptables/tests/shell/testcases/ebtables/0001-ebtables-basic_0
index b0db216ae3854..c7f24a383f698 100755
--- a/iptables/tests/shell/testcases/ebtables/0001-ebtables-basic_0
+++ b/iptables/tests/shell/testcases/ebtables/0001-ebtables-basic_0
@@ -1,5 +1,9 @@
 #!/bin/sh
 
+get_entries_count() { # (chain)
+	$XT_MULTI ebtables -L $1 | sed -n 's/.*entries: \([0-9]*\).*/\1/p'
+}
+
 set -x
 case "$XT_MULTI" in
 */xtables-nft-multi)
@@ -28,32 +32,32 @@ case "$XT_MULTI" in
 		exit 1
 	fi
 
-	$XT_MULTI ebtables -L FOO | grep -q 'entries: 0'
-	if [ $? -ne 0 ]; then
-		echo "Unexpected entries count in empty unreferenced chain"
+	entries=$(get_entries_count FOO)
+	if [ $entries -ne 0 ]; then
+		echo "Unexpected entries count in empty unreferenced chain (expected 0, have $entries)"
 		$XT_MULTI ebtables -L
 		exit 1
 	fi
 
 	$XT_MULTI ebtables -A FORWARD -j FOO
-	$XT_MULTI ebtables -L FORWARD | grep -q 'entries: 1'
-	if [ $? -ne 0 ]; then
-		echo "Unexpected entries count in FORWARD chain"
+	entries=$(get_entries_count FORWARD)
+	if [ $entries -ne 1 ]; then
+		echo "Unexpected entries count in FORWARD chain (expected 1, have $entries)"
 		$XT_MULTI ebtables -L
 		exit 1
 	fi
 
-	$XT_MULTI ebtables -L FOO | grep -q 'entries: 0'
-	if [ $? -ne 0 ]; then
-		echo "Unexpected entries count in empty referenced chain"
+	entries=$(get_entries_count FOO)
+	if [ $entries -ne 0 ]; then
+		echo "Unexpected entries count in empty referenced chain (expected 0, have $entries)"
 		$XT_MULTI ebtables -L
 		exit 1
 	fi
 
 	$XT_MULTI ebtables -A FOO -j ACCEPT
-	$XT_MULTI ebtables -L FOO | grep -q 'entries: 1'
-	if [ $? -ne 0 ]; then
-		echo "Unexpected entries count in non-empty referenced chain"
+	entries=$(get_entries_count FOO)
+	if [ $entries -ne 1 ]; then
+		echo "Unexpected entries count in non-empty referenced chain (expected 1, have $entries)"
 		$XT_MULTI ebtables -L
 		exit 1
 	fi
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [iptables PATCH 02/14] tests/shell: Speed up ipt-restore/0004-restore-race_0
  2019-09-16 16:49 [iptables PATCH 00/14] Improve iptables-nft performance with large rulesets Phil Sutter
  2019-09-16 16:49 ` [iptables PATCH 01/14] tests/shell: Make ebtables-basic test more verbose Phil Sutter
@ 2019-09-16 16:49 ` Phil Sutter
  2019-09-16 17:35   ` Florian Westphal
  2019-09-16 16:49 ` [iptables PATCH 03/14] DEBUG: Print to stderr to not disturb iptables-save Phil Sutter
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Phil Sutter @ 2019-09-16 16:49 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

This test tended to cause quite excessive load on my system, sometimes
taking longer than all other tests combined. Even with the reduced
numbers, it still fails reliably after reverting commit 58d7de0181f61
("xtables: handle concurrent ruleset modifications").

Fixes: 4000b4cf2ea38 ("tests: add test script for race-free restore")
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 .../tests/shell/testcases/ipt-restore/0004-restore-race_0     | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/iptables/tests/shell/testcases/ipt-restore/0004-restore-race_0 b/iptables/tests/shell/testcases/ipt-restore/0004-restore-race_0
index a92d18dcee058..96a5e66d0ab81 100755
--- a/iptables/tests/shell/testcases/ipt-restore/0004-restore-race_0
+++ b/iptables/tests/shell/testcases/ipt-restore/0004-restore-race_0
@@ -24,7 +24,7 @@ clean_tempfile()
 
 trap clean_tempfile EXIT
 
-ENTRY_NUM=$((RANDOM%100))
+ENTRY_NUM=$((RANDOM%10))
 UCHAIN_NUM=$((RANDOM%10))
 
 get_target()
@@ -87,7 +87,7 @@ fi
 
 case "$XT_MULTI" in
 */xtables-nft-multi)
-	attempts=$((RANDOM%200))
+	attempts=$((RANDOM%10))
 	attempts=$((attempts+1))
 	;;
 *)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [iptables PATCH 03/14] DEBUG: Print to stderr to not disturb iptables-save
  2019-09-16 16:49 [iptables PATCH 00/14] Improve iptables-nft performance with large rulesets Phil Sutter
  2019-09-16 16:49 ` [iptables PATCH 01/14] tests/shell: Make ebtables-basic test more verbose Phil Sutter
  2019-09-16 16:49 ` [iptables PATCH 02/14] tests/shell: Speed up ipt-restore/0004-restore-race_0 Phil Sutter
@ 2019-09-16 16:49 ` Phil Sutter
  2019-09-20 10:32   ` Pablo Neira Ayuso
  2019-09-16 16:49 ` [iptables PATCH 04/14] nft: Use nftnl_*_set_str() functions Phil Sutter
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Phil Sutter @ 2019-09-16 16:49 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

This way there's at least a chance to get meaningful results from
testsuite with debugging being turned on.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 iptables/xshared.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/iptables/xshared.h b/iptables/xshared.h
index fd1f96bad1b98..b08c700e1d8b9 100644
--- a/iptables/xshared.h
+++ b/iptables/xshared.h
@@ -11,7 +11,7 @@
 #include <linux/netfilter_ipv6/ip6_tables.h>
 
 #ifdef DEBUG
-#define DEBUGP(x, args...) fprintf(stdout, x, ## args)
+#define DEBUGP(x, args...) fprintf(stderr, x, ## args)
 #else
 #define DEBUGP(x, args...)
 #endif
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [iptables PATCH 04/14] nft: Use nftnl_*_set_str() functions
  2019-09-16 16:49 [iptables PATCH 00/14] Improve iptables-nft performance with large rulesets Phil Sutter
                   ` (2 preceding siblings ...)
  2019-09-16 16:49 ` [iptables PATCH 03/14] DEBUG: Print to stderr to not disturb iptables-save Phil Sutter
@ 2019-09-16 16:49 ` Phil Sutter
  2019-09-20 10:33   ` Pablo Neira Ayuso
  2019-09-16 16:49 ` [iptables PATCH 05/14] nft: Introduce nft_bridge_commit() Phil Sutter
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Phil Sutter @ 2019-09-16 16:49 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Although it doesn't make a difference in practice, they are the correct
API functions to use when assigning string attributes.

While doing so, also drop the needless casts to non-const.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 iptables/nft.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/iptables/nft.c b/iptables/nft.c
index ae3740be6bed5..81d01310c7f8c 100644
--- a/iptables/nft.c
+++ b/iptables/nft.c
@@ -662,7 +662,7 @@ static int nft_table_builtin_add(struct nft_handle *h,
 	if (t == NULL)
 		return -1;
 
-	nftnl_table_set(t, NFTNL_TABLE_NAME, (char *)_t->name);
+	nftnl_table_set_str(t, NFTNL_TABLE_NAME, _t->name);
 
 	ret = batch_table_add(h, NFT_COMPAT_TABLE_ADD, t) ? 0 : - 1;
 
@@ -679,12 +679,12 @@ nft_chain_builtin_alloc(const struct builtin_table *table,
 	if (c == NULL)
 		return NULL;
 
-	nftnl_chain_set(c, NFTNL_CHAIN_TABLE, (char *)table->name);
-	nftnl_chain_set(c, NFTNL_CHAIN_NAME, (char *)chain->name);
+	nftnl_chain_set_str(c, NFTNL_CHAIN_TABLE, table->name);
+	nftnl_chain_set_str(c, NFTNL_CHAIN_NAME, chain->name);
 	nftnl_chain_set_u32(c, NFTNL_CHAIN_HOOKNUM, chain->hook);
 	nftnl_chain_set_u32(c, NFTNL_CHAIN_PRIO, chain->prio);
 	nftnl_chain_set_u32(c, NFTNL_CHAIN_POLICY, policy);
-	nftnl_chain_set(c, NFTNL_CHAIN_TYPE, (char *)chain->type);
+	nftnl_chain_set_str(c, NFTNL_CHAIN_TYPE, chain->type);
 
 	return c;
 }
@@ -1250,8 +1250,8 @@ nft_rule_new(struct nft_handle *h, const char *chain, const char *table,
 		return NULL;
 
 	nftnl_rule_set_u32(r, NFTNL_RULE_FAMILY, h->family);
-	nftnl_rule_set(r, NFTNL_RULE_TABLE, (char *)table);
-	nftnl_rule_set(r, NFTNL_RULE_CHAIN, (char *)chain);
+	nftnl_rule_set_str(r, NFTNL_RULE_TABLE, table);
+	nftnl_rule_set_str(r, NFTNL_RULE_CHAIN, chain);
 
 	if (h->ops->add(r, data) < 0)
 		goto err;
@@ -1768,8 +1768,8 @@ __nft_rule_flush(struct nft_handle *h, const char *table,
 	if (r == NULL)
 		return;
 
-	nftnl_rule_set(r, NFTNL_RULE_TABLE, (char *)table);
-	nftnl_rule_set(r, NFTNL_RULE_CHAIN, (char *)chain);
+	nftnl_rule_set_str(r, NFTNL_RULE_TABLE, table);
+	nftnl_rule_set_str(r, NFTNL_RULE_CHAIN, chain);
 
 	obj = batch_rule_add(h, NFT_COMPAT_RULE_FLUSH, r);
 	if (!obj) {
@@ -1850,8 +1850,8 @@ int nft_chain_user_add(struct nft_handle *h, const char *chain, const char *tabl
 	if (c == NULL)
 		return 0;
 
-	nftnl_chain_set(c, NFTNL_CHAIN_TABLE, (char *)table);
-	nftnl_chain_set(c, NFTNL_CHAIN_NAME, (char *)chain);
+	nftnl_chain_set_str(c, NFTNL_CHAIN_TABLE, table);
+	nftnl_chain_set_str(c, NFTNL_CHAIN_NAME, chain);
 	if (h->family == NFPROTO_BRIDGE)
 		nftnl_chain_set_u32(c, NFTNL_CHAIN_POLICY, NF_ACCEPT);
 
@@ -1884,8 +1884,8 @@ int nft_chain_restore(struct nft_handle *h, const char *chain, const char *table
 		if (!c)
 			return -1;
 
-		nftnl_chain_set(c, NFTNL_CHAIN_TABLE, (char *)table);
-		nftnl_chain_set(c, NFTNL_CHAIN_NAME, (char *)chain);
+		nftnl_chain_set_str(c, NFTNL_CHAIN_TABLE, table);
+		nftnl_chain_set_str(c, NFTNL_CHAIN_NAME, chain);
 		created = true;
 	}
 
@@ -2034,8 +2034,8 @@ int nft_chain_user_rename(struct nft_handle *h,const char *chain,
 	if (c == NULL)
 		return 0;
 
-	nftnl_chain_set(c, NFTNL_CHAIN_TABLE, (char *)table);
-	nftnl_chain_set(c, NFTNL_CHAIN_NAME, (char *)newname);
+	nftnl_chain_set_str(c, NFTNL_CHAIN_TABLE, table);
+	nftnl_chain_set_str(c, NFTNL_CHAIN_NAME, newname);
 	nftnl_chain_set_u64(c, NFTNL_CHAIN_HANDLE, handle);
 
 	ret = batch_chain_add(h, NFT_COMPAT_CHAIN_RENAME, c);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [iptables PATCH 05/14] nft: Introduce nft_bridge_commit()
  2019-09-16 16:49 [iptables PATCH 00/14] Improve iptables-nft performance with large rulesets Phil Sutter
                   ` (3 preceding siblings ...)
  2019-09-16 16:49 ` [iptables PATCH 04/14] nft: Use nftnl_*_set_str() functions Phil Sutter
@ 2019-09-16 16:49 ` Phil Sutter
  2019-09-20 10:36   ` Pablo Neira Ayuso
  2019-09-16 16:49 ` [iptables PATCH 06/14] nft: Fix for add and delete of same rule in single batch Phil Sutter
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Phil Sutter @ 2019-09-16 16:49 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

No need to check family value from nft_commit() if we can have a
dedicated callback for bridge family.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 iptables/nft.c                   | 8 ++++++--
 iptables/nft.h                   | 1 +
 iptables/xtables-eb-standalone.c | 2 +-
 iptables/xtables-restore.c       | 2 +-
 4 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/iptables/nft.c b/iptables/nft.c
index 81d01310c7f8c..77ebc4f651e15 100644
--- a/iptables/nft.c
+++ b/iptables/nft.c
@@ -3069,11 +3069,15 @@ static void nft_bridge_commit_prepare(struct nft_handle *h)
 
 int nft_commit(struct nft_handle *h)
 {
-	if (h->family == NFPROTO_BRIDGE)
-		nft_bridge_commit_prepare(h);
 	return nft_action(h, NFT_COMPAT_COMMIT);
 }
 
+int nft_bridge_commit(struct nft_handle *h)
+{
+	nft_bridge_commit_prepare(h);
+	return nft_commit(h);
+}
+
 int nft_abort(struct nft_handle *h)
 {
 	return nft_action(h, NFT_COMPAT_ABORT);
diff --git a/iptables/nft.h b/iptables/nft.h
index 5e5e765b0d043..43463d7f262e8 100644
--- a/iptables/nft.h
+++ b/iptables/nft.h
@@ -147,6 +147,7 @@ uint32_t nft_invflags2cmp(uint32_t invflags, uint32_t flag);
  * global commit and abort
  */
 int nft_commit(struct nft_handle *h);
+int nft_bridge_commit(struct nft_handle *h);
 int nft_abort(struct nft_handle *h);
 int nft_abort_policy_rule(struct nft_handle *h, const char *table);
 
diff --git a/iptables/xtables-eb-standalone.c b/iptables/xtables-eb-standalone.c
index fb3daba0bd604..a9081c78c3bc3 100644
--- a/iptables/xtables-eb-standalone.c
+++ b/iptables/xtables-eb-standalone.c
@@ -51,7 +51,7 @@ int xtables_eb_main(int argc, char *argv[])
 
 	ret = do_commandeb(&h, argc, argv, &table, false);
 	if (ret)
-		ret = nft_commit(&h);
+		ret = nft_bridge_commit(&h);
 
 	if (!ret)
 		fprintf(stderr, "ebtables: %s\n", nft_strerror(errno));
diff --git a/iptables/xtables-restore.c b/iptables/xtables-restore.c
index 601c842feab38..f930f5ba2d167 100644
--- a/iptables/xtables-restore.c
+++ b/iptables/xtables-restore.c
@@ -463,7 +463,7 @@ static int ebt_table_flush(struct nft_handle *h, const char *table)
 
 struct nft_xt_restore_cb ebt_restore_cb = {
 	.chain_list	= get_chain_list,
-	.commit		= nft_commit,
+	.commit		= nft_bridge_commit,
 	.table_new	= nft_table_new,
 	.table_flush	= ebt_table_flush,
 	.do_command	= do_commandeb,
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [iptables PATCH 06/14] nft: Fix for add and delete of same rule in single batch
  2019-09-16 16:49 [iptables PATCH 00/14] Improve iptables-nft performance with large rulesets Phil Sutter
                   ` (4 preceding siblings ...)
  2019-09-16 16:49 ` [iptables PATCH 05/14] nft: Introduce nft_bridge_commit() Phil Sutter
@ 2019-09-16 16:49 ` Phil Sutter
  2019-09-16 16:49 ` [iptables PATCH 07/14] nft Increase mnl_talk() receive buffer size Phil Sutter
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Phil Sutter @ 2019-09-16 16:49 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Another corner-case found when extending restore ordering test: If a
delete command in a dump referenced a rule added earlier within the same
dump, kernel would reject the resulting NFT_MSG_DELRULE command.

Catch this by assigning the rule to delete a RULE_ID value if it doesn't
have a handle yet. Since __nft_rule_del() does not duplicate the
nftnl_rule object when creating the NFT_COMPAT_RULE_DELETE command, this
RULE_ID value is added to both NEWRULE and DELRULE commands - exactly
what is needed to establish the reference.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 iptables/nft.c                                 |  3 +++
 .../ipt-restore/0003-restore-ordering_0        | 18 +++++++++++++-----
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/iptables/nft.c b/iptables/nft.c
index 77ebc4f651e15..6248b9eb10a85 100644
--- a/iptables/nft.c
+++ b/iptables/nft.c
@@ -2199,6 +2199,9 @@ static int __nft_rule_del(struct nft_handle *h, struct nftnl_rule *r)
 
 	nftnl_rule_list_del(r);
 
+	if (!nftnl_rule_get_u64(r, NFTNL_RULE_HANDLE))
+		nftnl_rule_set_u32(r, NFTNL_RULE_ID, ++h->rule_id);
+
 	obj = batch_rule_add(h, NFT_COMPAT_RULE_DELETE, r);
 	if (!obj) {
 		nftnl_rule_free(r);
diff --git a/iptables/tests/shell/testcases/ipt-restore/0003-restore-ordering_0 b/iptables/tests/shell/testcases/ipt-restore/0003-restore-ordering_0
index 51f2422e15259..3f1d229e915ff 100755
--- a/iptables/tests/shell/testcases/ipt-restore/0003-restore-ordering_0
+++ b/iptables/tests/shell/testcases/ipt-restore/0003-restore-ordering_0
@@ -14,7 +14,7 @@ ipt_show() {
 
 $XT_MULTI iptables-restore <<EOF
 *filter
--A FORWARD -m comment --comment "appended rule" -j ACCEPT
+-A FORWARD -m comment --comment "rule 4" -j ACCEPT
 -I FORWARD 1 -m comment --comment "rule 1" -j ACCEPT
 -I FORWARD 2 -m comment --comment "rule 2" -j ACCEPT
 -I FORWARD 3 -m comment --comment "rule 3" -j ACCEPT
@@ -24,7 +24,7 @@ EOF
 EXPECT='-A FORWARD -m comment --comment "rule 1" -j ACCEPT
 -A FORWARD -m comment --comment "rule 2" -j ACCEPT
 -A FORWARD -m comment --comment "rule 3" -j ACCEPT
--A FORWARD -m comment --comment "appended rule" -j ACCEPT'
+-A FORWARD -m comment --comment "rule 4" -j ACCEPT'
 
 diff -u -Z <(echo -e "$EXPECT") <(ipt_show)
 
@@ -32,11 +32,14 @@ diff -u -Z <(echo -e "$EXPECT") <(ipt_show)
 
 $XT_MULTI iptables-restore --noflush <<EOF
 *filter
+-A FORWARD -m comment --comment "rule 5" -j ACCEPT
 -I FORWARD 1 -m comment --comment "rule 0.5" -j ACCEPT
 -I FORWARD 3 -m comment --comment "rule 1.5" -j ACCEPT
 -I FORWARD 5 -m comment --comment "rule 2.5" -j ACCEPT
 -I FORWARD 7 -m comment --comment "rule 3.5" -j ACCEPT
--I FORWARD 9 -m comment --comment "appended rule 2" -j ACCEPT
+-I FORWARD 9 -m comment --comment "rule 4.5" -j ACCEPT
+-I FORWARD 11 -m comment --comment "rule 5.5" -j ACCEPT
+-A FORWARD -m comment --comment "rule 6" -j ACCEPT
 COMMIT
 EOF
 
@@ -47,8 +50,11 @@ EXPECT='-A FORWARD -m comment --comment "rule 0.5" -j ACCEPT
 -A FORWARD -m comment --comment "rule 2.5" -j ACCEPT
 -A FORWARD -m comment --comment "rule 3" -j ACCEPT
 -A FORWARD -m comment --comment "rule 3.5" -j ACCEPT
--A FORWARD -m comment --comment "appended rule" -j ACCEPT
--A FORWARD -m comment --comment "appended rule 2" -j ACCEPT'
+-A FORWARD -m comment --comment "rule 4" -j ACCEPT
+-A FORWARD -m comment --comment "rule 4.5" -j ACCEPT
+-A FORWARD -m comment --comment "rule 5" -j ACCEPT
+-A FORWARD -m comment --comment "rule 5.5" -j ACCEPT
+-A FORWARD -m comment --comment "rule 6" -j ACCEPT'
 
 diff -u -Z <(echo -e "$EXPECT") <(ipt_show)
 
@@ -78,6 +84,8 @@ diff -u -Z <(echo -e "$EXPECT") <(ipt_show)
 
 $XT_MULTI iptables-restore --noflush <<EOF
 *filter
+-A FORWARD -m comment --comment "appended rule 4" -j ACCEPT
+-D FORWARD 7
 -D FORWARD -m comment --comment "appended rule 1" -j ACCEPT
 -D FORWARD 3
 -I FORWARD 3 -m comment --comment "manually replaced rule 2" -j ACCEPT
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [iptables PATCH 07/14] nft Increase mnl_talk() receive buffer size
  2019-09-16 16:49 [iptables PATCH 00/14] Improve iptables-nft performance with large rulesets Phil Sutter
                   ` (5 preceding siblings ...)
  2019-09-16 16:49 ` [iptables PATCH 06/14] nft: Fix for add and delete of same rule in single batch Phil Sutter
@ 2019-09-16 16:49 ` Phil Sutter
  2019-09-17  5:00   ` Pablo Neira Ayuso
  2019-09-20 11:13   ` Pablo Neira Ayuso
  2019-09-16 16:49 ` [iptables PATCH 08/14] xtables-restore: Avoid cache population when flushing Phil Sutter
                   ` (6 subsequent siblings)
  13 siblings, 2 replies; 26+ messages in thread
From: Phil Sutter @ 2019-09-16 16:49 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

This improves cache population quite a bit and therefore helps when
dealing with large rulesets. A simple hard to improve use-case is
listing the last rule in a large chain. These are the average program
run times depending on number of rules:

rule count	| legacy	| nft old	| nft new
---------------------------------------------------------
 50,000		| .052s		| .611s		| .406s
100,000		| .115s		| 2.12s		| 1.24s
150,000		| .265s		| 7.63s		| 4.14s
200,000		| .411s		| 21.0s		| 10.6s

So while legacy iptables is still magnitudes faster, this simple change
doubles iptables-nft performance in ideal cases.

Note that increasing the buffer even further didn't improve performance
anymore, so 32KB seems to be an upper limit in kernel space.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 iptables/nft.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/iptables/nft.c b/iptables/nft.c
index 6248b9eb10a85..7f0f9e1234ae4 100644
--- a/iptables/nft.c
+++ b/iptables/nft.c
@@ -101,7 +101,7 @@ int mnl_talk(struct nft_handle *h, struct nlmsghdr *nlh,
 	     void *data)
 {
 	int ret;
-	char buf[16536];
+	char buf[32768];
 
 	if (mnl_socket_sendto(h->nl, nlh, nlh->nlmsg_len) < 0)
 		return -1;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [iptables PATCH 08/14] xtables-restore: Avoid cache population when flushing
  2019-09-16 16:49 [iptables PATCH 00/14] Improve iptables-nft performance with large rulesets Phil Sutter
                   ` (6 preceding siblings ...)
  2019-09-16 16:49 ` [iptables PATCH 07/14] nft Increase mnl_talk() receive buffer size Phil Sutter
@ 2019-09-16 16:49 ` Phil Sutter
  2019-09-20 11:57   ` Pablo Neira Ayuso
  2019-09-16 16:49 ` [iptables PATCH 09/14] nft: Rename have_cache into have_chain_cache Phil Sutter
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Phil Sutter @ 2019-09-16 16:49 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

When called without --noflush, don't fetch full ruleset from kernel but
merely list of tables and the current genid. Locally, initialize chain
lists and set have_cache to simulate an empty ruleset.

Doing so reduces execution time significantly if a large ruleset exists
in kernel space. A simple test case consisting of a dump with 100,000
rules can be restored within 15s on my testing VM. Restoring it a second
time took 21s before this patch and only 17s after it.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 iptables/nft.c             | 27 ++++++++++++++++++++++++++-
 iptables/nft.h             |  1 +
 iptables/xtables-restore.c |  7 +++++--
 3 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/iptables/nft.c b/iptables/nft.c
index 7f0f9e1234ae4..820f3392dd495 100644
--- a/iptables/nft.c
+++ b/iptables/nft.c
@@ -882,7 +882,8 @@ static int flush_cache(struct nft_cache *c, const struct builtin_table *tables,
 		nftnl_chain_list_free(c->table[i].chains);
 		c->table[i].chains = NULL;
 	}
-	nftnl_table_list_free(c->tables);
+	if (c->tables)
+		nftnl_table_list_free(c->tables);
 	c->tables = NULL;
 
 	return 1;
@@ -1617,6 +1618,30 @@ void nft_build_cache(struct nft_handle *h)
 		__nft_build_cache(h);
 }
 
+void nft_fake_cache(struct nft_handle *h)
+{
+	int i;
+
+	if (h->have_cache)
+		return;
+
+	/* fetch tables so conditional table delete logic works */
+	if (!h->cache->tables)
+		fetch_table_cache(h);
+
+	for (i = 0; i < NFT_TABLE_MAX; i++) {
+		enum nft_table_type type = h->tables[i].type;
+
+		if (!h->tables[i].name ||
+		    h->cache->table[type].chains)
+			continue;
+
+		h->cache->table[type].chains = nftnl_chain_list_alloc();
+	}
+	mnl_genid_get(h, &h->nft_genid);
+	h->have_cache = true;
+}
+
 static void __nft_flush_cache(struct nft_handle *h)
 {
 	if (!h->cache_index) {
diff --git a/iptables/nft.h b/iptables/nft.h
index 43463d7f262e8..d9eb30a2a2413 100644
--- a/iptables/nft.h
+++ b/iptables/nft.h
@@ -74,6 +74,7 @@ int mnl_talk(struct nft_handle *h, struct nlmsghdr *nlh,
 int nft_init(struct nft_handle *h, const struct builtin_table *t);
 void nft_fini(struct nft_handle *h);
 void nft_build_cache(struct nft_handle *h);
+void nft_fake_cache(struct nft_handle *h);
 
 /*
  * Operations with tables.
diff --git a/iptables/xtables-restore.c b/iptables/xtables-restore.c
index f930f5ba2d167..d1486afedc480 100644
--- a/iptables/xtables-restore.c
+++ b/iptables/xtables-restore.c
@@ -96,6 +96,11 @@ void xtables_restore_parse(struct nft_handle *h,
 
 	line = 0;
 
+	if (h->noflush)
+		nft_build_cache(h);
+	else
+		nft_fake_cache(h);
+
 	/* Grab standard input. */
 	while (fgets(buffer, sizeof(buffer), p->in)) {
 		int ret = 0;
@@ -146,8 +151,6 @@ void xtables_restore_parse(struct nft_handle *h,
 			if (p->tablename && (strcmp(p->tablename, table) != 0))
 				continue;
 
-			nft_build_cache(h);
-
 			if (h->noflush == 0) {
 				DEBUGP("Cleaning all chains of table '%s'\n",
 					table);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [iptables PATCH 09/14] nft: Rename have_cache into have_chain_cache
  2019-09-16 16:49 [iptables PATCH 00/14] Improve iptables-nft performance with large rulesets Phil Sutter
                   ` (7 preceding siblings ...)
  2019-09-16 16:49 ` [iptables PATCH 08/14] xtables-restore: Avoid cache population when flushing Phil Sutter
@ 2019-09-16 16:49 ` Phil Sutter
  2019-09-16 16:49 ` [iptables PATCH 10/14] nft: Fetch rule cache only if needed Phil Sutter
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Phil Sutter @ 2019-09-16 16:49 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Prepare for optional rule cache by renaming the boolean.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 iptables/nft.c | 14 +++++++-------
 iptables/nft.h |  2 +-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/iptables/nft.c b/iptables/nft.c
index 820f3392dd495..1483664510518 100644
--- a/iptables/nft.c
+++ b/iptables/nft.c
@@ -891,11 +891,11 @@ static int flush_cache(struct nft_cache *c, const struct builtin_table *tables,
 
 static void flush_chain_cache(struct nft_handle *h, const char *tablename)
 {
-	if (!h->have_cache)
+	if (!h->have_chain_cache)
 		return;
 
 	if (flush_cache(h->cache, h->tables, tablename))
-		h->have_cache = false;
+		h->have_chain_cache = false;
 }
 
 void nft_fini(struct nft_handle *h)
@@ -1601,7 +1601,7 @@ retry:
 	mnl_genid_get(h, &genid_start);
 	fetch_chain_cache(h);
 	fetch_rule_cache(h);
-	h->have_cache = true;
+	h->have_chain_cache = true;
 	mnl_genid_get(h, &genid_stop);
 
 	if (genid_start != genid_stop) {
@@ -1614,7 +1614,7 @@ retry:
 
 void nft_build_cache(struct nft_handle *h)
 {
-	if (!h->have_cache)
+	if (!h->have_chain_cache)
 		__nft_build_cache(h);
 }
 
@@ -1622,7 +1622,7 @@ void nft_fake_cache(struct nft_handle *h)
 {
 	int i;
 
-	if (h->have_cache)
+	if (h->have_chain_cache)
 		return;
 
 	/* fetch tables so conditional table delete logic works */
@@ -1639,7 +1639,7 @@ void nft_fake_cache(struct nft_handle *h)
 		h->cache->table[type].chains = nftnl_chain_list_alloc();
 	}
 	mnl_genid_get(h, &h->nft_genid);
-	h->have_cache = true;
+	h->have_chain_cache = true;
 }
 
 static void __nft_flush_cache(struct nft_handle *h)
@@ -1654,7 +1654,7 @@ static void __nft_flush_cache(struct nft_handle *h)
 
 static void nft_rebuild_cache(struct nft_handle *h)
 {
-	if (h->have_cache)
+	if (h->have_chain_cache)
 		__nft_flush_cache(h);
 
 	__nft_build_cache(h);
diff --git a/iptables/nft.h b/iptables/nft.h
index d9eb30a2a2413..11b2d4e3be6ff 100644
--- a/iptables/nft.h
+++ b/iptables/nft.h
@@ -53,7 +53,7 @@ struct nft_handle {
 	unsigned int		cache_index;
 	struct nft_cache	__cache[2];
 	struct nft_cache	*cache;
-	bool			have_cache;
+	bool			have_chain_cache;
 	bool			restore;
 	bool			noflush;
 	int8_t			config_done;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [iptables PATCH 10/14] nft: Fetch rule cache only if needed
  2019-09-16 16:49 [iptables PATCH 00/14] Improve iptables-nft performance with large rulesets Phil Sutter
                   ` (8 preceding siblings ...)
  2019-09-16 16:49 ` [iptables PATCH 09/14] nft: Rename have_cache into have_chain_cache Phil Sutter
@ 2019-09-16 16:49 ` Phil Sutter
  2019-09-16 16:49 ` [iptables PATCH 11/14] nft: Allow to fetch only a specific chain from kernel Phil Sutter
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Phil Sutter @ 2019-09-16 16:49 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Introduce boolean have_rule_cache to indicate whether rules have been
fetched or not. Introduce nft_build_rule_cache() to trigger explicit
rule cache population.

In a ruleset with many rules, this largely improves performance of
commands which don't need to access the rules themselves. E.g.,
appending a rule to a large chain is now two magnitudes faster than
before and even one magnitude faster than legacy iptables.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 iptables/nft.c | 46 +++++++++++++++++++++++++++++++++++++++++++---
 iptables/nft.h |  2 ++
 2 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/iptables/nft.c b/iptables/nft.c
index 1483664510518..82c892ad96f34 100644
--- a/iptables/nft.c
+++ b/iptables/nft.c
@@ -894,8 +894,10 @@ static void flush_chain_cache(struct nft_handle *h, const char *tablename)
 	if (!h->have_chain_cache)
 		return;
 
-	if (flush_cache(h->cache, h->tables, tablename))
+	if (flush_cache(h->cache, h->tables, tablename)) {
 		h->have_chain_cache = false;
+		h->have_rule_cache = false;
+	}
 }
 
 void nft_fini(struct nft_handle *h)
@@ -1276,6 +1278,14 @@ nft_rule_append(struct nft_handle *h, const char *chain, const char *table,
 
 	nft_xt_builtin_init(h, table);
 
+	/* Since ebtables user-defined chain policies are implemented as last
+	 * rule in nftables, rule cache is required here to treat them right. */
+	if (h->family == NFPROTO_BRIDGE) {
+		c = nft_chain_find(h, table, chain);
+		if (c && !nft_chain_builtin(c))
+			nft_build_rule_cache(h);
+	}
+
 	nft_fn = nft_rule_append;
 
 	r = nft_rule_new(h, chain, table, data);
@@ -1602,6 +1612,7 @@ retry:
 	fetch_chain_cache(h);
 	fetch_rule_cache(h);
 	h->have_chain_cache = true;
+	h->have_rule_cache = true;
 	mnl_genid_get(h, &genid_stop);
 
 	if (genid_start != genid_stop) {
@@ -1618,11 +1629,19 @@ void nft_build_cache(struct nft_handle *h)
 		__nft_build_cache(h);
 }
 
+void nft_build_rule_cache(struct nft_handle *h)
+{
+	if (!h->have_rule_cache) {
+		fetch_rule_cache(h);
+		h->have_rule_cache = true;
+	}
+}
+
 void nft_fake_cache(struct nft_handle *h)
 {
 	int i;
 
-	if (h->have_chain_cache)
+	if (h->have_chain_cache && h->have_rule_cache)
 		return;
 
 	/* fetch tables so conditional table delete logic works */
@@ -1636,10 +1655,14 @@ void nft_fake_cache(struct nft_handle *h)
 		    h->cache->table[type].chains)
 			continue;
 
+		if (h->cache->table[type].chains)
+			continue;
+
 		h->cache->table[type].chains = nftnl_chain_list_alloc();
 	}
 	mnl_genid_get(h, &h->nft_genid);
 	h->have_chain_cache = true;
+	h->have_rule_cache = true;
 }
 
 static void __nft_flush_cache(struct nft_handle *h)
@@ -1675,7 +1698,10 @@ struct nftnl_chain_list *nft_chain_list_get(struct nft_handle *h,
 	if (!t)
 		return NULL;
 
-	nft_build_cache(h);
+	if (!h->have_chain_cache) {
+		fetch_chain_cache(h);
+		h->have_chain_cache = true;
+	}
 
 	return h->cache->table[t->type].chains;
 }
@@ -1760,6 +1786,8 @@ int nft_rule_save(struct nft_handle *h, const char *table, unsigned int format)
 	if (!list)
 		return 0;
 
+	nft_build_rule_cache(h);
+
 	iter = nftnl_chain_list_iter_create(list);
 	if (!iter)
 		return 0;
@@ -1981,6 +2009,10 @@ int nft_chain_user_del(struct nft_handle *h, const char *chain,
 	if (list == NULL)
 		return 0;
 
+	/* This triggers required policy rule deletion. */
+	if (h->family == NFPROTO_BRIDGE)
+		nft_build_rule_cache(h);
+
 	if (chain) {
 		c = nftnl_chain_list_lookup_byname(list, chain);
 		if (!c) {
@@ -2242,6 +2274,8 @@ nft_rule_find(struct nft_handle *h, struct nftnl_chain *c, void *data, int rulen
 	struct nftnl_rule_iter *iter;
 	bool found = false;
 
+	nft_build_rule_cache(h);
+
 	if (rulenum >= 0)
 		/* Delete by rule number case */
 		return nftnl_rule_lookup_byindex(c, rulenum);
@@ -3062,6 +3096,8 @@ int ebt_set_user_chain_policy(struct nft_handle *h, const char *table,
 	if (!c)
 		return 0;
 
+	nft_build_rule_cache(h);
+
 	if (!strcmp(policy, "DROP"))
 		pval = NF_DROP;
 	else if (!strcmp(policy, "ACCEPT"))
@@ -3339,6 +3375,8 @@ int nft_chain_zero_counters(struct nft_handle *h, const char *chain,
 	if (list == NULL)
 		goto err;
 
+	nft_build_rule_cache(h);
+
 	if (chain) {
 		c = nftnl_chain_list_lookup_byname(list, chain);
 		if (!c) {
@@ -3445,6 +3483,8 @@ bool nft_is_table_compatible(struct nft_handle *h, const char *tablename)
 	if (clist == NULL)
 		return false;
 
+	nft_build_rule_cache(h);
+
 	if (nftnl_chain_list_foreach(clist, nft_is_chain_compatible, h))
 		return false;
 
diff --git a/iptables/nft.h b/iptables/nft.h
index 11b2d4e3be6ff..718acdbf0c55d 100644
--- a/iptables/nft.h
+++ b/iptables/nft.h
@@ -54,6 +54,7 @@ struct nft_handle {
 	struct nft_cache	__cache[2];
 	struct nft_cache	*cache;
 	bool			have_chain_cache;
+	bool			have_rule_cache;
 	bool			restore;
 	bool			noflush;
 	int8_t			config_done;
@@ -74,6 +75,7 @@ int mnl_talk(struct nft_handle *h, struct nlmsghdr *nlh,
 int nft_init(struct nft_handle *h, const struct builtin_table *t);
 void nft_fini(struct nft_handle *h);
 void nft_build_cache(struct nft_handle *h);
+void nft_build_rule_cache(struct nft_handle *h);
 void nft_fake_cache(struct nft_handle *h);
 
 /*
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [iptables PATCH 11/14] nft: Allow to fetch only a specific chain from kernel
  2019-09-16 16:49 [iptables PATCH 00/14] Improve iptables-nft performance with large rulesets Phil Sutter
                   ` (9 preceding siblings ...)
  2019-09-16 16:49 ` [iptables PATCH 10/14] nft: Fetch rule cache only if needed Phil Sutter
@ 2019-09-16 16:49 ` Phil Sutter
  2019-09-16 16:49 ` [iptables PATCH 12/14] nft: Support fetching rules for a single chain only Phil Sutter
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Phil Sutter @ 2019-09-16 16:49 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Extend nft_chain_list_get() to accept an optional chain name callers are
interested in. If chain cache is not present yet, add a suitable payload
to NFT_MSG_GETCHAIN request to retrieve only that single chain. The
table's chain list will then contain only that chain.

Chain cache is considered present (i.e., have_chain_cache set to true)
only if fetch_chain_cache() was called without a specific chain name.
Therefore a full cache update could happen after fetching a specific
chain which means cache update preparation has to drop each table's
chain list to avoid duplicates.

With fetch_chain_cache() potentially ignoring all but a single table,
make fetch_rule_cache() aware of potentially uninitialized chain lists.
Rules are simply not fetched for those then.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 iptables/nft.c             | 64 +++++++++++++++++++++++++++-----------
 iptables/nft.h             |  3 +-
 iptables/xtables-restore.c |  2 +-
 iptables/xtables-save.c    |  2 +-
 4 files changed, 50 insertions(+), 21 deletions(-)

diff --git a/iptables/nft.c b/iptables/nft.c
index 82c892ad96f34..46c7be372a10f 100644
--- a/iptables/nft.c
+++ b/iptables/nft.c
@@ -750,7 +750,7 @@ nft_chain_builtin_find(const struct builtin_table *t, const char *chain)
 static void nft_chain_builtin_init(struct nft_handle *h,
 				   const struct builtin_table *table)
 {
-	struct nftnl_chain_list *list = nft_chain_list_get(h, table->name);
+	struct nftnl_chain_list *list = nft_chain_list_get(h, table->name, NULL);
 	struct nftnl_chain *c;
 	int i;
 
@@ -1425,7 +1425,7 @@ static int fetch_table_cache(struct nft_handle *h)
 	return 1;
 }
 
-static int fetch_chain_cache(struct nft_handle *h)
+static int fetch_chain_cache(struct nft_handle *h, const char *table, const char *chain)
 {
 	char buf[16536];
 	struct nlmsghdr *nlh;
@@ -1439,13 +1439,37 @@ static int fetch_chain_cache(struct nft_handle *h)
 		if (!h->tables[i].name)
 			continue;
 
+		/* When not fetching a single chain, kernel returns all chains
+		 * of the given family which may belong to other tables than
+		 * the expected one. Hence skip chain list initialization only
+		 * if both chain and table are specified. */
+		if (chain && table && strcmp(table, h->tables[i].name))
+			continue;
+
+		if (h->cache->table[type].chains)
+			nftnl_chain_list_free(h->cache->table[type].chains);
+
 		h->cache->table[type].chains = nftnl_chain_list_alloc();
 		if (!h->cache->table[type].chains)
 			return -1;
 	}
 
-	nlh = nftnl_chain_nlmsg_build_hdr(buf, NFT_MSG_GETCHAIN, h->family,
-					NLM_F_DUMP, h->seq);
+	if (table && chain) {
+		struct nftnl_chain *c;
+
+		nlh = nftnl_chain_nlmsg_build_hdr(buf, NFT_MSG_GETCHAIN, h->family,
+						NLM_F_ACK, h->seq);
+		c = nftnl_chain_alloc();
+		if (!c)
+			return -1;
+		nftnl_chain_set_str(c, NFTNL_CHAIN_TABLE, table);
+		nftnl_chain_set_str(c, NFTNL_CHAIN_NAME, chain);
+		nftnl_chain_nlmsg_build_payload(nlh, c);
+		nftnl_chain_free(c);
+	} else {
+		nlh = nftnl_chain_nlmsg_build_hdr(buf, NFT_MSG_GETCHAIN, h->family,
+						NLM_F_DUMP, h->seq);
+	}
 
 	ret = mnl_talk(h, nlh, nftnl_chain_list_cb, h);
 	if (ret < 0 && errno == EINTR)
@@ -1596,6 +1620,9 @@ static int fetch_rule_cache(struct nft_handle *h)
 		if (!h->tables[i].name)
 			continue;
 
+		if (!h->cache->table[type].chains)
+			continue;
+
 		if (nftnl_chain_list_foreach(h->cache->table[type].chains,
 					     nft_rule_list_update, h))
 			return -1;
@@ -1609,7 +1636,7 @@ static void __nft_build_cache(struct nft_handle *h)
 
 retry:
 	mnl_genid_get(h, &genid_start);
-	fetch_chain_cache(h);
+	fetch_chain_cache(h, NULL, NULL);
 	fetch_rule_cache(h);
 	h->have_chain_cache = true;
 	h->have_rule_cache = true;
@@ -1690,7 +1717,7 @@ static void nft_release_cache(struct nft_handle *h)
 }
 
 struct nftnl_chain_list *nft_chain_list_get(struct nft_handle *h,
-					    const char *table)
+					    const char *table, const char *chain)
 {
 	const struct builtin_table *t;
 
@@ -1699,8 +1726,9 @@ struct nftnl_chain_list *nft_chain_list_get(struct nft_handle *h,
 		return NULL;
 
 	if (!h->have_chain_cache) {
-		fetch_chain_cache(h);
-		h->have_chain_cache = true;
+		fetch_chain_cache(h, table, chain);
+		if (!chain)
+			h->have_chain_cache = true;
 	}
 
 	return h->cache->table[t->type].chains;
@@ -1782,7 +1810,7 @@ int nft_rule_save(struct nft_handle *h, const char *table, unsigned int format)
 	struct nftnl_chain *c;
 	int ret = 0;
 
-	list = nft_chain_list_get(h, table);
+	list = nft_chain_list_get(h, table, NULL);
 	if (!list)
 		return 0;
 
@@ -1845,7 +1873,7 @@ int nft_rule_flush(struct nft_handle *h, const char *chain, const char *table,
 
 	nft_fn = nft_rule_flush;
 
-	list = nft_chain_list_get(h, table);
+	list = nft_chain_list_get(h, table, chain);
 	if (list == NULL) {
 		ret = 1;
 		goto err;
@@ -1910,7 +1938,7 @@ int nft_chain_user_add(struct nft_handle *h, const char *chain, const char *tabl
 
 	ret = batch_chain_add(h, NFT_COMPAT_CHAIN_USER_ADD, c);
 
-	list = nft_chain_list_get(h, table);
+	list = nft_chain_list_get(h, table, NULL);
 	if (list)
 		nftnl_chain_list_add(c, list);
 
@@ -1950,7 +1978,7 @@ int nft_chain_restore(struct nft_handle *h, const char *chain, const char *table
 
 	ret = batch_chain_add(h, NFT_COMPAT_CHAIN_USER_ADD, c);
 
-	list = nft_chain_list_get(h, table);
+	list = nft_chain_list_get(h, table, chain);
 	if (list)
 		nftnl_chain_list_add(c, list);
 
@@ -2005,7 +2033,7 @@ int nft_chain_user_del(struct nft_handle *h, const char *chain,
 
 	nft_fn = nft_chain_user_del;
 
-	list = nft_chain_list_get(h, table);
+	list = nft_chain_list_get(h, table, chain);
 	if (list == NULL)
 		return 0;
 
@@ -2037,7 +2065,7 @@ nft_chain_find(struct nft_handle *h, const char *table, const char *chain)
 {
 	struct nftnl_chain_list *list;
 
-	list = nft_chain_list_get(h, table);
+	list = nft_chain_list_get(h, table, chain);
 	if (list == NULL)
 		return NULL;
 
@@ -2584,7 +2612,7 @@ int nft_rule_list(struct nft_handle *h, const char *chain, const char *table,
 		return 0;
 	}
 
-	list = nft_chain_list_get(h, table);
+	list = nft_chain_list_get(h, table, chain);
 	if (!list)
 		return 0;
 
@@ -2687,7 +2715,7 @@ int nft_rule_list_save(struct nft_handle *h, const char *chain,
 		return 0;
 	}
 
-	list = nft_chain_list_get(h, table);
+	list = nft_chain_list_get(h, table, chain);
 	if (!list)
 		return 0;
 
@@ -3371,7 +3399,7 @@ int nft_chain_zero_counters(struct nft_handle *h, const char *chain,
 	struct nftnl_chain *c;
 	int ret = 0;
 
-	list = nft_chain_list_get(h, table);
+	list = nft_chain_list_get(h, table, chain);
 	if (list == NULL)
 		goto err;
 
@@ -3479,7 +3507,7 @@ bool nft_is_table_compatible(struct nft_handle *h, const char *tablename)
 {
 	struct nftnl_chain_list *clist;
 
-	clist = nft_chain_list_get(h, tablename);
+	clist = nft_chain_list_get(h, tablename, NULL);
 	if (clist == NULL)
 		return false;
 
diff --git a/iptables/nft.h b/iptables/nft.h
index 718acdbf0c55d..4b1c191effbd6 100644
--- a/iptables/nft.h
+++ b/iptables/nft.h
@@ -98,7 +98,8 @@ struct nftnl_chain;
 
 int nft_chain_set(struct nft_handle *h, const char *table, const char *chain, const char *policy, const struct xt_counters *counters);
 struct nftnl_chain_list *nft_chain_list_get(struct nft_handle *h,
-					    const char *table);
+					    const char *table,
+					    const char *chain);
 int nft_chain_save(struct nft_handle *h, struct nftnl_chain_list *list);
 int nft_chain_user_add(struct nft_handle *h, const char *chain, const char *table);
 int nft_chain_user_del(struct nft_handle *h, const char *chain, const char *table, bool verbose);
diff --git a/iptables/xtables-restore.c b/iptables/xtables-restore.c
index d1486afedc480..835a21be7324f 100644
--- a/iptables/xtables-restore.c
+++ b/iptables/xtables-restore.c
@@ -62,7 +62,7 @@ static struct nftnl_chain_list *get_chain_list(struct nft_handle *h,
 {
 	struct nftnl_chain_list *chain_list;
 
-	chain_list = nft_chain_list_get(h, table);
+	chain_list = nft_chain_list_get(h, table, NULL);
 	if (chain_list == NULL)
 		xtables_error(OTHER_PROBLEM, "cannot retrieve chain list\n");
 
diff --git a/iptables/xtables-save.c b/iptables/xtables-save.c
index 77b13f7ffbcdd..503ae401737c5 100644
--- a/iptables/xtables-save.c
+++ b/iptables/xtables-save.c
@@ -82,7 +82,7 @@ __do_output(struct nft_handle *h, const char *tablename, void *data)
 		return 0;
 	}
 
-	chain_list = nft_chain_list_get(h, tablename);
+	chain_list = nft_chain_list_get(h, tablename, NULL);
 	if (!chain_list)
 		return 0;
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [iptables PATCH 12/14] nft: Support fetching rules for a single chain only
  2019-09-16 16:49 [iptables PATCH 00/14] Improve iptables-nft performance with large rulesets Phil Sutter
                   ` (10 preceding siblings ...)
  2019-09-16 16:49 ` [iptables PATCH 11/14] nft: Allow to fetch only a specific chain from kernel Phil Sutter
@ 2019-09-16 16:49 ` Phil Sutter
  2019-09-16 16:49 ` [iptables PATCH 13/14] nft: Optimize flushing all chains of a table Phil Sutter
  2019-09-16 16:50 ` [iptables PATCH 14/14] nft: Reduce impact of nft_chain_builtin_init() Phil Sutter
  13 siblings, 0 replies; 26+ messages in thread
From: Phil Sutter @ 2019-09-16 16:49 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Just like with single chain fetching, pass optional table and chain
names to nft_build_rule_cache() to restrict its effect onto those.

This effectively makes iptables-nft ignore other tables and other chains
of the same table even if rule cache is needed for an operation
affecting a single chain only.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 iptables/nft.c          | 52 ++++++++++++++++++++++++++++-------------
 iptables/nft.h          |  6 +++--
 iptables/xtables-save.c |  2 +-
 3 files changed, 41 insertions(+), 19 deletions(-)

diff --git a/iptables/nft.c b/iptables/nft.c
index 46c7be372a10f..294cb306700d0 100644
--- a/iptables/nft.c
+++ b/iptables/nft.c
@@ -1283,7 +1283,7 @@ nft_rule_append(struct nft_handle *h, const char *chain, const char *table,
 	if (h->family == NFPROTO_BRIDGE) {
 		c = nft_chain_find(h, table, chain);
 		if (c && !nft_chain_builtin(c))
-			nft_build_rule_cache(h);
+			nft_build_rule_cache(h, table, chain);
 	}
 
 	nft_fn = nft_rule_append;
@@ -1610,7 +1610,7 @@ static int nft_rule_list_update(struct nftnl_chain *c, void *data)
 	return 0;
 }
 
-static int fetch_rule_cache(struct nft_handle *h)
+static int fetch_rule_cache(struct nft_handle *h, const char *table, const char *chain)
 {
 	int i;
 
@@ -1623,9 +1623,25 @@ static int fetch_rule_cache(struct nft_handle *h)
 		if (!h->cache->table[type].chains)
 			continue;
 
+		if (table && strcmp(h->tables[i].name, table))
+			continue;
+
+		if (chain) {
+			struct nftnl_chain_list *clist;
+			struct nftnl_chain *c;
+
+			clist = h->cache->table[type].chains;
+			c = nftnl_chain_list_lookup_byname(clist, chain);
+
+			return c ? nft_rule_list_update(c, h) : -1;
+		}
+
 		if (nftnl_chain_list_foreach(h->cache->table[type].chains,
 					     nft_rule_list_update, h))
 			return -1;
+
+		if (table)
+			break;
 	}
 	return 0;
 }
@@ -1637,7 +1653,7 @@ static void __nft_build_cache(struct nft_handle *h)
 retry:
 	mnl_genid_get(h, &genid_start);
 	fetch_chain_cache(h, NULL, NULL);
-	fetch_rule_cache(h);
+	fetch_rule_cache(h, NULL, NULL);
 	h->have_chain_cache = true;
 	h->have_rule_cache = true;
 	mnl_genid_get(h, &genid_stop);
@@ -1656,11 +1672,13 @@ void nft_build_cache(struct nft_handle *h)
 		__nft_build_cache(h);
 }
 
-void nft_build_rule_cache(struct nft_handle *h)
+void nft_build_rule_cache(struct nft_handle *h,
+			  const char *table, const char *chain)
 {
 	if (!h->have_rule_cache) {
-		fetch_rule_cache(h);
-		h->have_rule_cache = true;
+		fetch_rule_cache(h, table, chain);
+		if (!table && !chain)
+			h->have_rule_cache = true;
 	}
 }
 
@@ -1814,7 +1832,7 @@ int nft_rule_save(struct nft_handle *h, const char *table, unsigned int format)
 	if (!list)
 		return 0;
 
-	nft_build_rule_cache(h);
+	nft_build_rule_cache(h, table, NULL);
 
 	iter = nftnl_chain_list_iter_create(list);
 	if (!iter)
@@ -2039,7 +2057,7 @@ int nft_chain_user_del(struct nft_handle *h, const char *chain,
 
 	/* This triggers required policy rule deletion. */
 	if (h->family == NFPROTO_BRIDGE)
-		nft_build_rule_cache(h);
+		nft_build_rule_cache(h, table, chain);
 
 	if (chain) {
 		c = nftnl_chain_list_lookup_byname(list, chain);
@@ -2302,7 +2320,8 @@ nft_rule_find(struct nft_handle *h, struct nftnl_chain *c, void *data, int rulen
 	struct nftnl_rule_iter *iter;
 	bool found = false;
 
-	nft_build_rule_cache(h);
+	nft_build_rule_cache(h, nftnl_chain_get_str(c, NFTNL_CHAIN_TABLE),
+			     nftnl_chain_get_str(c, NFTNL_CHAIN_NAME));
 
 	if (rulenum >= 0)
 		/* Delete by rule number case */
@@ -2607,7 +2626,7 @@ int nft_rule_list(struct nft_handle *h, const char *chain, const char *table,
 
 	ops = nft_family_ops_lookup(h->family);
 
-	if (!nft_is_table_compatible(h, table)) {
+	if (!nft_is_table_compatible(h, table, chain)) {
 		xtables_error(OTHER_PROBLEM, "table `%s' is incompatible, use 'nft' tool.\n", table);
 		return 0;
 	}
@@ -2710,7 +2729,7 @@ int nft_rule_list_save(struct nft_handle *h, const char *chain,
 
 	nft_xt_builtin_init(h, table);
 
-	if (!nft_is_table_compatible(h, table)) {
+	if (!nft_is_table_compatible(h, table, chain)) {
 		xtables_error(OTHER_PROBLEM, "table `%s' is incompatible, use 'nft' tool.\n", table);
 		return 0;
 	}
@@ -3124,7 +3143,7 @@ int ebt_set_user_chain_policy(struct nft_handle *h, const char *table,
 	if (!c)
 		return 0;
 
-	nft_build_rule_cache(h);
+	nft_build_rule_cache(h, table, chain);
 
 	if (!strcmp(policy, "DROP"))
 		pval = NF_DROP;
@@ -3403,7 +3422,7 @@ int nft_chain_zero_counters(struct nft_handle *h, const char *chain,
 	if (list == NULL)
 		goto err;
 
-	nft_build_rule_cache(h);
+	nft_build_rule_cache(h, table, chain);
 
 	if (chain) {
 		c = nftnl_chain_list_lookup_byname(list, chain);
@@ -3503,15 +3522,16 @@ static int nft_is_chain_compatible(struct nftnl_chain *c, void *data)
 	return 0;
 }
 
-bool nft_is_table_compatible(struct nft_handle *h, const char *tablename)
+bool nft_is_table_compatible(struct nft_handle *h,
+			     const char *tablename, const char *chainname)
 {
 	struct nftnl_chain_list *clist;
 
-	clist = nft_chain_list_get(h, tablename, NULL);
+	clist = nft_chain_list_get(h, tablename, chainname);
 	if (clist == NULL)
 		return false;
 
-	nft_build_rule_cache(h);
+	nft_build_rule_cache(h, tablename, chainname);
 
 	if (nftnl_chain_list_foreach(clist, nft_is_chain_compatible, h))
 		return false;
diff --git a/iptables/nft.h b/iptables/nft.h
index 4b1c191effbd6..e6bfdf34103b1 100644
--- a/iptables/nft.h
+++ b/iptables/nft.h
@@ -75,7 +75,8 @@ int mnl_talk(struct nft_handle *h, struct nlmsghdr *nlh,
 int nft_init(struct nft_handle *h, const struct builtin_table *t);
 void nft_fini(struct nft_handle *h);
 void nft_build_cache(struct nft_handle *h);
-void nft_build_rule_cache(struct nft_handle *h);
+void nft_build_rule_cache(struct nft_handle *h,
+			  const char *table, const char *chain);
 void nft_fake_cache(struct nft_handle *h);
 
 /*
@@ -202,7 +203,8 @@ int nft_arp_rule_insert(struct nft_handle *h, const char *chain,
 
 void nft_rule_to_arpt_entry(struct nftnl_rule *r, struct arpt_entry *fw);
 
-bool nft_is_table_compatible(struct nft_handle *h, const char *name);
+bool nft_is_table_compatible(struct nft_handle *h,
+			     const char *table, const char *chain);
 
 int ebt_set_user_chain_policy(struct nft_handle *h, const char *table,
 			      const char *chain, const char *policy);
diff --git a/iptables/xtables-save.c b/iptables/xtables-save.c
index 503ae401737c5..000104d3b51ae 100644
--- a/iptables/xtables-save.c
+++ b/iptables/xtables-save.c
@@ -76,7 +76,7 @@ __do_output(struct nft_handle *h, const char *tablename, void *data)
 	if (!nft_table_builtin_find(h, tablename))
 		return 0;
 
-	if (!nft_is_table_compatible(h, tablename)) {
+	if (!nft_is_table_compatible(h, tablename, NULL)) {
 		printf("# Table `%s' is incompatible, use 'nft' tool.\n",
 		       tablename);
 		return 0;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [iptables PATCH 13/14] nft: Optimize flushing all chains of a table
  2019-09-16 16:49 [iptables PATCH 00/14] Improve iptables-nft performance with large rulesets Phil Sutter
                   ` (11 preceding siblings ...)
  2019-09-16 16:49 ` [iptables PATCH 12/14] nft: Support fetching rules for a single chain only Phil Sutter
@ 2019-09-16 16:49 ` Phil Sutter
  2019-09-16 16:50 ` [iptables PATCH 14/14] nft: Reduce impact of nft_chain_builtin_init() Phil Sutter
  13 siblings, 0 replies; 26+ messages in thread
From: Phil Sutter @ 2019-09-16 16:49 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Leverage nftables' support for flushing all chains of a table by
omitting NFTNL_RULE_CHAIN attribute in NFT_MSG_DELRULE payload.

The only caveat is with verbose output, as that still requires to have a
list of (existing) chains to iterate over. Apart from that, implementing
this shortcut is pretty straightforward: Don't retrieve a chain list and
just call __nft_rule_flush() directly which doesn't set above attribute
if chain name pointer is NULL.

A bigger deal is keeping rule cache consistent: Instead of just clearing
rule list for each flushed chain, flush_rule_cache() is updated to
iterate over all cached chains of the given table and clears their rule
lists if not called for a specific chain.

While being at it, sort local variable declarations in nft_rule_flush()
from longest to shortest and drop the loop-local 'chain_name' variable
(but instead use 'chain' function parameter which is not used at that
point).

Note that this change by itself only marginally improves 'iptables-nft
-F' performance for a ruleset with many chains because
nft_xt_builtin_init() still fetches all chains from kernel. A follow-up
patch takes care of that, finally establishing O(1) complexity.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 iptables/nft.c | 54 ++++++++++++++++++++++++++++++++++----------------
 1 file changed, 37 insertions(+), 17 deletions(-)

diff --git a/iptables/nft.c b/iptables/nft.c
index 294cb306700d0..5eb129461dab2 100644
--- a/iptables/nft.c
+++ b/iptables/nft.c
@@ -836,7 +836,7 @@ int nft_init(struct nft_handle *h, const struct builtin_table *t)
 	return 0;
 }
 
-static int __flush_rule_cache(struct nftnl_rule *r, void *data)
+static int ____flush_rule_cache(struct nftnl_rule *r, void *data)
 {
 	nftnl_rule_list_del(r);
 	nftnl_rule_free(r);
@@ -844,9 +844,25 @@ static int __flush_rule_cache(struct nftnl_rule *r, void *data)
 	return 0;
 }
 
-static void flush_rule_cache(struct nftnl_chain *c)
+static int __flush_rule_cache(struct nftnl_chain *c, void *data)
 {
-	nftnl_rule_foreach(c, __flush_rule_cache, NULL);
+	return nftnl_rule_foreach(c, ____flush_rule_cache, NULL);
+}
+
+static int flush_rule_cache(struct nft_handle *h, const char *table,
+			    struct nftnl_chain *c)
+{
+	const struct builtin_table *t;
+
+	if (c)
+		return __flush_rule_cache(c, NULL);
+
+	t = nft_table_builtin_find(h, table);
+	if (!t|| !h->cache->table[t->type].chains)
+		return 0;
+
+	return nftnl_chain_list_foreach(h->cache->table[t->type].chains,
+					__flush_rule_cache, NULL);
 }
 
 static int __flush_chain_cache(struct nftnl_chain *c, void *data)
@@ -1860,7 +1876,7 @@ __nft_rule_flush(struct nft_handle *h, const char *table,
 	struct obj_update *obj;
 	struct nftnl_rule *r;
 
-	if (verbose)
+	if (verbose && chain)
 		fprintf(stdout, "Flushing chain `%s'\n", chain);
 
 	r = nftnl_rule_alloc();
@@ -1868,7 +1884,8 @@ __nft_rule_flush(struct nft_handle *h, const char *table,
 		return;
 
 	nftnl_rule_set_str(r, NFTNL_RULE_TABLE, table);
-	nftnl_rule_set_str(r, NFTNL_RULE_CHAIN, chain);
+	if (chain)
+		nftnl_rule_set_str(r, NFTNL_RULE_CHAIN, chain);
 
 	obj = batch_rule_add(h, NFT_COMPAT_RULE_FLUSH, r);
 	if (!obj) {
@@ -1882,19 +1899,21 @@ __nft_rule_flush(struct nft_handle *h, const char *table,
 int nft_rule_flush(struct nft_handle *h, const char *chain, const char *table,
 		   bool verbose)
 {
-	int ret = 0;
-	struct nftnl_chain_list *list;
 	struct nftnl_chain_list_iter *iter;
-	struct nftnl_chain *c;
+	struct nftnl_chain_list *list;
+	struct nftnl_chain *c = NULL;
+	int ret = 0;
 
 	nft_xt_builtin_init(h, table);
 
 	nft_fn = nft_rule_flush;
 
-	list = nft_chain_list_get(h, table, chain);
-	if (list == NULL) {
-		ret = 1;
-		goto err;
+	if (chain || verbose) {
+		list = nft_chain_list_get(h, table, chain);
+		if (list == NULL) {
+			ret = 1;
+			goto err;
+		}
 	}
 
 	if (chain) {
@@ -1903,9 +1922,11 @@ int nft_rule_flush(struct nft_handle *h, const char *chain, const char *table,
 			errno = ENOENT;
 			return 0;
 		}
+	}
 
+	if (chain || !verbose) {
 		__nft_rule_flush(h, table, chain, verbose, false);
-		flush_rule_cache(c);
+		flush_rule_cache(h, table, c);
 		return 1;
 	}
 
@@ -1917,11 +1938,10 @@ int nft_rule_flush(struct nft_handle *h, const char *chain, const char *table,
 
 	c = nftnl_chain_list_iter_next(iter);
 	while (c != NULL) {
-		const char *chain_name =
-			nftnl_chain_get_str(c, NFTNL_CHAIN_NAME);
+		chain = nftnl_chain_get_str(c, NFTNL_CHAIN_NAME);
 
-		__nft_rule_flush(h, table, chain_name, verbose, false);
-		flush_rule_cache(c);
+		__nft_rule_flush(h, table, chain, verbose, false);
+		flush_rule_cache(h, table, c);
 		c = nftnl_chain_list_iter_next(iter);
 	}
 	nftnl_chain_list_iter_destroy(iter);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [iptables PATCH 14/14] nft: Reduce impact of nft_chain_builtin_init()
  2019-09-16 16:49 [iptables PATCH 00/14] Improve iptables-nft performance with large rulesets Phil Sutter
                   ` (12 preceding siblings ...)
  2019-09-16 16:49 ` [iptables PATCH 13/14] nft: Optimize flushing all chains of a table Phil Sutter
@ 2019-09-16 16:50 ` Phil Sutter
  13 siblings, 0 replies; 26+ messages in thread
From: Phil Sutter @ 2019-09-16 16:50 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

When initializing builtin chains, fetch only the chain to add from
kernel to reduce overhead in case kernel ruleset contains many chains.

Given the flexibility of nft_chain_list_get() this is not a problem, but
care has to be taken when updating the table's chain list: Since a
command like 'iptables-nft -F INPUT' causes two invocations of
nft_chain_list_get() with same arguments, the same chain will be fetched
from kernel twice. To handle this, simply clearing chain list from
fetch_chain_cache() is not an option as list elements might be
referenced by pending batch jobs. Instead abort in nftnl_chain_list_cb()
if a chain with same name already exists in the list. Also, the call to
fetch_table_cache() must happen conditionally to avoid leaking existing
table list.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 iptables/nft.c | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/iptables/nft.c b/iptables/nft.c
index 5eb129461dab2..66011a8066ed9 100644
--- a/iptables/nft.c
+++ b/iptables/nft.c
@@ -750,15 +750,15 @@ nft_chain_builtin_find(const struct builtin_table *t, const char *chain)
 static void nft_chain_builtin_init(struct nft_handle *h,
 				   const struct builtin_table *table)
 {
-	struct nftnl_chain_list *list = nft_chain_list_get(h, table->name, NULL);
+	struct nftnl_chain_list *list;
 	struct nftnl_chain *c;
 	int i;
 
-	if (!list)
-		return;
-
 	/* Initialize built-in chains if they don't exist yet */
 	for (i=0; i < NF_INET_NUMHOOKS && table->chains[i].name != NULL; i++) {
+		list = nft_chain_list_get(h, table->name, table->chains[i].name);
+		if (!list)
+			continue;
 
 		c = nftnl_chain_list_lookup_byname(list, table->chains[i].name);
 		if (c != NULL)
@@ -1374,7 +1374,8 @@ static int nftnl_chain_list_cb(const struct nlmsghdr *nlh, void *data)
 {
 	struct nft_handle *h = data;
 	const struct builtin_table *t;
-	struct nftnl_chain *c;
+	struct nftnl_chain *c, *c2;
+	const char *tname, *cname;
 
 	c = nftnl_chain_alloc();
 	if (c == NULL)
@@ -1383,11 +1384,17 @@ static int nftnl_chain_list_cb(const struct nlmsghdr *nlh, void *data)
 	if (nftnl_chain_nlmsg_parse(nlh, c) < 0)
 		goto out;
 
-	t = nft_table_builtin_find(h,
-			nftnl_chain_get_str(c, NFTNL_CHAIN_TABLE));
+	tname = nftnl_chain_get_str(c, NFTNL_CHAIN_TABLE);
+	t = nft_table_builtin_find(h, tname);
 	if (!t)
 		goto out;
 
+	cname = nftnl_chain_get_str(c, NFTNL_CHAIN_NAME);
+	c2 = nftnl_chain_list_lookup_byname(h->cache->table[t->type].chains,
+					    cname);
+	if (c2)
+		goto out;
+
 	nftnl_chain_list_add_tail(c, h->cache->table[t->type].chains);
 
 	return MNL_CB_OK;
@@ -1447,7 +1454,8 @@ static int fetch_chain_cache(struct nft_handle *h, const char *table, const char
 	struct nlmsghdr *nlh;
 	int i, ret;
 
-	fetch_table_cache(h);
+	if (!h->cache->tables)
+		fetch_table_cache(h);
 
 	for (i = 0; i < NFT_TABLE_MAX; i++) {
 		enum nft_table_type type = h->tables[i].type;
@@ -1462,10 +1470,9 @@ static int fetch_chain_cache(struct nft_handle *h, const char *table, const char
 		if (chain && table && strcmp(table, h->tables[i].name))
 			continue;
 
-		if (h->cache->table[type].chains)
-			nftnl_chain_list_free(h->cache->table[type].chains);
+		if (!h->cache->table[type].chains)
+			h->cache->table[type].chains = nftnl_chain_list_alloc();
 
-		h->cache->table[type].chains = nftnl_chain_list_alloc();
 		if (!h->cache->table[type].chains)
 			return -1;
 	}
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [iptables PATCH 02/14] tests/shell: Speed up ipt-restore/0004-restore-race_0
  2019-09-16 16:49 ` [iptables PATCH 02/14] tests/shell: Speed up ipt-restore/0004-restore-race_0 Phil Sutter
@ 2019-09-16 17:35   ` Florian Westphal
  0 siblings, 0 replies; 26+ messages in thread
From: Florian Westphal @ 2019-09-16 17:35 UTC (permalink / raw)
  To: Phil Sutter; +Cc: Pablo Neira Ayuso, netfilter-devel

Phil Sutter <phil@nwl.cc> wrote:
> This test tended to cause quite excessive load on my system, sometimes
> taking longer than all other tests combined. Even with the reduced
> numbers, it still fails reliably after reverting commit 58d7de0181f61
> ("xtables: handle concurrent ruleset modifications").

Acked-by: Florian Westphal <fw@strlen.de>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [iptables PATCH 07/14] nft Increase mnl_talk() receive buffer size
  2019-09-16 16:49 ` [iptables PATCH 07/14] nft Increase mnl_talk() receive buffer size Phil Sutter
@ 2019-09-17  5:00   ` Pablo Neira Ayuso
  2019-09-17 14:08     ` Phil Sutter
  2019-09-20 11:13   ` Pablo Neira Ayuso
  1 sibling, 1 reply; 26+ messages in thread
From: Pablo Neira Ayuso @ 2019-09-17  5:00 UTC (permalink / raw)
  To: Phil Sutter; +Cc: netfilter-devel

On Mon, Sep 16, 2019 at 06:49:53PM +0200, Phil Sutter wrote:
> This improves cache population quite a bit and therefore helps when
> dealing with large rulesets. A simple hard to improve use-case is
> listing the last rule in a large chain.

You might consider extending the netlink interface too for this
particularly case, GETRULE plus position attribute could be used for
this I think. You won't be able to use this new operation from
userspace anytime soon though, given there is no way so far to expose
interface capabilities so far rather than probing.

If there are more particular corner cases like this, I would also
encourage to extend the netlink interface.

Just a side note, not a comment specifically on this patch :-).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [iptables PATCH 07/14] nft Increase mnl_talk() receive buffer size
  2019-09-17  5:00   ` Pablo Neira Ayuso
@ 2019-09-17 14:08     ` Phil Sutter
  0 siblings, 0 replies; 26+ messages in thread
From: Phil Sutter @ 2019-09-17 14:08 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Hi Pablo,

On Tue, Sep 17, 2019 at 07:00:38AM +0200, Pablo Neira Ayuso wrote:
> On Mon, Sep 16, 2019 at 06:49:53PM +0200, Phil Sutter wrote:
> > This improves cache population quite a bit and therefore helps when
> > dealing with large rulesets. A simple hard to improve use-case is
> > listing the last rule in a large chain.
> 
> You might consider extending the netlink interface too for this
> particularly case, GETRULE plus position attribute could be used for
> this I think. You won't be able to use this new operation from
> userspace anytime soon though, given there is no way so far to expose
> interface capabilities so far rather than probing.
> 
> If there are more particular corner cases like this, I would also
> encourage to extend the netlink interface.
> 
> Just a side note, not a comment specifically on this patch :-).

Thanks for suggesting, I didn't consider extending kernel to support the
index stuff yet. In general, I refrained from kernel changes simply
because of the compat problem. Implementing failure tolerance can
quickly turn into a mess, too.

Cheers, Phil

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [iptables PATCH 01/14] tests/shell: Make ebtables-basic test more verbose
  2019-09-16 16:49 ` [iptables PATCH 01/14] tests/shell: Make ebtables-basic test more verbose Phil Sutter
@ 2019-09-20 10:32   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 26+ messages in thread
From: Pablo Neira Ayuso @ 2019-09-20 10:32 UTC (permalink / raw)
  To: Phil Sutter; +Cc: netfilter-devel

On Mon, Sep 16, 2019 at 06:49:47PM +0200, Phil Sutter wrote:
> Print expected entries count if it doesn't match.
> 
> Signed-off-by: Phil Sutter <phil@nwl.cc>

Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [iptables PATCH 03/14] DEBUG: Print to stderr to not disturb iptables-save
  2019-09-16 16:49 ` [iptables PATCH 03/14] DEBUG: Print to stderr to not disturb iptables-save Phil Sutter
@ 2019-09-20 10:32   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 26+ messages in thread
From: Pablo Neira Ayuso @ 2019-09-20 10:32 UTC (permalink / raw)
  To: Phil Sutter; +Cc: netfilter-devel

On Mon, Sep 16, 2019 at 06:49:49PM +0200, Phil Sutter wrote:
> This way there's at least a chance to get meaningful results from
> testsuite with debugging being turned on.
> 
> Signed-off-by: Phil Sutter <phil@nwl.cc>

Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [iptables PATCH 04/14] nft: Use nftnl_*_set_str() functions
  2019-09-16 16:49 ` [iptables PATCH 04/14] nft: Use nftnl_*_set_str() functions Phil Sutter
@ 2019-09-20 10:33   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 26+ messages in thread
From: Pablo Neira Ayuso @ 2019-09-20 10:33 UTC (permalink / raw)
  To: Phil Sutter; +Cc: netfilter-devel

On Mon, Sep 16, 2019 at 06:49:50PM +0200, Phil Sutter wrote:
> Although it doesn't make a difference in practice, they are the correct
> API functions to use when assigning string attributes.
> 
> While doing so, also drop the needless casts to non-const.
> 
> Signed-off-by: Phil Sutter <phil@nwl.cc>

Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [iptables PATCH 05/14] nft: Introduce nft_bridge_commit()
  2019-09-16 16:49 ` [iptables PATCH 05/14] nft: Introduce nft_bridge_commit() Phil Sutter
@ 2019-09-20 10:36   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 26+ messages in thread
From: Pablo Neira Ayuso @ 2019-09-20 10:36 UTC (permalink / raw)
  To: Phil Sutter; +Cc: netfilter-devel

On Mon, Sep 16, 2019 at 06:49:51PM +0200, Phil Sutter wrote:
> No need to check family value from nft_commit() if we can have a
> dedicated callback for bridge family.
> 
> Signed-off-by: Phil Sutter <phil@nwl.cc>

Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [iptables PATCH 07/14] nft Increase mnl_talk() receive buffer size
  2019-09-16 16:49 ` [iptables PATCH 07/14] nft Increase mnl_talk() receive buffer size Phil Sutter
  2019-09-17  5:00   ` Pablo Neira Ayuso
@ 2019-09-20 11:13   ` Pablo Neira Ayuso
  2019-09-23 16:46     ` Phil Sutter
  1 sibling, 1 reply; 26+ messages in thread
From: Pablo Neira Ayuso @ 2019-09-20 11:13 UTC (permalink / raw)
  To: Phil Sutter; +Cc: netfilter-devel

On Mon, Sep 16, 2019 at 06:49:53PM +0200, Phil Sutter wrote:
> This improves cache population quite a bit and therefore helps when
> dealing with large rulesets. A simple hard to improve use-case is
> listing the last rule in a large chain. These are the average program
> run times depending on number of rules:
> 
> rule count	| legacy	| nft old	| nft new
> ---------------------------------------------------------
>  50,000		| .052s		| .611s		| .406s
> 100,000		| .115s		| 2.12s		| 1.24s
> 150,000		| .265s		| 7.63s		| 4.14s
> 200,000		| .411s		| 21.0s		| 10.6s
> 
> So while legacy iptables is still magnitudes faster, this simple change
> doubles iptables-nft performance in ideal cases.
> 
> Note that increasing the buffer even further didn't improve performance
> anymore, so 32KB seems to be an upper limit in kernel space.

Here are the details for this 32 KB number:

commit d35c99ff77ecb2eb239731b799386f3b3637a31e
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Oct 6 04:13:18 2016 +0900

    netlink: do not enter direct reclaim from netlink_dump()

iproute2 is also using 32 KBytes buffer, in case you want to append
this to your commit description before pushing this out.

> Signed-off-by: Phil Sutter <phil@nwl.cc>

Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [iptables PATCH 08/14] xtables-restore: Avoid cache population when flushing
  2019-09-16 16:49 ` [iptables PATCH 08/14] xtables-restore: Avoid cache population when flushing Phil Sutter
@ 2019-09-20 11:57   ` Pablo Neira Ayuso
  2019-09-24 14:43     ` Phil Sutter
  0 siblings, 1 reply; 26+ messages in thread
From: Pablo Neira Ayuso @ 2019-09-20 11:57 UTC (permalink / raw)
  To: Phil Sutter; +Cc: netfilter-devel

Hi Phil,

On Mon, Sep 16, 2019 at 06:49:54PM +0200, Phil Sutter wrote:
> When called without --noflush, don't fetch full ruleset from kernel but
> merely list of tables and the current genid. Locally, initialize chain
> lists and set have_cache to simulate an empty ruleset.
> 
> Doing so reduces execution time significantly if a large ruleset exists
> in kernel space. A simple test case consisting of a dump with 100,000
> rules can be restored within 15s on my testing VM. Restoring it a second
> time took 21s before this patch and only 17s after it.
> 
> Signed-off-by: Phil Sutter <phil@nwl.cc>
> ---
>  iptables/nft.c             | 27 ++++++++++++++++++++++++++-
>  iptables/nft.h             |  1 +
>  iptables/xtables-restore.c |  7 +++++--
>  3 files changed, 32 insertions(+), 3 deletions(-)
> 
> diff --git a/iptables/nft.c b/iptables/nft.c
> index 7f0f9e1234ae4..820f3392dd495 100644
> --- a/iptables/nft.c
> +++ b/iptables/nft.c
> @@ -882,7 +882,8 @@ static int flush_cache(struct nft_cache *c, const struct builtin_table *tables,
>  		nftnl_chain_list_free(c->table[i].chains);
>  		c->table[i].chains = NULL;
>  	}
> -	nftnl_table_list_free(c->tables);
> +	if (c->tables)
> +		nftnl_table_list_free(c->tables);
>  	c->tables = NULL;
>  
>  	return 1;
> @@ -1617,6 +1618,30 @@ void nft_build_cache(struct nft_handle *h)
>  		__nft_build_cache(h);
>  }
>  
> +void nft_fake_cache(struct nft_handle *h)
> +{
> +	int i;
> +
> +	if (h->have_cache)
> +		return;
> +
> +	/* fetch tables so conditional table delete logic works */
> +	if (!h->cache->tables)
> +		fetch_table_cache(h);
> +
> +	for (i = 0; i < NFT_TABLE_MAX; i++) {
> +		enum nft_table_type type = h->tables[i].type;
> +
> +		if (!h->tables[i].name ||
> +		    h->cache->table[type].chains)
> +			continue;
> +
> +		h->cache->table[type].chains = nftnl_chain_list_alloc();
> +	}
> +	mnl_genid_get(h, &h->nft_genid);
> +	h->have_cache = true;
> +}

Looking at patches from 8/24 onwards, I think it's time to introduce
cache flags logic to iptables.

In patch 9/14 there is a new field have_chain_cache.

In patch 10/14 there is have_rule_cache.

Then moving on, the logic is based on checking for this booleans and
then checking if the caches are initialized or not.

I think if we move towards cache flags logic (the flags tell us what
if we need no cache, partial cache (only tables, tables + chains) or
full cache.

This would make this logic easier to understand and more maintainable.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [iptables PATCH 07/14] nft Increase mnl_talk() receive buffer size
  2019-09-20 11:13   ` Pablo Neira Ayuso
@ 2019-09-23 16:46     ` Phil Sutter
  0 siblings, 0 replies; 26+ messages in thread
From: Phil Sutter @ 2019-09-23 16:46 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Hi Pablo,

On Fri, Sep 20, 2019 at 01:13:29PM +0200, Pablo Neira Ayuso wrote:
> On Mon, Sep 16, 2019 at 06:49:53PM +0200, Phil Sutter wrote:
> > This improves cache population quite a bit and therefore helps when
> > dealing with large rulesets. A simple hard to improve use-case is
> > listing the last rule in a large chain. These are the average program
> > run times depending on number of rules:
> > 
> > rule count	| legacy	| nft old	| nft new
> > ---------------------------------------------------------
> >  50,000		| .052s		| .611s		| .406s
> > 100,000		| .115s		| 2.12s		| 1.24s
> > 150,000		| .265s		| 7.63s		| 4.14s
> > 200,000		| .411s		| 21.0s		| 10.6s
> > 
> > So while legacy iptables is still magnitudes faster, this simple change
> > doubles iptables-nft performance in ideal cases.
> > 
> > Note that increasing the buffer even further didn't improve performance
> > anymore, so 32KB seems to be an upper limit in kernel space.
> 
> Here are the details for this 32 KB number:

Thanks for those!

[...]
> iproute2 is also using 32 KBytes buffer, in case you want to append
> this to your commit description before pushing this out.

Commit message adjusted and pushed.

Thanks, Phil

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [iptables PATCH 08/14] xtables-restore: Avoid cache population when flushing
  2019-09-20 11:57   ` Pablo Neira Ayuso
@ 2019-09-24 14:43     ` Phil Sutter
  0 siblings, 0 replies; 26+ messages in thread
From: Phil Sutter @ 2019-09-24 14:43 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Hi Pablo,

On Fri, Sep 20, 2019 at 01:57:02PM +0200, Pablo Neira Ayuso wrote:
[...]
> Looking at patches from 8/24 onwards, I think it's time to introduce
> cache flags logic to iptables.
> 
> In patch 9/14 there is a new field have_chain_cache.
> 
> In patch 10/14 there is have_rule_cache.
> 
> Then moving on, the logic is based on checking for this booleans and
> then checking if the caches are initialized or not.
> 
> I think if we move towards cache flags logic (the flags tell us what
> if we need no cache, partial cache (only tables, tables + chains) or
> full cache.
> 
> This would make this logic easier to understand and more maintainable.

I am not entirely sure this is a feasible approach:

On one hand, we certainly can introduce cache "levels" to distinguish
between:

- no cache
- tables
- chains
- rules

simply because all these naturally build upon each others. On the other
hand, in my series I pushed the envelope (and watched it bend) a bit
further: There are basically two modes of caching:

- "traditional" breadth-first, compatible with above: e.g. all tables,
  if needed all chains as well, etc.

- a new depth-first mode which allows to fetch e.g. a certain chain's
  rules but no other chains/rules.

While the first mode is fine, second one really gives us the edge in
situations where legacy iptables is faster simply because "number
crunching" is more optimized there.

This second mode doesn't have explicit completion indicators like the
first one (with 'have_cache', etc.). In fact, the code treats it like a
fire-and-forget situation: If e.g. the same chain is requested twice,
the second time cache is simply fetched again but nftnl_chain_list_cb()
discards the fetched chain if one with same name already exists in
table's chain list.

Actually, the only application which requires more attention is
xtables-restore with --noflush. It is able to run multiple commands
consecutively and these may need kernel ruleset as context. Still we
don't want to fetch the full kernel ruleset upon each invocation because
it's how users speed up their ruleset manipulations.

In summary, things boil down to the following options:

A) Avoid caching, fetch only the most necessary things to do the job.
B) Build a full cache if needed anyway or if we can't predict.
C) Create a fake cache if we know kernel's ruleset is irrelevant.

I'll give it another try, aligning cache update logic to the above and
see if things become clean enough to be considered maintainable.

Cheers, Phil

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2019-09-24 14:43 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-16 16:49 [iptables PATCH 00/14] Improve iptables-nft performance with large rulesets Phil Sutter
2019-09-16 16:49 ` [iptables PATCH 01/14] tests/shell: Make ebtables-basic test more verbose Phil Sutter
2019-09-20 10:32   ` Pablo Neira Ayuso
2019-09-16 16:49 ` [iptables PATCH 02/14] tests/shell: Speed up ipt-restore/0004-restore-race_0 Phil Sutter
2019-09-16 17:35   ` Florian Westphal
2019-09-16 16:49 ` [iptables PATCH 03/14] DEBUG: Print to stderr to not disturb iptables-save Phil Sutter
2019-09-20 10:32   ` Pablo Neira Ayuso
2019-09-16 16:49 ` [iptables PATCH 04/14] nft: Use nftnl_*_set_str() functions Phil Sutter
2019-09-20 10:33   ` Pablo Neira Ayuso
2019-09-16 16:49 ` [iptables PATCH 05/14] nft: Introduce nft_bridge_commit() Phil Sutter
2019-09-20 10:36   ` Pablo Neira Ayuso
2019-09-16 16:49 ` [iptables PATCH 06/14] nft: Fix for add and delete of same rule in single batch Phil Sutter
2019-09-16 16:49 ` [iptables PATCH 07/14] nft Increase mnl_talk() receive buffer size Phil Sutter
2019-09-17  5:00   ` Pablo Neira Ayuso
2019-09-17 14:08     ` Phil Sutter
2019-09-20 11:13   ` Pablo Neira Ayuso
2019-09-23 16:46     ` Phil Sutter
2019-09-16 16:49 ` [iptables PATCH 08/14] xtables-restore: Avoid cache population when flushing Phil Sutter
2019-09-20 11:57   ` Pablo Neira Ayuso
2019-09-24 14:43     ` Phil Sutter
2019-09-16 16:49 ` [iptables PATCH 09/14] nft: Rename have_cache into have_chain_cache Phil Sutter
2019-09-16 16:49 ` [iptables PATCH 10/14] nft: Fetch rule cache only if needed Phil Sutter
2019-09-16 16:49 ` [iptables PATCH 11/14] nft: Allow to fetch only a specific chain from kernel Phil Sutter
2019-09-16 16:49 ` [iptables PATCH 12/14] nft: Support fetching rules for a single chain only Phil Sutter
2019-09-16 16:49 ` [iptables PATCH 13/14] nft: Optimize flushing all chains of a table Phil Sutter
2019-09-16 16:50 ` [iptables PATCH 14/14] nft: Reduce impact of nft_chain_builtin_init() Phil Sutter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).