All of lore.kernel.org
 help / color / mirror / Atom feed
* [iptables PATCH 0/4] Fix for iptables-nft-restore under pressure
@ 2020-03-02 17:53 Phil Sutter
  2020-03-02 17:53 ` [iptables PATCH 1/4] nft: cache: Fix nft_release_cache() under stress Phil Sutter
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Phil Sutter @ 2020-03-02 17:53 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Using a rather simple test-case, it is possible to provoke NULL-pointer
derefs in iptables-nft-restore.

Said test-case involves a rule set with a thousand custom chains in each
table, a thousand rules in each builtin chain and one rule in each
custom chain - details are not important though, it is enough to have
reasonably large tables to cause delays.

The test script simply starts ten instances of iptables-nft-restore in
background and ten instances in a loop in foreground, all reading above
rule set.

Critical detail is iptables-nft-restore pushing to kernel at each COMMIT
line, so nft_rebuild_cache() may run multiple times during a single
restore.

The actual fix is contained in patch one. Patch two is actually a
performance optimization, the behaviour it changes is not wrong per se.
Patches three and four are fall-out from the first one.

Phil Sutter (4):
  nft: cache: Fix nft_release_cache() under stress
  nft: cache: Make nft_rebuild_cache() respect fake cache
  nft: cache: Simplify chain list allocation
  nft: cache: Review flush_cache()

 iptables/nft-cache.c | 87 +++++++++++++++++++++++---------------------
 iptables/nft.h       |  3 +-
 2 files changed, 48 insertions(+), 42 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [iptables PATCH 1/4] nft: cache: Fix nft_release_cache() under stress
  2020-03-02 17:53 [iptables PATCH 0/4] Fix for iptables-nft-restore under pressure Phil Sutter
@ 2020-03-02 17:53 ` Phil Sutter
  2020-03-02 19:19   ` Pablo Neira Ayuso
  2020-03-02 17:53 ` [iptables PATCH 2/4] nft: cache: Make nft_rebuild_cache() respect fake cache Phil Sutter
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Phil Sutter @ 2020-03-02 17:53 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

iptables-nft-restore calls nft_action(h, NFT_COMPAT_COMMIT) for each
COMMIT line in input. When restoring a dump containing multiple large
tables, chances are nft_rebuild_cache() has to run multiple times.

If the above happens, consecutive table contents are added to __cache[1]
which nft_rebuild_cache() then frees, so next commit attempt accesses
invalid memory.

Fix this by making nft_release_cache() (called after each successful
commit) return things into pre-rebuild state again, but keeping the
fresh cache copy.

Fixes: f6ad231d698c7 ("nft: keep original cache in case of ERESTART")
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 iptables/nft-cache.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/iptables/nft-cache.c b/iptables/nft-cache.c
index 7345a27e2894b..6f21f2283e0fb 100644
--- a/iptables/nft-cache.c
+++ b/iptables/nft-cache.c
@@ -647,8 +647,14 @@ void nft_rebuild_cache(struct nft_handle *h)
 
 void nft_release_cache(struct nft_handle *h)
 {
-	if (h->cache_index)
-		flush_cache(h, &h->__cache[0], NULL);
+	if (!h->cache_index)
+		return;
+
+	flush_cache(h, &h->__cache[0], NULL);
+	memcpy(&h->__cache[0], &h->__cache[1], sizeof(h->__cache[0]));
+	memset(&h->__cache[1], 0, sizeof(h->__cache[1]));
+	h->cache_index = 0;
+	h->cache = &h->__cache[0];
 }
 
 struct nftnl_table_list *nftnl_table_list_get(struct nft_handle *h)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [iptables PATCH 2/4] nft: cache: Make nft_rebuild_cache() respect fake cache
  2020-03-02 17:53 [iptables PATCH 0/4] Fix for iptables-nft-restore under pressure Phil Sutter
  2020-03-02 17:53 ` [iptables PATCH 1/4] nft: cache: Fix nft_release_cache() under stress Phil Sutter
@ 2020-03-02 17:53 ` Phil Sutter
  2020-03-02 19:26   ` Pablo Neira Ayuso
  2020-03-02 17:53 ` [iptables PATCH 3/4] nft: cache: Simplify chain list allocation Phil Sutter
  2020-03-02 17:53 ` [iptables PATCH 4/4] nft: cache: Review flush_cache() Phil Sutter
  3 siblings, 1 reply; 14+ messages in thread
From: Phil Sutter @ 2020-03-02 17:53 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

If transaction needed a refresh in nft_action(), restore with flush
would fetch a full cache instead of merely refreshing table list
contained in "fake" cache.

To fix this, nft_rebuild_cache() must distinguish between fake cache and
full rule cache. Therefore introduce NFT_CL_FAKE to be distinguished
from NFT_CL_RULES.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 iptables/nft-cache.c | 11 ++++++++---
 iptables/nft.h       |  3 ++-
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/iptables/nft-cache.c b/iptables/nft-cache.c
index 6f21f2283e0fb..e1b1e89c9e0d3 100644
--- a/iptables/nft-cache.c
+++ b/iptables/nft-cache.c
@@ -484,6 +484,7 @@ retry:
 			break;
 		/* fall through */
 	case NFT_CL_RULES:
+	case NFT_CL_FAKE:
 		break;
 	}
 
@@ -528,7 +529,7 @@ void nft_fake_cache(struct nft_handle *h)
 
 		h->cache->table[type].chains = nftnl_chain_list_alloc();
 	}
-	h->cache_level = NFT_CL_RULES;
+	h->cache_level = NFT_CL_FAKE;
 	mnl_genid_get(h, &h->nft_genid);
 }
 
@@ -641,8 +642,12 @@ void nft_rebuild_cache(struct nft_handle *h)
 	if (h->cache_level)
 		__nft_flush_cache(h);
 
-	h->cache_level = NFT_CL_NONE;
-	__nft_build_cache(h, level, NULL, NULL, NULL);
+	if (h->cache_level == NFT_CL_FAKE) {
+		nft_fake_cache(h);
+	} else {
+		h->cache_level = NFT_CL_NONE;
+		__nft_build_cache(h, level, NULL, NULL, NULL);
+	}
 }
 
 void nft_release_cache(struct nft_handle *h)
diff --git a/iptables/nft.h b/iptables/nft.h
index 5cf260a6d2cd3..2094b01455194 100644
--- a/iptables/nft.h
+++ b/iptables/nft.h
@@ -32,7 +32,8 @@ enum nft_cache_level {
 	NFT_CL_TABLES,
 	NFT_CL_CHAINS,
 	NFT_CL_SETS,
-	NFT_CL_RULES
+	NFT_CL_RULES,
+	NFT_CL_FAKE	/* must be last entry */
 };
 
 struct nft_cache {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [iptables PATCH 3/4] nft: cache: Simplify chain list allocation
  2020-03-02 17:53 [iptables PATCH 0/4] Fix for iptables-nft-restore under pressure Phil Sutter
  2020-03-02 17:53 ` [iptables PATCH 1/4] nft: cache: Fix nft_release_cache() under stress Phil Sutter
  2020-03-02 17:53 ` [iptables PATCH 2/4] nft: cache: Make nft_rebuild_cache() respect fake cache Phil Sutter
@ 2020-03-02 17:53 ` Phil Sutter
  2020-03-02 17:53 ` [iptables PATCH 4/4] nft: cache: Review flush_cache() Phil Sutter
  3 siblings, 0 replies; 14+ messages in thread
From: Phil Sutter @ 2020-03-02 17:53 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Allocate chain lists right after fetching table cache, regardless of
whether partial cache is fetched or not. Chain list pointers reside in
struct nft_cache's table array and hence are present irrespective of
actual tables in kernel. Given the small number of tables, there wasn't
much overhead avoided by the conditional in fetch_chain_cache().

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 iptables/nft-cache.c | 46 ++++++++++++++++++--------------------------
 1 file changed, 19 insertions(+), 27 deletions(-)

diff --git a/iptables/nft-cache.c b/iptables/nft-cache.c
index e1b1e89c9e0d3..0429fb32f2ed0 100644
--- a/iptables/nft-cache.c
+++ b/iptables/nft-cache.c
@@ -107,6 +107,23 @@ static int fetch_table_cache(struct nft_handle *h)
 	return 1;
 }
 
+static int init_chain_cache(struct nft_handle *h)
+{
+	int i;
+
+	for (i = 0; i < NFT_TABLE_MAX; i++) {
+		enum nft_table_type type = h->tables[i].type;
+
+		if (!h->tables[i].name)
+			continue;
+
+		h->cache->table[type].chains = nftnl_chain_list_alloc();
+		if (!h->cache->table[type].chains)
+			return -1;
+	}
+	return 0;
+}
+
 struct nftnl_chain_list_cb_data {
 	struct nft_handle *h;
 	const struct builtin_table *t;
@@ -316,26 +333,6 @@ static int fetch_chain_cache(struct nft_handle *h,
 	struct nlmsghdr *nlh;
 	int i, ret;
 
-	if (!t) {
-		for (i = 0; i < NFT_TABLE_MAX; i++) {
-			enum nft_table_type type = h->tables[i].type;
-
-			if (!h->tables[i].name)
-				continue;
-
-			if (h->cache->table[type].chains)
-				continue;
-
-			h->cache->table[type].chains = nftnl_chain_list_alloc();
-			if (!h->cache->table[type].chains)
-				return -1;
-		}
-	} else if (!h->cache->table[t->type].chains) {
-		h->cache->table[t->type].chains = nftnl_chain_list_alloc();
-		if (!h->cache->table[t->type].chains)
-			return -1;
-	}
-
 	if (t && chain) {
 		struct nftnl_chain *c = nftnl_chain_alloc();
 
@@ -465,6 +462,7 @@ retry:
 	switch (h->cache_level) {
 	case NFT_CL_NONE:
 		fetch_table_cache(h);
+		init_chain_cache(h);
 		if (level == NFT_CL_TABLES)
 			break;
 		/* fall through */
@@ -521,14 +519,8 @@ void nft_fake_cache(struct nft_handle *h)
 	int i;
 
 	fetch_table_cache(h);
-	for (i = 0; i < NFT_TABLE_MAX; i++) {
-		enum nft_table_type type = h->tables[i].type;
+	init_chain_cache(h);
 
-		if (!h->tables[i].name)
-			continue;
-
-		h->cache->table[type].chains = nftnl_chain_list_alloc();
-	}
 	h->cache_level = NFT_CL_FAKE;
 	mnl_genid_get(h, &h->nft_genid);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [iptables PATCH 4/4] nft: cache: Review flush_cache()
  2020-03-02 17:53 [iptables PATCH 0/4] Fix for iptables-nft-restore under pressure Phil Sutter
                   ` (2 preceding siblings ...)
  2020-03-02 17:53 ` [iptables PATCH 3/4] nft: cache: Simplify chain list allocation Phil Sutter
@ 2020-03-02 17:53 ` Phil Sutter
  2020-03-02 19:22   ` Pablo Neira Ayuso
  3 siblings, 1 reply; 14+ messages in thread
From: Phil Sutter @ 2020-03-02 17:53 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

While fixing for iptables-nft-restore under stress, I managed to hit
NULL-pointer deref in flush_cache(). Given that nftnl_*_list_free()
functions are not NULL-pointer tolerant, better make sure such are not
passed by accident.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 iptables/nft-cache.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/iptables/nft-cache.c b/iptables/nft-cache.c
index 0429fb32f2ed0..0dd131e1f70f5 100644
--- a/iptables/nft-cache.c
+++ b/iptables/nft-cache.c
@@ -603,17 +603,19 @@ static int flush_cache(struct nft_handle *h, struct nft_cache *c,
 		if (h->tables[i].name == NULL)
 			continue;
 
-		if (!c->table[i].chains)
-			continue;
-
-		nftnl_chain_list_free(c->table[i].chains);
-		c->table[i].chains = NULL;
-		if (c->table[i].sets)
+		if (c->table[i].chains) {
+			nftnl_chain_list_free(c->table[i].chains);
+			c->table[i].chains = NULL;
+		}
+		if (c->table[i].sets) {
 			nftnl_set_list_free(c->table[i].sets);
-		c->table[i].sets = NULL;
+			c->table[i].sets = NULL;
+		}
+	}
+	if (c->tables) {
+		nftnl_table_list_free(c->tables);
+		c->tables = NULL;
 	}
-	nftnl_table_list_free(c->tables);
-	c->tables = NULL;
 
 	return 1;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [iptables PATCH 1/4] nft: cache: Fix nft_release_cache() under stress
  2020-03-02 17:53 ` [iptables PATCH 1/4] nft: cache: Fix nft_release_cache() under stress Phil Sutter
@ 2020-03-02 19:19   ` Pablo Neira Ayuso
  2020-03-03  1:02     ` Phil Sutter
  0 siblings, 1 reply; 14+ messages in thread
From: Pablo Neira Ayuso @ 2020-03-02 19:19 UTC (permalink / raw)
  To: Phil Sutter; +Cc: netfilter-devel

On Mon, Mar 02, 2020 at 06:53:55PM +0100, Phil Sutter wrote:
> iptables-nft-restore calls nft_action(h, NFT_COMPAT_COMMIT) for each
> COMMIT line in input. When restoring a dump containing multiple large
> tables, chances are nft_rebuild_cache() has to run multiple times.

Then, fix nft_rebuild_cache() please.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [iptables PATCH 4/4] nft: cache: Review flush_cache()
  2020-03-02 17:53 ` [iptables PATCH 4/4] nft: cache: Review flush_cache() Phil Sutter
@ 2020-03-02 19:22   ` Pablo Neira Ayuso
  2020-03-03  1:22     ` Phil Sutter
  0 siblings, 1 reply; 14+ messages in thread
From: Pablo Neira Ayuso @ 2020-03-02 19:22 UTC (permalink / raw)
  To: Phil Sutter; +Cc: netfilter-devel

On Mon, Mar 02, 2020 at 06:53:58PM +0100, Phil Sutter wrote:
> While fixing for iptables-nft-restore under stress, I managed to hit
> NULL-pointer deref in flush_cache(). Given that nftnl_*_list_free()
> functions are not NULL-pointer tolerant, better make sure such are not
> passed by accident.

Could you explain what sequence is triggering the NULL-pointer
dereference?

Thank you.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [iptables PATCH 2/4] nft: cache: Make nft_rebuild_cache() respect fake cache
  2020-03-02 17:53 ` [iptables PATCH 2/4] nft: cache: Make nft_rebuild_cache() respect fake cache Phil Sutter
@ 2020-03-02 19:26   ` Pablo Neira Ayuso
  2020-03-03  1:15     ` Phil Sutter
  0 siblings, 1 reply; 14+ messages in thread
From: Pablo Neira Ayuso @ 2020-03-02 19:26 UTC (permalink / raw)
  To: Phil Sutter; +Cc: netfilter-devel

On Mon, Mar 02, 2020 at 06:53:56PM +0100, Phil Sutter wrote:
> If transaction needed a refresh in nft_action(), restore with flush
> would fetch a full cache instead of merely refreshing table list
> contained in "fake" cache.
> 
> To fix this, nft_rebuild_cache() must distinguish between fake cache and
> full rule cache. Therefore introduce NFT_CL_FAKE to be distinguished
> from NFT_CL_RULES.

Please, refresh me: Why do we need this "fake cache" in first place?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [iptables PATCH 1/4] nft: cache: Fix nft_release_cache() under stress
  2020-03-02 19:19   ` Pablo Neira Ayuso
@ 2020-03-03  1:02     ` Phil Sutter
  2020-03-03 20:55       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 14+ messages in thread
From: Phil Sutter @ 2020-03-03  1:02 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Hi Pablo,

On Mon, Mar 02, 2020 at 08:19:30PM +0100, Pablo Neira Ayuso wrote:
> On Mon, Mar 02, 2020 at 06:53:55PM +0100, Phil Sutter wrote:
> > iptables-nft-restore calls nft_action(h, NFT_COMPAT_COMMIT) for each
> > COMMIT line in input. When restoring a dump containing multiple large
> > tables, chances are nft_rebuild_cache() has to run multiple times.
> 
> Then, fix nft_rebuild_cache() please.

This is not the right place to fix the problem: nft_rebuild_cache()
simply rebuilds the cache, switching to a secondary instance if not done
so before to avoid freeing objects referenced from batch jobs.

When creating batch jobs (e.g., adding a rule or chain), code is not
aware of which cache instance is currently in use. It will just add
those objects to nft_handle->cache pointer.

It is the job of nft_release_cache() to return things back to normal
after each COMMIT line, which includes restoring nft_handle->cache
pointer to point at first cache instance.

If you see a flaw in my reasoning, I'm all ears. Also, if you see a
better solution, please elaborate - IMO, nft_release_cache() should undo
what nft_rebuild_cache() may have done. From nft_action() perspective,
they are related.

Cheers, Phil

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [iptables PATCH 2/4] nft: cache: Make nft_rebuild_cache() respect fake cache
  2020-03-02 19:26   ` Pablo Neira Ayuso
@ 2020-03-03  1:15     ` Phil Sutter
  0 siblings, 0 replies; 14+ messages in thread
From: Phil Sutter @ 2020-03-03  1:15 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Hi Pablo,

On Mon, Mar 02, 2020 at 08:26:04PM +0100, Pablo Neira Ayuso wrote:
> On Mon, Mar 02, 2020 at 06:53:56PM +0100, Phil Sutter wrote:
> > If transaction needed a refresh in nft_action(), restore with flush
> > would fetch a full cache instead of merely refreshing table list
> > contained in "fake" cache.
> > 
> > To fix this, nft_rebuild_cache() must distinguish between fake cache and
> > full rule cache. Therefore introduce NFT_CL_FAKE to be distinguished
> > from NFT_CL_RULES.
> 
> Please, refresh me: Why do we need this "fake cache" in first place?

In short: It is a middle-ground between needlessly fetching a full cache
and hitting ENOENT because we may not delete a table that doesn't exist.

Long version:

A) Full cache is not needed for iptables-nft-restore without --noflush.
   It is supposed to drop whatever is there and push the rule set it is
   fed with. Yet it shall only affect its "own" tables, so simple 'flush
   ruleset' at start of transaction is not OK.

B) Simple 'delete table' at each '*table' line may cause ENOENT if table
   does not exist, so list of existing tables must be fetched from
   kernel. Since that may change, the whole nft_rebuild_cache() thing
   was created.

At NFWS we discussed 'create'/'destroy' commands as alternatives to
'add'/'delete' which cause errors if existing/missing and change the
latter to not do that. With this in place, iptables-nft-restore could
get by without a cache at all. Another option would be to do a sequence
of add/delete/add for each table line which works because 'add' command
is accepted even if table already exists.

Cheers, Phil

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [iptables PATCH 4/4] nft: cache: Review flush_cache()
  2020-03-02 19:22   ` Pablo Neira Ayuso
@ 2020-03-03  1:22     ` Phil Sutter
  0 siblings, 0 replies; 14+ messages in thread
From: Phil Sutter @ 2020-03-03  1:22 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Hi Pablo,

On Mon, Mar 02, 2020 at 08:22:08PM +0100, Pablo Neira Ayuso wrote:
> On Mon, Mar 02, 2020 at 06:53:58PM +0100, Phil Sutter wrote:
> > While fixing for iptables-nft-restore under stress, I managed to hit
> > NULL-pointer deref in flush_cache(). Given that nftnl_*_list_free()
> > functions are not NULL-pointer tolerant, better make sure such are not
> > passed by accident.
> 
> Could you explain what sequence is triggering the NULL-pointer
> dereference?

I don't think it is possible to trigger with current upstream code. I
hit it while trying to find a fix for the bug described in patch 1, but
it was different code. So technically, this is fixing for a problem that
doesn't exist. If you therefore consider this change worthless, I'm
absolutely fine with dropping it. My motivation to submit it was that it
makes flush_cache() behave sane even in odd circumstances.

Cheers, Phil

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [iptables PATCH 1/4] nft: cache: Fix nft_release_cache() under stress
  2020-03-03  1:02     ` Phil Sutter
@ 2020-03-03 20:55       ` Pablo Neira Ayuso
  2020-03-04  2:13         ` Phil Sutter
  0 siblings, 1 reply; 14+ messages in thread
From: Pablo Neira Ayuso @ 2020-03-03 20:55 UTC (permalink / raw)
  To: Phil Sutter, netfilter-devel

On Tue, Mar 03, 2020 at 02:02:52AM +0100, Phil Sutter wrote:
> Hi Pablo,
> 
> On Mon, Mar 02, 2020 at 08:19:30PM +0100, Pablo Neira Ayuso wrote:
> > On Mon, Mar 02, 2020 at 06:53:55PM +0100, Phil Sutter wrote:
> > > iptables-nft-restore calls nft_action(h, NFT_COMPAT_COMMIT) for each
> > > COMMIT line in input. When restoring a dump containing multiple large
> > > tables, chances are nft_rebuild_cache() has to run multiple times.

It is true that chances that this code runs multiple times since the
new fine-grain caching logic is in place.

> > Then, fix nft_rebuild_cache() please.
> 
> This is not the right place to fix the problem: nft_rebuild_cache()
> simply rebuilds the cache, switching to a secondary instance if not done
> so before to avoid freeing objects referenced from batch jobs.
> 
> When creating batch jobs (e.g., adding a rule or chain), code is not
> aware of which cache instance is currently in use. It will just add
> those objects to nft_handle->cache pointer.
> 
> It is the job of nft_release_cache() to return things back to normal
> after each COMMIT line, which includes restoring nft_handle->cache
> pointer to point at first cache instance.
> 
> If you see a flaw in my reasoning, I'm all ears. Also, if you see a
> better solution, please elaborate - IMO, nft_release_cache() should undo
> what nft_rebuild_cache() may have done. From nft_action() perspective,
> they are related.

Would this patch still work after this series are applied:

https://patchwork.ozlabs.org/project/netfilter-devel/list/?series=151404

That is working and passing tests. It is just missing the code to
restore the fine grain dumping, that should be easy to add.

That logic will really reduce the chances to exercise all this cache
dump / cache cancel. Bugs in this cache consistency code is usually
not that easy to trigger and usually hard to fix.

I just think it would be a pity if that work ends up in the trash can.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [iptables PATCH 1/4] nft: cache: Fix nft_release_cache() under stress
  2020-03-03 20:55       ` Pablo Neira Ayuso
@ 2020-03-04  2:13         ` Phil Sutter
  2020-03-04 17:02           ` Pablo Neira Ayuso
  0 siblings, 1 reply; 14+ messages in thread
From: Phil Sutter @ 2020-03-04  2:13 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Hi Pablo,

On Tue, Mar 03, 2020 at 09:55:54PM +0100, Pablo Neira Ayuso wrote:
> On Tue, Mar 03, 2020 at 02:02:52AM +0100, Phil Sutter wrote:
> > Hi Pablo,
> > 
> > On Mon, Mar 02, 2020 at 08:19:30PM +0100, Pablo Neira Ayuso wrote:
> > > On Mon, Mar 02, 2020 at 06:53:55PM +0100, Phil Sutter wrote:
> > > > iptables-nft-restore calls nft_action(h, NFT_COMPAT_COMMIT) for each
> > > > COMMIT line in input. When restoring a dump containing multiple large
> > > > tables, chances are nft_rebuild_cache() has to run multiple times.
> 
> It is true that chances that this code runs multiple times since the
> new fine-grain caching logic is in place.

AFAICT, this is not related to granularity of caching logic. The crux is
that your fix of Florian's concurrency fix in commit f6ad231d698c7
("nft: keep original cache in case of ERESTART") ignores the fact that
cache may have to be rebuilt multiple times. I wasn't aware of it
either, but knowing that each COMMIT line causes a COMMIT internally
makes it obvious. Your patch adds code to increment cache_index but none
to reset it to zero.

> > > Then, fix nft_rebuild_cache() please.
> > 
> > This is not the right place to fix the problem: nft_rebuild_cache()
> > simply rebuilds the cache, switching to a secondary instance if not done
> > so before to avoid freeing objects referenced from batch jobs.
> > 
> > When creating batch jobs (e.g., adding a rule or chain), code is not
> > aware of which cache instance is currently in use. It will just add
> > those objects to nft_handle->cache pointer.
> > 
> > It is the job of nft_release_cache() to return things back to normal
> > after each COMMIT line, which includes restoring nft_handle->cache
> > pointer to point at first cache instance.
> > 
> > If you see a flaw in my reasoning, I'm all ears. Also, if you see a
> > better solution, please elaborate - IMO, nft_release_cache() should undo
> > what nft_rebuild_cache() may have done. From nft_action() perspective,
> > they are related.
> 
> Would this patch still work after this series are applied:
> 
> https://patchwork.ozlabs.org/project/netfilter-devel/list/?series=151404
> 
> That is working and passing tests. It is just missing the code to
> restore the fine grain dumping, that should be easy to add.
> 
> That logic will really reduce the chances to exercise all this cache
> dump / cache cancel. Bugs in this cache consistency code is usually
> not that easy to trigger and usually hard to fix.
> 
> I just think it would be a pity if that work ends up in the trash can.

I didn't review those patches yet, but from a quick glance it doesn't
seem to touch the problematic code around __nft_flush_cache(). Let's
make a deal: You accept my fix for the existing cache logic and I'll in
return fix your series if necessary and at least find out what needs to
be done so it doesn't cause a performance regression.

I don't veto against or sabotage your approach of separating caching
from parsing, but completing your series regarding performance as an
alternative assuming it fixes the existing bug at all is not feasible.
Therefore please let's go with a fix first and commit to cache logic
rewrite as illustrated in your patches.

Cheers, Phil

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [iptables PATCH 1/4] nft: cache: Fix nft_release_cache() under stress
  2020-03-04  2:13         ` Phil Sutter
@ 2020-03-04 17:02           ` Pablo Neira Ayuso
  0 siblings, 0 replies; 14+ messages in thread
From: Pablo Neira Ayuso @ 2020-03-04 17:02 UTC (permalink / raw)
  To: Phil Sutter, netfilter-devel

Hi Phil,

On Wed, Mar 04, 2020 at 03:13:34AM +0100, Phil Sutter wrote:
> Hi Pablo,
> 
> On Tue, Mar 03, 2020 at 09:55:54PM +0100, Pablo Neira Ayuso wrote:
> > On Tue, Mar 03, 2020 at 02:02:52AM +0100, Phil Sutter wrote:
> > > Hi Pablo,
> > > 
> > > On Mon, Mar 02, 2020 at 08:19:30PM +0100, Pablo Neira Ayuso wrote:
> > > > On Mon, Mar 02, 2020 at 06:53:55PM +0100, Phil Sutter wrote:
> > > > > iptables-nft-restore calls nft_action(h, NFT_COMPAT_COMMIT) for each
> > > > > COMMIT line in input. When restoring a dump containing multiple large
> > > > > tables, chances are nft_rebuild_cache() has to run multiple times.
> > 
> > It is true that chances that this code runs multiple times since the
> > new fine-grain caching logic is in place.
> 
> AFAICT, this is not related to granularity of caching logic. The crux is
> that your fix of Florian's concurrency fix in commit f6ad231d698c7
> ("nft: keep original cache in case of ERESTART") ignores the fact that
> cache may have to be rebuilt multiple times. I wasn't aware of it
> either, but knowing that each COMMIT line causes a COMMIT internally
> makes it obvious. Your patch adds code to increment cache_index but none
> to reset it to zero.

Thanks for explaining.

[...]
> > Would this patch still work after this series are applied:
> > 
> > https://patchwork.ozlabs.org/project/netfilter-devel/list/?series=151404
> > 
> > That is working and passing tests. It is just missing the code to
> > restore the fine grain dumping, that should be easy to add.
> > 
> > That logic will really reduce the chances to exercise all this cache
> > dump / cache cancel. Bugs in this cache consistency code is usually
> > not that easy to trigger and usually hard to fix.
> > 
> > I just think it would be a pity if that work ends up in the trash can.
> 
> I didn't review those patches yet, but from a quick glance it doesn't
> seem to touch the problematic code around __nft_flush_cache(). Let's
> make a deal: You accept my fix for the existing cache logic and I'll in
> return fix your series if necessary and at least find out what needs to
> be done so it doesn't cause a performance regression.

OK, deal :-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-03-04 17:02 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-02 17:53 [iptables PATCH 0/4] Fix for iptables-nft-restore under pressure Phil Sutter
2020-03-02 17:53 ` [iptables PATCH 1/4] nft: cache: Fix nft_release_cache() under stress Phil Sutter
2020-03-02 19:19   ` Pablo Neira Ayuso
2020-03-03  1:02     ` Phil Sutter
2020-03-03 20:55       ` Pablo Neira Ayuso
2020-03-04  2:13         ` Phil Sutter
2020-03-04 17:02           ` Pablo Neira Ayuso
2020-03-02 17:53 ` [iptables PATCH 2/4] nft: cache: Make nft_rebuild_cache() respect fake cache Phil Sutter
2020-03-02 19:26   ` Pablo Neira Ayuso
2020-03-03  1:15     ` Phil Sutter
2020-03-02 17:53 ` [iptables PATCH 3/4] nft: cache: Simplify chain list allocation Phil Sutter
2020-03-02 17:53 ` [iptables PATCH 4/4] nft: cache: Review flush_cache() Phil Sutter
2020-03-02 19:22   ` Pablo Neira Ayuso
2020-03-03  1:22     ` Phil Sutter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.