All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] netfilter: nf_tables: restrict expression reduction to first expression
@ 2022-05-18 10:08 Pablo Neira Ayuso
  2022-05-18 10:51 ` Phil Sutter
  0 siblings, 1 reply; 11+ messages in thread
From: Pablo Neira Ayuso @ 2022-05-18 10:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: phil, fw

Either userspace or kernelspace need to pre-fetch keys inconditionally
before comparisons for this to work. Otherwise, register tracking data
is misleading and it might result in reducing expressions which are not
yet registers.

First expression is guaranteed to be evaluated always, therefore, keep
tracking registers and restrict reduction to first expression.

Fixes: b2d306542ff9 ("netfilter: nf_tables: do not reduce read-only expressions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
@Phil, you mentioned about a way to simplify this patch, I don't see how,
just let me know.

 net/netfilter/nf_tables_api.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 16c3a39689f4..2c82fecc4094 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -8362,6 +8362,7 @@ static int nf_tables_commit_chain_prepare(struct net *net, struct nft_chain *cha
 	void *data, *data_boundary;
 	struct nft_rule_dp *prule;
 	struct nft_rule *rule;
+	bool reduce;
 
 	/* already handled or inactive chain? */
 	if (chain->blob_next || !nft_is_active_next(net, chain))
@@ -8398,12 +8399,18 @@ static int nf_tables_commit_chain_prepare(struct net *net, struct nft_chain *cha
 
 		size = 0;
 		track.last = nft_expr_last(rule);
+		reduce = true;
 		nft_rule_for_each_expr(expr, last, rule) {
 			track.cur = expr;
 
 			if (nft_expr_reduce(&track, expr)) {
-				expr = track.cur;
-				continue;
+				if (reduce) {
+					reduce = false;
+					expr = track.cur;
+					continue;
+				}
+			} else if (reduce) {
+				reduce = false;
 			}
 
 			if (WARN_ON_ONCE(data + expr->ops->size > data_boundary))
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] netfilter: nf_tables: restrict expression reduction to first expression
  2022-05-18 10:08 [PATCH] netfilter: nf_tables: restrict expression reduction to first expression Pablo Neira Ayuso
@ 2022-05-18 10:51 ` Phil Sutter
  2022-05-18 11:01   ` Pablo Neira Ayuso
  0 siblings, 1 reply; 11+ messages in thread
From: Phil Sutter @ 2022-05-18 10:51 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, fw

Hi,

On Wed, May 18, 2022 at 12:08:42PM +0200, Pablo Neira Ayuso wrote:
> Either userspace or kernelspace need to pre-fetch keys inconditionally
> before comparisons for this to work. Otherwise, register tracking data
> is misleading and it might result in reducing expressions which are not
> yet registers.
> 
> First expression is guaranteed to be evaluated always, therefore, keep
> tracking registers and restrict reduction to first expression.
> 
> Fixes: b2d306542ff9 ("netfilter: nf_tables: do not reduce read-only expressions")
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---
> @Phil, you mentioned about a way to simplify this patch, I don't see how,
> just let me know.

Not a big one. Instead of:

|	if (nft_expr_reduce(&track, expr)) {
|		if (reduce) {
|			reduce = false;
|			expr = track.cur;
|			continue;
|		}
|	} else if (reduce) {
|		reduce = false;
|	}

One could do:

|	if (nft_expr_reduce(&track, expr) && reduce) {
|		reduce = false;
|		expr = track.cur;
|		continue;
|	}
|	reduce = false;

Regarding later pre-fetching, one should distinguish between expressions
that (may) set verdict register and those that don't. There are pitfalls
though, e.g. error conditions handled that way.

Maybe introduce a new nft_expr_type field and set reduce like so:

| reduce = reduce && expr->ops->type->reduce;

Cheers, Phil

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] netfilter: nf_tables: restrict expression reduction to first expression
  2022-05-18 10:51 ` Phil Sutter
@ 2022-05-18 11:01   ` Pablo Neira Ayuso
  2022-05-18 11:40     ` Phil Sutter
  0 siblings, 1 reply; 11+ messages in thread
From: Pablo Neira Ayuso @ 2022-05-18 11:01 UTC (permalink / raw)
  To: Phil Sutter, netfilter-devel, fw

On Wed, May 18, 2022 at 12:51:00PM +0200, Phil Sutter wrote:
> Hi,
> 
> On Wed, May 18, 2022 at 12:08:42PM +0200, Pablo Neira Ayuso wrote:
> > Either userspace or kernelspace need to pre-fetch keys inconditionally
> > before comparisons for this to work. Otherwise, register tracking data
> > is misleading and it might result in reducing expressions which are not
> > yet registers.
> > 
> > First expression is guaranteed to be evaluated always, therefore, keep
> > tracking registers and restrict reduction to first expression.
> > 
> > Fixes: b2d306542ff9 ("netfilter: nf_tables: do not reduce read-only expressions")
> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > ---
> > @Phil, you mentioned about a way to simplify this patch, I don't see how,
> > just let me know.
> 
> Not a big one. Instead of:
> 
> |	if (nft_expr_reduce(&track, expr)) {
> |		if (reduce) {
> |			reduce = false;
> |			expr = track.cur;
> |			continue;
> |		}
> |	} else if (reduce) {
> |		reduce = false;
> |	}
> 
> One could do:
> 
> |	if (nft_expr_reduce(&track, expr) && reduce) {
> |		reduce = false;
> |		expr = track.cur;
> |		continue;
> |	}
> |	reduce = false;

I'll send v2 using this idiom.

> Regarding later pre-fetching, one should distinguish between expressions
> that (may) set verdict register and those that don't. There are pitfalls
> though, e.g. error conditions handled that way.
> 
> Maybe introduce a new nft_expr_type field and set reduce like so:
> 
> | reduce = reduce && expr->ops->type->reduce;

Could you elaborate?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] netfilter: nf_tables: restrict expression reduction to first expression
  2022-05-18 11:01   ` Pablo Neira Ayuso
@ 2022-05-18 11:40     ` Phil Sutter
  2022-05-18 11:48       ` Florian Westphal
  2022-05-18 12:21       ` Pablo Neira Ayuso
  0 siblings, 2 replies; 11+ messages in thread
From: Phil Sutter @ 2022-05-18 11:40 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, fw

On Wed, May 18, 2022 at 01:01:50PM +0200, Pablo Neira Ayuso wrote:
> On Wed, May 18, 2022 at 12:51:00PM +0200, Phil Sutter wrote:
> > Hi,
> > 
> > On Wed, May 18, 2022 at 12:08:42PM +0200, Pablo Neira Ayuso wrote:
> > > Either userspace or kernelspace need to pre-fetch keys inconditionally
> > > before comparisons for this to work. Otherwise, register tracking data
> > > is misleading and it might result in reducing expressions which are not
> > > yet registers.
> > > 
> > > First expression is guaranteed to be evaluated always, therefore, keep
> > > tracking registers and restrict reduction to first expression.
> > > 
> > > Fixes: b2d306542ff9 ("netfilter: nf_tables: do not reduce read-only expressions")
> > > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > > ---
> > > @Phil, you mentioned about a way to simplify this patch, I don't see how,
> > > just let me know.
> > 
> > Not a big one. Instead of:
> > 
> > |	if (nft_expr_reduce(&track, expr)) {
> > |		if (reduce) {
> > |			reduce = false;
> > |			expr = track.cur;
> > |			continue;
> > |		}
> > |	} else if (reduce) {
> > |		reduce = false;
> > |	}
> > 
> > One could do:
> > 
> > |	if (nft_expr_reduce(&track, expr) && reduce) {
> > |		reduce = false;
> > |		expr = track.cur;
> > |		continue;
> > |	}
> > |	reduce = false;
> 
> I'll send v2 using this idiom.
> 
> > Regarding later pre-fetching, one should distinguish between expressions
> > that (may) set verdict register and those that don't. There are pitfalls
> > though, e.g. error conditions handled that way.
> > 
> > Maybe introduce a new nft_expr_type field and set reduce like so:
> > 
> > | reduce = reduce && expr->ops->type->reduce;
> 
> Could you elaborate?

Well, an expression which may set verdict register to NFT_BREAK should
prevent reduction of later expressions in same rule as it may stop rule
evaluation at run-time. This is obvious for nft_cmp, but nft_meta is
also a candidate: NFT_META_IFTYPE causes NFT_BREAK if pkt->skb->dev is
NULL. The optimizer must not assume later expressions are evaluated.

A first step might be said nft_expr_type field indicating a given
expression might stop expression evaluation. Therefore:

| reduce = reduce && expr->ops->type->reduce;

would continue expression reduction if not already stopped and the
current expression doesn't end it.

Taking nft_meta as example again:

* Behaviour changes based on nft_expr_type::select_ops result
* Some keys are guaranteed to not stop expression evaluation:
  NFT_META_LEN for instance will always just fetch skb->len. So
  introduce a callback instead:

| bool nft_expr_ops::may_break(const struct nft_expr *expr);

Then "ask" the expression whether it may change verdict register:

| reduce = reduce && expr->ops->may_break(expr);

With nft_meta_get_ops, we'd have:

| bool nft_meta_get_may_break(const struct nft_expr *expr)
| {
| 	switch (nft_expr_priv(expr)->key) {
| 	case NFT_META_LEN:
| 	case NFT_META_PROTOCOL::
| 	[...]
| 		return false;
| 	case NFT_META_IFTYPE:
| 	[...]
| 		return true;
| 	}
| }

Another thing about your proposed patch: Expressions may update
registers even if not reduced. Could that upset later reduction
decision? E.g.:

| ip saddr 1.0.0.1 ip daddr 2.0.0.2 accept
| ip daddr 3.0.0.3 accept

Code no longer allows the first rule's 'ip daddr' expression to be
reduced (no matter what's in registers already), but it's existence
causes reduction of the second rule's 'ip daddr' expression, right?

Cheers, Phil

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] netfilter: nf_tables: restrict expression reduction to first expression
  2022-05-18 11:40     ` Phil Sutter
@ 2022-05-18 11:48       ` Florian Westphal
  2022-05-18 12:26         ` Pablo Neira Ayuso
  2022-05-18 12:21       ` Pablo Neira Ayuso
  1 sibling, 1 reply; 11+ messages in thread
From: Florian Westphal @ 2022-05-18 11:48 UTC (permalink / raw)
  To: Phil Sutter, Pablo Neira Ayuso, netfilter-devel, fw

Phil Sutter <phil@nwl.cc> wrote:
> > > | reduce = reduce && expr->ops->type->reduce;
> > 
> > Could you elaborate?
> 
> Well, an expression which may set verdict register to NFT_BREAK should
> prevent reduction of later expressions in same rule as it may stop rule
> evaluation at run-time. This is obvious for nft_cmp, but nft_meta is
> also a candidate: NFT_META_IFTYPE causes NFT_BREAK if pkt->skb->dev is
> NULL. The optimizer must not assume later expressions are evaluated.

This all seems fragile to me, with huge potential to add subtle bugs
that will be hard to track down.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] netfilter: nf_tables: restrict expression reduction to first expression
  2022-05-18 11:40     ` Phil Sutter
  2022-05-18 11:48       ` Florian Westphal
@ 2022-05-18 12:21       ` Pablo Neira Ayuso
  2022-05-18 12:33         ` Pablo Neira Ayuso
  2022-05-18 12:43         ` Phil Sutter
  1 sibling, 2 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2022-05-18 12:21 UTC (permalink / raw)
  To: Phil Sutter, netfilter-devel, fw

On Wed, May 18, 2022 at 01:40:21PM +0200, Phil Sutter wrote:
> On Wed, May 18, 2022 at 01:01:50PM +0200, Pablo Neira Ayuso wrote:
> > On Wed, May 18, 2022 at 12:51:00PM +0200, Phil Sutter wrote:
> > > Hi,
> > > 
> > > On Wed, May 18, 2022 at 12:08:42PM +0200, Pablo Neira Ayuso wrote:
> > > > Either userspace or kernelspace need to pre-fetch keys inconditionally
> > > > before comparisons for this to work. Otherwise, register tracking data
> > > > is misleading and it might result in reducing expressions which are not
> > > > yet registers.
> > > > 
> > > > First expression is guaranteed to be evaluated always, therefore, keep
> > > > tracking registers and restrict reduction to first expression.
> > > > 
> > > > Fixes: b2d306542ff9 ("netfilter: nf_tables: do not reduce read-only expressions")
> > > > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > > > ---
> > > > @Phil, you mentioned about a way to simplify this patch, I don't see how,
> > > > just let me know.
> > > 
> > > Not a big one. Instead of:
> > > 
> > > |	if (nft_expr_reduce(&track, expr)) {
> > > |		if (reduce) {
> > > |			reduce = false;
> > > |			expr = track.cur;
> > > |			continue;
> > > |		}
> > > |	} else if (reduce) {
> > > |		reduce = false;
> > > |	}
> > > 
> > > One could do:
> > > 
> > > |	if (nft_expr_reduce(&track, expr) && reduce) {
> > > |		reduce = false;
> > > |		expr = track.cur;
> > > |		continue;
> > > |	}
> > > |	reduce = false;
> > 
> > I'll send v2 using this idiom.
> > 
> > > Regarding later pre-fetching, one should distinguish between expressions
> > > that (may) set verdict register and those that don't. There are pitfalls
> > > though, e.g. error conditions handled that way.
> > > 
> > > Maybe introduce a new nft_expr_type field and set reduce like so:
> > > 
> > > | reduce = reduce && expr->ops->type->reduce;
> > 
> > Could you elaborate?
> 
> Well, an expression which may set verdict register to NFT_BREAK should
> prevent reduction of later expressions in same rule as it may stop rule
> evaluation at run-time. This is obvious for nft_cmp, but nft_meta is
> also a candidate: NFT_META_IFTYPE causes NFT_BREAK if pkt->skb->dev is
> NULL. The optimizer must not assume later expressions are evaluated.

How many other expression are breaking when fetching the key?

> A first step might be said nft_expr_type field indicating a given
> expression might stop expression evaluation. Therefore:
> 
> | reduce = reduce && expr->ops->type->reduce;
> 
> would continue expression reduction if not already stopped and the
> current expression doesn't end it.
> 
> Taking nft_meta as example again:
> 
> * Behaviour changes based on nft_expr_type::select_ops result
> * Some keys are guaranteed to not stop expression evaluation:
>   NFT_META_LEN for instance will always just fetch skb->len. So
>   introduce a callback instead:
>
> | bool nft_expr_ops::may_break(const struct nft_expr *expr);
>
> Then "ask" the expression whether it may change verdict register:
> 
> | reduce = reduce && expr->ops->may_break(expr);
> 
> With nft_meta_get_ops, we'd have:
> 
> | bool nft_meta_get_may_break(const struct nft_expr *expr)
> | {
> | 	switch (nft_expr_priv(expr)->key) {
> | 	case NFT_META_LEN:
> | 	case NFT_META_PROTOCOL::
> | 	[...]
> | 		return false;
> | 	case NFT_META_IFTYPE:
> | 	[...]
> | 		return true;
> | 	}
> | }

And simply remove that NFT_BREAK and set a value that will not ever
match via nft_cmp?

> Another thing about your proposed patch: Expressions may update
> registers even if not reduced. Could that upset later reduction
> decision? E.g.:
> 
> | ip saddr 1.0.0.1 ip daddr 2.0.0.2 accept
> | ip daddr 3.0.0.3 accept
> 
> Code no longer allows the first rule's 'ip daddr' expression to be
> reduced (no matter what's in registers already), but it's existence
> causes reduction of the second rule's 'ip daddr' expression, right?

We cannot make assumptions on ip daddr because there is a cmp right
before (to test for ip saddr 1.0.0.1), unless keys are inconditionally
prefetched.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] netfilter: nf_tables: restrict expression reduction to first expression
  2022-05-18 11:48       ` Florian Westphal
@ 2022-05-18 12:26         ` Pablo Neira Ayuso
  2022-05-18 12:38           ` Florian Westphal
  0 siblings, 1 reply; 11+ messages in thread
From: Pablo Neira Ayuso @ 2022-05-18 12:26 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Phil Sutter, netfilter-devel

On Wed, May 18, 2022 at 01:48:07PM +0200, Florian Westphal wrote:
> Phil Sutter <phil@nwl.cc> wrote:
> > > > | reduce = reduce && expr->ops->type->reduce;
> > > 
> > > Could you elaborate?
> > 
> > Well, an expression which may set verdict register to NFT_BREAK should
> > prevent reduction of later expressions in same rule as it may stop rule
> > evaluation at run-time. This is obvious for nft_cmp, but nft_meta is
> > also a candidate: NFT_META_IFTYPE causes NFT_BREAK if pkt->skb->dev is
> > NULL. The optimizer must not assume later expressions are evaluated.
> 
> This all seems fragile to me, with huge potential to add subtle bugs
> that will be hard to track down.

We can expose flags to indicate that an expression is reduced and
expressions that are prefetched.

New test infrastructure will help to catch bugs, more selftests and
userspace validation of bytecode through exposed flags.

It would be good not to re-fetch data into register that is already
there.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] netfilter: nf_tables: restrict expression reduction to first expression
  2022-05-18 12:21       ` Pablo Neira Ayuso
@ 2022-05-18 12:33         ` Pablo Neira Ayuso
  2022-05-18 12:43         ` Phil Sutter
  1 sibling, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2022-05-18 12:33 UTC (permalink / raw)
  To: Phil Sutter, netfilter-devel, fw

On Wed, May 18, 2022 at 02:21:59PM +0200, Pablo Neira Ayuso wrote:
> On Wed, May 18, 2022 at 01:40:21PM +0200, Phil Sutter wrote:
> > On Wed, May 18, 2022 at 01:01:50PM +0200, Pablo Neira Ayuso wrote:
> > > On Wed, May 18, 2022 at 12:51:00PM +0200, Phil Sutter wrote:
> > > > Hi,
> > > > 
> > > > On Wed, May 18, 2022 at 12:08:42PM +0200, Pablo Neira Ayuso wrote:
> > > > > Either userspace or kernelspace need to pre-fetch keys inconditionally
> > > > > before comparisons for this to work. Otherwise, register tracking data
> > > > > is misleading and it might result in reducing expressions which are not
> > > > > yet registers.
> > > > > 
> > > > > First expression is guaranteed to be evaluated always, therefore, keep
> > > > > tracking registers and restrict reduction to first expression.
> > > > > 
> > > > > Fixes: b2d306542ff9 ("netfilter: nf_tables: do not reduce read-only expressions")
> > > > > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > > > > ---
> > > > > @Phil, you mentioned about a way to simplify this patch, I don't see how,
> > > > > just let me know.
> > > > 
> > > > Not a big one. Instead of:
> > > > 
> > > > |	if (nft_expr_reduce(&track, expr)) {
> > > > |		if (reduce) {
> > > > |			reduce = false;
> > > > |			expr = track.cur;
> > > > |			continue;
> > > > |		}
> > > > |	} else if (reduce) {
> > > > |		reduce = false;
> > > > |	}
> > > > 
> > > > One could do:
> > > > 
> > > > |	if (nft_expr_reduce(&track, expr) && reduce) {
> > > > |		reduce = false;
> > > > |		expr = track.cur;
> > > > |		continue;
> > > > |	}
> > > > |	reduce = false;
> > > 
> > > I'll send v2 using this idiom.
> > > 
> > > > Regarding later pre-fetching, one should distinguish between expressions
> > > > that (may) set verdict register and those that don't. There are pitfalls
> > > > though, e.g. error conditions handled that way.
> > > > 
> > > > Maybe introduce a new nft_expr_type field and set reduce like so:
> > > > 
> > > > | reduce = reduce && expr->ops->type->reduce;
> > > 
> > > Could you elaborate?
> > 
> > Well, an expression which may set verdict register to NFT_BREAK should
> > prevent reduction of later expressions in same rule as it may stop rule
> > evaluation at run-time. This is obvious for nft_cmp, but nft_meta is
> > also a candidate: NFT_META_IFTYPE causes NFT_BREAK if pkt->skb->dev is
> > NULL. The optimizer must not assume later expressions are evaluated.
> 
> How many other expression are breaking when fetching the key?
> 
> > A first step might be said nft_expr_type field indicating a given
> > expression might stop expression evaluation. Therefore:
> > 
> > | reduce = reduce && expr->ops->type->reduce;
> > 
> > would continue expression reduction if not already stopped and the
> > current expression doesn't end it.
> > 
> > Taking nft_meta as example again:
> > 
> > * Behaviour changes based on nft_expr_type::select_ops result
> > * Some keys are guaranteed to not stop expression evaluation:
> >   NFT_META_LEN for instance will always just fetch skb->len. So
> >   introduce a callback instead:
> >
> > | bool nft_expr_ops::may_break(const struct nft_expr *expr);
> >
> > Then "ask" the expression whether it may change verdict register:
> > 
> > | reduce = reduce && expr->ops->may_break(expr);
> > 
> > With nft_meta_get_ops, we'd have:
> > 
> > | bool nft_meta_get_may_break(const struct nft_expr *expr)
> > | {
> > | 	switch (nft_expr_priv(expr)->key) {
> > | 	case NFT_META_LEN:
> > | 	case NFT_META_PROTOCOL::
> > | 	[...]
> > | 		return false;
> > | 	case NFT_META_IFTYPE:
> > | 	[...]
> > | 		return true;
> > | 	}
> > | }
> 
> And simply remove that NFT_BREAK and set a value that will not ever
> match via nft_cmp?

Canceling tracking of keys that might break is probably a more simple
solution.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] netfilter: nf_tables: restrict expression reduction to first expression
  2022-05-18 12:26         ` Pablo Neira Ayuso
@ 2022-05-18 12:38           ` Florian Westphal
  2022-05-18 12:49             ` Pablo Neira Ayuso
  0 siblings, 1 reply; 11+ messages in thread
From: Florian Westphal @ 2022-05-18 12:38 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, Phil Sutter, netfilter-devel

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > This all seems fragile to me, with huge potential to add subtle bugs
> > that will be hard to track down.
> 
> We can expose flags to indicate that an expression is reduced and
> expressions that are prefetched.
> 
> New test infrastructure will help to catch bugs, more selftests and
> userspace validation of bytecode through exposed flags.
> 
> It would be good not to re-fetch data into register that is already
> there.

I wonder if we should explore doing this from userspace only, i.e.
provide hints to kernel which expressions should be dropped in a given
chain.

Thats more transparent and would permit to reshuffle expressions,
e.g. first add all 'load instructions' and then do the comparisions
register opererations.

Kind of reverse approach to what you and Phil are doing, instead of
eliding expressions in the data path representation based on in-kernel
logic and a debug infra that annotates 'soft off' expressions, annotate
them in userspace and then tell kernel what it can discard.

Downside is that userspace would have to delete+re-add entire chain to
keep the 'elide' as-is.

With proposed scheme, we will have to patch kernel and then tell users
that they must upgrade kernel or risk that their ruelset is incorrect.

With userspace approach, we could slowly extend nft and add explicit
optimization flags to the commandline tool, with default of re-fetch.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] netfilter: nf_tables: restrict expression reduction to first expression
  2022-05-18 12:21       ` Pablo Neira Ayuso
  2022-05-18 12:33         ` Pablo Neira Ayuso
@ 2022-05-18 12:43         ` Phil Sutter
  1 sibling, 0 replies; 11+ messages in thread
From: Phil Sutter @ 2022-05-18 12:43 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, fw

On Wed, May 18, 2022 at 02:21:57PM +0200, Pablo Neira Ayuso wrote:
> On Wed, May 18, 2022 at 01:40:21PM +0200, Phil Sutter wrote:
> > On Wed, May 18, 2022 at 01:01:50PM +0200, Pablo Neira Ayuso wrote:
> > > On Wed, May 18, 2022 at 12:51:00PM +0200, Phil Sutter wrote:
> > > > Hi,
> > > > 
> > > > On Wed, May 18, 2022 at 12:08:42PM +0200, Pablo Neira Ayuso wrote:
> > > > > Either userspace or kernelspace need to pre-fetch keys inconditionally
> > > > > before comparisons for this to work. Otherwise, register tracking data
> > > > > is misleading and it might result in reducing expressions which are not
> > > > > yet registers.
> > > > > 
> > > > > First expression is guaranteed to be evaluated always, therefore, keep
> > > > > tracking registers and restrict reduction to first expression.
> > > > > 
> > > > > Fixes: b2d306542ff9 ("netfilter: nf_tables: do not reduce read-only expressions")
> > > > > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > > > > ---
> > > > > @Phil, you mentioned about a way to simplify this patch, I don't see how,
> > > > > just let me know.
> > > > 
> > > > Not a big one. Instead of:
> > > > 
> > > > |	if (nft_expr_reduce(&track, expr)) {
> > > > |		if (reduce) {
> > > > |			reduce = false;
> > > > |			expr = track.cur;
> > > > |			continue;
> > > > |		}
> > > > |	} else if (reduce) {
> > > > |		reduce = false;
> > > > |	}
> > > > 
> > > > One could do:
> > > > 
> > > > |	if (nft_expr_reduce(&track, expr) && reduce) {
> > > > |		reduce = false;
> > > > |		expr = track.cur;
> > > > |		continue;
> > > > |	}
> > > > |	reduce = false;
> > > 
> > > I'll send v2 using this idiom.
> > > 
> > > > Regarding later pre-fetching, one should distinguish between expressions
> > > > that (may) set verdict register and those that don't. There are pitfalls
> > > > though, e.g. error conditions handled that way.
> > > > 
> > > > Maybe introduce a new nft_expr_type field and set reduce like so:
> > > > 
> > > > | reduce = reduce && expr->ops->type->reduce;
> > > 
> > > Could you elaborate?
> > 
> > Well, an expression which may set verdict register to NFT_BREAK should
> > prevent reduction of later expressions in same rule as it may stop rule
> > evaluation at run-time. This is obvious for nft_cmp, but nft_meta is
> > also a candidate: NFT_META_IFTYPE causes NFT_BREAK if pkt->skb->dev is
> > NULL. The optimizer must not assume later expressions are evaluated.
> 
> How many other expression are breaking when fetching the key?

* nft_ct
* nft_exthdr
* nft_fib
* nft_flow_offload
* nft_fwd_netdev
* nft_osf
* nft_payload
* nft_rt
* nft_socket
* nft_synproxy
* nft_tproxy
* nft_tunnel
* nft_xfrm

> > A first step might be said nft_expr_type field indicating a given
> > expression might stop expression evaluation. Therefore:
> > 
> > | reduce = reduce && expr->ops->type->reduce;
> > 
> > would continue expression reduction if not already stopped and the
> > current expression doesn't end it.
> > 
> > Taking nft_meta as example again:
> > 
> > * Behaviour changes based on nft_expr_type::select_ops result
> > * Some keys are guaranteed to not stop expression evaluation:
> >   NFT_META_LEN for instance will always just fetch skb->len. So
> >   introduce a callback instead:
> >
> > | bool nft_expr_ops::may_break(const struct nft_expr *expr);
> >
> > Then "ask" the expression whether it may change verdict register:
> > 
> > | reduce = reduce && expr->ops->may_break(expr);
> > 
> > With nft_meta_get_ops, we'd have:
> > 
> > | bool nft_meta_get_may_break(const struct nft_expr *expr)
> > | {
> > | 	switch (nft_expr_priv(expr)->key) {
> > | 	case NFT_META_LEN:
> > | 	case NFT_META_PROTOCOL::
> > | 	[...]
> > | 		return false;
> > | 	case NFT_META_IFTYPE:
> > | 	[...]
> > | 		return true;
> > | 	}
> > | }
> 
> And simply remove that NFT_BREAK and set a value that will not ever
> match via nft_cmp?

If possible, yes. If given interface is not available, we store 0 or an
empty string for ifindex or ifname for instance. There might be types
without unreachable values, though. (ifgroup maybe?)

> > Another thing about your proposed patch: Expressions may update
> > registers even if not reduced. Could that upset later reduction
> > decision? E.g.:
> > 
> > | ip saddr 1.0.0.1 ip daddr 2.0.0.2 accept
> > | ip daddr 3.0.0.3 accept
> > 
> > Code no longer allows the first rule's 'ip daddr' expression to be
> > reduced (no matter what's in registers already), but it's existence
> > causes reduction of the second rule's 'ip daddr' expression, right?
> 
> We cannot make assumptions on ip daddr because there is a cmp right
> before (to test for ip saddr 1.0.0.1), unless keys are inconditionally
> prefetched.

That's right, and I fear calling nft_expr_reduce() might cause that even
if not evicting the expression.

Cheers, Phil

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] netfilter: nf_tables: restrict expression reduction to first expression
  2022-05-18 12:38           ` Florian Westphal
@ 2022-05-18 12:49             ` Pablo Neira Ayuso
  0 siblings, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2022-05-18 12:49 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Phil Sutter, netfilter-devel

On Wed, May 18, 2022 at 02:38:14PM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > This all seems fragile to me, with huge potential to add subtle bugs
> > > that will be hard to track down.
> > 
> > We can expose flags to indicate that an expression is reduced and
> > expressions that are prefetched.
> > 
> > New test infrastructure will help to catch bugs, more selftests and
> > userspace validation of bytecode through exposed flags.
> > 
> > It would be good not to re-fetch data into register that is already
> > there.
> 
> I wonder if we should explore doing this from userspace only, i.e.
> provide hints to kernel which expressions should be dropped in a given
> chain.
> 
> Thats more transparent and would permit to reshuffle expressions,
> e.g. first add all 'load instructions' and then do the comparisions
> register opererations.
> 
> Kind of reverse approach to what you and Phil are doing, instead of
> eliding expressions in the data path representation based on in-kernel
> logic and a debug infra that annotates 'soft off' expressions, annotate
> them in userspace and then tell kernel what it can discard.
> 
> Downside is that userspace would have to delete+re-add entire chain to
> keep the 'elide' as-is.

Problem is incremental ruleset updates, we'd go back to iptables way
(dump ruleset, rework, reload).

> With proposed scheme, we will have to patch kernel and then tell users
> that they must upgrade kernel or risk that their ruelset is incorrect.
> 
> With userspace approach, we could slowly extend nft and add explicit
> optimization flags to the commandline tool, with default of re-fetch.

I would revisit to enable expression reduction for keys that do not
depend on datapath data by canceling tracking in such case.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-05-18 12:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-18 10:08 [PATCH] netfilter: nf_tables: restrict expression reduction to first expression Pablo Neira Ayuso
2022-05-18 10:51 ` Phil Sutter
2022-05-18 11:01   ` Pablo Neira Ayuso
2022-05-18 11:40     ` Phil Sutter
2022-05-18 11:48       ` Florian Westphal
2022-05-18 12:26         ` Pablo Neira Ayuso
2022-05-18 12:38           ` Florian Westphal
2022-05-18 12:49             ` Pablo Neira Ayuso
2022-05-18 12:21       ` Pablo Neira Ayuso
2022-05-18 12:33         ` Pablo Neira Ayuso
2022-05-18 12:43         ` Phil Sutter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.