All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: netfilter-devel@vger.kernel.org
Subject: Re: [PATCH nf] nft_set_rbtree: Move clauses for expired nodes, last active node as leaf
Date: Mon, 23 May 2022 16:59:30 +0200	[thread overview]
Message-ID: <YouhUq09zfcflOnz@salvia> (raw)
In-Reply-To: <20220520174524.439b5fa2@elisabeth>

On Fri, May 20, 2022 at 05:45:24PM +0200, Stefano Brivio wrote:
> On Tue, 17 May 2022 14:57:09 +0200
> Stefano Brivio <sbrivio@redhat.com> wrote:
> 
> > On Mon, 16 May 2022 20:16:53 +0200
> > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > 
> > > Hi Stefano,
> > > 
> > > On Thu, May 12, 2022 at 08:34:21PM +0200, Stefano Brivio wrote:  
> > > > In the overlap detection performed as part of the insertion operation,
> > > > we have to skip nodes that are not active in the current generation or
> > > > expired. This was done by adding several conditions in overlap decision
> > > > clauses, which made it hard to check for correctness, and debug the
> > > > actual issue this patch is fixing.
> > > > 
> > > > Consolidate these conditions into a single clause, so that we can
> > > > proceed with debugging and fixing the following issue.
> > > > 
> > > > Case b3. described in comments covers the insertion of a start
> > > > element after an existing end, as a leaf. If all the descendants of
> > > > a given element are inactive, functionally, for the purposes of
> > > > overlap detection, we still have to treat this as a leaf, but we don't.
> > > > 
> > > > If, in the same transaction, we remove a start element to the right,
> > > > remove an end element to the left, and add a start element to the right
> > > > of an existing, active end element, we don't hit case b3. For example:
> > > > 
> > > > - existing intervals: 1-2, 3-5, 6-7
> > > > - transaction: delete 3-5, insert 4-5
> > > > 
> > > > node traversal might happen as follows:
> > > > - 2 (active end)
> > > > - 5 (inactive end)
> > > > - 3 (inactive start)
> > > > 
> > > > when we add 4 as start element, we should simply consider 2 as
> > > > preceding end, and consider it as a leaf, because interval 3-5 has been
> > > > deleted. If we don't, we'll report an overlap because we forgot about
> > > > this.
> > > > 
> > > > Add an additional flag which is set as we find an active end, and reset
> > > > it if we find an active start (to the left). If we finish the traversal
> > > > with this flag set, it means we need to functionally consider the
> > > > previous active end as a leaf, and allow insertion instead of reporting
> > > > overlap.    
> > > 
> > > I can still trigger EEXIST with deletion of existing interval. It
> > > became harder to reproduce after this patch.
> > > 
> > > After hitting EEXIST, if I do:
> > > 
> > >         echo "flush ruleset" > test.nft
> > >         nft list ruleset >> test.nft
> > > 
> > > to dump the existing ruleset, then I run the delete element command
> > > again to remove the interval and it works. Before this patch I could
> > > reproduce it by reloading an existing ruleset dump.
> > > 
> > > I'm running the script that I'm attaching manually, just one manual
> > > invocation after another.  
> > 
> > Ouch, sorry for that.
> > 
> > It looks like there's another case where inactive elements still affect
> > overlap detection in an unexpected way... at least with the structure
> > of this patch it should be easier to find, I'm looking into that now.
> 
> ...what a mess. I could figure that part out (it was a case symmetric
> to what this patch fixed, in this case resolving to case b5.) but
> there's then another case (found by triggering a specific tree topology
> with 0044interval_overlap_1) where we first add a start element, then
> fail to add the end element because the start element is completely
> "hidden" in the tree by inactive nodes.
> 
> I tried to solve that with some backtracking, but that looks also
> fragile. If I clean up the tree before insertion, instead, that will
> only deal with expired nodes, not inactive nodes -- I can't erase
> non-expired, inactive nodes because the API expects to find them at
> some later point and call nft_rbtree_remove() on them.
> 
> Now I'm trying another approach that looks more robust: instead of
> descending the tree to find overlaps, just going through it in the same
> way nft_rbtree_gc() does (linearly, node by node), marking the
> value-wise closest points from left and right _valid_ nodes, and
> applying the same reasoning. I need a bit more time for this, but it
> looks way simpler. Insertion itself would keep working as it does now.
> 
> In hindsight, it looks like it was a terrible idea to try to fix this
> implementation. I really underestimated how bad this is. Functionally
> speaking, it's not a red-black tree because:
> 
> - we can't use it as a sorted binary tree, given that some elements
>   "don't matter" for some operations, or have another colour. We might
>   try to think of it as some other structure and rebuild from there
>   useful properties of sorted binary trees, but I'm not sure a
>   "red-brown-black" tree would have any other use making it worth of
>   any further research
> 
> - end elements being represented as their value plus one also break
>   assumptions of sorted trees (this is the issue I'm actually facing
>   with backtracking)
> 
> - left subtrees store keys greater than right subtrees, but this
>   looks consistent so it's just added fun and could be fixed
>   trivially (it's all reversed)
> 
> By the way, I think we _should_ have similar issues in the lookup
> function. Given that it's possible to build a tree with a subtree of at
> least three levels with entirely non-valid nodes, I guess we can hide a
> valid interval from the lookup too. It's just harder to test.

Thanks for the detailed report.

Another possibility? Maintain two trees, one for the existing
generation (this is read-only) and another for the next generation
(insertion/removals are performed on it), then swap them when commit
happens? pipapo has similar requirements, currently it is relying on a
workqueue to make some postprocess after the commit phase. At the
expense of consuming more memory.

> In the perspective of getting rid of it, I think we need:
> 
> 1. some "introductory" documentation for nft_set_pipapo -- I just
>    got back to it (drawing some diagrams first...)
> 
> 2. to understand if the performance gap in the few (maybe not
>    reasonable) cases where nft_set_rbtree matches faster than
>    nft_set_pipapo is acceptable. Summary:
>      https://lore.kernel.org/netfilter-devel/be7d4e51603633e7b88e4b0dde54b25a3e5a018b.1583598508.git.sbrivio@redhat.com/

IIRC pipapo was not too far behind from rbtree for a few scenarios.

> 3. a solution for https://bugzilla.netfilter.org/show_bug.cgi?id=1583,
>    it's an atomicity issue which has little to do with nft_set_pipapo
>    structures themselves, but I couldn't figure out the exact issue
>    yet. I'm struggling to find the time for it, so if somebody wants to
>    give it a try, I'd be more than happy to reassign it...

OK, a different problem, related to pipapo.

> Anyway, I'll post a different patch for nft_set_rbtree soon.

Thanks.

  reply	other threads:[~2022-05-23 14:59 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-12 18:34 [PATCH nf] nft_set_rbtree: Move clauses for expired nodes, last active node as leaf Stefano Brivio
2022-05-16 18:16 ` Pablo Neira Ayuso
2022-05-17 12:57   ` Stefano Brivio
2022-05-20 15:45     ` Stefano Brivio
2022-05-23 14:59       ` Pablo Neira Ayuso [this message]
2022-05-25 12:15         ` Stefano Brivio
2022-06-01 11:15           ` Pablo Neira Ayuso
2022-06-03 13:04             ` Stefano Brivio
2022-06-06  9:01               ` Pablo Neira Ayuso
2022-06-14  9:58                 ` Stefano Brivio
2022-06-16  9:08                   ` Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YouhUq09zfcflOnz@salvia \
    --to=pablo@netfilter.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.