All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: Phil Sutter <phil@nwl.cc>,
	netfilter-devel@vger.kernel.org, Florian Westphal <fw@strlen.de>
Subject: Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
Date: Tue, 25 Feb 2020 13:39:34 +0100	[thread overview]
Message-ID: <20200225123934.p3vru3tmbsjj2o7y@salvia> (raw)
In-Reply-To: <20200223222258.2bb7516a@redhat.com>

Hi,

On Sun, Feb 23, 2020 at 10:22:58PM +0100, Stefano Brivio wrote:
> Hi Phil,
> 
> On Sat, 22 Feb 2020 02:19:33 +0100
> Phil Sutter <phil@nwl.cc> wrote:
> 
> > Hi Stefano,
> > 
> > On Fri, Feb 21, 2020 at 11:22:18PM +0100, Stefano Brivio wrote:
> > > On Fri, 21 Feb 2020 22:17:04 +0100
> > > Phil Sutter <phil@nwl.cc> wrote:
> > >   
> > > > Hi Stefano,
> > > > 
> > > > On Fri, Feb 21, 2020 at 03:04:20AM +0100, Stefano Brivio wrote:  
> > > > > Patch 1/2 fixes the issue recently reported by Phil on a sequence of
> > > > > add/flush/add operations, and patch 2/2 introduces a test case
> > > > > covering that.    
> > > > 
> > > > This fixes my test case, thanks!
> > > > 
> > > > I found another problem, but it's maybe on user space side (and not a
> > > > crash this time ;):
> > > > 
> > > > | # nft add table t
> > > > | # nft add set t s '{ type inet_service . inet_service ; flags interval ; }
> > > > | # nft add element t s '{ 20-30 . 40, 25-35 . 40 }'
> > > > | # nft list ruleset
> > > > | table ip t {
> > > > | 	set s {
> > > > | 		type inet_service . inet_service
> > > > | 		flags interval
> > > > | 		elements = { 20-30 . 40 }
> > > > | 	}
> > > > | }
> > > > 
> > > > As you see, the second element disappears. It happens only if ranges
> > > > overlap and non-range parts are identical.
> > > >
> > > > Looking at do_add_setelems(), set_to_intervals() should not be called
> > > > for concatenated ranges, although I *think* range merging happens only
> > > > there. So user space should cover for that already?!  
> > > 
> > > Yes. I didn't consider the need for this kind of specification, given
> > > that you can obtain the same result by simply adding two elements:
> > > separate, partially overlapping elements can be inserted (which is, if I
> > > recall correctly, not the case for rbtree).
> > > 
> > > If I recall correctly, we had a short discussion with Florian about
> > > this, but I don't remember the conclusion.
> > > 
> > > However, I see the ugliness, and how this breaks probably legitimate
> > > expectations. I guess we could call set_to_intervals() in this case,
> > > that function might need some minor adjustments.
> > > 
> > > An alternative, and I'm not sure which one is the most desirable, would
> > > be to refuse that kind of insertion.  
> > 
> > I don't think having concatenated ranges not merge even if possible is a
> > problem. It's just a "nice feature" with some controversial aspects.
> > 
> > The bug I'm reporting is that Above 'add element' command passes two
> > elements to nftables but only the first one makes it into the set. If
> > overlapping elements are fine in pipapo, they should both be there. If
> > not (or otherwise unwanted), we better error out instead of silently
> > dropping the second one.
> 
> Indeed, I agree there's a blatant bug, I was just wondering how to
> solve it.

With hashtable and bitmap, adding an element that already exists is
fine:

nft add element x y { 1.1.1.1 }
nft add element x y { 22 }
nft add element x z { 1.1.1.1-2.2.2.2 }
nft add element x k { 1.1.1.1-2.2.2.2 . 3.3.3.3 }

then, through 'create':

nft create element x y { 1.1.1.1 }
nft create element x y { 22 }
nft create element x z { 1.1.1.1-2.2.2.2 }
nft create element x k { 1.1.1.1-2.2.2.2 . 3.3.3.3 }

these hit EEXIST, all good.

In pipapo, the following is silently ignored:

nft add element x k { 1.1.1.1-2.2.2.3 . 3.3.3.3 }
                                    ^
(just added a slightly large range). I tried _without_ these patches.

In rbtree, if you try to add an interval that already exists:

# nft add element x z { 1.1.0.0-1.1.2.254 }
Error: interval overlaps with an existing one
add element x z { 1.1.0.0-1.1.2.254 }
                  ^^^^^^^^^^^^^^^^^
Error: Could not process rule: File exists
add element x z { 1.1.0.0-1.1.2.254 }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This complains as an overlap.

I think it's better to not extend userspace to dump the element cache
for add/create, this slows down things for incremental updates
(there's a ticket on bugzilla regarding this problem on the rbtree
IIRC). Better if pipapo can handle this from the kernel.

There is a automerge feature in the rbtree userspace that has been
controversial. This was initially turned on by default, users were
reporting that this was confusing. We can probably introduce a kernel
flag to turn on this automerge feature so pipapo knows user is asking
to not bail out with EEXIST, instead just merge? Not sure how hard is
to implement merging.

> I found out from notes with an old discussion with Florian what the
> problem really was: for concatenated ranges, we might have stuff like:
> 
> 	'{ 20-30 . 40-50, 25-35 . 45-50 }'

I think the second element should hit EEXIST.

> And the only sane way to handle this is as two separate elements. Also
> note that I gave a rather confusing description of the behaviour with
> "partially overlapping elements": what can partially overlap are ranges
> within one field, but there can't be an overlap (even partial) over the
> whole concatenation, because that creates ambiguity. That is,
> 
> 	'{ 20-30 . 40, 25-35 . 40 }'
> 
> the "second element" here is not allowed, while:
> 
> 	'{ 20-30 . 40, 25-35 . 41 }'
> 
> these elements both are.
> 
> Now, this turns into a question for Pablo. I started digging a bit
> further, and I think that both userspace and nft_pipapo_insert()
> observe a reasonable behaviour here: nft passes those as two elements,
> nft_pipapo_insert() rejects the second one with -EEXIST.
> 
> However, in nft_add_set_elem(), we hit this:
> 
> 	err = set->ops->insert(ctx->net, set, &elem, &ext2);
> 	if (err) {
> 		if (err == -EEXIST) {
> 			if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) ^
> 			    nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) ||
> 			    nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) ^
> 			    nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF)) {
> 				err = -EBUSY;
> 				goto err_element_clash;
> 			}
> 			if ((nft_set_ext_exists(ext, NFT_SET_EXT_DATA) &&
> 			     nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) &&
> 			     memcmp(nft_set_ext_data(ext),
> 				    nft_set_ext_data(ext2), set->dlen) != 0) ||
> 			    (nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) &&
> 			     nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF) &&
> 			     *nft_set_ext_obj(ext) != *nft_set_ext_obj(ext2)))
> 				err = -EBUSY;
> 			else if (!(nlmsg_flags & NLM_F_EXCL))
> 				err = 0;
> 		}
> 		goto err_element_clash;
> 	}
> 
> and, in particular, as there's no "data" or "objref" extension
> associated with the elements, we hit the:
> 
> 	if (!(nlmsg_flags & NLM_F_EXCL))
> 
> clause, introduced by commit c016c7e45ddf ("netfilter: nf_tables: honor
> NLM_F_EXCL flag in set element insertion"). The error is reset, and we
> return success, but the set back-end indicated a problem.
> 
> Now, if NLM_F_EXCL is not passed and the entry the user wants to add is
> already present, even though I'd give RFC 3549 a different
> interpretation (we won't replace the entry, but we should still report
> the error IMHO), I see why we might return success in this case.
>
> The additional problem with concatenated ranges here is that the entry
> might be conflicting (overlapping over the whole concatenation), but
> not be the same as the user wants to insert. I think -EEXIST is the
> code that still fits best in this case, so... do you see a better
> alternative than changing nft_pipapo_insert() to return, say, -EINVAL?

Please, no -EINVAL, it's very overloaded and I think we should only
use this for missing netlink attributes / malformed netlink message.

If I understood your reasoning, I agree -EEXIST for an overlapping
element is fine, even if NLM_F_EXCL is not set.

  reply	other threads:[~2020-02-25 12:39 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-21  2:04 [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table Stefano Brivio
2020-02-21  2:04 ` [PATCH nf 1/2] nft_set_pipapo: Actually fetch key data in nft_pipapo_remove() Stefano Brivio
2020-02-21  2:04 ` [PATCH nf 2/2] selftests: nft_concat_range: Add test for reported add/flush/add issue Stefano Brivio
2020-02-21 21:17 ` [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table Phil Sutter
2020-02-21 22:22   ` Stefano Brivio
2020-02-22  1:19     ` Phil Sutter
2020-02-23 21:22       ` Stefano Brivio
2020-02-25 12:39         ` Pablo Neira Ayuso [this message]
2020-02-25 12:45           ` Stefano Brivio
2020-02-25 13:13           ` Stefano Brivio
2020-02-25 13:42             ` Pablo Neira Ayuso
2020-02-25 14:34               ` Stefano Brivio
2020-02-25 18:48                 ` Phil Sutter
2020-02-25 19:33                   ` Stefano Brivio
2020-02-25 20:21                 ` Pablo Neira Ayuso
2020-02-25 20:38                   ` Stefano Brivio
2020-02-25 20:58                     ` Pablo Neira Ayuso
2020-02-26 10:58                       ` Pablo Neira Ayuso
2020-02-26 11:01                         ` Pablo Neira Ayuso
2020-02-26 11:02                         ` Stefano Brivio
2020-02-26 11:29                           ` Pablo Neira Ayuso
2020-02-26 11:36                             ` Stefano Brivio
2020-02-26 11:53                               ` Pablo Neira Ayuso
2020-02-26 10:59                       ` Stefano Brivio
2020-02-26 11:10                         ` Pablo Neira Ayuso
2020-02-26 11:19                           ` Stefano Brivio
2020-02-26 11:34                             ` Pablo Neira Ayuso
2020-02-26 11:39                               ` Stefano Brivio
2020-02-26 11:54                                 ` Stefano Brivio
2020-02-26 12:10                                   ` Pablo Neira Ayuso
2020-02-26 13:33 ` Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200225123934.p3vru3tmbsjj2o7y@salvia \
    --to=pablo@netfilter.org \
    --cc=fw@strlen.de \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=phil@nwl.cc \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.