All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
@ 2020-02-21  2:04 Stefano Brivio
  2020-02-21  2:04 ` [PATCH nf 1/2] nft_set_pipapo: Actually fetch key data in nft_pipapo_remove() Stefano Brivio
                   ` (3 more replies)
  0 siblings, 4 replies; 31+ messages in thread
From: Stefano Brivio @ 2020-02-21  2:04 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, Phil Sutter

Patch 1/2 fixes the issue recently reported by Phil on a sequence of
add/flush/add operations, and patch 2/2 introduces a test case
covering that.

Stefano Brivio (2):
  nft_set_pipapo: Actually fetch key data in nft_pipapo_remove()
  selftests: nft_concat_range: Add test for reported add/flush/add issue

 net/netfilter/nft_set_pipapo.c                |  6 ++-
 .../selftests/netfilter/nft_concat_range.sh   | 43 +++++++++++++++++--
 2 files changed, 43 insertions(+), 6 deletions(-)

-- 
2.25.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH nf 1/2] nft_set_pipapo: Actually fetch key data in nft_pipapo_remove()
  2020-02-21  2:04 [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table Stefano Brivio
@ 2020-02-21  2:04 ` Stefano Brivio
  2020-02-21  2:04 ` [PATCH nf 2/2] selftests: nft_concat_range: Add test for reported add/flush/add issue Stefano Brivio
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 31+ messages in thread
From: Stefano Brivio @ 2020-02-21  2:04 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, Phil Sutter

Phil reports that adding elements, flushing and re-adding them
right away:

  nft add table t '{ set s { type ipv4_addr . inet_service; flags interval; }; }'
  nft add element t s '{ 10.0.0.1 . 22-25, 10.0.0.1 . 10-20 }'
  nft flush set t s
  nft add element t s '{ 10.0.0.1 . 10-20, 10.0.0.1 . 22-25 }'

triggers, almost reliably, a crash like this one:

  [   71.319848] general protection fault, probably for non-canonical address 0x6f6b6e696c2e756e: 0000 [#1] PREEMPT SMP PTI
  [   71.321540] CPU: 3 PID: 1201 Comm: kworker/3:2 Not tainted 5.6.0-rc1-00377-g2bb07f4e1d861 #192
  [   71.322746] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014
  [   71.324430] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
  [   71.325387] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables]
  [   71.326164] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 <48> 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b
  [   71.328423] RSP: 0018:ffffc9000226fd90 EFLAGS: 00010282
  [   71.329225] RAX: 6f6b6e696c2e756e RBX: ffff88813ab79f60 RCX: ffff88813931b5a0
  [   71.330365] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88813ab79f9a
  [   71.331473] RBP: ffff88813ab79f60 R08: 0000000000000008 R09: 0000000000000000
  [   71.332627] R10: 000000000000021c R11: 0000000000000000 R12: ffff88813ab79fc2
  [   71.333615] R13: ffff88813b3adf50 R14: dead000000000100 R15: ffff88813931b8a0
  [   71.334596] FS:  0000000000000000(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
  [   71.335780] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [   71.336577] CR2: 000055ac683710f0 CR3: 000000013a222003 CR4: 0000000000360ee0
  [   71.337533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [   71.338557] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [   71.339718] Call Trace:
  [   71.340093]  nft_pipapo_destroy+0x7a/0x170 [nf_tables_set]
  [   71.340973]  nft_set_destroy+0x20/0x50 [nf_tables]
  [   71.341879]  nf_tables_trans_destroy_work+0x246/0x260 [nf_tables]
  [   71.342916]  process_one_work+0x1d5/0x3c0
  [   71.343601]  worker_thread+0x4a/0x3c0
  [   71.344229]  kthread+0xfb/0x130
  [   71.344780]  ? process_one_work+0x3c0/0x3c0
  [   71.345477]  ? kthread_park+0x90/0x90
  [   71.346129]  ret_from_fork+0x35/0x40
  [   71.346748] Modules linked in: nf_tables_set nf_tables nfnetlink 8021q [last unloaded: nfnetlink]
  [   71.348153] ---[ end trace 2eaa8149ca759bcc ]---
  [   71.349066] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables]
  [   71.350016] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 <48> 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b
  [   71.350017] RSP: 0018:ffffc9000226fd90 EFLAGS: 00010282
  [   71.350019] RAX: 6f6b6e696c2e756e RBX: ffff88813ab79f60 RCX: ffff88813931b5a0
  [   71.350019] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88813ab79f9a
  [   71.350020] RBP: ffff88813ab79f60 R08: 0000000000000008 R09: 0000000000000000
  [   71.350021] R10: 000000000000021c R11: 0000000000000000 R12: ffff88813ab79fc2
  [   71.350022] R13: ffff88813b3adf50 R14: dead000000000100 R15: ffff88813931b8a0
  [   71.350025] FS:  0000000000000000(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
  [   71.350026] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [   71.350027] CR2: 000055ac683710f0 CR3: 000000013a222003 CR4: 0000000000360ee0
  [   71.350028] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [   71.350028] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [   71.350030] Kernel panic - not syncing: Fatal exception
  [   71.350412] Kernel Offset: disabled
  [   71.365922] ---[ end Kernel panic - not syncing: Fatal exception ]---

which is caused by dangling elements that have been deactivated, but
never removed.

On a flush operation, nft_pipapo_walk() walks through all the elements
in the mapping table, which are then deactivated by nft_flush_set(),
one by one, and added to the commit list for removal. Element data is
then freed.

On transaction commit, nft_pipapo_remove() is called, and failed to
remove these elements, leading to the stale references in the mapping.
The first symptom of this, revealed by KASan, is a one-byte
use-after-free in subsequent calls to nft_pipapo_walk(), which is
usually not enough to trigger a panic. When stale elements are used
more heavily, though, such as double-free via nft_pipapo_destroy()
as in Phil's case, the problem becomes more noticeable.

The issue comes from that fact that, on a flush operation,
nft_pipapo_remove() won't get the actual key data via elem->key,
elements to be deleted upon commit won't be found by the lookup via
pipapo_get(), and removal will be skipped. Key data should be fetched
via nft_set_ext_key(), instead.

Reported-by: Phil Sutter <phil@nwl.cc>
Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 net/netfilter/nft_set_pipapo.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index feac8553f6d9..4fc0c924ed5d 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -1766,11 +1766,13 @@ static bool pipapo_match_field(struct nft_pipapo_field *f,
 static void nft_pipapo_remove(const struct net *net, const struct nft_set *set,
 			      const struct nft_set_elem *elem)
 {
-	const u8 *data = (const u8 *)elem->key.val.data;
 	struct nft_pipapo *priv = nft_set_priv(set);
 	struct nft_pipapo_match *m = priv->clone;
+	struct nft_pipapo_elem *e = elem->priv;
 	int rules_f0, first_rule = 0;
-	struct nft_pipapo_elem *e;
+	const u8 *data;
+
+	data = (const u8 *)nft_set_ext_key(&e->ext);
 
 	e = pipapo_get(net, set, data, 0);
 	if (IS_ERR(e))
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH nf 2/2] selftests: nft_concat_range: Add test for reported add/flush/add issue
  2020-02-21  2:04 [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table Stefano Brivio
  2020-02-21  2:04 ` [PATCH nf 1/2] nft_set_pipapo: Actually fetch key data in nft_pipapo_remove() Stefano Brivio
@ 2020-02-21  2:04 ` Stefano Brivio
  2020-02-21 21:17 ` [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table Phil Sutter
  2020-02-26 13:33 ` Pablo Neira Ayuso
  3 siblings, 0 replies; 31+ messages in thread
From: Stefano Brivio @ 2020-02-21  2:04 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, Phil Sutter

Add a specific test for the crash reported by Phil Sutter and addressed
in the previous patch. The test cases that, in my intention, should
have covered these cases, that is, the ones from the 'concurrency'
section, don't run these sequences tightly enough and spectacularly
failed to catch this.

While at it, define a convenient way to add these kind of tests, by
adding a "reported issues" test section.

It's more convenient, for this particular test, to execute the set
setup in its own function. However, future test cases like this one
might need to call setup functions, and will typically need no tools
other than nft, so allow for this in check_tools().

The original form of the reproducer used here was provided by Phil.

Reported-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 .../selftests/netfilter/nft_concat_range.sh   | 43 +++++++++++++++++--
 1 file changed, 39 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/netfilter/nft_concat_range.sh b/tools/testing/selftests/netfilter/nft_concat_range.sh
index aca21dde102a..dc810e32c59e 100755
--- a/tools/testing/selftests/netfilter/nft_concat_range.sh
+++ b/tools/testing/selftests/netfilter/nft_concat_range.sh
@@ -13,11 +13,12 @@
 KSELFTEST_SKIP=4
 
 # Available test groups:
+# - reported_issues: check for issues that were reported in the past
 # - correctness: check that packets match given entries, and only those
 # - concurrency: attempt races between insertion, deletion and lookup
 # - timeout: check that packets match entries until they expire
 # - performance: estimate matching rate, compare with rbtree and hash baselines
-TESTS="correctness concurrency timeout"
+TESTS="reported_issues correctness concurrency timeout"
 [ "${quicktest}" != "1" ] && TESTS="${TESTS} performance"
 
 # Set types, defined by TYPE_ variables below
@@ -25,6 +26,9 @@ TYPES="net_port port_net net6_port port_proto net6_port_mac net6_port_mac_proto
        net_port_net net_mac net_mac_icmp net6_mac_icmp net6_port_net6_port
        net_port_mac_proto_net"
 
+# Reported bugs, also described by TYPE_ variables below
+BUGS="flush_remove_add"
+
 # List of possible paths to pktgen script from kernel tree for performance tests
 PKTGEN_SCRIPT_PATHS="
 	../../../samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh
@@ -327,6 +331,12 @@ flood_spec	ip daddr . tcp dport . meta l4proto . ip saddr
 perf_duration	0
 "
 
+# Definition of tests for bugs reported in the past:
+# display	display text for test report
+TYPE_flush_remove_add="
+display		Add two elements, flush, re-add
+"
+
 # Set template for all tests, types and rules are filled in depending on test
 set_template='
 flush ruleset
@@ -440,6 +450,8 @@ setup_set() {
 
 # Check that at least one of the needed tools is available
 check_tools() {
+	[ -z "${tools}" ] && return 0
+
 	__tools=
 	for tool in ${tools}; do
 		if [ "${tool}" = "nc" ] && [ "${proto}" = "udp6" ] && \
@@ -1430,6 +1442,23 @@ test_performance() {
 	kill "${perf_pid}"
 }
 
+test_bug_flush_remove_add() {
+	set_cmd='{ set s { type ipv4_addr . inet_service; flags interval; }; }'
+	elem1='{ 10.0.0.1 . 22-25, 10.0.0.1 . 10-20 }'
+	elem2='{ 10.0.0.1 . 10-20, 10.0.0.1 . 22-25 }'
+	for i in `seq 1 100`; do
+		nft add table t ${set_cmd}	|| return ${KSELFTEST_SKIP}
+		nft add element t s ${elem1}	2>/dev/null || return 1
+		nft flush set t s		2>/dev/null || return 1
+		nft add element t s ${elem2}	2>/dev/null || return 1
+	done
+	nft flush ruleset
+}
+
+test_reported_issues() {
+	eval test_bug_"${subtest}"
+}
+
 # Run everything in a separate network namespace
 [ "${1}" != "run" ] && { unshare -n "${0}" run; exit $?; }
 tmp="$(mktemp)"
@@ -1438,9 +1467,15 @@ trap cleanup EXIT
 # Entry point for test runs
 passed=0
 for name in ${TESTS}; do
-	printf "TEST: %s\n" "${name}"
-	for type in ${TYPES}; do
-		eval desc=\$TYPE_"${type}"
+	printf "TEST: %s\n" "$(echo ${name} | tr '_' ' ')"
+	if [ "${name}" = "reported_issues" ]; then
+		SUBTESTS="${BUGS}"
+	else
+		SUBTESTS="${TYPES}"
+	fi
+
+	for subtest in ${SUBTESTS}; do
+		eval desc=\$TYPE_"${subtest}"
 		IFS='
 '
 		for __line in ${desc}; do
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-21  2:04 [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table Stefano Brivio
  2020-02-21  2:04 ` [PATCH nf 1/2] nft_set_pipapo: Actually fetch key data in nft_pipapo_remove() Stefano Brivio
  2020-02-21  2:04 ` [PATCH nf 2/2] selftests: nft_concat_range: Add test for reported add/flush/add issue Stefano Brivio
@ 2020-02-21 21:17 ` Phil Sutter
  2020-02-21 22:22   ` Stefano Brivio
  2020-02-26 13:33 ` Pablo Neira Ayuso
  3 siblings, 1 reply; 31+ messages in thread
From: Phil Sutter @ 2020-02-21 21:17 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Pablo Neira Ayuso, netfilter-devel

Hi Stefano,

On Fri, Feb 21, 2020 at 03:04:20AM +0100, Stefano Brivio wrote:
> Patch 1/2 fixes the issue recently reported by Phil on a sequence of
> add/flush/add operations, and patch 2/2 introduces a test case
> covering that.

This fixes my test case, thanks!

I found another problem, but it's maybe on user space side (and not a
crash this time ;):

| # nft add table t
| # nft add set t s '{ type inet_service . inet_service ; flags interval ; }
| # nft add element t s '{ 20-30 . 40, 25-35 . 40 }'
| # nft list ruleset
| table ip t {
| 	set s {
| 		type inet_service . inet_service
| 		flags interval
| 		elements = { 20-30 . 40 }
| 	}
| }

As you see, the second element disappears. It happens only if ranges
overlap and non-range parts are identical.

Looking at do_add_setelems(), set_to_intervals() should not be called
for concatenated ranges, although I *think* range merging happens only
there. So user space should cover for that already?! Still, it doesn't
work.

Cheers, Phil

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-21 21:17 ` [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table Phil Sutter
@ 2020-02-21 22:22   ` Stefano Brivio
  2020-02-22  1:19     ` Phil Sutter
  0 siblings, 1 reply; 31+ messages in thread
From: Stefano Brivio @ 2020-02-21 22:22 UTC (permalink / raw)
  To: Phil Sutter; +Cc: Pablo Neira Ayuso, netfilter-devel, Florian Westphal

Hi Phil,

On Fri, 21 Feb 2020 22:17:04 +0100
Phil Sutter <phil@nwl.cc> wrote:

> Hi Stefano,
> 
> On Fri, Feb 21, 2020 at 03:04:20AM +0100, Stefano Brivio wrote:
> > Patch 1/2 fixes the issue recently reported by Phil on a sequence of
> > add/flush/add operations, and patch 2/2 introduces a test case
> > covering that.  
> 
> This fixes my test case, thanks!
> 
> I found another problem, but it's maybe on user space side (and not a
> crash this time ;):
> 
> | # nft add table t
> | # nft add set t s '{ type inet_service . inet_service ; flags interval ; }
> | # nft add element t s '{ 20-30 . 40, 25-35 . 40 }'
> | # nft list ruleset
> | table ip t {
> | 	set s {
> | 		type inet_service . inet_service
> | 		flags interval
> | 		elements = { 20-30 . 40 }
> | 	}
> | }
> 
> As you see, the second element disappears. It happens only if ranges
> overlap and non-range parts are identical.
>
> Looking at do_add_setelems(), set_to_intervals() should not be called
> for concatenated ranges, although I *think* range merging happens only
> there. So user space should cover for that already?!

Yes. I didn't consider the need for this kind of specification, given
that you can obtain the same result by simply adding two elements:
separate, partially overlapping elements can be inserted (which is, if I
recall correctly, not the case for rbtree).

If I recall correctly, we had a short discussion with Florian about
this, but I don't remember the conclusion.

However, I see the ugliness, and how this breaks probably legitimate
expectations. I guess we could call set_to_intervals() in this case,
that function might need some minor adjustments.

An alternative, and I'm not sure which one is the most desirable, would
be to refuse that kind of insertion.

-- 
Stefano


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-21 22:22   ` Stefano Brivio
@ 2020-02-22  1:19     ` Phil Sutter
  2020-02-23 21:22       ` Stefano Brivio
  0 siblings, 1 reply; 31+ messages in thread
From: Phil Sutter @ 2020-02-22  1:19 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Pablo Neira Ayuso, netfilter-devel, Florian Westphal

Hi Stefano,

On Fri, Feb 21, 2020 at 11:22:18PM +0100, Stefano Brivio wrote:
> On Fri, 21 Feb 2020 22:17:04 +0100
> Phil Sutter <phil@nwl.cc> wrote:
> 
> > Hi Stefano,
> > 
> > On Fri, Feb 21, 2020 at 03:04:20AM +0100, Stefano Brivio wrote:
> > > Patch 1/2 fixes the issue recently reported by Phil on a sequence of
> > > add/flush/add operations, and patch 2/2 introduces a test case
> > > covering that.  
> > 
> > This fixes my test case, thanks!
> > 
> > I found another problem, but it's maybe on user space side (and not a
> > crash this time ;):
> > 
> > | # nft add table t
> > | # nft add set t s '{ type inet_service . inet_service ; flags interval ; }
> > | # nft add element t s '{ 20-30 . 40, 25-35 . 40 }'
> > | # nft list ruleset
> > | table ip t {
> > | 	set s {
> > | 		type inet_service . inet_service
> > | 		flags interval
> > | 		elements = { 20-30 . 40 }
> > | 	}
> > | }
> > 
> > As you see, the second element disappears. It happens only if ranges
> > overlap and non-range parts are identical.
> >
> > Looking at do_add_setelems(), set_to_intervals() should not be called
> > for concatenated ranges, although I *think* range merging happens only
> > there. So user space should cover for that already?!
> 
> Yes. I didn't consider the need for this kind of specification, given
> that you can obtain the same result by simply adding two elements:
> separate, partially overlapping elements can be inserted (which is, if I
> recall correctly, not the case for rbtree).
> 
> If I recall correctly, we had a short discussion with Florian about
> this, but I don't remember the conclusion.
> 
> However, I see the ugliness, and how this breaks probably legitimate
> expectations. I guess we could call set_to_intervals() in this case,
> that function might need some minor adjustments.
> 
> An alternative, and I'm not sure which one is the most desirable, would
> be to refuse that kind of insertion.

I don't think having concatenated ranges not merge even if possible is a
problem. It's just a "nice feature" with some controversial aspects.

The bug I'm reporting is that Above 'add element' command passes two
elements to nftables but only the first one makes it into the set. If
overlapping elements are fine in pipapo, they should both be there. If
not (or otherwise unwanted), we better error out instead of silently
dropping the second one.

Cheers, Phil

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-22  1:19     ` Phil Sutter
@ 2020-02-23 21:22       ` Stefano Brivio
  2020-02-25 12:39         ` Pablo Neira Ayuso
  0 siblings, 1 reply; 31+ messages in thread
From: Stefano Brivio @ 2020-02-23 21:22 UTC (permalink / raw)
  To: Phil Sutter, Pablo Neira Ayuso; +Cc: netfilter-devel, Florian Westphal

Hi Phil,

On Sat, 22 Feb 2020 02:19:33 +0100
Phil Sutter <phil@nwl.cc> wrote:

> Hi Stefano,
> 
> On Fri, Feb 21, 2020 at 11:22:18PM +0100, Stefano Brivio wrote:
> > On Fri, 21 Feb 2020 22:17:04 +0100
> > Phil Sutter <phil@nwl.cc> wrote:
> >   
> > > Hi Stefano,
> > > 
> > > On Fri, Feb 21, 2020 at 03:04:20AM +0100, Stefano Brivio wrote:  
> > > > Patch 1/2 fixes the issue recently reported by Phil on a sequence of
> > > > add/flush/add operations, and patch 2/2 introduces a test case
> > > > covering that.    
> > > 
> > > This fixes my test case, thanks!
> > > 
> > > I found another problem, but it's maybe on user space side (and not a
> > > crash this time ;):
> > > 
> > > | # nft add table t
> > > | # nft add set t s '{ type inet_service . inet_service ; flags interval ; }
> > > | # nft add element t s '{ 20-30 . 40, 25-35 . 40 }'
> > > | # nft list ruleset
> > > | table ip t {
> > > | 	set s {
> > > | 		type inet_service . inet_service
> > > | 		flags interval
> > > | 		elements = { 20-30 . 40 }
> > > | 	}
> > > | }
> > > 
> > > As you see, the second element disappears. It happens only if ranges
> > > overlap and non-range parts are identical.
> > >
> > > Looking at do_add_setelems(), set_to_intervals() should not be called
> > > for concatenated ranges, although I *think* range merging happens only
> > > there. So user space should cover for that already?!  
> > 
> > Yes. I didn't consider the need for this kind of specification, given
> > that you can obtain the same result by simply adding two elements:
> > separate, partially overlapping elements can be inserted (which is, if I
> > recall correctly, not the case for rbtree).
> > 
> > If I recall correctly, we had a short discussion with Florian about
> > this, but I don't remember the conclusion.
> > 
> > However, I see the ugliness, and how this breaks probably legitimate
> > expectations. I guess we could call set_to_intervals() in this case,
> > that function might need some minor adjustments.
> > 
> > An alternative, and I'm not sure which one is the most desirable, would
> > be to refuse that kind of insertion.  
> 
> I don't think having concatenated ranges not merge even if possible is a
> problem. It's just a "nice feature" with some controversial aspects.
> 
> The bug I'm reporting is that Above 'add element' command passes two
> elements to nftables but only the first one makes it into the set. If
> overlapping elements are fine in pipapo, they should both be there. If
> not (or otherwise unwanted), we better error out instead of silently
> dropping the second one.

Indeed, I agree there's a blatant bug, I was just wondering how to
solve it.

I found out from notes with an old discussion with Florian what the
problem really was: for concatenated ranges, we might have stuff like:

	'{ 20-30 . 40-50, 25-35 . 45-50 }'

And the only sane way to handle this is as two separate elements. Also
note that I gave a rather confusing description of the behaviour with
"partially overlapping elements": what can partially overlap are ranges
within one field, but there can't be an overlap (even partial) over the
whole concatenation, because that creates ambiguity. That is,

	'{ 20-30 . 40, 25-35 . 40 }'

the "second element" here is not allowed, while:

	'{ 20-30 . 40, 25-35 . 41 }'

these elements both are.

Now, this turns into a question for Pablo. I started digging a bit
further, and I think that both userspace and nft_pipapo_insert()
observe a reasonable behaviour here: nft passes those as two elements,
nft_pipapo_insert() rejects the second one with -EEXIST.

However, in nft_add_set_elem(), we hit this:

	err = set->ops->insert(ctx->net, set, &elem, &ext2);
	if (err) {
		if (err == -EEXIST) {
			if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) ^
			    nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) ||
			    nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) ^
			    nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF)) {
				err = -EBUSY;
				goto err_element_clash;
			}
			if ((nft_set_ext_exists(ext, NFT_SET_EXT_DATA) &&
			     nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) &&
			     memcmp(nft_set_ext_data(ext),
				    nft_set_ext_data(ext2), set->dlen) != 0) ||
			    (nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) &&
			     nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF) &&
			     *nft_set_ext_obj(ext) != *nft_set_ext_obj(ext2)))
				err = -EBUSY;
			else if (!(nlmsg_flags & NLM_F_EXCL))
				err = 0;
		}
		goto err_element_clash;
	}

and, in particular, as there's no "data" or "objref" extension
associated with the elements, we hit the:

	if (!(nlmsg_flags & NLM_F_EXCL))

clause, introduced by commit c016c7e45ddf ("netfilter: nf_tables: honor
NLM_F_EXCL flag in set element insertion"). The error is reset, and we
return success, but the set back-end indicated a problem.

Now, if NLM_F_EXCL is not passed and the entry the user wants to add is
already present, even though I'd give RFC 3549 a different
interpretation (we won't replace the entry, but we should still report
the error IMHO), I see why we might return success in this case.

The additional problem with concatenated ranges here is that the entry
might be conflicting (overlapping over the whole concatenation), but
not be the same as the user wants to insert. I think -EEXIST is the
code that still fits best in this case, so... do you see a better
alternative than changing nft_pipapo_insert() to return, say, -EINVAL?

-- 
Stefano


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-23 21:22       ` Stefano Brivio
@ 2020-02-25 12:39         ` Pablo Neira Ayuso
  2020-02-25 12:45           ` Stefano Brivio
  2020-02-25 13:13           ` Stefano Brivio
  0 siblings, 2 replies; 31+ messages in thread
From: Pablo Neira Ayuso @ 2020-02-25 12:39 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Phil Sutter, netfilter-devel, Florian Westphal

Hi,

On Sun, Feb 23, 2020 at 10:22:58PM +0100, Stefano Brivio wrote:
> Hi Phil,
> 
> On Sat, 22 Feb 2020 02:19:33 +0100
> Phil Sutter <phil@nwl.cc> wrote:
> 
> > Hi Stefano,
> > 
> > On Fri, Feb 21, 2020 at 11:22:18PM +0100, Stefano Brivio wrote:
> > > On Fri, 21 Feb 2020 22:17:04 +0100
> > > Phil Sutter <phil@nwl.cc> wrote:
> > >   
> > > > Hi Stefano,
> > > > 
> > > > On Fri, Feb 21, 2020 at 03:04:20AM +0100, Stefano Brivio wrote:  
> > > > > Patch 1/2 fixes the issue recently reported by Phil on a sequence of
> > > > > add/flush/add operations, and patch 2/2 introduces a test case
> > > > > covering that.    
> > > > 
> > > > This fixes my test case, thanks!
> > > > 
> > > > I found another problem, but it's maybe on user space side (and not a
> > > > crash this time ;):
> > > > 
> > > > | # nft add table t
> > > > | # nft add set t s '{ type inet_service . inet_service ; flags interval ; }
> > > > | # nft add element t s '{ 20-30 . 40, 25-35 . 40 }'
> > > > | # nft list ruleset
> > > > | table ip t {
> > > > | 	set s {
> > > > | 		type inet_service . inet_service
> > > > | 		flags interval
> > > > | 		elements = { 20-30 . 40 }
> > > > | 	}
> > > > | }
> > > > 
> > > > As you see, the second element disappears. It happens only if ranges
> > > > overlap and non-range parts are identical.
> > > >
> > > > Looking at do_add_setelems(), set_to_intervals() should not be called
> > > > for concatenated ranges, although I *think* range merging happens only
> > > > there. So user space should cover for that already?!  
> > > 
> > > Yes. I didn't consider the need for this kind of specification, given
> > > that you can obtain the same result by simply adding two elements:
> > > separate, partially overlapping elements can be inserted (which is, if I
> > > recall correctly, not the case for rbtree).
> > > 
> > > If I recall correctly, we had a short discussion with Florian about
> > > this, but I don't remember the conclusion.
> > > 
> > > However, I see the ugliness, and how this breaks probably legitimate
> > > expectations. I guess we could call set_to_intervals() in this case,
> > > that function might need some minor adjustments.
> > > 
> > > An alternative, and I'm not sure which one is the most desirable, would
> > > be to refuse that kind of insertion.  
> > 
> > I don't think having concatenated ranges not merge even if possible is a
> > problem. It's just a "nice feature" with some controversial aspects.
> > 
> > The bug I'm reporting is that Above 'add element' command passes two
> > elements to nftables but only the first one makes it into the set. If
> > overlapping elements are fine in pipapo, they should both be there. If
> > not (or otherwise unwanted), we better error out instead of silently
> > dropping the second one.
> 
> Indeed, I agree there's a blatant bug, I was just wondering how to
> solve it.

With hashtable and bitmap, adding an element that already exists is
fine:

nft add element x y { 1.1.1.1 }
nft add element x y { 22 }
nft add element x z { 1.1.1.1-2.2.2.2 }
nft add element x k { 1.1.1.1-2.2.2.2 . 3.3.3.3 }

then, through 'create':

nft create element x y { 1.1.1.1 }
nft create element x y { 22 }
nft create element x z { 1.1.1.1-2.2.2.2 }
nft create element x k { 1.1.1.1-2.2.2.2 . 3.3.3.3 }

these hit EEXIST, all good.

In pipapo, the following is silently ignored:

nft add element x k { 1.1.1.1-2.2.2.3 . 3.3.3.3 }
                                    ^
(just added a slightly large range). I tried _without_ these patches.

In rbtree, if you try to add an interval that already exists:

# nft add element x z { 1.1.0.0-1.1.2.254 }
Error: interval overlaps with an existing one
add element x z { 1.1.0.0-1.1.2.254 }
                  ^^^^^^^^^^^^^^^^^
Error: Could not process rule: File exists
add element x z { 1.1.0.0-1.1.2.254 }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This complains as an overlap.

I think it's better to not extend userspace to dump the element cache
for add/create, this slows down things for incremental updates
(there's a ticket on bugzilla regarding this problem on the rbtree
IIRC). Better if pipapo can handle this from the kernel.

There is a automerge feature in the rbtree userspace that has been
controversial. This was initially turned on by default, users were
reporting that this was confusing. We can probably introduce a kernel
flag to turn on this automerge feature so pipapo knows user is asking
to not bail out with EEXIST, instead just merge? Not sure how hard is
to implement merging.

> I found out from notes with an old discussion with Florian what the
> problem really was: for concatenated ranges, we might have stuff like:
> 
> 	'{ 20-30 . 40-50, 25-35 . 45-50 }'

I think the second element should hit EEXIST.

> And the only sane way to handle this is as two separate elements. Also
> note that I gave a rather confusing description of the behaviour with
> "partially overlapping elements": what can partially overlap are ranges
> within one field, but there can't be an overlap (even partial) over the
> whole concatenation, because that creates ambiguity. That is,
> 
> 	'{ 20-30 . 40, 25-35 . 40 }'
> 
> the "second element" here is not allowed, while:
> 
> 	'{ 20-30 . 40, 25-35 . 41 }'
> 
> these elements both are.
> 
> Now, this turns into a question for Pablo. I started digging a bit
> further, and I think that both userspace and nft_pipapo_insert()
> observe a reasonable behaviour here: nft passes those as two elements,
> nft_pipapo_insert() rejects the second one with -EEXIST.
> 
> However, in nft_add_set_elem(), we hit this:
> 
> 	err = set->ops->insert(ctx->net, set, &elem, &ext2);
> 	if (err) {
> 		if (err == -EEXIST) {
> 			if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) ^
> 			    nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) ||
> 			    nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) ^
> 			    nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF)) {
> 				err = -EBUSY;
> 				goto err_element_clash;
> 			}
> 			if ((nft_set_ext_exists(ext, NFT_SET_EXT_DATA) &&
> 			     nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) &&
> 			     memcmp(nft_set_ext_data(ext),
> 				    nft_set_ext_data(ext2), set->dlen) != 0) ||
> 			    (nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) &&
> 			     nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF) &&
> 			     *nft_set_ext_obj(ext) != *nft_set_ext_obj(ext2)))
> 				err = -EBUSY;
> 			else if (!(nlmsg_flags & NLM_F_EXCL))
> 				err = 0;
> 		}
> 		goto err_element_clash;
> 	}
> 
> and, in particular, as there's no "data" or "objref" extension
> associated with the elements, we hit the:
> 
> 	if (!(nlmsg_flags & NLM_F_EXCL))
> 
> clause, introduced by commit c016c7e45ddf ("netfilter: nf_tables: honor
> NLM_F_EXCL flag in set element insertion"). The error is reset, and we
> return success, but the set back-end indicated a problem.
> 
> Now, if NLM_F_EXCL is not passed and the entry the user wants to add is
> already present, even though I'd give RFC 3549 a different
> interpretation (we won't replace the entry, but we should still report
> the error IMHO), I see why we might return success in this case.
>
> The additional problem with concatenated ranges here is that the entry
> might be conflicting (overlapping over the whole concatenation), but
> not be the same as the user wants to insert. I think -EEXIST is the
> code that still fits best in this case, so... do you see a better
> alternative than changing nft_pipapo_insert() to return, say, -EINVAL?

Please, no -EINVAL, it's very overloaded and I think we should only
use this for missing netlink attributes / malformed netlink message.

If I understood your reasoning, I agree -EEXIST for an overlapping
element is fine, even if NLM_F_EXCL is not set.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-25 12:39         ` Pablo Neira Ayuso
@ 2020-02-25 12:45           ` Stefano Brivio
  2020-02-25 13:13           ` Stefano Brivio
  1 sibling, 0 replies; 31+ messages in thread
From: Stefano Brivio @ 2020-02-25 12:45 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Phil Sutter, netfilter-devel, Florian Westphal

On Tue, 25 Feb 2020 13:39:34 +0100
Pablo Neira Ayuso <pablo@netfilter.org> wrote:

> (just added a slightly large range). I tried _without_ these patches.

As a side note (I'll answer later), just for clarity: these patches
have nothing to do with this problem.

-- 
Stefano


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-25 12:39         ` Pablo Neira Ayuso
  2020-02-25 12:45           ` Stefano Brivio
@ 2020-02-25 13:13           ` Stefano Brivio
  2020-02-25 13:42             ` Pablo Neira Ayuso
  1 sibling, 1 reply; 31+ messages in thread
From: Stefano Brivio @ 2020-02-25 13:13 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Phil Sutter, netfilter-devel, Florian Westphal

On Tue, 25 Feb 2020 13:39:34 +0100
Pablo Neira Ayuso <pablo@netfilter.org> wrote:

> Hi,
> 
> On Sun, Feb 23, 2020 at 10:22:58PM +0100, Stefano Brivio wrote:
> > Hi Phil,
> > 
> > On Sat, 22 Feb 2020 02:19:33 +0100
> > Phil Sutter <phil@nwl.cc> wrote:
> >   
> > > Hi Stefano,
> > > 
> > > On Fri, Feb 21, 2020 at 11:22:18PM +0100, Stefano Brivio wrote:  
> > > > On Fri, 21 Feb 2020 22:17:04 +0100
> > > > Phil Sutter <phil@nwl.cc> wrote:
> > > >     
> > > > > Hi Stefano,
> > > > > 
> > > > > On Fri, Feb 21, 2020 at 03:04:20AM +0100, Stefano Brivio wrote:    
> > > > > > Patch 1/2 fixes the issue recently reported by Phil on a sequence of
> > > > > > add/flush/add operations, and patch 2/2 introduces a test case
> > > > > > covering that.      
> > > > > 
> > > > > This fixes my test case, thanks!
> > > > > 
> > > > > I found another problem, but it's maybe on user space side (and not a
> > > > > crash this time ;):
> > > > > 
> > > > > | # nft add table t
> > > > > | # nft add set t s '{ type inet_service . inet_service ; flags interval ; }
> > > > > | # nft add element t s '{ 20-30 . 40, 25-35 . 40 }'
> > > > > | # nft list ruleset
> > > > > | table ip t {
> > > > > | 	set s {
> > > > > | 		type inet_service . inet_service
> > > > > | 		flags interval
> > > > > | 		elements = { 20-30 . 40 }
> > > > > | 	}
> > > > > | }
> > > > > 
> > > > > As you see, the second element disappears. It happens only if ranges
> > > > > overlap and non-range parts are identical.
> > > > >
> > > > > Looking at do_add_setelems(), set_to_intervals() should not be called
> > > > > for concatenated ranges, although I *think* range merging happens only
> > > > > there. So user space should cover for that already?!    
> > > > 
> > > > Yes. I didn't consider the need for this kind of specification, given
> > > > that you can obtain the same result by simply adding two elements:
> > > > separate, partially overlapping elements can be inserted (which is, if I
> > > > recall correctly, not the case for rbtree).
> > > > 
> > > > If I recall correctly, we had a short discussion with Florian about
> > > > this, but I don't remember the conclusion.
> > > > 
> > > > However, I see the ugliness, and how this breaks probably legitimate
> > > > expectations. I guess we could call set_to_intervals() in this case,
> > > > that function might need some minor adjustments.
> > > > 
> > > > An alternative, and I'm not sure which one is the most desirable, would
> > > > be to refuse that kind of insertion.    
> > > 
> > > I don't think having concatenated ranges not merge even if possible is a
> > > problem. It's just a "nice feature" with some controversial aspects.
> > > 
> > > The bug I'm reporting is that Above 'add element' command passes two
> > > elements to nftables but only the first one makes it into the set. If
> > > overlapping elements are fine in pipapo, they should both be there. If
> > > not (or otherwise unwanted), we better error out instead of silently
> > > dropping the second one.  
> > 
> > Indeed, I agree there's a blatant bug, I was just wondering how to
> > solve it.  
> 
> With hashtable and bitmap, adding an element that already exists is
> fine:
> 
> nft add element x y { 1.1.1.1 }
> nft add element x y { 22 }
> nft add element x z { 1.1.1.1-2.2.2.2 }
> nft add element x k { 1.1.1.1-2.2.2.2 . 3.3.3.3 }
> 
> then, through 'create':
> 
> nft create element x y { 1.1.1.1 }
> nft create element x y { 22 }
> nft create element x z { 1.1.1.1-2.2.2.2 }
> nft create element x k { 1.1.1.1-2.2.2.2 . 3.3.3.3 }
> 
> these hit EEXIST, all good.
> 
> In pipapo, the following is silently ignored:
> 
> nft add element x k { 1.1.1.1-2.2.2.3 . 3.3.3.3 }
>                                     ^
> (just added a slightly large range). I tried _without_ these patches.

I suppose you're doing something like:

nft add table x
nft add set x k '{ type ipv4_addr . ipv4_addr ; flags interval ; }'
nft add element x k { 1.1.1.1-2.2.2.2 . 3.3.3.3 }
nft add element x k { 1.1.1.1-2.2.2.3 . 3.3.3.3 }

in which case, yes, this is exactly the problem reported by Phil and my
point below: nft_pipapo_insert() reports -EEXIST on the second element,
but it's cleared by nft_add_set_elem().

> In rbtree, if you try to add an interval that already exists:
> 
> # nft add element x z { 1.1.0.0-1.1.2.254 }
> Error: interval overlaps with an existing one
> add element x z { 1.1.0.0-1.1.2.254 }
>                   ^^^^^^^^^^^^^^^^^
> Error: Could not process rule: File exists
> add element x z { 1.1.0.0-1.1.2.254 }
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> This complains as an overlap.

It doesn't for me:

# nft add table x
# nft add set x z '{ type ipv4_addr ; flags interval ; }'
# nft add element x z { 1.1.0.0-1.1.2.254 }
# nft add element x z { 1.1.0.0-1.1.2.254 }
# echo $?
0

that is, -EEXIST from nft_rbtree_insert() is also cleared by
nft_add_set_elem(). Am I trying it in a different way?

> I think it's better to not extend userspace to dump the element cache
> for add/create, this slows down things for incremental updates
> (there's a ticket on bugzilla regarding this problem on the rbtree
> IIRC). Better if pipapo can handle this from the kernel.

It does, the overlapping (entire or partial) is detected and
nft_pipapo_insert() reports -EEXIST.

> There is a automerge feature in the rbtree userspace that has been
> controversial. This was initially turned on by default, users were
> reporting that this was confusing. We can probably introduce a kernel
> flag to turn on this automerge feature so pipapo knows user is asking
> to not bail out with EEXIST, instead just merge? Not sure how hard is
> to implement merging.

It's doable, the problem is that with multiple ranges as different
fields of single elements it might be ambiguous for which fields
merging should be attempted.

That is, in the '{ 20-30 . 40-50, 25-35 . 45-50 }' case below, should I
attempt merging 0, 1 or 2 fields? I think the 'automerge' feature would
be even more confusing here, assuming it can be defined at all.

> > I found out from notes with an old discussion with Florian what the
> > problem really was: for concatenated ranges, we might have stuff like:
> > 
> > 	'{ 20-30 . 40-50, 25-35 . 45-50 }'  
> 
> I think the second element should hit EEXIST.

Yes, I also think so, and nft_pipapo_insert() reports that.

> > And the only sane way to handle this is as two separate elements. Also
> > note that I gave a rather confusing description of the behaviour with
> > "partially overlapping elements": what can partially overlap are ranges
> > within one field, but there can't be an overlap (even partial) over the
> > whole concatenation, because that creates ambiguity. That is,
> > 
> > 	'{ 20-30 . 40, 25-35 . 40 }'
> > 
> > the "second element" here is not allowed, while:
> > 
> > 	'{ 20-30 . 40, 25-35 . 41 }'
> > 
> > these elements both are.
> > 
> > Now, this turns into a question for Pablo. I started digging a bit
> > further, and I think that both userspace and nft_pipapo_insert()
> > observe a reasonable behaviour here: nft passes those as two elements,
> > nft_pipapo_insert() rejects the second one with -EEXIST.
> > 
> > However, in nft_add_set_elem(), we hit this:
> > 
> > 	err = set->ops->insert(ctx->net, set, &elem, &ext2);
> > 	if (err) {
> > 		if (err == -EEXIST) {
> > 			if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) ^
> > 			    nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) ||
> > 			    nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) ^
> > 			    nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF)) {
> > 				err = -EBUSY;
> > 				goto err_element_clash;
> > 			}
> > 			if ((nft_set_ext_exists(ext, NFT_SET_EXT_DATA) &&
> > 			     nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) &&
> > 			     memcmp(nft_set_ext_data(ext),
> > 				    nft_set_ext_data(ext2), set->dlen) != 0) ||
> > 			    (nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) &&
> > 			     nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF) &&
> > 			     *nft_set_ext_obj(ext) != *nft_set_ext_obj(ext2)))
> > 				err = -EBUSY;
> > 			else if (!(nlmsg_flags & NLM_F_EXCL))
> > 				err = 0;
> > 		}
> > 		goto err_element_clash;
> > 	}
> > 
> > and, in particular, as there's no "data" or "objref" extension
> > associated with the elements, we hit the:
> > 
> > 	if (!(nlmsg_flags & NLM_F_EXCL))
> > 
> > clause, introduced by commit c016c7e45ddf ("netfilter: nf_tables: honor
> > NLM_F_EXCL flag in set element insertion"). The error is reset, and we
> > return success, but the set back-end indicated a problem.
> > 
> > Now, if NLM_F_EXCL is not passed and the entry the user wants to add is
> > already present, even though I'd give RFC 3549 a different
> > interpretation (we won't replace the entry, but we should still report
> > the error IMHO), I see why we might return success in this case.
> >
> > The additional problem with concatenated ranges here is that the entry
> > might be conflicting (overlapping over the whole concatenation), but
> > not be the same as the user wants to insert. I think -EEXIST is the
> > code that still fits best in this case, so... do you see a better
> > alternative than changing nft_pipapo_insert() to return, say, -EINVAL?  
> 
> Please, no -EINVAL, it's very overloaded and I think we should only
> use this for missing netlink attributes / malformed netlink message.
> 
> If I understood your reasoning, I agree -EEXIST for an overlapping
> element is fine, even if NLM_F_EXCL is not set.

So you're suggesting that I change nft_add_set_elem() like this:

@@ -5075,8 +5075,6 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
                             nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF) &&
                             *nft_set_ext_obj(ext) != *nft_set_ext_obj(ext2)))
                                err = -EBUSY;
-                       else if (!(nlmsg_flags & NLM_F_EXCL))
-                               err = 0;
                }
                goto err_element_clash;
        }

?

-- 
Stefano



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-25 13:13           ` Stefano Brivio
@ 2020-02-25 13:42             ` Pablo Neira Ayuso
  2020-02-25 14:34               ` Stefano Brivio
  0 siblings, 1 reply; 31+ messages in thread
From: Pablo Neira Ayuso @ 2020-02-25 13:42 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Phil Sutter, netfilter-devel, Florian Westphal

On Tue, Feb 25, 2020 at 02:13:46PM +0100, Stefano Brivio wrote:
> On Tue, 25 Feb 2020 13:39:34 +0100
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> 
> > Hi,
> > 
> > On Sun, Feb 23, 2020 at 10:22:58PM +0100, Stefano Brivio wrote:
> > > Hi Phil,
> > > 
> > > On Sat, 22 Feb 2020 02:19:33 +0100
> > > Phil Sutter <phil@nwl.cc> wrote:
> > >   
> > > > Hi Stefano,
> > > > 
> > > > On Fri, Feb 21, 2020 at 11:22:18PM +0100, Stefano Brivio wrote:  
> > > > > On Fri, 21 Feb 2020 22:17:04 +0100
> > > > > Phil Sutter <phil@nwl.cc> wrote:
> > > > >     
> > > > > > Hi Stefano,
> > > > > > 
> > > > > > On Fri, Feb 21, 2020 at 03:04:20AM +0100, Stefano Brivio wrote:    
> > > > > > > Patch 1/2 fixes the issue recently reported by Phil on a sequence of
> > > > > > > add/flush/add operations, and patch 2/2 introduces a test case
> > > > > > > covering that.      
> > > > > > 
> > > > > > This fixes my test case, thanks!
> > > > > > 
> > > > > > I found another problem, but it's maybe on user space side (and not a
> > > > > > crash this time ;):
> > > > > > 
> > > > > > | # nft add table t
> > > > > > | # nft add set t s '{ type inet_service . inet_service ; flags interval ; }
> > > > > > | # nft add element t s '{ 20-30 . 40, 25-35 . 40 }'
> > > > > > | # nft list ruleset
> > > > > > | table ip t {
> > > > > > | 	set s {
> > > > > > | 		type inet_service . inet_service
> > > > > > | 		flags interval
> > > > > > | 		elements = { 20-30 . 40 }
> > > > > > | 	}
> > > > > > | }
> > > > > > 
> > > > > > As you see, the second element disappears. It happens only if ranges
> > > > > > overlap and non-range parts are identical.
> > > > > >
> > > > > > Looking at do_add_setelems(), set_to_intervals() should not be called
> > > > > > for concatenated ranges, although I *think* range merging happens only
> > > > > > there. So user space should cover for that already?!    
> > > > > 
> > > > > Yes. I didn't consider the need for this kind of specification, given
> > > > > that you can obtain the same result by simply adding two elements:
> > > > > separate, partially overlapping elements can be inserted (which is, if I
> > > > > recall correctly, not the case for rbtree).
> > > > > 
> > > > > If I recall correctly, we had a short discussion with Florian about
> > > > > this, but I don't remember the conclusion.
> > > > > 
> > > > > However, I see the ugliness, and how this breaks probably legitimate
> > > > > expectations. I guess we could call set_to_intervals() in this case,
> > > > > that function might need some minor adjustments.
> > > > > 
> > > > > An alternative, and I'm not sure which one is the most desirable, would
> > > > > be to refuse that kind of insertion.    
> > > > 
> > > > I don't think having concatenated ranges not merge even if possible is a
> > > > problem. It's just a "nice feature" with some controversial aspects.
> > > > 
> > > > The bug I'm reporting is that Above 'add element' command passes two
> > > > elements to nftables but only the first one makes it into the set. If
> > > > overlapping elements are fine in pipapo, they should both be there. If
> > > > not (or otherwise unwanted), we better error out instead of silently
> > > > dropping the second one.  
> > > 
> > > Indeed, I agree there's a blatant bug, I was just wondering how to
> > > solve it.  
> > 
> > With hashtable and bitmap, adding an element that already exists is
> > fine:
> > 
> > nft add element x y { 1.1.1.1 }
> > nft add element x y { 22 }
> > nft add element x z { 1.1.1.1-2.2.2.2 }
> > nft add element x k { 1.1.1.1-2.2.2.2 . 3.3.3.3 }
> > 
> > then, through 'create':
> > 
> > nft create element x y { 1.1.1.1 }
> > nft create element x y { 22 }
> > nft create element x z { 1.1.1.1-2.2.2.2 }
> > nft create element x k { 1.1.1.1-2.2.2.2 . 3.3.3.3 }
> > 
> > these hit EEXIST, all good.
> > 
> > In pipapo, the following is silently ignored:
> > 
> > nft add element x k { 1.1.1.1-2.2.2.3 . 3.3.3.3 }
> >                                     ^
> > (just added a slightly large range). I tried _without_ these patches.
> 
> I suppose you're doing something like:
> 
> nft add table x
> nft add set x k '{ type ipv4_addr . ipv4_addr ; flags interval ; }'
> nft add element x k { 1.1.1.1-2.2.2.2 . 3.3.3.3 }
> nft add element x k { 1.1.1.1-2.2.2.3 . 3.3.3.3 }
> 
> in which case, yes, this is exactly the problem reported by Phil and my
> point below: nft_pipapo_insert() reports -EEXIST on the second element,
> but it's cleared by nft_add_set_elem().
> 
> > In rbtree, if you try to add an interval that already exists:
> > 
> > # nft add element x z { 1.1.0.0-1.1.2.254 }
> > Error: interval overlaps with an existing one
> > add element x z { 1.1.0.0-1.1.2.254 }
> >                   ^^^^^^^^^^^^^^^^^
> > Error: Could not process rule: File exists
> > add element x z { 1.1.0.0-1.1.2.254 }
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > 
> > This complains as an overlap.
> 
> It doesn't for me:

I'm assuming there's already:

nft create element x z { 1.1.1.1-2.2.2.2 }

to trigger the EEXIST, see below.

> # nft add table x
> # nft add set x z '{ type ipv4_addr ; flags interval ; }'
> # nft add element x z { 1.1.0.0-1.1.2.254 }
> # nft add element x z { 1.1.0.0-1.1.2.254 }
> # echo $?
> 0
> 
> that is, -EEXIST from nft_rbtree_insert() is also cleared by
> nft_add_set_elem(). Am I trying it in a different way?

No, this is the same behaviour I see here:

# nft add element x z { 1.1.0.0-1.1.2.254 }
# nft add element x z { 1.1.0.0-1.1.2.254 }
# echo $?
0

This is fine. If you use 'create' instead, the second command gets
EEXIST.

But:

# nft add element x z { 1.1.0.0-1.1.2.254 }
# nft add element x z { 1.1.0.0-1.1.2.1 }

Then, the second command hits EEXIST because of the overlap.

> > I think it's better to not extend userspace to dump the element cache
> > for add/create, this slows down things for incremental updates
> > (there's a ticket on bugzilla regarding this problem on the rbtree
> > IIRC). Better if pipapo can handle this from the kernel.
> 
> It does, the overlapping (entire or partial) is detected and
> nft_pipapo_insert() reports -EEXIST.

OK.

> > There is a automerge feature in the rbtree userspace that has been
> > controversial. This was initially turned on by default, users were
> > reporting that this was confusing. We can probably introduce a kernel
> > flag to turn on this automerge feature so pipapo knows user is asking
> > to not bail out with EEXIST, instead just merge? Not sure how hard is
> > to implement merging.
> 
> It's doable, the problem is that with multiple ranges as different
> fields of single elements it might be ambiguous for which fields
> merging should be attempted.
> 
> That is, in the '{ 20-30 . 40-50, 25-35 . 45-50 }' case below, should I
> attempt merging 0, 1 or 2 fields? I think the 'automerge' feature would
> be even more confusing here, assuming it can be defined at all.

I'm not very fond of the automerge feature either. Some people have
been asking for this though. I still don't understand the usecase.

> > > I found out from notes with an old discussion with Florian what the
> > > problem really was: for concatenated ranges, we might have stuff like:
> > > 
> > > 	'{ 20-30 . 40-50, 25-35 . 45-50 }'  
> > 
> > I think the second element should hit EEXIST.
> 
> Yes, I also think so, and nft_pipapo_insert() reports that.
> 
> > > And the only sane way to handle this is as two separate elements. Also
> > > note that I gave a rather confusing description of the behaviour with
> > > "partially overlapping elements": what can partially overlap are ranges
> > > within one field, but there can't be an overlap (even partial) over the
> > > whole concatenation, because that creates ambiguity. That is,
> > > 
> > > 	'{ 20-30 . 40, 25-35 . 40 }'
> > > 
> > > the "second element" here is not allowed, while:
> > > 
> > > 	'{ 20-30 . 40, 25-35 . 41 }'
> > > 
> > > these elements both are.
> > > 
> > > Now, this turns into a question for Pablo. I started digging a bit
> > > further, and I think that both userspace and nft_pipapo_insert()
> > > observe a reasonable behaviour here: nft passes those as two elements,
> > > nft_pipapo_insert() rejects the second one with -EEXIST.
> > > 
> > > However, in nft_add_set_elem(), we hit this:
> > > 
> > > 	err = set->ops->insert(ctx->net, set, &elem, &ext2);
> > > 	if (err) {
> > > 		if (err == -EEXIST) {
> > > 			if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) ^
> > > 			    nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) ||
> > > 			    nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) ^
> > > 			    nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF)) {
> > > 				err = -EBUSY;
> > > 				goto err_element_clash;
> > > 			}
> > > 			if ((nft_set_ext_exists(ext, NFT_SET_EXT_DATA) &&
> > > 			     nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) &&
> > > 			     memcmp(nft_set_ext_data(ext),
> > > 				    nft_set_ext_data(ext2), set->dlen) != 0) ||
> > > 			    (nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) &&
> > > 			     nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF) &&
> > > 			     *nft_set_ext_obj(ext) != *nft_set_ext_obj(ext2)))
> > > 				err = -EBUSY;
> > > 			else if (!(nlmsg_flags & NLM_F_EXCL))
> > > 				err = 0;
> > > 		}
> > > 		goto err_element_clash;
> > > 	}
> > > 
> > > and, in particular, as there's no "data" or "objref" extension
> > > associated with the elements, we hit the:
> > > 
> > > 	if (!(nlmsg_flags & NLM_F_EXCL))
> > > 
> > > clause, introduced by commit c016c7e45ddf ("netfilter: nf_tables: honor
> > > NLM_F_EXCL flag in set element insertion"). The error is reset, and we
> > > return success, but the set back-end indicated a problem.
> > > 
> > > Now, if NLM_F_EXCL is not passed and the entry the user wants to add is
> > > already present, even though I'd give RFC 3549 a different
> > > interpretation (we won't replace the entry, but we should still report
> > > the error IMHO), I see why we might return success in this case.
> > >
> > > The additional problem with concatenated ranges here is that the entry
> > > might be conflicting (overlapping over the whole concatenation), but
> > > not be the same as the user wants to insert. I think -EEXIST is the
> > > code that still fits best in this case, so... do you see a better
> > > alternative than changing nft_pipapo_insert() to return, say, -EINVAL?  
> > 
> > Please, no -EINVAL, it's very overloaded and I think we should only
> > use this for missing netlink attributes / malformed netlink message.
> > 
> > If I understood your reasoning, I agree -EEXIST for an overlapping
> > element is fine, even if NLM_F_EXCL is not set.
> 
> So you're suggesting that I change nft_add_set_elem() like this:
> 
> @@ -5075,8 +5075,6 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
>                              nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF) &&
>                              *nft_set_ext_obj(ext) != *nft_set_ext_obj(ext2)))
>                                 err = -EBUSY;
> -                       else if (!(nlmsg_flags & NLM_F_EXCL))
> -                               err = 0;

IIRC, this is disabling the 'nft create element' behaviour. NLM_F_EXCL
is not set for 'nft add element commands'.

Are you observing any inconsistency? I still don't see where the
problem is.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-25 13:42             ` Pablo Neira Ayuso
@ 2020-02-25 14:34               ` Stefano Brivio
  2020-02-25 18:48                 ` Phil Sutter
  2020-02-25 20:21                 ` Pablo Neira Ayuso
  0 siblings, 2 replies; 31+ messages in thread
From: Stefano Brivio @ 2020-02-25 14:34 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Phil Sutter, netfilter-devel, Florian Westphal

On Tue, 25 Feb 2020 14:42:36 +0100
Pablo Neira Ayuso <pablo@netfilter.org> wrote:

> On Tue, Feb 25, 2020 at 02:13:46PM +0100, Stefano Brivio wrote:
> > On Tue, 25 Feb 2020 13:39:34 +0100
> > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> >   
> > > Hi,
> > > 
> > > On Sun, Feb 23, 2020 at 10:22:58PM +0100, Stefano Brivio wrote:  
> > > > Hi Phil,
> > > > 
> > > > On Sat, 22 Feb 2020 02:19:33 +0100
> > > > Phil Sutter <phil@nwl.cc> wrote:
> > > >     
> > > > > Hi Stefano,
> > > > > 
> > > > > On Fri, Feb 21, 2020 at 11:22:18PM +0100, Stefano Brivio wrote:    
> > > > > > On Fri, 21 Feb 2020 22:17:04 +0100
> > > > > > Phil Sutter <phil@nwl.cc> wrote:
> > > > > >       
> > > > > > > I found another problem, but it's maybe on user space side (and not a
> > > > > > > crash this time ;):
> > > > > > > 
> > > > > > > | # nft add table t
> > > > > > > | # nft add set t s '{ type inet_service . inet_service ; flags interval ; }
> > > > > > > | # nft add element t s '{ 20-30 . 40, 25-35 . 40 }'
> > > > > > > | # nft list ruleset
> > > > > > > | table ip t {
> > > > > > > | 	set s {
> > > > > > > | 		type inet_service . inet_service
> > > > > > > | 		flags interval
> > > > > > > | 		elements = { 20-30 . 40 }
> > > > > > > | 	}
> > > > > > > | }
> > > > > > > 
> > > > > > > As you see, the second element disappears. It happens only if ranges
> > > > > > > overlap and non-range parts are identical.
> > > > > > >
> > > > > > > Looking at do_add_setelems(), set_to_intervals() should not be called
> > > > > > > for concatenated ranges, although I *think* range merging happens only
> > > > > > > there. So user space should cover for that already?!      
> > > > > > 
> > > > > > Yes. I didn't consider the need for this kind of specification, given
> > > > > > that you can obtain the same result by simply adding two elements:
> > > > > > separate, partially overlapping elements can be inserted (which is, if I
> > > > > > recall correctly, not the case for rbtree).
> > > > > > 
> > > > > > If I recall correctly, we had a short discussion with Florian about
> > > > > > this, but I don't remember the conclusion.
> > > > > > 
> > > > > > However, I see the ugliness, and how this breaks probably legitimate
> > > > > > expectations. I guess we could call set_to_intervals() in this case,
> > > > > > that function might need some minor adjustments.
> > > > > > 
> > > > > > An alternative, and I'm not sure which one is the most desirable, would
> > > > > > be to refuse that kind of insertion.      
> > > > > 
> > > > > I don't think having concatenated ranges not merge even if possible is a
> > > > > problem. It's just a "nice feature" with some controversial aspects.
> > > > > 
> > > > > The bug I'm reporting is that Above 'add element' command passes two
> > > > > elements to nftables but only the first one makes it into the set. If
> > > > > overlapping elements are fine in pipapo, they should both be there. If
> > > > > not (or otherwise unwanted), we better error out instead of silently
> > > > > dropping the second one.    
> > > > 
> > > > Indeed, I agree there's a blatant bug, I was just wondering how to
> > > > solve it.    
> > > 
> > > With hashtable and bitmap, adding an element that already exists is
> > > fine:
> > > 
> > > nft add element x y { 1.1.1.1 }
> > > nft add element x y { 22 }
> > > nft add element x z { 1.1.1.1-2.2.2.2 }
> > > nft add element x k { 1.1.1.1-2.2.2.2 . 3.3.3.3 }
> > > 
> > > then, through 'create':
> > > 
> > > nft create element x y { 1.1.1.1 }
> > > nft create element x y { 22 }
> > > nft create element x z { 1.1.1.1-2.2.2.2 }
> > > nft create element x k { 1.1.1.1-2.2.2.2 . 3.3.3.3 }
> > > 
> > > these hit EEXIST, all good.
> > > 
> > > In pipapo, the following is silently ignored:
> > > 
> > > nft add element x k { 1.1.1.1-2.2.2.3 . 3.3.3.3 }
> > >                                     ^
> > > (just added a slightly large range). I tried _without_ these patches.  
> > 
> > I suppose you're doing something like:
> > 
> > nft add table x
> > nft add set x k '{ type ipv4_addr . ipv4_addr ; flags interval ; }'
> > nft add element x k { 1.1.1.1-2.2.2.2 . 3.3.3.3 }
> > nft add element x k { 1.1.1.1-2.2.2.3 . 3.3.3.3 }
> > 
> > in which case, yes, this is exactly the problem reported by Phil and my
> > point below: nft_pipapo_insert() reports -EEXIST on the second element,
> > but it's cleared by nft_add_set_elem().
> >   
> > > In rbtree, if you try to add an interval that already exists:
> > > 
> > > # nft add element x z { 1.1.0.0-1.1.2.254 }
> > > Error: interval overlaps with an existing one
> > > add element x z { 1.1.0.0-1.1.2.254 }
> > >                   ^^^^^^^^^^^^^^^^^
> > > Error: Could not process rule: File exists
> > > add element x z { 1.1.0.0-1.1.2.254 }
> > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > 
> > > This complains as an overlap.  
> > 
> > It doesn't for me:  
> 
> I'm assuming there's already:
> 
> nft create element x z { 1.1.1.1-2.2.2.2 }
> 
> to trigger the EEXIST, see below.
> 
> > # nft add table x
> > # nft add set x z '{ type ipv4_addr ; flags interval ; }'
> > # nft add element x z { 1.1.0.0-1.1.2.254 }
> > # nft add element x z { 1.1.0.0-1.1.2.254 }
> > # echo $?
> > 0
> > 
> > that is, -EEXIST from nft_rbtree_insert() is also cleared by
> > nft_add_set_elem(). Am I trying it in a different way?  
> 
> No, this is the same behaviour I see here:
> 
> # nft add element x z { 1.1.0.0-1.1.2.254 }
> # nft add element x z { 1.1.0.0-1.1.2.254 }
> # echo $?
> 0
> 
> This is fine. If you use 'create' instead, the second command gets
> EEXIST.
> 
> But:
> 
> # nft add element x z { 1.1.0.0-1.1.2.254 }
> # nft add element x z { 1.1.0.0-1.1.2.1 }
> 
> Then, the second command hits EEXIST because of the overlap.

Okay, thanks, I see now. But this is not returned by the kernel: set
back-ends have no way to return -EEXIST on 'add' to userspace. This is
returned by set_overlap() in src/segtree.c, the element is not even
sent to the kernel.

> > > I think it's better to not extend userspace to dump the element cache
> > > for add/create, this slows down things for incremental updates
> > > (there's a ticket on bugzilla regarding this problem on the rbtree
> > > IIRC). Better if pipapo can handle this from the kernel.  
> > 
> > It does, the overlapping (entire or partial) is detected and
> > nft_pipapo_insert() reports -EEXIST.  
> 
> OK.
> 
> > > There is a automerge feature in the rbtree userspace that has been
> > > controversial. This was initially turned on by default, users were
> > > reporting that this was confusing. We can probably introduce a kernel
> > > flag to turn on this automerge feature so pipapo knows user is asking
> > > to not bail out with EEXIST, instead just merge? Not sure how hard is
> > > to implement merging.  
> > 
> > It's doable, the problem is that with multiple ranges as different
> > fields of single elements it might be ambiguous for which fields
> > merging should be attempted.
> > 
> > That is, in the '{ 20-30 . 40-50, 25-35 . 45-50 }' case below, should I
> > attempt merging 0, 1 or 2 fields? I think the 'automerge' feature would
> > be even more confusing here, assuming it can be defined at all.  
> 
> I'm not very fond of the automerge feature either. Some people have
> been asking for this though. I still don't understand the usecase.

I would then avoid touching that unless somebody comes up with a use
case, which in this case it will also need to suggest a specification
of how to handle this for concatenated fields.

> > > > I found out from notes with an old discussion with Florian what the
> > > > problem really was: for concatenated ranges, we might have stuff like:
> > > > 
> > > > 	'{ 20-30 . 40-50, 25-35 . 45-50 }'    
> > > 
> > > I think the second element should hit EEXIST.  
> > 
> > Yes, I also think so, and nft_pipapo_insert() reports that.
> >   
> > > > And the only sane way to handle this is as two separate elements. Also
> > > > note that I gave a rather confusing description of the behaviour with
> > > > "partially overlapping elements": what can partially overlap are ranges
> > > > within one field, but there can't be an overlap (even partial) over the
> > > > whole concatenation, because that creates ambiguity. That is,
> > > > 
> > > > 	'{ 20-30 . 40, 25-35 . 40 }'
> > > > 
> > > > the "second element" here is not allowed, while:
> > > > 
> > > > 	'{ 20-30 . 40, 25-35 . 41 }'
> > > > 
> > > > these elements both are.
> > > > 
> > > > Now, this turns into a question for Pablo. I started digging a bit
> > > > further, and I think that both userspace and nft_pipapo_insert()
> > > > observe a reasonable behaviour here: nft passes those as two elements,
> > > > nft_pipapo_insert() rejects the second one with -EEXIST.
> > > > 
> > > > However, in nft_add_set_elem(), we hit this:
> > > > 
> > > > 	err = set->ops->insert(ctx->net, set, &elem, &ext2);
> > > > 	if (err) {
> > > > 		if (err == -EEXIST) {
> > > > 			if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) ^
> > > > 			    nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) ||
> > > > 			    nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) ^
> > > > 			    nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF)) {
> > > > 				err = -EBUSY;
> > > > 				goto err_element_clash;
> > > > 			}
> > > > 			if ((nft_set_ext_exists(ext, NFT_SET_EXT_DATA) &&
> > > > 			     nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) &&
> > > > 			     memcmp(nft_set_ext_data(ext),
> > > > 				    nft_set_ext_data(ext2), set->dlen) != 0) ||
> > > > 			    (nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) &&
> > > > 			     nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF) &&
> > > > 			     *nft_set_ext_obj(ext) != *nft_set_ext_obj(ext2)))
> > > > 				err = -EBUSY;
> > > > 			else if (!(nlmsg_flags & NLM_F_EXCL))
> > > > 				err = 0;
> > > > 		}
> > > > 		goto err_element_clash;
> > > > 	}
> > > > 
> > > > and, in particular, as there's no "data" or "objref" extension
> > > > associated with the elements, we hit the:
> > > > 
> > > > 	if (!(nlmsg_flags & NLM_F_EXCL))
> > > > 
> > > > clause, introduced by commit c016c7e45ddf ("netfilter: nf_tables: honor
> > > > NLM_F_EXCL flag in set element insertion"). The error is reset, and we
> > > > return success, but the set back-end indicated a problem.
> > > > 
> > > > Now, if NLM_F_EXCL is not passed and the entry the user wants to add is
> > > > already present, even though I'd give RFC 3549 a different
> > > > interpretation (we won't replace the entry, but we should still report
> > > > the error IMHO), I see why we might return success in this case.
> > > >
> > > > The additional problem with concatenated ranges here is that the entry
> > > > might be conflicting (overlapping over the whole concatenation), but
> > > > not be the same as the user wants to insert. I think -EEXIST is the
> > > > code that still fits best in this case, so... do you see a better
> > > > alternative than changing nft_pipapo_insert() to return, say, -EINVAL?    
> > > 
> > > Please, no -EINVAL, it's very overloaded and I think we should only
> > > use this for missing netlink attributes / malformed netlink message.
> > > 
> > > If I understood your reasoning, I agree -EEXIST for an overlapping
> > > element is fine, even if NLM_F_EXCL is not set.  
> > 
> > So you're suggesting that I change nft_add_set_elem() like this:
> > 
> > @@ -5075,8 +5075,6 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
> >                              nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF) &&
> >                              *nft_set_ext_obj(ext) != *nft_set_ext_obj(ext2)))
> >                                 err = -EBUSY;
> > -                       else if (!(nlmsg_flags & NLM_F_EXCL))
> > -                               err = 0;  
> 
> IIRC, this is disabling the 'nft create element' behaviour. NLM_F_EXCL
> is not set for 'nft add element commands'.

Right, and if it's not set, set back-ends can't return -EEXIST to
userspace.

> Are you observing any inconsistency? I still don't see where the
> problem is.

This is the problem Phil reported:

> > > > > > On Fri, 21 Feb 2020 22:17:04 +0100
> > > > > > Phil Sutter <phil@nwl.cc> wrote:
> > > > > >       
> > > > > > > I found another problem, but it's maybe on user space side (and not a
> > > > > > > crash this time ;):
> > > > > > > 
> > > > > > > | # nft add table t
> > > > > > > | # nft add set t s '{ type inet_service . inet_service ; flags interval ; }
> > > > > > > | # nft add element t s '{ 20-30 . 40, 25-35 . 40 }'
> > > > > > > | # nft list ruleset
> > > > > > > | table ip t {
> > > > > > > | 	set s {
> > > > > > > | 		type inet_service . inet_service
> > > > > > > | 		flags interval
> > > > > > > | 		elements = { 20-30 . 40 }
> > > > > > > | 	}
> > > > > > > | }
> > > > > > > 
> > > > > > > As you see, the second element disappears. It happens only if ranges
> > > > > > > overlap and non-range parts are identical.

Or also simply with:

# nft add element t s '{ 20-30 . 40 }'
# nft add element t s '{ 25-35 . 40 }'

the second element is silently ignored. I'm returning -EEXIST from
nft_pipapo_insert(), but nft_add_set_elem() clears it because NLM_F_EXCL
is not set.

Are you suggesting that this is consistent and therefore not a problem?

Or are you proposing that I should handle this in userspace as it's done
for non-concatenated ranges?

-- 
Stefano


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-25 14:34               ` Stefano Brivio
@ 2020-02-25 18:48                 ` Phil Sutter
  2020-02-25 19:33                   ` Stefano Brivio
  2020-02-25 20:21                 ` Pablo Neira Ayuso
  1 sibling, 1 reply; 31+ messages in thread
From: Phil Sutter @ 2020-02-25 18:48 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Pablo Neira Ayuso, netfilter-devel, Florian Westphal

Hi,

Sorry for jumping back into the discussion this late.

On Tue, Feb 25, 2020 at 03:34:35PM +0100, Stefano Brivio wrote:
[...]
> Or also simply with:
> 
> # nft add element t s '{ 20-30 . 40 }'
> # nft add element t s '{ 25-35 . 40 }'
> 
> the second element is silently ignored. I'm returning -EEXIST from
> nft_pipapo_insert(), but nft_add_set_elem() clears it because NLM_F_EXCL
> is not set.
> 
> Are you suggesting that this is consistent and therefore not a problem?
> 
> Or are you proposing that I should handle this in userspace as it's done
> for non-concatenated ranges?

The problem is that user tried to add a new element which is not yet
contained and the 'add element' command is the same as if it was
identical to an existing one. We must not ignore this situation as the
user needs to know: In the above case e.g., element '35 . 40' won't
match after the zero-return from 'add element' command.

At first I assumed we could merge e.g.:

| { 20-30 . 40-50, 25-35 . 45-55 }

into:

| { 20-35 . 40-55 }

But now I realize this is wrong. We would match e.g. '{ 20 . 55 }', a
combination the user never specified.

Given that merging multiple concatenated ranges is a non-trivial task, I
guess the only sane thing to do (for now at least) is to perform overlap
detection in user space and reject the command if an overlap is
detected. Stefano, do you see any problems with that?

Thanks, Phil

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-25 18:48                 ` Phil Sutter
@ 2020-02-25 19:33                   ` Stefano Brivio
  0 siblings, 0 replies; 31+ messages in thread
From: Stefano Brivio @ 2020-02-25 19:33 UTC (permalink / raw)
  To: Phil Sutter; +Cc: Pablo Neira Ayuso, netfilter-devel, Florian Westphal

On Tue, 25 Feb 2020 19:48:57 +0100
Phil Sutter <phil@nwl.cc> wrote:

> Hi,
> 
> Sorry for jumping back into the discussion this late.
> 
> On Tue, Feb 25, 2020 at 03:34:35PM +0100, Stefano Brivio wrote:
> [...]
> > Or also simply with:
> > 
> > # nft add element t s '{ 20-30 . 40 }'
> > # nft add element t s '{ 25-35 . 40 }'
> > 
> > the second element is silently ignored. I'm returning -EEXIST from
> > nft_pipapo_insert(), but nft_add_set_elem() clears it because NLM_F_EXCL
> > is not set.
> > 
> > Are you suggesting that this is consistent and therefore not a problem?
> > 
> > Or are you proposing that I should handle this in userspace as it's done
> > for non-concatenated ranges?  
> 
> The problem is that user tried to add a new element which is not yet
> contained and the 'add element' command is the same as if it was
> identical to an existing one. We must not ignore this situation as the
> user needs to know: In the above case e.g., element '35 . 40' won't
> match after the zero-return from 'add element' command.
> 
> At first I assumed we could merge e.g.:
> 
> | { 20-30 . 40-50, 25-35 . 45-55 }
> 
> into:
> 
> | { 20-35 . 40-55 }
> 
> But now I realize this is wrong. We would match e.g. '{ 20 . 55 }', a
> combination the user never specified.
> 
> Given that merging multiple concatenated ranges is a non-trivial task, I
> guess the only sane thing to do (for now at least) is to perform overlap
> detection in user space and reject the command if an overlap is
> detected. Stefano, do you see any problems with that?

Functionally, it's doable. The downsides I see are:

1. This logic is already implemented by pipapo, so I would duplicate
   it.

   Other than code duplication itself, the worst part is the risk of
   (accidental) mismatch between the two implementations, and the fact
   that if we ever want to change this logic, we'll have to change it in
   two places (taking care of not breaking API, etc.)

2. It's going to be a bit more complicated than interval_overlap(),
   expect perhaps 50 LoCs, plus conditionals to select
   interval_overlap() or something_else_overlap()

3. [very, very debatable] I consider accepting already existing
   entries, without returning -EEXIST, a bug, no matter if NLM_F_EXCL
   was not passed: NLM_F_EXCL should simply mean what RFC 3549 says,
   that is, "Don't replace the config object if it already exists.". By
   leaving that "error clearing" in the API, we maintain this. On the
   other hand, even in the unlikely case we agree it's a bug, "fixing"
   it comes with UAPI breakage risks, too.

So, yes, I would like to avoid that, but if:

a. I can't return anything else than -EEXIST from nft_pipapo_insert()

b. I can't remove that "if (err == -EEXIST) err = 0;" part

then I don't see any other solution than implementing it in userspace.
I hope somebody had better ideas, but if not, I would go ahead and
implement it in userspace.

-- 
Stefano


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-25 14:34               ` Stefano Brivio
  2020-02-25 18:48                 ` Phil Sutter
@ 2020-02-25 20:21                 ` Pablo Neira Ayuso
  2020-02-25 20:38                   ` Stefano Brivio
  1 sibling, 1 reply; 31+ messages in thread
From: Pablo Neira Ayuso @ 2020-02-25 20:21 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Phil Sutter, netfilter-devel, Florian Westphal

Hi Stefano,

On Tue, Feb 25, 2020 at 03:34:35PM +0100, Stefano Brivio wrote:
[...]
> This is the problem Phil reported:
[...]
> Or also simply with:
> 
> # nft add element t s '{ 20-30 . 40 }'
> # nft add element t s '{ 25-35 . 40 }'
> 
> the second element is silently ignored. I'm returning -EEXIST from
> nft_pipapo_insert(), but nft_add_set_elem() clears it because NLM_F_EXCL
> is not set.
> 
> Are you suggesting that this is consistent and therefore not a problem?

                        NLM_F_EXCL      !NLM_F_EXCL
        exact match       EEXIST             0 [*]
        partial match     EEXIST           EEXIST

The [*] case would allow for element timeout/expiration updates from
the control plane for exact matches. Note that element updates are not
supported yet, so this check for !NLM_F_EXCL is a stub. I don't think
we should allow for updates on partial matches

I think what it is missing is a error to report "partial match" from
pipapo. Then, the core translates this "partial match" error to EEXIST
whether NLM_F_EXCL is set or not.

Would this work for you?

> Or are you proposing that I should handle this in userspace as it's done
> for non-concatenated ranges?

I don't think we should handle this from userspace. If we do so, we'll
need to get an element cache for incremental updates, that will be slow.

Thanks for explaining.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-25 20:21                 ` Pablo Neira Ayuso
@ 2020-02-25 20:38                   ` Stefano Brivio
  2020-02-25 20:58                     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 31+ messages in thread
From: Stefano Brivio @ 2020-02-25 20:38 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Phil Sutter, netfilter-devel, Florian Westphal

On Tue, 25 Feb 2020 21:21:43 +0100
Pablo Neira Ayuso <pablo@netfilter.org> wrote:

> Hi Stefano,
> 
> On Tue, Feb 25, 2020 at 03:34:35PM +0100, Stefano Brivio wrote:
> [...]
> > This is the problem Phil reported:  
> [...]
> > Or also simply with:
> > 
> > # nft add element t s '{ 20-30 . 40 }'
> > # nft add element t s '{ 25-35 . 40 }'
> > 
> > the second element is silently ignored. I'm returning -EEXIST from
> > nft_pipapo_insert(), but nft_add_set_elem() clears it because NLM_F_EXCL
> > is not set.
> > 
> > Are you suggesting that this is consistent and therefore not a problem?  
> 
>                         NLM_F_EXCL      !NLM_F_EXCL
>         exact match       EEXIST             0 [*]
>         partial match     EEXIST           EEXIST
> 
> The [*] case would allow for element timeout/expiration updates from
> the control plane for exact matches.

A-ha. I didn't even consider that.

> Note that element updates are not
> supported yet, so this check for !NLM_F_EXCL is a stub. I don't think
> we should allow for updates on partial matches
> 
> I think what it is missing is a error to report "partial match" from
> pipapo. Then, the core translates this "partial match" error to EEXIST
> whether NLM_F_EXCL is set or not.

Yes, given what you explained, I also think it's the case.

> Would this work for you?

It would. I need to write a few more lines in nft_pipapo_insert(),
because right now I don't have a special case for "entirely
overlapping". Something on the lines of:

	dup = pipapo_get(net, set, start, genmask);
	if (PTR_ERR(dup) == -ENOENT) {

-->		compare start and end key for this entry with
		start and end key from 'ext'

Let me know if you want me to post a patch with a placeholder for
whatever you have in mind, or if I can help implementing this, etc.

Thanks!

-- 
Stefano


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-25 20:38                   ` Stefano Brivio
@ 2020-02-25 20:58                     ` Pablo Neira Ayuso
  2020-02-26 10:58                       ` Pablo Neira Ayuso
  2020-02-26 10:59                       ` Stefano Brivio
  0 siblings, 2 replies; 31+ messages in thread
From: Pablo Neira Ayuso @ 2020-02-25 20:58 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Phil Sutter, netfilter-devel, Florian Westphal

On Tue, Feb 25, 2020 at 09:38:15PM +0100, Stefano Brivio wrote:
> On Tue, 25 Feb 2020 21:21:43 +0100
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> 
> > Hi Stefano,
> > 
> > On Tue, Feb 25, 2020 at 03:34:35PM +0100, Stefano Brivio wrote:
> > [...]
> > > This is the problem Phil reported:  
> > [...]
> > > Or also simply with:
> > > 
> > > # nft add element t s '{ 20-30 . 40 }'
> > > # nft add element t s '{ 25-35 . 40 }'
> > > 
> > > the second element is silently ignored. I'm returning -EEXIST from
> > > nft_pipapo_insert(), but nft_add_set_elem() clears it because NLM_F_EXCL
> > > is not set.
> > > 
> > > Are you suggesting that this is consistent and therefore not a problem?  
> > 
> >                         NLM_F_EXCL      !NLM_F_EXCL
> >         exact match       EEXIST             0 [*]
> >         partial match     EEXIST           EEXIST
> > 
> > The [*] case would allow for element timeout/expiration updates from
> > the control plane for exact matches.
> 
> A-ha. I didn't even consider that.
> 
> > Note that element updates are not
> > supported yet, so this check for !NLM_F_EXCL is a stub. I don't think
> > we should allow for updates on partial matches
> > 
> > I think what it is missing is a error to report "partial match" from
> > pipapo. Then, the core translates this "partial match" error to EEXIST
> > whether NLM_F_EXCL is set or not.
> 
> Yes, given what you explained, I also think it's the case.
> 
> > Would this work for you?
> 
> It would. I need to write a few more lines in nft_pipapo_insert(),
> because right now I don't have a special case for "entirely
> overlapping". Something on the lines of:
> 
> 	dup = pipapo_get(net, set, start, genmask);
> 	if (PTR_ERR(dup) == -ENOENT) {
> 
> -->		compare start and end key for this entry with
> 		start and end key from 'ext'
> 
> Let me know if you want me to post a patch with a placeholder for
> whatever you have in mind, or if I can help implementing this, etc.

Please, go ahead with the placeholder, it might be faster. I'll jump
on it.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-25 20:58                     ` Pablo Neira Ayuso
@ 2020-02-26 10:58                       ` Pablo Neira Ayuso
  2020-02-26 11:01                         ` Pablo Neira Ayuso
  2020-02-26 11:02                         ` Stefano Brivio
  2020-02-26 10:59                       ` Stefano Brivio
  1 sibling, 2 replies; 31+ messages in thread
From: Pablo Neira Ayuso @ 2020-02-26 10:58 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Phil Sutter, netfilter-devel, Florian Westphal

[-- Attachment #1: Type: text/plain, Size: 2581 bytes --]

On Tue, Feb 25, 2020 at 09:58:47PM +0100, Pablo Neira Ayuso wrote:
> On Tue, Feb 25, 2020 at 09:38:15PM +0100, Stefano Brivio wrote:
> > On Tue, 25 Feb 2020 21:21:43 +0100
> > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > 
> > > Hi Stefano,
> > > 
> > > On Tue, Feb 25, 2020 at 03:34:35PM +0100, Stefano Brivio wrote:
> > > [...]
> > > > This is the problem Phil reported:  
> > > [...]
> > > > Or also simply with:
> > > > 
> > > > # nft add element t s '{ 20-30 . 40 }'
> > > > # nft add element t s '{ 25-35 . 40 }'
> > > > 
> > > > the second element is silently ignored. I'm returning -EEXIST from
> > > > nft_pipapo_insert(), but nft_add_set_elem() clears it because NLM_F_EXCL
> > > > is not set.
> > > > 
> > > > Are you suggesting that this is consistent and therefore not a problem?  
> > > 
> > >                         NLM_F_EXCL      !NLM_F_EXCL
> > >         exact match       EEXIST             0 [*]
> > >         partial match     EEXIST           EEXIST
> > > 
> > > The [*] case would allow for element timeout/expiration updates from
> > > the control plane for exact matches.
> > 
> > A-ha. I didn't even consider that.
> > 
> > > Note that element updates are not
> > > supported yet, so this check for !NLM_F_EXCL is a stub. I don't think
> > > we should allow for updates on partial matches
> > > 
> > > I think what it is missing is a error to report "partial match" from
> > > pipapo. Then, the core translates this "partial match" error to EEXIST
> > > whether NLM_F_EXCL is set or not.
> > 
> > Yes, given what you explained, I also think it's the case.
> > 
> > > Would this work for you?
> > 
> > It would. I need to write a few more lines in nft_pipapo_insert(),
> > because right now I don't have a special case for "entirely
> > overlapping". Something on the lines of:
> > 
> > 	dup = pipapo_get(net, set, start, genmask);
> > 	if (PTR_ERR(dup) == -ENOENT) {
> > 
> > -->		compare start and end key for this entry with
> > 		start and end key from 'ext'
> > 
> > Let me know if you want me to post a patch with a placeholder for
> > whatever you have in mind, or if I can help implementing this, etc.
> 
> Please, go ahead with the placeholder, it might be faster. I'll jump
> on it.

Sorry, I just realized I can be probably quicker on the core side.
Here is my proposal.

I'm attaching a patch for the core. This is handling -ENOTEMPTY which
is (ab)used to report the partial element matching.

if NLM_F_EXCL is set off, then -EEXIST becomes 0.
                          then -ENOTEMPTY becomes -EEXIST.

Would this work for you?

Thanks.

[-- Attachment #2: x.patch --]
[-- Type: text/x-diff, Size: 1164 bytes --]

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index d1318bdf49ca..0bbe36e731b8 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -5059,7 +5059,8 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 	ext->genmask = nft_genmask_cur(ctx->net) | NFT_SET_ELEM_BUSY_MASK;
 	err = set->ops->insert(ctx->net, set, &elem, &ext2);
 	if (err) {
-		if (err == -EEXIST) {
+		/* ENOTEMPTY reports partial element matching. */
+		if (err == -EEXIST || err == -ENOTEMPTY) {
 			if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) ^
 			    nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) ||
 			    nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) ^
@@ -5075,8 +5076,12 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 			     nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF) &&
 			     *nft_set_ext_obj(ext) != *nft_set_ext_obj(ext2)))
 				err = -EBUSY;
-			else if (!(nlmsg_flags & NLM_F_EXCL))
-				err = 0;
+			else if (!(nlmsg_flags & NLM_F_EXCL)) {
+				if (err == -EEXIST)
+					err = 0;
+				else if (err == -ENOTEMPTY)
+					err = -EEXIST;
+			}
 		}
 		goto err_element_clash;
 	}

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-25 20:58                     ` Pablo Neira Ayuso
  2020-02-26 10:58                       ` Pablo Neira Ayuso
@ 2020-02-26 10:59                       ` Stefano Brivio
  2020-02-26 11:10                         ` Pablo Neira Ayuso
  1 sibling, 1 reply; 31+ messages in thread
From: Stefano Brivio @ 2020-02-26 10:59 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Phil Sutter, netfilter-devel, Florian Westphal

[-- Attachment #1: Type: text/plain, Size: 3748 bytes --]

On Tue, 25 Feb 2020 21:58:47 +0100
Pablo Neira Ayuso <pablo@netfilter.org> wrote:

> On Tue, Feb 25, 2020 at 09:38:15PM +0100, Stefano Brivio wrote:
> > On Tue, 25 Feb 2020 21:21:43 +0100
> > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> >   
> > > Hi Stefano,
> > > 
> > > On Tue, Feb 25, 2020 at 03:34:35PM +0100, Stefano Brivio wrote:
> > > [...]  
> > > > This is the problem Phil reported:    
> > > [...]  
> > > > Or also simply with:
> > > > 
> > > > # nft add element t s '{ 20-30 . 40 }'
> > > > # nft add element t s '{ 25-35 . 40 }'
> > > > 
> > > > the second element is silently ignored. I'm returning -EEXIST from
> > > > nft_pipapo_insert(), but nft_add_set_elem() clears it because NLM_F_EXCL
> > > > is not set.
> > > > 
> > > > Are you suggesting that this is consistent and therefore not a problem?    
> > > 
> > >                         NLM_F_EXCL      !NLM_F_EXCL
> > >         exact match       EEXIST             0 [*]
> > >         partial match     EEXIST           EEXIST
> > > 
> > > The [*] case would allow for element timeout/expiration updates from
> > > the control plane for exact matches.  
> > 
> > A-ha. I didn't even consider that.
> >   
> > > Note that element updates are not
> > > supported yet, so this check for !NLM_F_EXCL is a stub. I don't think
> > > we should allow for updates on partial matches
> > > 
> > > I think what it is missing is a error to report "partial match" from
> > > pipapo. Then, the core translates this "partial match" error to EEXIST
> > > whether NLM_F_EXCL is set or not.  
> > 
> > Yes, given what you explained, I also think it's the case.
> >   
> > > Would this work for you?  
> > 
> > It would. I need to write a few more lines in nft_pipapo_insert(),
> > because right now I don't have a special case for "entirely
> > overlapping". Something on the lines of:
> > 
> > 	dup = pipapo_get(net, set, start, genmask);
> > 	if (PTR_ERR(dup) == -ENOENT) {
> >   
> > -->		compare start and end key for this entry with  
> > 		start and end key from 'ext'
> > 
> > Let me know if you want me to post a patch with a placeholder for
> > whatever you have in mind, or if I can help implementing this, etc.  
> 
> Please, go ahead with the placeholder, it might be faster. I'll jump
> on it.

Attached, reasonably tested, the placeholder is 'return -ENOTTY':

# nft add table t
# nft add set t s '{ type ipv4_addr . ipv4_addr ; flags interval ; }'
# nft add element t s '{ 1.1.1.1-2.2.2.2 . 3.3.3.3 }'
# nft add element t s '{ 1.1.1.1-2.2.2.3 . 3.3.3.3 }'
Error: Could not process rule: Inappropriate ioctl for device
add element t s { 1.1.1.1-2.2.2.3 . 3.3.3.3 }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

One detail, unrelated to this patch, that I should probably document in
man pages and Wiki (I forgot, it occurred to me while testing): it is
allowed to insert an entry if a proper subset of it, with no
overlapping bounds, is already inserted. The reverse sequence is not
allowed. This can be used without ambiguity due to strict guarantees
about ordering. That is:

# nft add element t s '{ 1.0.0.20-1.0.0.21 . 3.3.3.3 }'
# nft add element t s '{ 1.0.0.10-1.0.0.100 . 3.3.3.3 }'
# nft list table ip t
table ip t {
	set s {
		type ipv4_addr . ipv4_addr
		flags interval
		elements = { 1.0.0.20/31 . 3.3.3.3,
			     1.0.0.10-1.0.0.100 . 3.3.3.3 }
	}
}

But:

# nft add element t s '{ 1.0.0.10-1.0.0.100 . 3.3.3.3 }'
# nft add element t s '{ 1.0.0.20-1.0.0.21 . 3.3.3.3 }'
Error: Could not process rule: Inappropriate ioctl for device
add element t s { 1.0.0.20-1.0.0.21 . 3.3.3.3 }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This is because least generic entries are only allowed to be added
after more generic ones, and match in that order.

-- 
Stefano

[-- Attachment #2: pipapo_overlap_placeholder.patch --]
[-- Type: text/x-patch, Size: 1573 bytes --]

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 4fc0c924ed5d..fc5e347bfeba 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -1098,21 +1098,41 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
 	struct nft_pipapo_field *f;
 	int i, bsize_max, err = 0;
 
+	if (nft_set_ext_exists(ext, NFT_SET_EXT_KEY_END))
+		end = (const u8 *)nft_set_ext_key_end(ext)->data;
+	else
+		end = start;
+
 	dup = pipapo_get(net, set, start, genmask);
-	if (PTR_ERR(dup) == -ENOENT) {
-		if (nft_set_ext_exists(ext, NFT_SET_EXT_KEY_END)) {
-			end = (const u8 *)nft_set_ext_key_end(ext)->data;
-			dup = pipapo_get(net, set, end, nft_genmask_next(net));
-		} else {
-			end = start;
+	if (!IS_ERR(dup)) {
+		/* Check if we already have the same exact entry */
+		const struct nft_data *dup_key, *dup_end;
+
+		dup_key = nft_set_ext_key(&dup->ext);
+		if (nft_set_ext_exists(&dup->ext, NFT_SET_EXT_KEY_END))
+			dup_end = nft_set_ext_key_end(&dup->ext);
+		else
+			dup_end = dup_key;
+
+		if (!memcmp(start, dup_key->data, sizeof(*dup_key->data)) &&
+		    !memcmp(end, dup_end->data, sizeof(*dup_end->data))) {
+			*ext2 = &dup->ext;
+			return -EEXIST;
 		}
+
+		return -ENOTTY;
+	}
+
+	if (PTR_ERR(dup) == -ENOENT) {
+		/* Look for partially overlapping entries */
+		dup = pipapo_get(net, set, end, nft_genmask_next(net));
 	}
 
 	if (PTR_ERR(dup) != -ENOENT) {
 		if (IS_ERR(dup))
 			return PTR_ERR(dup);
 		*ext2 = &dup->ext;
-		return -EEXIST;
+		return -ENOTTY;
 	}
 
 	/* Validate */

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-26 10:58                       ` Pablo Neira Ayuso
@ 2020-02-26 11:01                         ` Pablo Neira Ayuso
  2020-02-26 11:02                         ` Stefano Brivio
  1 sibling, 0 replies; 31+ messages in thread
From: Pablo Neira Ayuso @ 2020-02-26 11:01 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Phil Sutter, netfilter-devel, Florian Westphal

On Wed, Feb 26, 2020 at 11:58:04AM +0100, Pablo Neira Ayuso wrote:
> On Tue, Feb 25, 2020 at 09:58:47PM +0100, Pablo Neira Ayuso wrote:
> > On Tue, Feb 25, 2020 at 09:38:15PM +0100, Stefano Brivio wrote:
> > > On Tue, 25 Feb 2020 21:21:43 +0100
> > > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > 
> > > > Hi Stefano,
> > > > 
> > > > On Tue, Feb 25, 2020 at 03:34:35PM +0100, Stefano Brivio wrote:
> > > > [...]
> > > > > This is the problem Phil reported:  
> > > > [...]
> > > > > Or also simply with:
> > > > > 
> > > > > # nft add element t s '{ 20-30 . 40 }'
> > > > > # nft add element t s '{ 25-35 . 40 }'
> > > > > 
> > > > > the second element is silently ignored. I'm returning -EEXIST from
> > > > > nft_pipapo_insert(), but nft_add_set_elem() clears it because NLM_F_EXCL
> > > > > is not set.
> > > > > 
> > > > > Are you suggesting that this is consistent and therefore not a problem?  
> > > > 
> > > >                         NLM_F_EXCL      !NLM_F_EXCL
> > > >         exact match       EEXIST             0 [*]
> > > >         partial match     EEXIST           EEXIST
> > > > 
> > > > The [*] case would allow for element timeout/expiration updates from
> > > > the control plane for exact matches.
> > > 
> > > A-ha. I didn't even consider that.
> > > 
> > > > Note that element updates are not
> > > > supported yet, so this check for !NLM_F_EXCL is a stub. I don't think
> > > > we should allow for updates on partial matches
> > > > 
> > > > I think what it is missing is a error to report "partial match" from
> > > > pipapo. Then, the core translates this "partial match" error to EEXIST
> > > > whether NLM_F_EXCL is set or not.
> > > 
> > > Yes, given what you explained, I also think it's the case.
> > > 
> > > > Would this work for you?
> > > 
> > > It would. I need to write a few more lines in nft_pipapo_insert(),
> > > because right now I don't have a special case for "entirely
> > > overlapping". Something on the lines of:
> > > 
> > > 	dup = pipapo_get(net, set, start, genmask);
> > > 	if (PTR_ERR(dup) == -ENOENT) {
> > > 
> > > -->		compare start and end key for this entry with
> > > 		start and end key from 'ext'
> > > 
> > > Let me know if you want me to post a patch with a placeholder for
> > > whatever you have in mind, or if I can help implementing this, etc.
> > 
> > Please, go ahead with the placeholder, it might be faster. I'll jump
> > on it.
> 
> Sorry, I just realized I can be probably quicker on the core side.
> Here is my proposal.
> 
> I'm attaching a patch for the core. This is handling -ENOTEMPTY which
> is (ab)used to report the partial element matching.
> 
> if NLM_F_EXCL is set off, then -EEXIST becomes 0.
>                           then -ENOTEMPTY becomes -EEXIST.
> 
> Would this work for you?
> 
> Thanks.

> diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
> index d1318bdf49ca..0bbe36e731b8 100644
> --- a/net/netfilter/nf_tables_api.c
> +++ b/net/netfilter/nf_tables_api.c
> @@ -5059,7 +5059,8 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
>  	ext->genmask = nft_genmask_cur(ctx->net) | NFT_SET_ELEM_BUSY_MASK;
>  	err = set->ops->insert(ctx->net, set, &elem, &ext2);
>  	if (err) {
> -		if (err == -EEXIST) {
> +		/* ENOTEMPTY reports partial element matching. */
> +		if (err == -EEXIST || err == -ENOTEMPTY) {
>  			if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) ^
>  			    nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) ||
>  			    nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) ^
> @@ -5075,8 +5076,12 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
>  			     nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF) &&
>  			     *nft_set_ext_obj(ext) != *nft_set_ext_obj(ext2)))
>  				err = -EBUSY;
> -			else if (!(nlmsg_flags & NLM_F_EXCL))
> -				err = 0;
> +			else if (!(nlmsg_flags & NLM_F_EXCL)) {
> +				if (err == -EEXIST)
> +					err = 0;
> +				else if (err == -ENOTEMPTY)
> +					err = -EEXIST;

BTW, this could be simplified to:

                                if (err == -ENOTEMPTY)
                                        err = -EEXIST;
                                else
                                        err = 0;

instead of:

				if (err == -EEXIST)
					err = 0;
				else if (err == -ENOTEMPTY)
					err = -EEXIST;

> +			}
>  		}
>  		goto err_element_clash;
>  	}


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-26 10:58                       ` Pablo Neira Ayuso
  2020-02-26 11:01                         ` Pablo Neira Ayuso
@ 2020-02-26 11:02                         ` Stefano Brivio
  2020-02-26 11:29                           ` Pablo Neira Ayuso
  1 sibling, 1 reply; 31+ messages in thread
From: Stefano Brivio @ 2020-02-26 11:02 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Phil Sutter, netfilter-devel, Florian Westphal

On Wed, 26 Feb 2020 11:58:04 +0100
Pablo Neira Ayuso <pablo@netfilter.org> wrote:

> On Tue, Feb 25, 2020 at 09:58:47PM +0100, Pablo Neira Ayuso wrote:
> > On Tue, Feb 25, 2020 at 09:38:15PM +0100, Stefano Brivio wrote:  
> > > On Tue, 25 Feb 2020 21:21:43 +0100
> > > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > >   
> > > > Hi Stefano,
> > > > 
> > > > On Tue, Feb 25, 2020 at 03:34:35PM +0100, Stefano Brivio wrote:
> > > > [...]  
> > > > > This is the problem Phil reported:    
> > > > [...]  
> > > > > Or also simply with:
> > > > > 
> > > > > # nft add element t s '{ 20-30 . 40 }'
> > > > > # nft add element t s '{ 25-35 . 40 }'
> > > > > 
> > > > > the second element is silently ignored. I'm returning -EEXIST from
> > > > > nft_pipapo_insert(), but nft_add_set_elem() clears it because NLM_F_EXCL
> > > > > is not set.
> > > > > 
> > > > > Are you suggesting that this is consistent and therefore not a problem?    
> > > > 
> > > >                         NLM_F_EXCL      !NLM_F_EXCL
> > > >         exact match       EEXIST             0 [*]
> > > >         partial match     EEXIST           EEXIST
> > > > 
> > > > The [*] case would allow for element timeout/expiration updates from
> > > > the control plane for exact matches.  
> > > 
> > > A-ha. I didn't even consider that.
> > >   
> > > > Note that element updates are not
> > > > supported yet, so this check for !NLM_F_EXCL is a stub. I don't think
> > > > we should allow for updates on partial matches
> > > > 
> > > > I think what it is missing is a error to report "partial match" from
> > > > pipapo. Then, the core translates this "partial match" error to EEXIST
> > > > whether NLM_F_EXCL is set or not.  
> > > 
> > > Yes, given what you explained, I also think it's the case.
> > >   
> > > > Would this work for you?  
> > > 
> > > It would. I need to write a few more lines in nft_pipapo_insert(),
> > > because right now I don't have a special case for "entirely
> > > overlapping". Something on the lines of:
> > > 
> > > 	dup = pipapo_get(net, set, start, genmask);
> > > 	if (PTR_ERR(dup) == -ENOENT) {
> > >   
> > > -->		compare start and end key for this entry with  
> > > 		start and end key from 'ext'
> > > 
> > > Let me know if you want me to post a patch with a placeholder for
> > > whatever you have in mind, or if I can help implementing this, etc.  
> > 
> > Please, go ahead with the placeholder, it might be faster. I'll jump
> > on it.  
> 
> Sorry, I just realized I can be probably quicker on the core side.
> Here is my proposal.
> 
> I'm attaching a patch for the core. This is handling -ENOTEMPTY which
> is (ab)used to report the partial element matching.
> 
> if NLM_F_EXCL is set off, then -EEXIST becomes 0.
>                           then -ENOTEMPTY becomes -EEXIST.
> 
> Would this work for you?

Oops, I sent you my patch 80 seconds later it seems. Yes, we just need
to s/TTY/TEMPTY/ :)

Let me know how to proceed, if you want me to post that or you want to
post that (as a series?).

-- 
Stefano


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-26 10:59                       ` Stefano Brivio
@ 2020-02-26 11:10                         ` Pablo Neira Ayuso
  2020-02-26 11:19                           ` Stefano Brivio
  0 siblings, 1 reply; 31+ messages in thread
From: Pablo Neira Ayuso @ 2020-02-26 11:10 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Phil Sutter, netfilter-devel, Florian Westphal

On Wed, Feb 26, 2020 at 11:59:24AM +0100, Stefano Brivio wrote:
[...]
> One detail, unrelated to this patch, that I should probably document in
> man pages and Wiki (I forgot, it occurred to me while testing): it is
> allowed to insert an entry if a proper subset of it, with no
> overlapping bounds, is already inserted. The reverse sequence is not
> allowed. This can be used without ambiguity due to strict guarantees
> about ordering. That is:
> 
> # nft add element t s '{ 1.0.0.20-1.0.0.21 . 3.3.3.3 }'
> # nft add element t s '{ 1.0.0.10-1.0.0.100 . 3.3.3.3 }'

OK, so first element "shadows" the second one. And the first element
will matching in case that address is 1.0.0.20 and 10.0.0.21. Right?

Your patch looks good to me, BTW.

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-26 11:10                         ` Pablo Neira Ayuso
@ 2020-02-26 11:19                           ` Stefano Brivio
  2020-02-26 11:34                             ` Pablo Neira Ayuso
  0 siblings, 1 reply; 31+ messages in thread
From: Stefano Brivio @ 2020-02-26 11:19 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Phil Sutter, netfilter-devel, Florian Westphal

On Wed, 26 Feb 2020 12:10:56 +0100
Pablo Neira Ayuso <pablo@netfilter.org> wrote:

> On Wed, Feb 26, 2020 at 11:59:24AM +0100, Stefano Brivio wrote:
> [...]
> > One detail, unrelated to this patch, that I should probably document in
> > man pages and Wiki (I forgot, it occurred to me while testing): it is
> > allowed to insert an entry if a proper subset of it, with no
> > overlapping bounds, is already inserted. The reverse sequence is not
> > allowed. This can be used without ambiguity due to strict guarantees
> > about ordering. That is:
> > 
> > # nft add element t s '{ 1.0.0.20-1.0.0.21 . 3.3.3.3 }'
> > # nft add element t s '{ 1.0.0.10-1.0.0.100 . 3.3.3.3 }'  
> 
> OK, so first element "shadows" the second one. And the first element
> will matching in case that address is 1.0.0.20 and 10.0.0.21. Right?

Correct.

> Your patch looks good to me, BTW.

Thanks for checking! Let me know how to proceed.

-- 
Stefano


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-26 11:02                         ` Stefano Brivio
@ 2020-02-26 11:29                           ` Pablo Neira Ayuso
  2020-02-26 11:36                             ` Stefano Brivio
  0 siblings, 1 reply; 31+ messages in thread
From: Pablo Neira Ayuso @ 2020-02-26 11:29 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Phil Sutter, netfilter-devel, Florian Westphal

[-- Attachment #1: Type: text/plain, Size: 768 bytes --]

On Wed, Feb 26, 2020 at 12:02:53PM +0100, Stefano Brivio wrote:
> On Wed, 26 Feb 2020 11:58:04 +0100
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
[...]
> > I'm attaching a patch for the core. This is handling -ENOTEMPTY which
> > is (ab)used to report the partial element matching.
> > 
> > if NLM_F_EXCL is set off, then -EEXIST becomes 0.
> >                           then -ENOTEMPTY becomes -EEXIST.
> > 
> > Would this work for you?
> 
> Oops, I sent you my patch 80 seconds later it seems. Yes, we just need
> to s/TTY/TEMPTY/ :)

All good, we're in sync.

> Let me know how to proceed, if you want me to post that or you want to
> post that (as a series?).

I'm revisiting the patch I sent, it would be like this, to not expose
the -ENOTEMPTY to userspace.

[-- Attachment #2: 0001-netfilter-nf_tables-report-ENOTEMPTY-on-element-inte.patch --]
[-- Type: text/x-diff, Size: 1862 bytes --]

From a17f22eac1dfd599ff97bb262fc97d64333b06fe Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Wed, 26 Feb 2020 12:11:53 +0100
Subject: [PATCH] netfilter: nf_tables: report ENOTEMPTY on element
 intersection

The set backend uses ENOTEMPTY to report an intersection between two
range elements.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_tables_api.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index d1318bdf49ca..48ad273a273e 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -5059,7 +5059,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 	ext->genmask = nft_genmask_cur(ctx->net) | NFT_SET_ELEM_BUSY_MASK;
 	err = set->ops->insert(ctx->net, set, &elem, &ext2);
 	if (err) {
-		if (err == -EEXIST) {
+		if (err == -EEXIST || err == -ENOTEMPTY) {
 			if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) ^
 			    nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) ||
 			    nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) ^
@@ -5073,10 +5073,17 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 				    nft_set_ext_data(ext2), set->dlen) != 0) ||
 			    (nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) &&
 			     nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF) &&
-			     *nft_set_ext_obj(ext) != *nft_set_ext_obj(ext2)))
+			     *nft_set_ext_obj(ext) != *nft_set_ext_obj(ext2))) {
 				err = -EBUSY;
-			else if (!(nlmsg_flags & NLM_F_EXCL))
-				err = 0;
+			} else {
+				/* ENOTEMPTY reports an intersection between
+				 * this element and an existing one.
+				 */
+				if (err == -ENOTEMPTY)
+					err = -EEXIST;
+				else if (!(nlmsg_flags & NLM_F_EXCL))
+					err = 0;
+			}
 		}
 		goto err_element_clash;
 	}
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-26 11:19                           ` Stefano Brivio
@ 2020-02-26 11:34                             ` Pablo Neira Ayuso
  2020-02-26 11:39                               ` Stefano Brivio
  0 siblings, 1 reply; 31+ messages in thread
From: Pablo Neira Ayuso @ 2020-02-26 11:34 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Phil Sutter, netfilter-devel, Florian Westphal

On Wed, Feb 26, 2020 at 12:19:24PM +0100, Stefano Brivio wrote:
> On Wed, 26 Feb 2020 12:10:56 +0100
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> 
> > On Wed, Feb 26, 2020 at 11:59:24AM +0100, Stefano Brivio wrote:
> > [...]
> > > One detail, unrelated to this patch, that I should probably document in
> > > man pages and Wiki (I forgot, it occurred to me while testing): it is
> > > allowed to insert an entry if a proper subset of it, with no
> > > overlapping bounds, is already inserted. The reverse sequence is not
> > > allowed. This can be used without ambiguity due to strict guarantees
> > > about ordering. That is:
> > > 
> > > # nft add element t s '{ 1.0.0.20-1.0.0.21 . 3.3.3.3 }'
> > > # nft add element t s '{ 1.0.0.10-1.0.0.100 . 3.3.3.3 }'  
> > 
> > OK, so first element "shadows" the second one. And the first element
> > will matching in case that address is 1.0.0.20 and 10.0.0.21. Right?
> 
> Correct.

So this is happening because the result bitmap contains the pipapo
rules that represent the first element and the second. But when
iterating over the result bitmap bits, the pipapo rule that represents
the first element is taken as the matching one, right?

I mean, to catch elements that represents subsets/supersets of another
element (like in this example above), pipapo would need to make a
lookup for already matching rules for this new element?

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-26 11:29                           ` Pablo Neira Ayuso
@ 2020-02-26 11:36                             ` Stefano Brivio
  2020-02-26 11:53                               ` Pablo Neira Ayuso
  0 siblings, 1 reply; 31+ messages in thread
From: Stefano Brivio @ 2020-02-26 11:36 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Phil Sutter, netfilter-devel, Florian Westphal

On Wed, 26 Feb 2020 12:29:35 +0100
Pablo Neira Ayuso <pablo@netfilter.org> wrote:

> From a17f22eac1dfd599ff97bb262fc97d64333b06fe Mon Sep 17 00:00:00 2001
> From: Pablo Neira Ayuso <pablo@netfilter.org>
> Date: Wed, 26 Feb 2020 12:11:53 +0100
> Subject: [PATCH] netfilter: nf_tables: report ENOTEMPTY on element
>  intersection
> 
> The set backend uses ENOTEMPTY to report an intersection between two
> range elements.
> 
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---
>  net/netfilter/nf_tables_api.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
> index d1318bdf49ca..48ad273a273e 100644
> --- a/net/netfilter/nf_tables_api.c
> +++ b/net/netfilter/nf_tables_api.c
> @@ -5059,7 +5059,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
>  	ext->genmask = nft_genmask_cur(ctx->net) | NFT_SET_ELEM_BUSY_MASK;
>  	err = set->ops->insert(ctx->net, set, &elem, &ext2);
>  	if (err) {
> -		if (err == -EEXIST) {
> +		if (err == -EEXIST || err == -ENOTEMPTY) {
>  			if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) ^
>  			    nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) ||
>  			    nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) ^
> @@ -5073,10 +5073,17 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
>  				    nft_set_ext_data(ext2), set->dlen) != 0) ||
>  			    (nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) &&
>  			     nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF) &&
> -			     *nft_set_ext_obj(ext) != *nft_set_ext_obj(ext2)))
> +			     *nft_set_ext_obj(ext) != *nft_set_ext_obj(ext2))) {
>  				err = -EBUSY;
> -			else if (!(nlmsg_flags & NLM_F_EXCL))
> -				err = 0;
> +			} else {
> +				/* ENOTEMPTY reports an intersection between
> +				 * this element and an existing one.
> +				 */
> +				if (err == -ENOTEMPTY)
> +					err = -EEXIST;
> +				else if (!(nlmsg_flags & NLM_F_EXCL))
> +					err = 0;
> +			}
>  		}
>  		goto err_element_clash;
>  	}

I haven't tested it, but isn't:

@@ -5077,6 +5077,11 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
                                err = -EBUSY;
                        else if (!(nlmsg_flags & NLM_F_EXCL))
                                err = 0;
+               } else if (err == -ENOTEMPTY) {
+                       /* ENOTEMPTY reports overlapping between this element
+                        * and an existing one.
+                        */
+                       err = -EEXIST;
                }
                goto err_element_clash;
        }

simpler and equivalent?

-- 
Stefano


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-26 11:34                             ` Pablo Neira Ayuso
@ 2020-02-26 11:39                               ` Stefano Brivio
  2020-02-26 11:54                                 ` Stefano Brivio
  0 siblings, 1 reply; 31+ messages in thread
From: Stefano Brivio @ 2020-02-26 11:39 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Phil Sutter, netfilter-devel, Florian Westphal

On Wed, 26 Feb 2020 12:34:43 +0100
Pablo Neira Ayuso <pablo@netfilter.org> wrote:

> On Wed, Feb 26, 2020 at 12:19:24PM +0100, Stefano Brivio wrote:
> > On Wed, 26 Feb 2020 12:10:56 +0100
> > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> >   
> > > On Wed, Feb 26, 2020 at 11:59:24AM +0100, Stefano Brivio wrote:
> > > [...]  
> > > > One detail, unrelated to this patch, that I should probably document in
> > > > man pages and Wiki (I forgot, it occurred to me while testing): it is
> > > > allowed to insert an entry if a proper subset of it, with no
> > > > overlapping bounds, is already inserted. The reverse sequence is not
> > > > allowed. This can be used without ambiguity due to strict guarantees
> > > > about ordering. That is:
> > > > 
> > > > # nft add element t s '{ 1.0.0.20-1.0.0.21 . 3.3.3.3 }'
> > > > # nft add element t s '{ 1.0.0.10-1.0.0.100 . 3.3.3.3 }'    
> > > 
> > > OK, so first element "shadows" the second one. And the first element
> > > will matching in case that address is 1.0.0.20 and 10.0.0.21. Right?  
> > 
> > Correct.  
> 
> So this is happening because the result bitmap contains the pipapo
> rules that represent the first element and the second. But when
> iterating over the result bitmap bits, the pipapo rule that represents
> the first element is taken as the matching one, right?

Right.

> I mean, to catch elements that represents subsets/supersets of another
> element (like in this example above), pipapo would need to make a
> lookup for already matching rules for this new element?

Right, and that's what those two pipapo_get() calls in
nft_pipapo_insert() do.

-- 
Stefano


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-26 11:36                             ` Stefano Brivio
@ 2020-02-26 11:53                               ` Pablo Neira Ayuso
  0 siblings, 0 replies; 31+ messages in thread
From: Pablo Neira Ayuso @ 2020-02-26 11:53 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Phil Sutter, netfilter-devel, Florian Westphal

On Wed, Feb 26, 2020 at 12:36:26PM +0100, Stefano Brivio wrote:
> On Wed, 26 Feb 2020 12:29:35 +0100
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> 
> > From a17f22eac1dfd599ff97bb262fc97d64333b06fe Mon Sep 17 00:00:00 2001
> > From: Pablo Neira Ayuso <pablo@netfilter.org>
> > Date: Wed, 26 Feb 2020 12:11:53 +0100
> > Subject: [PATCH] netfilter: nf_tables: report ENOTEMPTY on element
> >  intersection
> > 
> > The set backend uses ENOTEMPTY to report an intersection between two
> > range elements.
> > 
> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > ---
> >  net/netfilter/nf_tables_api.c | 15 +++++++++++----
> >  1 file changed, 11 insertions(+), 4 deletions(-)
> > 
> > diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
> > index d1318bdf49ca..48ad273a273e 100644
> > --- a/net/netfilter/nf_tables_api.c
> > +++ b/net/netfilter/nf_tables_api.c
> > @@ -5059,7 +5059,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
> >  	ext->genmask = nft_genmask_cur(ctx->net) | NFT_SET_ELEM_BUSY_MASK;
> >  	err = set->ops->insert(ctx->net, set, &elem, &ext2);
> >  	if (err) {
> > -		if (err == -EEXIST) {
> > +		if (err == -EEXIST || err == -ENOTEMPTY) {
> >  			if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) ^
> >  			    nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) ||
> >  			    nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) ^
> > @@ -5073,10 +5073,17 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
> >  				    nft_set_ext_data(ext2), set->dlen) != 0) ||
> >  			    (nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) &&
> >  			     nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF) &&
> > -			     *nft_set_ext_obj(ext) != *nft_set_ext_obj(ext2)))
> > +			     *nft_set_ext_obj(ext) != *nft_set_ext_obj(ext2))) {
> >  				err = -EBUSY;
> > -			else if (!(nlmsg_flags & NLM_F_EXCL))
> > -				err = 0;
> > +			} else {
> > +				/* ENOTEMPTY reports an intersection between
> > +				 * this element and an existing one.
> > +				 */
> > +				if (err == -ENOTEMPTY)
> > +					err = -EEXIST;
> > +				else if (!(nlmsg_flags & NLM_F_EXCL))
> > +					err = 0;
> > +			}
> >  		}
> >  		goto err_element_clash;
> >  	}
> 
> I haven't tested it, but isn't:
> 
> @@ -5077,6 +5077,11 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
>                                 err = -EBUSY;
>                         else if (!(nlmsg_flags & NLM_F_EXCL))
>                                 err = 0;
> +               } else if (err == -ENOTEMPTY) {
> +                       /* ENOTEMPTY reports overlapping between this element
> +                        * and an existing one.
> +                        */
> +                       err = -EEXIST;
>                 }
>                 goto err_element_clash;
>         }
> 
> simpler and equivalent?

Indeed, there is no chance to do the special EBUSY handling in the
-ENOTEMPTY case.

You patch is much better.

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-26 11:39                               ` Stefano Brivio
@ 2020-02-26 11:54                                 ` Stefano Brivio
  2020-02-26 12:10                                   ` Pablo Neira Ayuso
  0 siblings, 1 reply; 31+ messages in thread
From: Stefano Brivio @ 2020-02-26 11:54 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Phil Sutter, netfilter-devel, Florian Westphal

On Wed, 26 Feb 2020 12:39:26 +0100
Stefano Brivio <sbrivio@redhat.com> wrote:

> On Wed, 26 Feb 2020 12:34:43 +0100
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> 
> > I mean, to catch elements that represents subsets/supersets of another
> > element (like in this example above), pipapo would need to make a
> > lookup for already matching rules for this new element?  
> 
> Right, and that's what those two pipapo_get() calls in
> nft_pipapo_insert() do.

Specifically, on re-reading your question: those find sets including
the subset that we would be about to insert, and forbid the insertion.

But, given an already existing proper subset with none of the bounds
overlapping ("more specific entry", by any measure), they won't return
it, so insertion can proceed.

-- 
Stefano


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-26 11:54                                 ` Stefano Brivio
@ 2020-02-26 12:10                                   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 31+ messages in thread
From: Pablo Neira Ayuso @ 2020-02-26 12:10 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Phil Sutter, netfilter-devel, Florian Westphal

On Wed, Feb 26, 2020 at 12:54:07PM +0100, Stefano Brivio wrote:
> On Wed, 26 Feb 2020 12:39:26 +0100
> Stefano Brivio <sbrivio@redhat.com> wrote:
> 
> > On Wed, 26 Feb 2020 12:34:43 +0100
> > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > 
> > > I mean, to catch elements that represents subsets/supersets of another
> > > element (like in this example above), pipapo would need to make a
> > > lookup for already matching rules for this new element?  
> > 
> > Right, and that's what those two pipapo_get() calls in
> > nft_pipapo_insert() do.
> 
> Specifically, on re-reading your question: those find sets including
> the subset that we would be about to insert, and forbid the insertion.
> 
> But, given an already existing proper subset with none of the bounds
> overlapping ("more specific entry", by any measure), they won't return
> it, so insertion can proceed.

Thanks for explaining.

I see, the bounds are not found by pipapo_get(), they are not included
in the existing (subset) element range. We would need to tests for all
the existing (inner) elements in the range to catch for subsets.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table
  2020-02-21  2:04 [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table Stefano Brivio
                   ` (2 preceding siblings ...)
  2020-02-21 21:17 ` [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table Phil Sutter
@ 2020-02-26 13:33 ` Pablo Neira Ayuso
  3 siblings, 0 replies; 31+ messages in thread
From: Pablo Neira Ayuso @ 2020-02-26 13:33 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: netfilter-devel, Phil Sutter

On Fri, Feb 21, 2020 at 03:04:20AM +0100, Stefano Brivio wrote:
> Patch 1/2 fixes the issue recently reported by Phil on a sequence of
> add/flush/add operations, and patch 2/2 introduces a test case
> covering that.

Applied, thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2020-02-26 13:34 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-21  2:04 [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table Stefano Brivio
2020-02-21  2:04 ` [PATCH nf 1/2] nft_set_pipapo: Actually fetch key data in nft_pipapo_remove() Stefano Brivio
2020-02-21  2:04 ` [PATCH nf 2/2] selftests: nft_concat_range: Add test for reported add/flush/add issue Stefano Brivio
2020-02-21 21:17 ` [PATCH nf 0/2] nft_set_pipapo: Fix crash due to dangling entries in mapping table Phil Sutter
2020-02-21 22:22   ` Stefano Brivio
2020-02-22  1:19     ` Phil Sutter
2020-02-23 21:22       ` Stefano Brivio
2020-02-25 12:39         ` Pablo Neira Ayuso
2020-02-25 12:45           ` Stefano Brivio
2020-02-25 13:13           ` Stefano Brivio
2020-02-25 13:42             ` Pablo Neira Ayuso
2020-02-25 14:34               ` Stefano Brivio
2020-02-25 18:48                 ` Phil Sutter
2020-02-25 19:33                   ` Stefano Brivio
2020-02-25 20:21                 ` Pablo Neira Ayuso
2020-02-25 20:38                   ` Stefano Brivio
2020-02-25 20:58                     ` Pablo Neira Ayuso
2020-02-26 10:58                       ` Pablo Neira Ayuso
2020-02-26 11:01                         ` Pablo Neira Ayuso
2020-02-26 11:02                         ` Stefano Brivio
2020-02-26 11:29                           ` Pablo Neira Ayuso
2020-02-26 11:36                             ` Stefano Brivio
2020-02-26 11:53                               ` Pablo Neira Ayuso
2020-02-26 10:59                       ` Stefano Brivio
2020-02-26 11:10                         ` Pablo Neira Ayuso
2020-02-26 11:19                           ` Stefano Brivio
2020-02-26 11:34                             ` Pablo Neira Ayuso
2020-02-26 11:39                               ` Stefano Brivio
2020-02-26 11:54                                 ` Stefano Brivio
2020-02-26 12:10                                   ` Pablo Neira Ayuso
2020-02-26 13:33 ` Pablo Neira Ayuso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.