All of lore.kernel.org
 help / color / mirror / Atom feed
* Moving from ipset to nftables: Sets not ready for prime time yet?
@ 2020-07-02 22:30 Timo Sigurdsson
  2020-07-03  9:28 ` Stefano Brivio
  2020-07-30 19:27 ` Pablo Neira Ayuso
  0 siblings, 2 replies; 15+ messages in thread
From: Timo Sigurdsson @ 2020-07-02 22:30 UTC (permalink / raw)
  To: netfilter-devel

Hi,

I'm currently migrating my various iptables/ipset setups to nftables. The nftables syntax is a pleasure and for the most part the transition of my rulesets has been smooth. Moving my ipsets to nftables sets, however, has proven to be a major pain point - to a degree where I started wondering whether nftables sets are actually ready to replace existing ipset workflows yet.

Before I go into the various issues I encountered with nftables sets, let me briefly explain what my ipset workflow looked like. On gateways that forward traffic, I use ipsets for blacklisting. I fetch blacklists from various sources regularly, convert them to files that can be loaded with `ipset restore', load them into a new ipset and then replace the old ipset with the new one with `ipset swap`. Since some of my blacklists may contain the same addresses or ranges, I use ipsets' -exist switch when loading multiple blacklists into one ipset. This approach has worked for me for quite some time.

Now, let's get to the issues I encountered:

1) Auto-merge issues
Initially, I intended to use the auto-merge feature as a means of dealing with duplicate addresses in the various source lists I use. The first issue I encountered was that it's currently not possible to add an element to a set if it already exists in the set or is part or an interval in the set, despite the auto-merge flag set. This has been reported already by someone else [1] and the only workaround seems to be to add all addresses at once (within one 'add element' statement).

Another issue I stumbled upon was that auto-merge may actually generate wrong/incomplete intervals if you have multiple 'add element' statements within an nftables script file. I consider this a serious issue if you can't be sure whether the addresses or intervals you add to a set actually end up in the set. I reported this here [2]. The workaround for it is - again - to add all addresses in a single statement.

The third auto-merge issue I encountered is another one that has been reported already by someone else [3]. It is that the auto-merge flag actually makes it impossible to update the set atomically. Oh, well, let's abandon auto-merge altogether for now...
 
2) Atomic reload of large sets unbearably slow
Moving on without the auto-merge feature, I started testing sets with actual lists I use. The initial setup (meaning populating the sets for the first time) went fine. But when I tried to update them atomically, i.e. use a script file that would have a 'flush set' statement in the beginning and then an 'add element' statement with all the addresses I wanted to add to it, the system seemed to lock up. As it turns out, updating existing large sets is excessively slow - to a point where it becomes unusable if you work with multiple large sets. I reported the details including an example and performance indicators here [4]. The only workaround for this (that keeps atomicity) I found so far is to reload the complete firewall configuration including the set definitions. But that has other unwanted side-effects such as resetting all counters and so on.

3) Referencing sets within a set not possible
As a workaround for the auto-merge issues described above (and also for another use case), I was looking into the possibility to reference sets within a set so I could create a set for each source list I use and reference them in a single set so I could match them all at once without duplicating rules for multiple sets. To be clear, I'm not really sure whether this is supposed to work all. I found some commits which suggested to me it might be possible [5][6]. Nevertheless, I couldn't get this to work.

Summing up:
Well, that's quite a number of issues to run into as an nftables newbie. I wouldn't have expected this at all. And frankly, I actually converted my rules first and thought adjusting my scripts around ipset to achieve the same with nftables sets would be straightforward and simple... Maybe my approach or understanding of nftables is wrong. But I don't think that the use case is that extraordinary that it should be that difficult.

In any case, if anyone has any tips or workarounds to speed up the atomic reload of large sets, I'd be happy to hear (or read) them. Same goes for referencing sets within sets. If this should be possible to do, I'd appreaciate any hints to the correct syntax to do so.
Are there better approaches to deal with large sets regularly updated from various sources?


Cheers,

Timo


[1] https://www.spinics.net/lists/netfilter/msg58937.html
[2] https://bugzilla.netfilter.org/show_bug.cgi?id=1438
[3] https://bugzilla.netfilter.org/show_bug.cgi?id=1404
[4] https://bugzilla.netfilter.org/show_bug.cgi?id=1439
[5] http://git.netfilter.org/nftables/commit/?h=v0.9.0&id=a6b75b837f5e851c80f8f2dc508b11f1693af1b3
[6] http://git.netfilter.org/nftables/commit/?h=v0.9.0&id=bada2f9c182dddf72a6d3b7b00c9eace7eb596c3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Moving from ipset to nftables: Sets not ready for prime time yet?
  2020-07-02 22:30 Moving from ipset to nftables: Sets not ready for prime time yet? Timo Sigurdsson
@ 2020-07-03  9:28 ` Stefano Brivio
  2020-07-03 10:24   ` Jozsef Kadlecsik
  2020-07-03 14:03   ` Timo Sigurdsson
  2020-07-30 19:27 ` Pablo Neira Ayuso
  1 sibling, 2 replies; 15+ messages in thread
From: Stefano Brivio @ 2020-07-03  9:28 UTC (permalink / raw)
  To: Timo Sigurdsson; +Cc: netfilter-devel, Phil Sutter, Jozsef Kadlecsik

Hi Timo,

On Fri,  3 Jul 2020 00:30:10 +0200 (CEST)
"Timo Sigurdsson" <public_timo.s@silentcreek.de> wrote:

> Another issue I stumbled upon was that auto-merge may actually
> generate wrong/incomplete intervals if you have multiple 'add
> element' statements within an nftables script file. I consider this a
> serious issue if you can't be sure whether the addresses or intervals
> you add to a set actually end up in the set. I reported this here
> [2]. The workaround for it is - again - to add all addresses in a
> single statement.

Practically speaking I think it's a bug, but I can't find a formal,
complete definition of automerge, so one can also say it "adds items up
to and including the first conflicting one", and there you go, it's
working as intended.

In general, when we discussed this "automerge" feature for
multi-dimensional sets in nftables (not your case, but I aimed at
consistency), I thought it was a mistake to introduce it altogether,
because it's hard to define it and whatever definition one comes up
with might not match what some users think. Consider this example:

# ipset create s hash:net,net
# ipset add s 10.0.1.1/30,192.168.1.1/24
# ipset add s 10.0.0.1/24,172.16.0.1
# ipset list s
[...]
Members:
10.0.1.0/30,192.168.1.0/24
10.0.0.0/24,172.16.0.1

good, ipset has no notion of automerge, so it won't try to do anything
bad here: the set of address pairs denoted by <10.0.1.1/30,
192.168.1.1/24> is disjoint from the set of address pairs denoted by
<10.0.0.1/24, 172.16.0.1>. Then:

# ipset add s 10.0.0.1/16,192.168.0.0/16
# ipset list s
[...]
Members:
10.0.1.0/30,192.168.1.0/24
10.0.0.0/16,192.168.0.0/16
10.0.0.0/24,172.16.0.1

and, as expected with ipset, we have entirely overlapping entries added
to the set. Is that a problem? Not really, ipset doesn't support maps,
so it doesn't matter which entry is actually matched.

# nft add table t
# nft add set t s '{ type ipv4_addr . ipv4_addr; flags interval ; }'
# nft add element t s '{ 10.0.1.1/30 . 192.168.1.1/24 }'
# nft add element t s '{ 10.0.0.1/24 . 172.16.0.1 }'
# nft add element t s '{ 10.0.0.1/16 . 192.168.0.0/16 }'
# nft list ruleset
table ip t {
	set s {
		type ipv4_addr . ipv4_addr
		flags interval
		elements = { 10.0.1.0/30 . 192.168.1.0/24,
			     10.0.0.0/24 . 172.16.0.1,
			     10.0.0.0/16 . 192.168.0.0/16 }
	}
}

also fine: the least generic entry is added first, so it matches first.
Let's try to reorder the insertions:

# nft add element t s '{ 10.0.0.1/16 . 192.168.0.0/16 }'
# nft add element t s '{ 10.0.0.1/24 . 172.16.0.1 }'
# nft add element t s '{ 10.0.1.1/30 . 192.168.1.1/24 }'
Error: Could not process rule: File exists
add element t s { 10.0.1.1/30 . 192.168.1.1/24 }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

...because that entry would never match anything: it's inserted after a
more generic one that already covers it completely, and we'd like to
tell the user that it doesn't make sense.

Now, this is pretty much the only advantage of not allowing overlaps:
telling the user that some insertion doesn't make sense, and thus it
was probably not what the user wanted to do.

So... I wouldn't know how deal with your use case, even in theory, in a
consistent way. Should we rather introduce a flag that allows any type
of overlapping (default with ipset), which is a way for the user to
tell us they don't actually care about entries not having any effect?

And, in that case, would you expect the entry to be listed in the
resulting set, in case of full overlap (where one set is a subset, not
necessarily proper, of the other one)?

> [...]
>
> Summing up:
> Well, that's quite a number of issues to run into as an nftables
> newbie. I wouldn't have expected this at all. And frankly, I actually
> converted my rules first and thought adjusting my scripts around
> ipset to achieve the same with nftables sets would be straightforward
> and simple... Maybe my approach or understanding of nftables is
> wrong. But I don't think that the use case is that extraordinary that
> it should be that difficult.

I don't think so either, still I kind of expect to see the issues you
report as these features seem to start being heavily used just recently.

And I (maybe optimistically) think that all we need to iron out the
most apparent issues on the subject is a few reports like yours, so
thanks for sharing it.

-- 
Stefano


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Moving from ipset to nftables: Sets not ready for prime time yet?
  2020-07-03  9:28 ` Stefano Brivio
@ 2020-07-03 10:24   ` Jozsef Kadlecsik
  2020-07-03 13:38     ` Stefano Brivio
  2020-07-03 14:03   ` Timo Sigurdsson
  1 sibling, 1 reply; 15+ messages in thread
From: Jozsef Kadlecsik @ 2020-07-03 10:24 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Timo Sigurdsson, netfilter-devel, Phil Sutter

Hi Stefano,

On Fri, 3 Jul 2020, Stefano Brivio wrote:

> On Fri,  3 Jul 2020 00:30:10 +0200 (CEST)
> "Timo Sigurdsson" <public_timo.s@silentcreek.de> wrote:
> 
> > Another issue I stumbled upon was that auto-merge may actually
> > generate wrong/incomplete intervals if you have multiple 'add
> > element' statements within an nftables script file. I consider this a
> > serious issue if you can't be sure whether the addresses or intervals
> > you add to a set actually end up in the set. I reported this here
> > [2]. The workaround for it is - again - to add all addresses in a
> > single statement.
> 
> Practically speaking I think it's a bug, but I can't find a formal,
> complete definition of automerge, so one can also say it "adds items up
> to and including the first conflicting one", and there you go, it's
> working as intended.
> 
> In general, when we discussed this "automerge" feature for
> multi-dimensional sets in nftables (not your case, but I aimed at
> consistency), I thought it was a mistake to introduce it altogether,
> because it's hard to define it and whatever definition one comes up
> with might not match what some users think. Consider this example:
> 
> # ipset create s hash:net,net
> # ipset add s 10.0.1.1/30,192.168.1.1/24
> # ipset add s 10.0.0.1/24,172.16.0.1
> # ipset list s
> [...]
> Members:
> 10.0.1.0/30,192.168.1.0/24
> 10.0.0.0/24,172.16.0.1
> 
> good, ipset has no notion of automerge, so it won't try to do anything
> bad here: the set of address pairs denoted by <10.0.1.1/30,
> 192.168.1.1/24> is disjoint from the set of address pairs denoted by
> <10.0.0.1/24, 172.16.0.1>. Then:
> 
> # ipset add s 10.0.0.1/16,192.168.0.0/16
> # ipset list s
> [...]
> Members:
> 10.0.1.0/30,192.168.1.0/24
> 10.0.0.0/16,192.168.0.0/16
> 10.0.0.0/24,172.16.0.1
> 
> and, as expected with ipset, we have entirely overlapping entries added
> to the set. Is that a problem? Not really, ipset doesn't support maps,
> so it doesn't matter which entry is actually matched.

Actually, the flags, extensions (nomatch, timeout, skbinfo, etc.) in ipset 
are some kind of mappings and do matter which entry is matched and which 
flags, extensions are applied to the matching packets.

Therefore the matching in the net kind of sets follow a strict ordering: 
most specific match wins and in the case of multiple dimensions (like 
net,net above) it goes from left to right to find the best most specific 
match.

> # nft add table t
> # nft add set t s '{ type ipv4_addr . ipv4_addr; flags interval ; }'
> # nft add element t s '{ 10.0.1.1/30 . 192.168.1.1/24 }'
> # nft add element t s '{ 10.0.0.1/24 . 172.16.0.1 }'
> # nft add element t s '{ 10.0.0.1/16 . 192.168.0.0/16 }'
> # nft list ruleset
> table ip t {
> 	set s {
> 		type ipv4_addr . ipv4_addr
> 		flags interval
> 		elements = { 10.0.1.0/30 . 192.168.1.0/24,
> 			     10.0.0.0/24 . 172.16.0.1,
> 			     10.0.0.0/16 . 192.168.0.0/16 }
> 	}
> }
> 
> also fine: the least generic entry is added first, so it matches first.
> Let's try to reorder the insertions:
> 
> # nft add element t s '{ 10.0.0.1/16 . 192.168.0.0/16 }'
> # nft add element t s '{ 10.0.0.1/24 . 172.16.0.1 }'
> # nft add element t s '{ 10.0.1.1/30 . 192.168.1.1/24 }'
> Error: Could not process rule: File exists
> add element t s { 10.0.1.1/30 . 192.168.1.1/24 }
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> ...because that entry would never match anything: it's inserted after a
> more generic one that already covers it completely, and we'd like to
> tell the user that it doesn't make sense.

I think sets should not store information about which order the entries 
were added. That should totally be indifferent. The input of the sets may 
come from countless sources and if the order of adding the entries matters 
then a preordering is required, which is sometimes non-trivial.
 
> Now, this is pretty much the only advantage of not allowing overlaps:
> telling the user that some insertion doesn't make sense, and thus it
> was probably not what the user wanted to do.

This makes also impossible to make exceptions in the sets in nftables - 
with the "nomatch" flag in ipset one can easily create exceptions in 
intentionally overlapping entries (in whatever deep nestings) in a single 
set. In practice it comes quite handy to say

ipset create access_to_servers hash:ip,port,net
ipset add access_to_servers your_ssh_server,22,x.y.z.0/24
ipset add access_to_servers your_ssh_server,22,x.y.z.32/27 nomatch
...

and exclude access to some parts of a given subnet.

However, the internals of the sets in nftables are totally different from 
ipset, so I'm pretty sure it's absolutely not trivial (and sometimes 
impossible) to provide exactly the same behaviour.

> So... I wouldn't know how deal with your use case, even in theory, in a
> consistent way. Should we rather introduce a flag that allows any type
> of overlapping (default with ipset), which is a way for the user to
> tell us they don't actually care about entries not having any effect?
> 
> And, in that case, would you expect the entry to be listed in the
> resulting set, in case of full overlap (where one set is a subset, not
> necessarily proper, of the other one)?
> 
> > [...]
> >
> > Summing up:
> > Well, that's quite a number of issues to run into as an nftables
> > newbie. I wouldn't have expected this at all. And frankly, I actually
> > converted my rules first and thought adjusting my scripts around
> > ipset to achieve the same with nftables sets would be straightforward
> > and simple... Maybe my approach or understanding of nftables is
> > wrong. But I don't think that the use case is that extraordinary that
> > it should be that difficult.
> 
> I don't think so either, still I kind of expect to see the issues you
> report as these features seem to start being heavily used just recently.
> 
> And I (maybe optimistically) think that all we need to iron out the
> most apparent issues on the subject is a few reports like yours, so
> thanks for sharing it.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Moving from ipset to nftables: Sets not ready for prime time yet?
  2020-07-03 10:24   ` Jozsef Kadlecsik
@ 2020-07-03 13:38     ` Stefano Brivio
  0 siblings, 0 replies; 15+ messages in thread
From: Stefano Brivio @ 2020-07-03 13:38 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: Timo Sigurdsson, netfilter-devel, Phil Sutter

Hi József,

On Fri, 3 Jul 2020 12:24:03 +0200 (CEST)
Jozsef Kadlecsik <kadlec@netfilter.org> wrote:

> On Fri, 3 Jul 2020, Stefano Brivio wrote:
> 
> > On Fri,  3 Jul 2020 00:30:10 +0200 (CEST)
> > "Timo Sigurdsson" <public_timo.s@silentcreek.de> wrote:
> >   
> > > Another issue I stumbled upon was that auto-merge may actually
> > > generate wrong/incomplete intervals if you have multiple 'add
> > > element' statements within an nftables script file. I consider this a
> > > serious issue if you can't be sure whether the addresses or intervals
> > > you add to a set actually end up in the set. I reported this here
> > > [2]. The workaround for it is - again - to add all addresses in a
> > > single statement.  
> > 
> > Practically speaking I think it's a bug, but I can't find a formal,
> > complete definition of automerge, so one can also say it "adds items up
> > to and including the first conflicting one", and there you go, it's
> > working as intended.
> > 
> > In general, when we discussed this "automerge" feature for
> > multi-dimensional sets in nftables (not your case, but I aimed at
> > consistency), I thought it was a mistake to introduce it altogether,
> > because it's hard to define it and whatever definition one comes up
> > with might not match what some users think. Consider this example:
> > 
> > # ipset create s hash:net,net
> > # ipset add s 10.0.1.1/30,192.168.1.1/24
> > # ipset add s 10.0.0.1/24,172.16.0.1
> > # ipset list s
> > [...]
> > Members:
> > 10.0.1.0/30,192.168.1.0/24
> > 10.0.0.0/24,172.16.0.1
> > 
> > good, ipset has no notion of automerge, so it won't try to do anything
> > bad here: the set of address pairs denoted by <10.0.1.1/30,  
> > 192.168.1.1/24> is disjoint from the set of address pairs denoted by  
> > <10.0.0.1/24, 172.16.0.1>. Then:
> > 
> > # ipset add s 10.0.0.1/16,192.168.0.0/16
> > # ipset list s
> > [...]
> > Members:
> > 10.0.1.0/30,192.168.1.0/24
> > 10.0.0.0/16,192.168.0.0/16
> > 10.0.0.0/24,172.16.0.1
> > 
> > and, as expected with ipset, we have entirely overlapping entries added
> > to the set. Is that a problem? Not really, ipset doesn't support maps,
> > so it doesn't matter which entry is actually matched.  
> 
> Actually, the flags, extensions (nomatch, timeout, skbinfo, etc.) in ipset 
> are some kind of mappings and do matter which entry is matched and which 
> flags, extensions are applied to the matching packets.

Oh, I didn't consider that.

> Therefore the matching in the net kind of sets follow a strict ordering: 
> most specific match wins and in the case of multiple dimensions (like 
> net,net above) it goes from left to right to find the best most specific 
> match.

And I didn't know about this either. Well, this looks a bit arbitrary
to me, also because there's no such thing as hash:port,net, so forcing
the left-to-right precedence won't cover all the possible cases anyway.

In nftables, as sets now support an arbitrary number of dimensions, in
an arbitrary order, that would require an explicit evaluation ordering,
which is actually not too hard to implement. I just doubt the usage
would be practical.

> > # nft add table t
> > # nft add set t s '{ type ipv4_addr . ipv4_addr; flags interval ; }'
> > # nft add element t s '{ 10.0.1.1/30 . 192.168.1.1/24 }'
> > # nft add element t s '{ 10.0.0.1/24 . 172.16.0.1 }'
> > # nft add element t s '{ 10.0.0.1/16 . 192.168.0.0/16 }'
> > # nft list ruleset
> > table ip t {
> > 	set s {
> > 		type ipv4_addr . ipv4_addr
> > 		flags interval
> > 		elements = { 10.0.1.0/30 . 192.168.1.0/24,
> > 			     10.0.0.0/24 . 172.16.0.1,
> > 			     10.0.0.0/16 . 192.168.0.0/16 }
> > 	}
> > }
> > 
> > also fine: the least generic entry is added first, so it matches first.
> > Let's try to reorder the insertions:
> > 
> > # nft add element t s '{ 10.0.0.1/16 . 192.168.0.0/16 }'
> > # nft add element t s '{ 10.0.0.1/24 . 172.16.0.1 }'
> > # nft add element t s '{ 10.0.1.1/30 . 192.168.1.1/24 }'
> > Error: Could not process rule: File exists
> > add element t s { 10.0.1.1/30 . 192.168.1.1/24 }
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > 
> > ...because that entry would never match anything: it's inserted after a
> > more generic one that already covers it completely, and we'd like to
> > tell the user that it doesn't make sense.  
> 
> I think sets should not store information about which order the entries 
> were added. That should totally be indifferent. The input of the sets may 
> come from countless sources and if the order of adding the entries matters 
> then a preordering is required, which is sometimes non-trivial.

As it comes for free, I think it's nice to leave this possibility open
for simple combinations. It doesn't introduce any ambiguity. It's not
an usage I would recommend anyway, but I don't see the harm.

> > Now, this is pretty much the only advantage of not allowing overlaps:
> > telling the user that some insertion doesn't make sense, and thus it
> > was probably not what the user wanted to do.  
> 
> This makes also impossible to make exceptions in the sets in nftables - 
> with the "nomatch" flag in ipset one can easily create exceptions in 
> intentionally overlapping entries (in whatever deep nestings) in a single 
> set. In practice it comes quite handy to say
> 
> ipset create access_to_servers hash:ip,port,net
> ipset add access_to_servers your_ssh_server,22,x.y.z.0/24
> ipset add access_to_servers your_ssh_server,22,x.y.z.32/27 nomatch
> ...
> 
> and exclude access to some parts of a given subnet.
> 
> However, the internals of the sets in nftables are totally different from 
> ipset, so I'm pretty sure it's absolutely not trivial (and sometimes 
> impossible) to provide exactly the same behaviour.

It's actually kind of trivial for nft_set_pipapo, for nft_set_hash it
doesn't apply (it doesn't implement intervals), and I'm not sure about
nft_set_rbtree right now.

However, does this really provide any value compared to having a
separate set for exceptions matched earlier in a chain?

If it really does, I think it could and should be done in userspace by
splitting the intervals. The kernel back-ends shouldn't be overloaded
with complexity that doesn't *need* to live there, and no matter what,
this is going to have a performance impact on the lookup (it should be
doable to avoid an explicit branch for this, but we can't avoid
fetching more bits per element).

Ideally, I would even like to drop the need for timeout and validity
checks as part of the lookup, because they are quite heavy (fetching
the 'extension' pointer, branches, etc.). It involves some internal API
refactoring and is actually on my motionless to-do list, but too far
from the surface to have any practical value.

-- 
Stefano


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Moving from ipset to nftables: Sets not ready for prime time yet?
  2020-07-03  9:28 ` Stefano Brivio
  2020-07-03 10:24   ` Jozsef Kadlecsik
@ 2020-07-03 14:03   ` Timo Sigurdsson
  1 sibling, 0 replies; 15+ messages in thread
From: Timo Sigurdsson @ 2020-07-03 14:03 UTC (permalink / raw)
  To: sbrivio; +Cc: netfilter-devel, phil, kadlec

Hi Stefano,

Stefano Brivio schrieb am 03.07.2020 11:28 (GMT +02:00):

> On Fri,  3 Jul 2020 00:30:10 +0200 (CEST)
> "Timo Sigurdsson" <public_timo.s@silentcreek.de> wrote:
> 
>> Another issue I stumbled upon was that auto-merge may actually
>> generate wrong/incomplete intervals if you have multiple 'add
>> element' statements within an nftables script file. I consider this a
>> serious issue if you can't be sure whether the addresses or intervals
>> you add to a set actually end up in the set. I reported this here
>> [2]. The workaround for it is - again - to add all addresses in a
>> single statement.
>> ...
>> [2] https://bugzilla.netfilter.org/show_bug.cgi?id=1438
> 
> Practically speaking I think it's a bug, but I can't find a formal,
> complete definition of automerge, so one can also say it "adds items up
> to and including the first conflicting one", and there you go, it's
> working as intended.

Actually, I think it's a bug regardless of how auto-merge is defined exactly. Simply because I don't think that this
  add element family table myset { A }
  add element family table myset { B }
should give a different result compared to this
  add element family table myset { A, B }

But that's basically what's happening in my example.

> In general, when we discussed this "automerge" feature for
> multi-dimensional sets in nftables (not your case, but I aimed at
> consistency), I thought it was a mistake to introduce it altogether,
> because it's hard to define it and whatever definition one comes up
> with might not match what some users think. 

I understand that depending on the use case, one may have different expectations and that merging entries may cause issues. One example I read before was about adding a single IP address to a set which already contains the full /24 interval. If the addition would be simply ignored, you either wouldn't be able to delete the entry again or you'd have to break up the interval when doing so. So, I understand that it's impossible to make everybody happy.

> Let's try to reorder the insertions:
> 
> # nft add element t s '{ 10.0.0.1/16 . 192.168.0.0/16 }'
> # nft add element t s '{ 10.0.0.1/24 . 172.16.0.1 }'
> # nft add element t s '{ 10.0.1.1/30 . 192.168.1.1/24 }'
> Error: Could not process rule: File exists
> add element t s { 10.0.1.1/30 . 192.168.1.1/24 }
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> ...because that entry would never match anything: it's inserted after a
> more generic one that already covers it completely, and we'd like to
> tell the user that it doesn't make sense.
> 
> Now, this is pretty much the only advantage of not allowing overlaps:
> telling the user that some insertion doesn't make sense, and thus it
> was probably not what the user wanted to do.

Two thoughts here:
1) If there is a real conflict, say in a verdict map with diverging entries, then sure, refusing to accept the conflicting entries is certainly useful. But if there is an overlap that bears no meaning other than you'd either have an additional entry in your set that would never be matched, then I think there should be a way to just allow this or ignore the addition altogether.

2) Another problem here in practice is that nft doesn't emit a warning but fails entirely if you load a large set from a script with a duplicate entry (without auto-merge). You could avoid this by adding each element individually, but that's very slow for large sets, so not really an option either.
 
> So... I wouldn't know how deal with your use case, even in theory, in a
> consistent way. Should we rather introduce a flag that allows any type
> of overlapping (default with ipset), which is a way for the user to
> tell us they don't actually care about entries not having any effect?

Giving advice here is difficult, since I'm not in any position to make judgements about other use cases that might be affected by this. But, generally speaking, yes I think some option that makes overlaps ignorable like `ipset -exist' would be useful as well as good documentation what is to be expected from each flag/option, so the user at least knows about the limitations. And if there is a way to determine this programmatically, there could be a distinction along the lines what I described above between conflicting statements (leading to an error) and statements that simply overlap or are superfluous (leading to a simple warning unless a flag/option is used to allow and ignore them).

> And, in that case, would you expect the entry to be listed in the
> resulting set, in case of full overlap (where one set is a subset, not
> necessarily proper, of the other one)?

I would have expected the entries to be merged, so if I add 192.168.0.0/24 and then 192.168.0.2, I'd only expect to get 192.168.0.0/24 returned when listing the set. But even if both entries were added and the second one would never match, that would be fine for my use case. The problem is, when working with multiple blacklists, I cannot keep track of which entries are already added to the set, especially if one list may contain complete networks and another list contains individual IP addresses.

Thanks and regards,

Timo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Moving from ipset to nftables: Sets not ready for prime time yet?
  2020-07-02 22:30 Moving from ipset to nftables: Sets not ready for prime time yet? Timo Sigurdsson
  2020-07-03  9:28 ` Stefano Brivio
@ 2020-07-30 19:27 ` Pablo Neira Ayuso
  1 sibling, 0 replies; 15+ messages in thread
From: Pablo Neira Ayuso @ 2020-07-30 19:27 UTC (permalink / raw)
  To: Timo Sigurdsson; +Cc: netfilter-devel

On Fri, Jul 03, 2020 at 12:30:10AM +0200, Timo Sigurdsson wrote:
> Hi,
> 
> I'm currently migrating my various iptables/ipset setups to nftables. The nftables syntax is a pleasure and for the most part the transition of my rulesets has been smooth. Moving my ipsets to nftables sets, however, has proven to be a major pain point - to a degree where I started wondering whether nftables sets are actually ready to replace existing ipset workflows yet.
[...]
> 2) Atomic reload of large sets unbearably slow
> Moving on without the auto-merge feature, I started testing sets with actual lists I use. The initial setup (meaning populating the sets for the first time) went fine. But when I tried to update them atomically, i.e. use a script file that would have a 'flush set' statement in the beginning and then an 'add element' statement with all the addresses I wanted to add to it, the system seemed to lock up. As it turns out, updating existing large sets is excessively slow - to a point where it becomes unusable if you work with multiple large sets. I reported the details including an example and performance indicators here [4]. The only workaround for this (that keeps atomicity) I found so far is to reload the complete firewall configuration including the set definitions. But that has other unwanted side-effects such as resetting all counters and so on.
> 
> 3) Referencing sets within a set not possible
> As a workaround for the auto-merge issues described above (and also for another use case), I was looking into the possibility to reference sets within a set so I could create a set for each source list I use and reference them in a single set so I could match them all at once without duplicating rules for multiple sets. To be clear, I'm not really sure whether this is supposed to work all. I found some commits which suggested to me it might be possible [5][6]. Nevertheless, I couldn't get this to work.

For the record, these two issues are now fixed in git.

Thank you for reporting.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Moving from ipset to nftables: Sets not ready for prime time yet?
  2020-07-02 23:18 Timo Sigurdsson
  2020-07-03  7:04 ` G.W. Haywood
@ 2020-07-14 13:27 ` Timo Sigurdsson
  1 sibling, 0 replies; 15+ messages in thread
From: Timo Sigurdsson @ 2020-07-14 13:27 UTC (permalink / raw)
  To: netfilter

Hi again,

just a quick follow-up: I came across yet another issue trying to replace or reload native sets atomically. It leads me to conclude that the atomic handling of sets is pretty much broken or unusable at this point.

While I was previously under the impression that atomic reloads of sets were only problematic when either using the auto-merge flag or very large sets as describe in my first email, I have now figured out that a much more basic case also does not work and that is attempting to change sets with intervals (without auto-merge).

Quick example - create a test set:
  `nft add set inet filter testset { type ipv4_addr; flags interval; }'

Now create a script file a.nft with the following content to pupulate the set:
  flush set inet filter testset
  add element inet filter testset { 192.168.0.0/16 }

Load the file with `nft -f a.nft' and it will work just fine, even repeatedly.

But now try this example b.nft:
  flush set inet filter testset
  add element inet filter testset { 192.168.0.0/24 }

Trying to run `nft -f b.nft' will result in the error:
  Interval overlaps with an existing one

The reason why I haven't encountered this issue earlier is that in most of my experiments I was trying to either reload the same set, which works fine, or reload a set with changes in terms of added or deleted elements, which also works fine. It only breaks when you try to change the extent of an existing interval despite the flush statement in the beginning of the script file. I found that the issue was already reported by someone else and I have now updated it with additional information:
https://bugzilla.netfilter.org/show_bug.cgi?id=1431

In any case, that pretty much defeats all of my attempts to work around the issues I laid out earlier. The only way around it is to reload the entire ruleset with all the downside that comes with.

I am now thinking about scripting my way around the atomic handling of sets with nft entirely, like creating a new set, populating it, inserting a new rule to match the new set, then flush the old set and populate it with the new contents and then delete the new set and inserted rule again. This would kind of mimic the behavior of `ipset swap' just more complicated and with some overhead...

Regards,

Timo



Timo Sigurdsson schrieb am 03.07.2020 01:18 (GMT +02:00):

> P.S. Sorry, I sent this message to netfilter-devel first as I was already
> subscribed to that list and only realized later that the netfilter list would
> be a better place to post this to. Hence, one more time to this list...
> 
> 
> Hi,
> 
> I'm currently migrating my various iptables/ipset setups to nftables. The
> nftables syntax is a pleasure and for the most part the transition of my
> rulesets has been smooth. Moving my ipsets to nftables sets, however, has
> proven to be a major pain point - to a degree where I started wondering whether
> nftables sets are actually ready to replace existing ipset workflows yet.
> 
> Before I go into the various issues I encountered with nftables sets, let me
> briefly explain what my ipset workflow looked like. On gateways that forward
> traffic, I use ipsets for blacklisting. I fetch blacklists from various sources
> regularly, convert them to files that can be loaded with `ipset restore', load
> them into a new ipset and then replace the old ipset with the new one with
> `ipset swap`. Since some of my blacklists may contain the same addresses or
> ranges, I use ipsets' -exist switch when loading multiple blacklists into one
> ipset. This approach has worked for me for quite some time.
> 
> Now, let's get to the issues I encountered:
> 
> 1) Auto-merge issues
> Initially, I intended to use the auto-merge feature as a means of dealing with
> duplicate addresses in the various source lists I use. The first issue I
> encountered was that it's currently not possible to add an element to a set if
> it already exists in the set or is part or an interval in the set, despite the
> auto-merge flag set. This has been reported already by someone else [1] and the
> only workaround seems to be to add all addresses at once (within one 'add
> element' statement).
> 
> Another issue I stumbled upon was that auto-merge may actually generate
> wrong/incomplete intervals if you have multiple 'add element' statements within
> an nftables script file. I consider this a serious issue if you can't be sure
> whether the addresses or intervals you add to a set actually end up in the set.
> I reported this here [2]. The workaround for it is - again - to add all
> addresses in a single statement.
> 
> The third auto-merge issue I encountered is another one that has been reported
> already by someone else [3]. It is that the auto-merge flag actually makes it
> impossible to update the set atomically. Oh, well, let's abandon auto-merge
> altogether for now...
>  
> 2) Atomic reload of large sets unbearably slow
> Moving on without the auto-merge feature, I started testing sets with actual
> lists I use. The initial setup (meaning populating the sets for the first time)
> went fine. But when I tried to update them atomically, i.e. use a script file
> that would have a 'flush set' statement in the beginning and then an 'add
> element' statement with all the addresses I wanted to add to it, the system
> seemed to lock up. As it turns out, updating existing large sets is excessively
> slow - to a point where it becomes unusable if you work with multiple large
> sets. I reported the details including an example and performance indicators
> here [4]. The only workaround for this (that keeps atomicity) I found so far is
> to reload the complete firewall configuration including the set definitions.
> But that has other unwanted side-effects such as resetting all counters and so
> on.
> 
> 3) Referencing sets within a set not possible
> As a workaround for the auto-merge issues described above (and also for another
> use case), I was looking into the possibility to reference sets within a set so
> I could create a set for each source list I use and reference them in a single
> set so I could match them all at once without duplicating rules for multiple
> sets. To be clear, I'm not really sure whether this is supposed to work all. I
> found some commits which suggested to me it might be possible [5][6].
> Nevertheless, I couldn't get this to work.
> 
> Summing up:
> Well, that's quite a number of issues to run into as an nftables newbie. I
> wouldn't have expected this at all. And frankly, I actually converted my rules
> first and thought adjusting my scripts around ipset to achieve the same with
> nftables sets would be straightforward and simple... Maybe my approach or
> understanding of nftables is wrong. But I don't think that the use case is that
> extraordinary that it should be that difficult.
> 
> In any case, if anyone has any tips or workarounds to speed up the atomic
> reload of large sets, I'd be happy to hear (or read) them. Same goes for
> referencing sets within sets. If this should be possible to do, I'd appreaciate
> any hints to the correct syntax to do so.
> Are there better approaches to deal with large sets regularly updated from
> various sources?
> 
> 
> Cheers,
> 
> Timo
> 
> 
> [1] https://www.spinics.net/lists/netfilter/msg58937.html
> [2] https://bugzilla.netfilter.org/show_bug.cgi?id=1438
> [3] https://bugzilla.netfilter.org/show_bug.cgi?id=1404
> [4] https://bugzilla.netfilter.org/show_bug.cgi?id=1439
> [5]
> http://git.netfilter.org/nftables/commit/?h=v0.9.0&id=a6b75b837f5e851c80f8f2dc508b11f1693af1b3
> [6]
> http://git.netfilter.org/nftables/commit/?h=v0.9.0&id=bada2f9c182dddf72a6d3b7b00c9eace7eb596c3
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Moving from ipset to nftables: Sets not ready for prime time yet?
  2020-07-08 10:36         ` Pablo Neira Ayuso
  2020-07-08 10:48           ` Reindl Harald
@ 2020-07-09  4:40           ` Trent W. Buck
  1 sibling, 0 replies; 15+ messages in thread
From: Trent W. Buck @ 2020-07-09  4:40 UTC (permalink / raw)
  To: netfilter

Pablo Neira Ayuso <pablo@netfilter.org> writes:

>> "iptables-translate" comments out much more than just upset related
>> stuff, in my case xt_recent and connlimit rules are also just comments
>
> If you could post what kind of rule examples are commented out, it
> would help us keep this in the radar.
>
> It is not too hard to add new translations, there is a _xlate()
> function under iptables/extensions/libxt_*.c that provides the
> translation. The important thing is to validate that the translation
> is semantically equivalent, or if not possible, provide a close
> translation.

Here's the one that bit me when I started with nft:

    ## An automated SSH brute-force blacklist.  Requires xtables.  Unlike
    ## fail2ban or DenyHosts, there are NO userspace requirements -- not
    ## even sshd is needed!  echo +1.2.3.4 >/proc/net/xt_recent/whitelist
    ## to whitelist 1.2.3.4 for an hour.  Protects both this host AND all
    ## hosts "behind" this one.
    ##
    # New connections from IPs blacklisted within the last ten minutes are
    # chaotically rejected, AND reset the countdown back to ten minutes.
    # This is in PRELUDE such that blacklisted attackers are refused ALL
    # services, not just rate-limited ones.
    -A PRELUDE -m recent --name blacklist --update --seconds 600 --rttl -j BLACKLIST
    # This NON-TERMINAL chain counts connections passing through it.  When
    # a connection rate exceeds 3/min/srcip/dstip/dstport, the source IP
    # is blacklisted.  Acting on the blacklist is done elsewhere, as is
    # accepting or rejecting this connection.
    -A PRELUDE -i ppp+ -p tcp --dport ssh -m hashlimit --hashlimit-name maybe-blacklist --hashlimit-mode srcip,dstip,dstport --hashlimit-above 1/min --hashlimit-burst 3 -m recent --name blacklist --set -j LOG --log-prefix "Blacklisted SRC: "
    -A BLACKLIST -m recent --name whitelist --rcheck --seconds 3600 -j RETURN -m comment --comment "whitelist overrides blacklist"
    -A BLACKLIST -j CHAOS --tarpit

Here's the hand-written translation (not exactly the same):

    ## An automated SSH (et al) brute-force blacklist.
    ##
    ## The nominal goal is to nerf brute-force password guessing.
    ## Since I disable password auth, the REAL goal is to reduce the
    ## amount of spam in my SSH auth log.
    ##
    ## (Running SSH on a non-standard port would also work, but
    ## I want to benefit from ISPs giving preferential QOS to 22/tcp).
    ##
    ## 1. if you brute-force port X more than Y times/minute,
    ##    you're blacklisted for Z minutes.
    ##
    ## 2. if you are blacklisted and make ANY connection,
    ##    you're blacklisted for Z minutes (i.e. countdown resets).
    ##
    ## 3. if you are blacklisted, all your new flows are dropped.
    ##    (We used to TARPIT, to tie up attacker resources.
    ##    That used xtables-addons and isn't supported in nftables 0.9.1.)
    ##
    ## Compared to sshguard or fail2ban or DenyHosts:
    ##
    ##  BONUS: installed on a gateway, protects the entire network.
    ##
    ##  BONUS: works even when syslogd is down, or /var/log is full, or
    ##         the syslog "access denied" log format changes.
    ##
    ##  BONUS: works even when sshd (or whatever) is down.
    ##         That is, if the host is off, the gateway will still trigger.
    ##
    ##  BONUS: works even when sshd (or whatever) is unused.
    ##         If you never even run FTP or RDP, trigger on them!
    ##
    ##  MALUS: cannot ignore legitimate traffic.
    ##
    ##         For SSH, you can mitigate this by forcing your users to
    ##         use ControlMaster.
    ##
    ##         For HTTPS and IMAPS, you're screwed --- those ALWAYS
    ##         make 30+ connections at once (in IMAP's case, because
    ##         IDLE extension sucks).
    ##
    ##         You can also mitigate this by having a "backdoor" open
    ##         while blacklisted, which adds you to a temporary
    ##         whitelist if you port knock in the right sequence.
    ##
    ##         The port knock sequence is a pre-shared key to your end
    ##         users, with all the problems that a PSK involves!
    ##
    ##  MALUS: easy for an attacker to spoof SYNs to block a legitimate user?
    ##         (See port knock mitigation, above)
    ##
    ##  MALUS: because we run this AFTER "ct state established accept",
    ##         connections that are "in flight" when the ban hits
    ##         are allowed to complete.
    ##
    ##         This happens in the wild where the attacker makes 100
    ##         SSH connections in 1 second.
    ##
    ##         The alternative is to run this (relatively expensive)
    ##         check on EVERY packet, instead of once per flow.
    ##
    ## You can see the current state of the list with:
    ##
    ##     nft list set inet my_filter my_IPS_IPv4_blacklist
    ##     nft list set inet my_filter my_IPS_IPv6_blacklist
    ##
    ## I recommend:
    ##
    ##   * this IPS for low-rate (SSH w/ ControlMaster) and unused (FTP, RDP) services,
    ##     on gateways, for flows originating from the internet / upstream.
    ##
    ##     For a list of ports to (maybe) IPS guard, consider the first N lines of:
    ##
    ##         sort -rnk3 /usr/share/nmap/nmap-services
    ##
    ##   * a basic firewall, and sshguard, on every host that runs a relevant service.
    ##     (This includes SSH, so basically everything.)
    ##     This also covers legitimately bursty traffic on imaps.
    ##     Does this cover submission 587/tcp (postfix)?
    ##
    ##   * EXCEPT, sshguard doesn't do apache or nginx, so fail2ban on the www hosts?
    ##     UPDATE: sshguard supports apache/nginx if you tell it to read
    ##     the relevant NCSA-format logfile.
    ##
    ##   * postscreen covers smtp (25/tcp).

    ## FIXME: per https://wiki.dovecot.org/Authentication/Penalty, we
    ##        should meter/block IPv6 sources by /48 instead of by single address (as we do for IPv4).
    ##        Each corresponds to the typical allocation of a single ISP subscriber.

    chain my_IPS {
        ct state != new  return  comment "Operate per-flow, not per-packet (my_prologue guarantees this anyway)"
        iiftype != ppp   return  comment "IPS only protects against attacks from the internet"

        # Track the rate of new connections (my_IPS_IPvX_meter).
        # If someone (ip saddr) connects to a service (ip daddr . tcp dport) too often,
        # then blacklist them (my_IPS_IPvX_blacklist).
        tcp dport @my_IPS_TCP_ports  \
            add @my_IPS_IPv4_meter { ip saddr . ip daddr . tcp dport  limit rate over 1/minute  burst 3 packets }  \
            add @my_IPS_IPv4_blacklist { ip saddr }  \
            log level audit log prefix "Blacklist SRC: "
        tcp dport @my_IPS_TCP_ports  \
            add @my_IPS_IPv6_meter { ip6 saddr . ip6 daddr . tcp dport  limit rate over 1/minute  burst 3 packets }  \
            add @my_IPS_IPv6_blacklist { ip6 saddr }  \
            log level audit log prefix "Blacklist SRC: "

        # If someone is NOT whitelisted, and IS blacklisted, then drop their connection, AND reset their countdown (hence "update" not "add").
        # In other words, once blacklisted for brute-forcing SSH, you REMAIN blacklisted until you STFU for a while (on ALL ports).
        ip  saddr != @my_IPS_IPv4_whitelist  ip  saddr @my_IPS_IPv4_blacklist  update @my_IPS_IPv4_blacklist { ip  saddr }  drop
        ip6 saddr != @my_IPS_IPv6_whitelist  ip6 saddr @my_IPS_IPv6_blacklist  update @my_IPS_IPv6_blacklist { ip6 saddr }  drop
    }
    set my_IPS_IPv4_meter     { type ipv4_addr . ipv4_addr . inet_service; timeout 10m; flags dynamic; }
    set my_IPS_IPv6_meter     { type ipv6_addr . ipv6_addr . inet_service; timeout 10m; flags dynamic; }
    set my_IPS_IPv4_blacklist { type ipv4_addr; timeout 10m; }
    set my_IPS_IPv6_blacklist { type ipv6_addr; timeout 10m; }
    set my_IPS_IPv4_whitelist { type ipv4_addr; timeout 10h; }
    set my_IPS_IPv6_whitelist { type ipv6_addr; timeout 10h; }
    set my_IPS_TCP_ports      { type inet_service; elements={
            ssh,
            telnet,             # we don't use it
            ftp, ftps,          # we don't use it
            3389, 5900,         # we don't use it (VNC & RDP)
            pop3, pop3s, imap,  # we don't use it
            microsoft-ds,       # we don't use it (SMB)
            mysql, postgresql, ms-sql-s,  # we don't use it (from the internet, without a VPN)
            pptp,                         # we don't use it
            login,                        # we don't use it
        }; }
    # CONSIDERED AND REJECTED FOR my_IPS_TCP_ports
    # ============================================
    #
    #  * http, https:
    #
    #    HTTP/0.9 and HTTP/1.1 is one TCP connect per request.
    #
    #    HTTP/1.1 has workarounds that still suck due to head-of-line blocking.
    #    https://en.wikipedia.org/wiki/HTTP_persistent_connection
    #    https://en.wikipedia.org/wiki/HTTP_pipelining
    #
    #    HTTP/2 solves this fully, but is /de facto/ never used on port 80.
    #
    #    The end result is that as at August 2019,
    #    GUI browsers still routinely burst many HTTP connections to a single DST:DPT.
    #    This IPS only measures burstiness, so it can't work for HTTP/S.
    #
    #  * imaps:
    #
    #    If the server (and client) speak IMAP IDLE but not IMAP NOTIFY,
    #    the client will make ONE CONNECTION PER MAILBOX FOLDER.
    #    This looks very bursty, so the IPS can't do it's thing.
    #
    #    See also:
    #    https://tools.ietf.org/html/rfc5465
    #    https://wiki2.dovecot.org/Plugins/PushNotification  (??? -- different RFC)
    #    https://bugzilla.mozilla.org/show_bug.cgi?id=479133  (tbird)
    #    https://blog.jcea.es/posts/20141011-thunderbird_notify.html
    #    https://en.wikipedia.org/wiki/JMAP  (just ditch IMAP entirely)
    #
    # * smtp, submission:
    #
    #   For smtp (25/tcp), can't do shit because we have to talk to
    #   whatever the fuck crackhead MTAs are out there.
    #
    #   For submission, we could limit connection rate IFF we knew
    #   ALL STAFF were running an MSA that batched up the messages.
    #   We know that at least msmtp does not, so this is a no-go.
    #
    #   (Consider a manager sending 4+ one-liner "yes" or "do it!"
    #   emails in a single minute.  We might be able to mitigate this
    #   by matching on submission with a more forgiving burst limit,
    #   e.g. 1/min burst 10?  Otherwise, we have to rate-limit in the
    #   postfix->dovecot SASL backend, or the dovecot->ad LDAP
    #   backend.  UGH.)
    #
    # * msrpc:
    #
    #   FIXME: wtf even.  I don't want to read enough about this to
    #   know if it's reasonable to IPS it.
    #
    # * openvpn:
    #
    #   Normally UDP, and we currently only IPS TCP.
    #   Normally cert-based (but can use PSKs).
    #   Might be worth considering if we do this later.
    #
    # * ident:
    #
    #   I think when you irssi -c irc.oftc.net,
    #   OFTC tries to ident back to you?
    #   I don't want to accidentally block OFTC/Freenode.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Moving from ipset to nftables: Sets not ready for prime time yet?
  2020-07-08 10:36         ` Pablo Neira Ayuso
@ 2020-07-08 10:48           ` Reindl Harald
  2020-07-09  4:40           ` Trent W. Buck
  1 sibling, 0 replies; 15+ messages in thread
From: Reindl Harald @ 2020-07-08 10:48 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Trent W. Buck, netfilter



Am 08.07.20 um 12:36 schrieb Pablo Neira Ayuso:
> On Wed, Jul 08, 2020 at 12:16:18PM +0200, Reindl Harald wrote:
>>
>>
>> Am 08.07.20 um 09:51 schrieb Trent W. Buck:
>>> Reindl Harald <h.reindl@thelounge.net> writes:
>>>
>>>> Am 03.07.20 um 09:04 schrieb G.W. Haywood:
>>>>> On Fri, 3 Jul 2020, Timo Sigurdsson wrote:
>>>>>
>>>>>> ... I use ipsets for blacklisting.
>>>>>>     I fetch blacklists from various sources
>>>>>> ... This approach has worked for me for quite some time.
>>>>>> ... some of my blacklists may contain the same addresses or ranges,
>>>>>>     I use ipsets' -exist switch when loading
>>>>>> ... I don't think that the use case is that extraordinary ...
>>>>>
>>>>> +6
>>>>>
>>>>> FWIW I'll be following this thread very closely.
>>>>
>>>> it turned out at least with recent kernel and recent userland
>>>> "iptables-nft" can fully replace "iptables" and continue to use "ipset"
>>>> unchanged
>>>
>>> I tested this and you're right - it is working.  This surprised me!
>>>
>>> I saw these "commented out" rules in iptables-translate, where
>>> I (wrongly) assumed that meant the rule was completely inactive.
>>
>> "iptables-translate" comments out much more than just upset related
>> stuff, in my case xt_recent and connlimit rules are also just comments
> 
> If you could post what kind of rule examples are commented out, it
> would help us keep this in the radar.
> 
> It is not too hard to add new translations, there is a _xlate()
> function under iptables/extensions/libxt_*.c that provides the
> translation. The important thing is to validate that the translation
> is semantically equivalent, or if not possible, provide a close
> translation.

[root@firewall:~]$ iptables-restore-translate
--file=/etc/sysconfig/iptables | grep "^#"
# Translated by iptables-restore-translate v1.8.3 on Wed Jul  8 12:47:31
2020
# -t nat -A PREROUTING -d 172.17.0.0/24 -i wan -p icmp -m set
--match-set EXCLUDES_IPV4 src -m icmp --icmp-type 8 -j DNAT
--to-destination 172.16.0.1
# -t nat -A PREROUTING -d 172.17.0.0/24 -j NETMAP --to 172.16.0.0/24
# -t nat -A POSTROUTING -s 172.16.0.0/24 -o wan -j NETMAP --to
172.17.0.0/24
# -t nat -A POSTROUTING -o lan -m iprange --src-range
172.16.0.2-172.16.0.254 -m iprange --dst-range 172.16.0.2-172.16.0.254
-j NETMAP --to 172.17.0.0/24
# -t mangle -A PREROUTING -i wan -m set  --match-set EXCLUDES_IPV4 src
-j INBOUND
# -t mangle -A LD_SCAN -j SET --add-set BLOCKED_DYNAMIC_PORTSCAN_IPV4
src --exist
# -t mangle -A IN_DNS -m connlimit --connlimit-above 50 --connlimit-mask
32 --connlimit-saddr -j LD_C_ALL
# -t mangle -A IN_DNS -m recent --update --seconds 2 --reap --hitcount
60 --name dns --mask 255.255.255.255 --rsource -j LD_R_ALL
# -t mangle -A IN_DNS -m recent --set --name dns --mask 255.255.255.255
--rsource
# -t mangle -A IN_FTP -m recent --update --seconds 2 --reap --hitcount
20 --name ftp --mask 255.255.255.255 --rsource -j LD_R_ALL
# -t mangle -A IN_FTP -m recent --set --name ftp --mask 255.255.255.255
--rsource
# -t mangle -A IN_SSH -m recent --update --seconds 60 --reap --hitcount
15 --name ssh --mask 255.255.255.255 --rsource -j LD_R_ALL
# -t mangle -A IN_SSH -m recent --set --name ssh --mask 255.255.255.255
--rsource
# -t mangle -A IN_TCP -p tcp -m tcpmss --mss 1:500 -j DROP
# -t mangle -A IN_TCP -m set --match-set BLOCKED_DYNAMIC_MAIL_IPV4 src
-m set --match-set PORTS_MAIL dst -j DROP
# -t mangle -A IPST_ALL -j SET --add-set BLOCKED_DYNAMIC_IPV4 src --exist
# -t mangle -A INBOUND -m set --match-set PORTSCAN_PORTS dst -m set
--match-set HONEYPOT_IPS_IPV4 dst -j LD_SCAN
# -t mangle -A INBOUND -m recent --rcheck --seconds 2 --hitcount 200
--name all --mask 255.255.255.255 --rsource -j IPST_ALL
# -t mangle -A INBOUND -m recent --update --seconds 2 --hitcount 150
--name all --mask 255.255.255.255 --rsource -j LD_R_ALL
# -t mangle -A INBOUND -m recent --set --name all --mask 255.255.255.255
--rsource
# -t mangle -A INBOUND -m connlimit --connlimit-above 250
--connlimit-mask 24 --connlimit-saddr -j LD_C_ALL
# -t mangle -A INBOUND -m connlimit --connlimit-above 120
--connlimit-mask 32 --connlimit-saddr -j LD_C_ALL
# -t mangle -A INBOUND -m connlimit --connlimit-above 500
--connlimit-mask 16 --connlimit-saddr -j LD_C_ALL
# -t mangle -A INBOUND -m set --match-set DNS_PORTS dst -j IN_DNS
# -t raw -A IN_TCP -p tcp -m tcp --dport 21 -j CT --helper ftp
# -t raw -A INBOUND -m set --match-set BLOCKED_MERGED_IPV4 src -j DROP
# -t raw -A INBOUND -m set --match-set BLOCKED_DYNAMIC_PORTSCAN_IPV4 src
-j DROP
# -t filter -A INPUT -p tcp -m tcp --dport 10022 -m set --match-set
ADMIN_CLIENTS_IPV4 src -j ACCEPT
# -t filter -A INPUT -p tcp -m tcp --dport 5201 -m set --match-set
IPERF_IPV4 src -j ACCEPT
# -t filter -A INPUT -p icmp -m icmp --icmp-type 3/4 -m limit --limit
50/sec -j ACCEPT
# -t filter -A FORWARD -p icmp -m icmp --icmp-type 3/4 -m limit --limit
50/sec -j ACCEPT
# -t filter -A LD_SCAN -m set --match-set EXCLUDES_IPV4 src -j DROP
# -t filter -A LD_SCAN -j SET --add-set BLOCKED_DYNAMIC_PORTSCAN_IPV4
src --exist
# -t filter -A RESTRICT -s 172.16.0.253/32  -p icmp -m time --timestart
00:00:00 --timestop 05:30:00 --datestop 2038-01-19T03:14:07 -j DROP
# -t filter -A RESTRICT -p icmp -m icmp --icmp-type 3/4 -m limit --limit
50/sec -j ACCEPT
# -t filter -A RESTRICT -m set  --match-set PORTS_RESTRICTED dst -j LD_RST
# -t filter -A HONEYPOT -m connlimit --connlimit-upto 5 --connlimit-mask
24 --connlimit-saddr -m limit --limit 5/sec -m set --match-set
HONEYPOT_PORTS dst -j ACCEPT
# -t filter -A OUTBOUND -m set --match-set RESTRICTED_IPV4 src -j RESTRICT
# -t filter -A OUTBOUND -m set --match-set OUTBOUND_BLOCKED_PORTS dst -j
LD_OUT
# -t filter -A OUTBOUND -m set --match-set OUTBOUND_BLOCKED_SRC_IPV4 src
-j LD_OUT
# -t filter -A INTERNAL -m set --match-set RESTRICTED_IPV4 src -j RESTRICT
# -t filter -A INTERNAL -d 172.16.0.253/32 -m set --match-set
ADMIN_CLIENTS_IPV4 src -j ACCEPT
# -t filter -A IPST_MAIL -j SET --add-set BLOCKED_DYNAMIC_MAIL_IPV4 src
--exist
# -t filter -A LD_R_MAIL -m recent --rcheck --seconds 600 --reap
--hitcount 150 --name mail_ipset --mask 255.255.255.255 --rsource -j
IPST_MAIL
# -t filter -A LD_R_MAIL -m recent --set --name mail_ipset --mask
255.255.255.255 --rsource
# -t filter -A LD_R_MX -m recent --rcheck --seconds 60 --hitcount 50
--name mail_ipset --mask 255.255.255.255 --rsource -j IPST_MAIL
# -t filter -A LD_R_MX -m recent --set --name mail_ipset --mask
255.255.255.255 --rsource
# -t filter -A VPN -i lan -o vpn -m set --match-set
LAN_VPN_FORWARDING_IPV4 src -j ACCEPT
# -t filter -A VPN_IN -m set  --match-set INFRASTRUCTURE_IPV4 dst -j ACCEPT
# -t filter -A VPN_IN -m set --match-set ADMIN_CLIENTS_IPV4 src -j ACCEPT
# -t filter -A RL_MAIL -m connlimit --connlimit-above 75
--connlimit-mask 32 --connlimit-saddr -j LD_C_MAIL
# -t filter -A RL_MAIL -p tcp -m multiport --dports 25,465,587 -m recent
--update --seconds 300 --hitcount 80 --name mail_mta --mask
255.255.255.255 --rsource -j LD_R_MAIL
# -t filter -A RL_MAIL -p tcp -m multiport --dports 25,465,587 -m recent
--update --seconds 1800 --reap --hitcount 100 --name mail_mta --mask
255.255.255.255 --rsource -j LD_R_MAIL
# -t filter -A RL_MAIL -p tcp -m multiport --dports 25,465,587 -m recent
--set --name mail_mta --mask 255.255.255.255 --rsource
# -t filter -A RL_MAIL -p tcp -m multiport --dports 110,143,993,995 -m
recent --update --seconds 300 --hitcount 150 --name mail_mua --mask
255.255.255.255 --rsource -j LD_R_MAIL
# -t filter -A RL_MAIL -p tcp -m multiport --dports 110,143,993,995 -m
recent --update --seconds 1200 --reap --hitcount 250 --name mail_mua
--mask 255.255.255.255 --rsource -j LD_R_MAIL
# -t filter -A RL_MAIL -p tcp -m multiport --dports 110,143,993,995 -m
recent --set --name mail_mua --mask 255.255.255.255 --rsource
# -t filter -A RL_MX -m connlimit --connlimit-above 10 --connlimit-mask
32 --connlimit-saddr -j LD_C_MX
# -t filter -A RL_MX -m recent --update --seconds 2 --hitcount 5 --name
mail_mx --mask 255.255.255.255 --rsource -j LD_R_MX
# -t filter -A RL_MX -m recent --update --seconds 1800 --reap --hitcount
80 --name mail_mx --mask 255.255.255.255 --rsource -j LD_R_MX
# -t filter -A RL_MX -m recent --set --name mail_mx --mask
255.255.255.255 --rsource
# -t filter -A HST_05 -i wan -m set  --match-set EXCLUDES_IPV4 src -j
HST_05_RL
# -t filter -A HST_05_RL -m connlimit --connlimit-above 50
--connlimit-mask 32 --connlimit-saddr -j LD_C_HST
# -t filter -A HST_05_RL -m recent --update --seconds 2 --reap
--hitcount 100 --name proxy --mask 255.255.255.255 --rsource -j LD_R_HST
# -t filter -A HST_05_RL -m recent --set --name proxy --mask
255.255.255.255 --rsource
# -t filter -A HST_11 -i wan -m set  --match-set EXCLUDES_IPV4 src -j
HST_11_RL
# -t filter -A HST_11_RL -m connlimit --connlimit-above 50
--connlimit-mask 32 --connlimit-saddr -j LD_C_HST
# -t filter -A HST_11_RL -m recent --update --seconds 2 --reap
--hitcount 100 --name arrakis --mask 255.255.255.255 --rsource -j LD_R_HST
# -t filter -A HST_11_RL -m recent --set --name arrakis --mask
255.255.255.255 --rsource
# -t filter -A HST_04 -i wan -m set  --match-set EXCLUDES_IPV4 src -j
HST_04_RL
# -t filter -A HST_04_RL -m connlimit --connlimit-above 50
--connlimit-mask 32 --connlimit-saddr -j LD_C_HST
# -t filter -A HST_04_RL -m recent --update --seconds 2 --reap
--hitcount 100 --name proxy --mask 255.255.255.255 --rsource -j LD_R_HST
# -t filter -A HST_04_RL -m recent --set --name proxy --mask
255.255.255.255 --rsource
# -t filter -A HST_06 -i wan -m set  --match-set EXCLUDES_IPV4 src -j
HST_06_RL
# -t filter -A HST_06_RL -m connlimit --connlimit-above 50
--connlimit-mask 32 --connlimit-saddr -j LD_C_HST
# -t filter -A HST_06_RL -m recent --update --seconds 2 --reap
--hitcount 100 --name arrakis --mask 255.255.255.255 --rsource -j LD_R_HST
# -t filter -A HST_06_RL -m recent --set --name arrakis --mask
255.255.255.255 --rsource
# -t filter -A HST_15 -i wan -m set  --match-set EXCLUDES_IPV4 src -j
HST_15_RL
# -t filter -A HST_15 -p tcp -m tcp --dport 588 -m set --match-set
EXCLUDES_IPV4 src -j ACCEPT
# -t filter -A HST_15_RL -m set --match-set PORTS_MAIL dst -j RL_MAIL
# -t filter -A HST_17 -i wan -m set  --match-set EXCLUDES_IPV4 src -j
HST_17_RL
# -t filter -A HST_17_RL -m connlimit --connlimit-above 50
--connlimit-mask 32 --connlimit-saddr -j LD_C_HST
# -t filter -A HST_17_RL -m recent --update --seconds 2 --reap
--hitcount 50 --name caladan --mask 255.255.255.255 --rsource -j LD_R_HST
# -t filter -A HST_17_RL -m recent --set --name caladan --mask
255.255.255.255 --rsource
# -t filter -A HST_17_RL -m set --match-set PORTS_MAIL dst -j RL_MAIL
# -t filter -A HST_19 -i wan -m set  --match-set EXCLUDES_IPV4 src -j
HST_19_RL
# -t filter -A HST_19 -p tcp -m tcp --dport 443 -m set --match-set
BAYES_SYNC_IPV4 src -j ACCEPT
# -t filter -A HST_21 -i wan -m set  --match-set EXCLUDES_IPV4 src -j
HST_21_RL
# -t filter -A HST_20 -i wan -m set  --match-set EXCLUDES_IPV4 src -j
HST_20_RL
# -t filter -A HST_30 -p tcp -m multiport --dports 5222,5269 -m set
--match-set JABBER_IPV4 src -j ACCEPT
# -t filter -A HST_30 -p tcp -m tcp --dport 873 -m set --match-set
RBL_SYNC_IPV4 src -j ACCEPT
# -t filter -A HST_30 -p tcp -m multiport --dports 5201,12865 -m set
--match-set IPERF_IPV4 src -j ACCEPT
# -t filter -A HST_10 -i wan -m set  --match-set EXCLUDES_IPV4 src -j
HST_10_RL
# -t filter -A HST_10_RL -m connlimit --connlimit-above 50
--connlimit-mask 32 --connlimit-saddr -j LD_C_HST
# -t filter -A HST_10_RL -m recent --update --seconds 2 --reap
--hitcount 50 --name arrakis --mask 255.255.255.255 --rsource -j LD_R_HST
# -t filter -A HST_10_RL -m recent --set --name arrakis --mask
255.255.255.255 --rsource
# -t filter -A HST_08 -i wan -m set  --match-set EXCLUDES_IPV4 src -j
HST_08_RL
# -t filter -A HST_08_RL -m connlimit --connlimit-above 50
--connlimit-mask 32 --connlimit-saddr -j LD_C_HST
# -t filter -A HST_08_RL -m recent --update --seconds 2 --reap
--hitcount 50 --name arrakis --mask 255.255.255.255 --rsource -j LD_R_HST
# -t filter -A HST_08_RL -m recent --set --name arrakis --mask
255.255.255.255 --rsource
# -t filter -A HST_09 -i wan -m set  --match-set EXCLUDES_IPV4 src -j
HST_09_RL
# -t filter -A HST_09_RL -m connlimit --connlimit-above 50
--connlimit-mask 32 --connlimit-saddr -j LD_C_HST
# -t filter -A HST_09_RL -m recent --update --seconds 2 --reap
--hitcount 50 --name arrakis --mask 255.255.255.255 --rsource -j LD_R_HST
# -t filter -A HST_09_RL -m recent --set --name arrakis --mask
255.255.255.255 --rsource
# -t filter -A HST_35 -i wan -m set  --match-set EXCLUDES_IPV4 src -j
HST_35_RL
# -t filter -A HST_35_RL -m connlimit --connlimit-above 50
--connlimit-mask 32 --connlimit-saddr -j LD_C_HST
# -t filter -A HST_35_RL -m recent --update --seconds 2 --reap
--hitcount 50 --name thebe --mask 255.255.255.255 --rsource -j LD_R_HST
# -t filter -A HST_35_RL -m recent --set --name thebe --mask
255.255.255.255 --rsource
# -t filter -A HST_34 -i wan -m set  --match-set EXCLUDES_IPV4 src -j
HST_34_RL
# -t filter -A HST_34_RL -m connlimit --connlimit-above 50
--connlimit-mask 32 --connlimit-saddr -j LD_C_HST
# -t filter -A HST_34_RL -m recent --update --seconds 2 --reap
--hitcount 50 --name thebe --mask 255.255.255.255 --rsource -j LD_R_HST
# -t filter -A HST_34_RL -m recent --set --name thebe --mask
255.255.255.255 --rsource
# -t filter -A HST_38 -i wan -m set  --match-set EXCLUDES_IPV4 src -j
HST_38_RL
# -t filter -A HST_38_RL -m connlimit --connlimit-above 50
--connlimit-mask 32 --connlimit-saddr -j LD_C_HST
# -t filter -A HST_38_RL -m recent --update --seconds 2 --reap
--hitcount 50 --name thebe --mask 255.255.255.255 --rsource -j LD_R_HST
# -t filter -A HST_38_RL -m recent --set --name thebe --mask
255.255.255.255 --rsource
# -t filter -A HST_03 -p tcp -m tcp --dport 22 -m set --match-set
SFTP_22_IPV4 src -j ACCEPT
# Completed on Wed Jul  8 10:47:31 2020
[root@firewall:~]$

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Moving from ipset to nftables: Sets not ready for prime time yet?
  2020-07-08 10:16       ` Reindl Harald
@ 2020-07-08 10:36         ` Pablo Neira Ayuso
  2020-07-08 10:48           ` Reindl Harald
  2020-07-09  4:40           ` Trent W. Buck
  0 siblings, 2 replies; 15+ messages in thread
From: Pablo Neira Ayuso @ 2020-07-08 10:36 UTC (permalink / raw)
  To: Reindl Harald; +Cc: Trent W. Buck, netfilter

On Wed, Jul 08, 2020 at 12:16:18PM +0200, Reindl Harald wrote:
> 
> 
> Am 08.07.20 um 09:51 schrieb Trent W. Buck:
> > Reindl Harald <h.reindl@thelounge.net> writes:
> > 
> >> Am 03.07.20 um 09:04 schrieb G.W. Haywood:
> >>> On Fri, 3 Jul 2020, Timo Sigurdsson wrote:
> >>>
> >>>> ... I use ipsets for blacklisting.
> >>>>     I fetch blacklists from various sources
> >>>> ... This approach has worked for me for quite some time.
> >>>> ... some of my blacklists may contain the same addresses or ranges,
> >>>>     I use ipsets' -exist switch when loading
> >>>> ... I don't think that the use case is that extraordinary ...
> >>>
> >>> +6
> >>>
> >>> FWIW I'll be following this thread very closely.
> >>
> >> it turned out at least with recent kernel and recent userland
> >> "iptables-nft" can fully replace "iptables" and continue to use "ipset"
> >> unchanged
> > 
> > I tested this and you're right - it is working.  This surprised me!
> > 
> > I saw these "commented out" rules in iptables-translate, where
> > I (wrongly) assumed that meant the rule was completely inactive.
> 
> "iptables-translate" comments out much more than just upset related
> stuff, in my case xt_recent and connlimit rules are also just comments

If you could post what kind of rule examples are commented out, it
would help us keep this in the radar.

It is not too hard to add new translations, there is a _xlate()
function under iptables/extensions/libxt_*.c that provides the
translation. The important thing is to validate that the translation
is semantically equivalent, or if not possible, provide a close
translation.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Moving from ipset to nftables: Sets not ready for prime time yet?
  2020-07-08  7:51     ` Trent W. Buck
@ 2020-07-08 10:16       ` Reindl Harald
  2020-07-08 10:36         ` Pablo Neira Ayuso
  0 siblings, 1 reply; 15+ messages in thread
From: Reindl Harald @ 2020-07-08 10:16 UTC (permalink / raw)
  To: Trent W. Buck, netfilter



Am 08.07.20 um 09:51 schrieb Trent W. Buck:
> Reindl Harald <h.reindl@thelounge.net> writes:
> 
>> Am 03.07.20 um 09:04 schrieb G.W. Haywood:
>>> On Fri, 3 Jul 2020, Timo Sigurdsson wrote:
>>>
>>>> ... I use ipsets for blacklisting.
>>>>     I fetch blacklists from various sources
>>>> ... This approach has worked for me for quite some time.
>>>> ... some of my blacklists may contain the same addresses or ranges,
>>>>     I use ipsets' -exist switch when loading
>>>> ... I don't think that the use case is that extraordinary ...
>>>
>>> +6
>>>
>>> FWIW I'll be following this thread very closely.
>>
>> it turned out at least with recent kernel and recent userland
>> "iptables-nft" can fully replace "iptables" and continue to use "ipset"
>> unchanged
> 
> I tested this and you're right - it is working.  This surprised me!
> 
> I saw these "commented out" rules in iptables-translate, where
> I (wrongly) assumed that meant the rule was completely inactive.

"iptables-translate" comments out much more than just upset related
stuff, in my case xt_recent and connlimit rules are also just comments

given that the backend is "nftables" just with what you had before on
the frontend side with all the shiny scripts and expiereience
iptables-nft is for me clearly the way to go

but as said i would love "ipset-nft" to make the transition complete and
also keep on that part the scripts which are designed to work outside
the ruleset itself by intention

----------------------

BTW: in my (complex) ruleset when i show the rules with "nft" all that
stuff is still commented out as it would be with the translate stuff

that's pretty sure a userland implementation given that the kernel knows
what to do and is nothing missing according to "iptables-nft -L"
----------------------

one reason i would prefer to sta with iptables-nft forever is the really
nice output format of "iptables.nft -t raw --list --numeric --exact
--verbose" where i generate with grep and bash magic stats like these

it's simple parseable

23M   23M   0  100   100   0   ALL
13M   13M   0  58.3  58.3  0   DENY
12M   12M   0  50.2  50.2  0   DENY SCAN
10M   10M   0  44.5  44.5  0   DENY IPSET
9.6M  9.6M  0  41.7  41.7  0   ACCEPT
5.1M  5.1M  0  22.2  22.2  0   ACCEPT IN
4.5M  4.5M  0  19.4  19.4  0   ACCEPT OUT
1.1M  1.1M  0  4.9   4.9   0   HONEYPOT
857K  857K  0  3.7   3.7   0   INVALID
761K  761K  0  3.3   3.3   0   RL + CL
748K  748K  0  3.3   3.3   0   RATELIMIT
13K   13K   0  0.1   0.1   0   CONNLIMIT
12K   12K   0  0.1   0.1   0   DENY TIME
1.7K  1.7K  0  0     0     0   OUT RESTRICT
889   889   0  0     0     0   OUT DENY

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Moving from ipset to nftables: Sets not ready for prime time yet?
  2020-07-03 10:39   ` Reindl Harald
@ 2020-07-08  7:51     ` Trent W. Buck
  2020-07-08 10:16       ` Reindl Harald
  0 siblings, 1 reply; 15+ messages in thread
From: Trent W. Buck @ 2020-07-08  7:51 UTC (permalink / raw)
  To: netfilter

Reindl Harald <h.reindl@thelounge.net> writes:

> Am 03.07.20 um 09:04 schrieb G.W. Haywood:
>> On Fri, 3 Jul 2020, Timo Sigurdsson wrote:
>> 
>>> ... I use ipsets for blacklisting.
>>>     I fetch blacklists from various sources
>>> ... This approach has worked for me for quite some time.
>>> ... some of my blacklists may contain the same addresses or ranges,
>>>     I use ipsets' -exist switch when loading
>>> ... I don't think that the use case is that extraordinary ...
>> 
>> +6
>> 
>> FWIW I'll be following this thread very closely.
>
> it turned out at least with recent kernel and recent userland
> "iptables-nft" can fully replace "iptables" and continue to use "ipset"
> unchanged

I tested this and you're right - it is working.  This surprised me!

I saw these "commented out" rules in iptables-translate, where
I (wrongly) assumed that meant the rule was completely inactive.

    bash5$ sudo ip netns add delete-me
    bash5$ sudo ip netns exec delete-me bash
    bash5# nft list ruleset
    bash5# ipset create xs hash:ip
    bash5# iptables-nft -N x
    bash5# iptables-nft -A x -m set --match-set xs dst
    bash5# iptables-nft-save
    # Generated by xtables-save v1.8.3 on Wed Jul  8 17:30:56 2020
    *filter
    :INPUT ACCEPT [0:0]
    :FORWARD ACCEPT [0:0]
    :OUTPUT ACCEPT [0:0]
    :x - [0:0]
    -A x -m set --match-set xs dst 
    COMMIT
    # Completed on Wed Jul  8 17:30:56 2020
    bash5# nft list chain filter x
    table ip filter {
            chain x {
                    # match-set xs dst counter packets 0 bytes 0
            }
    }

Testing shows it IS matching, so
the only limitation is you must create the ruleset using
iptables-nft-restore (old syntax) instead of
nft (new syntax).

    bash5# iptables-nft -A OUTPUT -m set --match-set xs dst -j REJECT
    bash5# ip link set dev lo up
    bash5# ip a add 127.0.0.1/8 brd + dev lo
    bash5# iptables-nft-save -c
    ⋮
    [0:0] -A OUTPUT -m set --match-set xs dst -j REJECT --reject-with icmp-port-unreachable
    ⋮
    bash5# ping -c1 127.0.0.1
    ping: sendmsg: Operation not permitted
    bash5# iptables-nft-save -c
    ⋮
    [2:196] -A OUTPUT -m set –match-set xs dst -j REJECT –reject-with icmp-port-unreachable
    ⋮
    bash5# nft list chain filter OUTPUT
    ⋮
                    # match-set xs dst counter packets 2 bytes 196 reject
    ⋮


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Moving from ipset to nftables: Sets not ready for prime time yet?
  2020-07-03  7:04 ` G.W. Haywood
@ 2020-07-03 10:39   ` Reindl Harald
  2020-07-08  7:51     ` Trent W. Buck
  0 siblings, 1 reply; 15+ messages in thread
From: Reindl Harald @ 2020-07-03 10:39 UTC (permalink / raw)
  To: G.W. Haywood, netfilter



Am 03.07.20 um 09:04 schrieb G.W. Haywood:
> On Fri, 3 Jul 2020, Timo Sigurdsson wrote:
> 
>> ... I use ipsets for blacklisting.
>>     I fetch blacklists from various sources
>> ... This approach has worked for me for quite some time.
>> ... some of my blacklists may contain the same addresses or ranges,
>>     I use ipsets' -exist switch when loading
>> ... I don't think that the use case is that extraordinary ...
> 
> +6
> 
> FWIW I'll be following this thread very closely.

it turned out at least with recent kernel and recent userland
"iptables-nft" can fully replace "iptables" and continue to use "ipset"
unchanged

-------------------------

in my case the swicth was even done by just replace the command in my
non-distribution network-up.service and reboot, the save-files are
compatible

ExecStart=/usr/sbin/iptables-legacy-restore /etc/sysconfig/iptables
ExecStart=/usr/sbin/iptables-nft-restore /etc/sysconfig/iptables

-------------------------

before dealing with "alternatives" i prepared my scripts so that for the
migration time only the above chnage

# Check if 'iptables-nft' is loaded
if grep -q 'nft_compat' <<< $(lsmod); then IPTABLES_NFT=1; else
IPTABLES_NFT=0; fi

# Shortcuts
IPSET_SAVE_FILE="/etc/sysconfig/ipset"
IP6TABLES_SAVE_FILE="/etc/sysconfig/ip6tables"
IPTABLES_SAVE_FILE="/etc/sysconfig/iptables"
IPSET="$(which 'ipset' 2> '/dev/null')"
IPSET_SAVE="$IPSET -file $IPSET_SAVE_FILE save"

# Compat-Layer
if [ "$IPTABLES_NFT" == 1 ]; then
 IPTABLES="$(which 'iptables-nft' 2> '/dev/null' || which 'iptables' 2>
'/dev/null')"
 IPTABLES_SAVE="$(which 'iptables-nft-save' 2> '/dev/null' || which
'iptables-save' 2> '/dev/null')"
 IP6TABLES="$(which '/usr/sbin/ip6tables-nft' 2> '/dev/null' || which
'ip6tables' 2> '/dev/null')"
 IP6TABLES_SAVE="$(which 'ip6tables-nft-save' 2> '/dev/null' || which
'ip6tables-save' 2> '/dev/null')"
else
 IPTABLES="$(which 'iptables-legacy' 2> '/dev/null' || which 'iptables'
2> '/dev/null')"
 IPTABLES_SAVE="$(which 'iptables-legacy-save' 2> '/dev/null' || which
'iptables-save' 2> '/dev/null')"
 IP6TABLES="$(which 'ip6tables-legacy' 2> '/dev/null' || which
'ip6tables' 2> '/dev/null')"
 IP6TABLES_SAVE="$(which 'ip6tables-legacy-save' 2> '/dev/null' || which
'ip6tables-save' 2> '/dev/null')"
fi

-------------------------

what i *really* would love having the same for ipset with "ipset-legacy"
and "ipset-nft" to keep semantic and scripts unchanged but behind the
scenes use nftables

besides the optimizations below which as far as i understand it only
apply to native "nf_set" that coulkd be the road to finally get rid of
all the ipset/iptables code and switch everything to "nftables" without
break any userland software and homegrown scripts

https://lore.kernel.org/netfilter-devel/20200315141353.u6hv7podfwxeopgi@salvia/T/

-------------------------

finally: in over 20 years IT the switch from "iptables-legacy" to
"iptables-nft" is the first time a "dropin-replacement" really deserves
that label and it would be great if "ipset" could go the same direction

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Moving from ipset to nftables: Sets not ready for prime time yet?
  2020-07-02 23:18 Timo Sigurdsson
@ 2020-07-03  7:04 ` G.W. Haywood
  2020-07-03 10:39   ` Reindl Harald
  2020-07-14 13:27 ` Timo Sigurdsson
  1 sibling, 1 reply; 15+ messages in thread
From: G.W. Haywood @ 2020-07-03  7:04 UTC (permalink / raw)
  To: netfilter

Hi there,

On Fri, 3 Jul 2020, Timo Sigurdsson wrote:

> ... I use ipsets for blacklisting.
>     I fetch blacklists from various sources
> ... This approach has worked for me for quite some time.
> ... some of my blacklists may contain the same addresses or ranges,
>     I use ipsets' -exist switch when loading
> ... I don't think that the use case is that extraordinary ...

+6

FWIW I'll be following this thread very closely.

-- 

73,
Ged.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Moving from ipset to nftables: Sets not ready for prime time yet?
@ 2020-07-02 23:18 Timo Sigurdsson
  2020-07-03  7:04 ` G.W. Haywood
  2020-07-14 13:27 ` Timo Sigurdsson
  0 siblings, 2 replies; 15+ messages in thread
From: Timo Sigurdsson @ 2020-07-02 23:18 UTC (permalink / raw)
  To: netfilter

P.S. Sorry, I sent this message to netfilter-devel first as I was already subscribed to that list and only realized later that the netfilter list would be a better place to post this to. Hence, one more time to this list...


Hi,

I'm currently migrating my various iptables/ipset setups to nftables. The nftables syntax is a pleasure and for the most part the transition of my rulesets has been smooth. Moving my ipsets to nftables sets, however, has proven to be a major pain point - to a degree where I started wondering whether nftables sets are actually ready to replace existing ipset workflows yet.

Before I go into the various issues I encountered with nftables sets, let me briefly explain what my ipset workflow looked like. On gateways that forward traffic, I use ipsets for blacklisting. I fetch blacklists from various sources regularly, convert them to files that can be loaded with `ipset restore', load them into a new ipset and then replace the old ipset with the new one with `ipset swap`. Since some of my blacklists may contain the same addresses or ranges, I use ipsets' -exist switch when loading multiple blacklists into one ipset. This approach has worked for me for quite some time.

Now, let's get to the issues I encountered:

1) Auto-merge issues
Initially, I intended to use the auto-merge feature as a means of dealing with duplicate addresses in the various source lists I use. The first issue I encountered was that it's currently not possible to add an element to a set if it already exists in the set or is part or an interval in the set, despite the auto-merge flag set. This has been reported already by someone else [1] and the only workaround seems to be to add all addresses at once (within one 'add element' statement).

Another issue I stumbled upon was that auto-merge may actually generate wrong/incomplete intervals if you have multiple 'add element' statements within an nftables script file. I consider this a serious issue if you can't be sure whether the addresses or intervals you add to a set actually end up in the set. I reported this here [2]. The workaround for it is - again - to add all addresses in a single statement.

The third auto-merge issue I encountered is another one that has been reported already by someone else [3]. It is that the auto-merge flag actually makes it impossible to update the set atomically. Oh, well, let's abandon auto-merge altogether for now...
 
2) Atomic reload of large sets unbearably slow
Moving on without the auto-merge feature, I started testing sets with actual lists I use. The initial setup (meaning populating the sets for the first time) went fine. But when I tried to update them atomically, i.e. use a script file that would have a 'flush set' statement in the beginning and then an 'add element' statement with all the addresses I wanted to add to it, the system seemed to lock up. As it turns out, updating existing large sets is excessively slow - to a point where it becomes unusable if you work with multiple large sets. I reported the details including an example and performance indicators here [4]. The only workaround for this (that keeps atomicity) I found so far is to reload the complete firewall configuration including the set definitions. But that has other unwant
 ed side-effects such as resetting all counters and so on.

3) Referencing sets within a set not possible
As a workaround for the auto-merge issues described above (and also for another use case), I was looking into the possibility to reference sets within a set so I could create a set for each source list I use and reference them in a single set so I could match them all at once without duplicating rules for multiple sets. To be clear, I'm not really sure whether this is supposed to work all. I found some commits which suggested to me it might be possible [5][6]. Nevertheless, I couldn't get this to work.

Summing up:
Well, that's quite a number of issues to run into as an nftables newbie. I wouldn't have expected this at all. And frankly, I actually converted my rules first and thought adjusting my scripts around ipset to achieve the same with nftables sets would be straightforward and simple... Maybe my approach or understanding of nftables is wrong. But I don't think that the use case is that extraordinary that it should be that difficult.

In any case, if anyone has any tips or workarounds to speed up the atomic reload of large sets, I'd be happy to hear (or read) them. Same goes for referencing sets within sets. If this should be possible to do, I'd appreaciate any hints to the correct syntax to do so.
Are there better approaches to deal with large sets regularly updated from various sources?


Cheers,

Timo


[1] https://www.spinics.net/lists/netfilter/msg58937.html
[2] https://bugzilla.netfilter.org/show_bug.cgi?id=1438
[3] https://bugzilla.netfilter.org/show_bug.cgi?id=1404
[4] https://bugzilla.netfilter.org/show_bug.cgi?id=1439
[5] http://git.netfilter.org/nftables/commit/?h=v0.9.0&id=a6b75b837f5e851c80f8f2dc508b11f1693af1b3
[6] http://git.netfilter.org/nftables/commit/?h=v0.9.0&id=bada2f9c182dddf72a6d3b7b00c9eace7eb596c3

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-07-30 19:27 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-02 22:30 Moving from ipset to nftables: Sets not ready for prime time yet? Timo Sigurdsson
2020-07-03  9:28 ` Stefano Brivio
2020-07-03 10:24   ` Jozsef Kadlecsik
2020-07-03 13:38     ` Stefano Brivio
2020-07-03 14:03   ` Timo Sigurdsson
2020-07-30 19:27 ` Pablo Neira Ayuso
2020-07-02 23:18 Timo Sigurdsson
2020-07-03  7:04 ` G.W. Haywood
2020-07-03 10:39   ` Reindl Harald
2020-07-08  7:51     ` Trent W. Buck
2020-07-08 10:16       ` Reindl Harald
2020-07-08 10:36         ` Pablo Neira Ayuso
2020-07-08 10:48           ` Reindl Harald
2020-07-09  4:40           ` Trent W. Buck
2020-07-14 13:27 ` Timo Sigurdsson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.