All of lore.kernel.org
 help / color / mirror / Atom feed
* IPSec, masquerade and dnat with nftables
@ 2016-09-09  7:06 Thomas Bach
  2016-10-17 19:44 ` Pablo Neira Ayuso
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Bach @ 2016-09-09  7:06 UTC (permalink / raw)
  To: netfilter

Hi,

I have two hosts with public ip addresses running Ubuntu 16.04 with
Kernel version 4.4.0.

I want to interconnect two containers (systemd-nspawn) with veth
interfaces running on these hosts in a server client setup.

So on the first host, where the server in the container runs I have
the following rules:
# nft list ruleset
table ip nat {
  chain prerouting {
    type nat hook prerouting priority 0; policy accept;
    tcp dport { 4506, 4505} dnat 10.0.0.2 
  }

  chain output {
    type nat hook output priority 0; policy accept;
    tcp dport { 4505, 4506} dnat 10.0.0.2
  }

  chain input {
    type nat hook input priority 0; policy accept;
  }

  chain postrouting {
    type nat hook postrouting priority 0; policy accept;
    ip saddr 10.0.0.0/8 oif enp4s0 masquerade 
  }
}

On the second host, where the client runs i have the following:
# nft list ruleset
table ip nat {
  chain prerouting {
    type nat hook prerouting priority 0; policy accept;
  }

  chain output {
    type nat hook output priority 0; policy accept;
  }

  chain input {
    type nat hook input priority 0; policy accept;
  }

  chain postrouting {
    type nat hook postrouting priority 0; policy accept;
    ip saddr 10.0.0.0/8 oif enp0s31f6 masquerade 
  }
}

This works as expected and without any problems at all. Now IPSec
enters the picture. As soon as I setup a policy to encrypt everyting
between the two hosts the following happens:
+ I can still connect from the second host to the server in the
  container without problems,
+ I can still /connect/ (i.e. establish a connection) from the
  container on the second host to the server on the first host, but
+ in tcpdump listening on the interface of the container (on the
  second host) I see lots of TCP Retransmissions and the TCP connection
  is effectively broken.

Can someone give me a hint what is going on here?

Regards

    Thomas Bach.
-- 
ilexius GmbH
Thomas Bach
Unter den Eichen 5
Haus i
65195 Wiesbaden
Fon: +49-(0)611 - 180 33 49
Fax: +49-(0)611 - 236 80 84 29
----------------------------------------
ilexius GmbH
vertreten durch die Geschäftsleitung:
Thomas Schlüter und Sebastian Koch
Registergericht: Wiesbaden
Handelsregister: HRB 21723
Steuernummer: 040 236 22640
Ust-IdNr.: DE240822836
----------------------------------------

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: IPSec, masquerade and dnat with nftables
  2016-09-09  7:06 IPSec, masquerade and dnat with nftables Thomas Bach
@ 2016-10-17 19:44 ` Pablo Neira Ayuso
  2016-10-17 19:52   ` Noel Kuntze
  2016-10-18  9:39   ` Thomas Bach
  0 siblings, 2 replies; 13+ messages in thread
From: Pablo Neira Ayuso @ 2016-10-17 19:44 UTC (permalink / raw)
  To: Thomas Bach; +Cc: netfilter

On Fri, Sep 09, 2016 at 09:06:59AM +0200, Thomas Bach wrote:
> Hi,
> 
> I have two hosts with public ip addresses running Ubuntu 16.04 with
> Kernel version 4.4.0.
> 
> I want to interconnect two containers (systemd-nspawn) with veth
> interfaces running on these hosts in a server client setup.
> 
> So on the first host, where the server in the container runs I have
> the following rules:
> # nft list ruleset
> table ip nat {
>   chain prerouting {
>     type nat hook prerouting priority 0; policy accept;
>     tcp dport { 4506, 4505} dnat 10.0.0.2 
>   }
> 
>   chain output {
>     type nat hook output priority 0; policy accept;
>     tcp dport { 4505, 4506} dnat 10.0.0.2
>   }
> 
>   chain input {
>     type nat hook input priority 0; policy accept;
>   }
> 
>   chain postrouting {
>     type nat hook postrouting priority 0; policy accept;
>     ip saddr 10.0.0.0/8 oif enp4s0 masquerade 
>   }
> }
> 
> On the second host, where the client runs i have the following:
> # nft list ruleset
> table ip nat {
>   chain prerouting {
>     type nat hook prerouting priority 0; policy accept;
>   }
> 
>   chain output {
>     type nat hook output priority 0; policy accept;
>   }
> 
>   chain input {
>     type nat hook input priority 0; policy accept;
>   }
> 
>   chain postrouting {
>     type nat hook postrouting priority 0; policy accept;
>     ip saddr 10.0.0.0/8 oif enp0s31f6 masquerade 
>   }
> }
> 
> This works as expected and without any problems at all. Now IPSec
> enters the picture. As soon as I setup a policy to encrypt everyting
> between the two hosts the following happens:
> + I can still connect from the second host to the server in the
>   container without problems,
> + I can still /connect/ (i.e. establish a connection) from the
>   container on the second host to the server on the first host, but
> + in tcpdump listening on the interface of the container (on the
>   second host) I see lots of TCP Retransmissions and the TCP connection
>   is effectively broken.
> 
> Can someone give me a hint what is going on here?

Did you find the root cause for this problem?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: IPSec, masquerade and dnat with nftables
  2016-10-17 19:44 ` Pablo Neira Ayuso
@ 2016-10-17 19:52   ` Noel Kuntze
  2016-10-17 20:11     ` Pablo Neira Ayuso
  2016-10-18  9:39   ` Thomas Bach
  1 sibling, 1 reply; 13+ messages in thread
From: Noel Kuntze @ 2016-10-17 19:52 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Thomas Bach; +Cc: netfilter


[-- Attachment #1.1: Type: text/plain, Size: 2969 bytes --]

On 17.10.2016 21:44, Pablo Neira Ayuso wrote:
> On Fri, Sep 09, 2016 at 09:06:59AM +0200, Thomas Bach wrote:
>> > Hi,
>> > 
>> > I have two hosts with public ip addresses running Ubuntu 16.04 with
>> > Kernel version 4.4.0.
>> > 
>> > I want to interconnect two containers (systemd-nspawn) with veth
>> > interfaces running on these hosts in a server client setup.
>> > 
>> > So on the first host, where the server in the container runs I have
>> > the following rules:
>> > # nft list ruleset
>> > table ip nat {
>> >   chain prerouting {
>> >     type nat hook prerouting priority 0; policy accept;
>> >     tcp dport { 4506, 4505} dnat 10.0.0.2 
>> >   }
>> > 
>> >   chain output {
>> >     type nat hook output priority 0; policy accept;
>> >     tcp dport { 4505, 4506} dnat 10.0.0.2
>> >   }
>> > 
>> >   chain input {
>> >     type nat hook input priority 0; policy accept;
>> >   }
>> > 
>> >   chain postrouting {
>> >     type nat hook postrouting priority 0; policy accept;
>> >     ip saddr 10.0.0.0/8 oif enp4s0 masquerade 
>> >   }
>> > }
>> > 
>> > On the second host, where the client runs i have the following:
>> > # nft list ruleset
>> > table ip nat {
>> >   chain prerouting {
>> >     type nat hook prerouting priority 0; policy accept;
>> >   }
>> > 
>> >   chain output {
>> >     type nat hook output priority 0; policy accept;
>> >   }
>> > 
>> >   chain input {
>> >     type nat hook input priority 0; policy accept;
>> >   }
>> > 
>> >   chain postrouting {
>> >     type nat hook postrouting priority 0; policy accept;
>> >     ip saddr 10.0.0.0/8 oif enp0s31f6 masquerade 
>> >   }
>> > }
>> > 
>> > This works as expected and without any problems at all. Now IPSec
>> > enters the picture. As soon as I setup a policy to encrypt everyting
>> > between the two hosts the following happens:
>> > + I can still connect from the second host to the server in the
>> >   container without problems,
>> > + I can still /connect/ (i.e. establish a connection) from the
>> >   container on the second host to the server on the first host, but
>> > + in tcpdump listening on the interface of the container (on the
>> >   second host) I see lots of TCP Retransmissions and the TCP connection
>> >   is effectively broken.
>> > 
>> > Can someone give me a hint what is going on here?
> Did you find the root cause for this problem?
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Probably missing TCP MTU clamping. Normal problem.
Can happen with broken PMTUD.

We also need the policy match module to support ipsec in nftables.
Is that on the TODO list?

-- 

Mit freundlichen Grüßen/Kind Regards,
Noel Kuntze

GPG Key ID: 0x63EC6658
Fingerprint: 23CA BB60 2146 05E7 7278 6592 3839 298F 63EC 6658



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: IPSec, masquerade and dnat with nftables
  2016-10-17 19:52   ` Noel Kuntze
@ 2016-10-17 20:11     ` Pablo Neira Ayuso
  2016-10-17 20:17       ` Noel Kuntze
  0 siblings, 1 reply; 13+ messages in thread
From: Pablo Neira Ayuso @ 2016-10-17 20:11 UTC (permalink / raw)
  To: Noel Kuntze; +Cc: Thomas Bach, netfilter, fw

On Mon, Oct 17, 2016 at 09:52:06PM +0200, Noel Kuntze wrote:
> On 17.10.2016 21:44, Pablo Neira Ayuso wrote:
> > On Fri, Sep 09, 2016 at 09:06:59AM +0200, Thomas Bach wrote:
> >> > Hi,
> >> > 
> >> > I have two hosts with public ip addresses running Ubuntu 16.04 with
> >> > Kernel version 4.4.0.
> >> > 
> >> > I want to interconnect two containers (systemd-nspawn) with veth
> >> > interfaces running on these hosts in a server client setup.
> >> > 
> >> > So on the first host, where the server in the container runs I have
> >> > the following rules:
> >> > # nft list ruleset
> >> > table ip nat {
> >> >   chain prerouting {
> >> >     type nat hook prerouting priority 0; policy accept;
> >> >     tcp dport { 4506, 4505} dnat 10.0.0.2 
> >> >   }
> >> > 
> >> >   chain output {
> >> >     type nat hook output priority 0; policy accept;
> >> >     tcp dport { 4505, 4506} dnat 10.0.0.2
> >> >   }
> >> > 
> >> >   chain input {
> >> >     type nat hook input priority 0; policy accept;
> >> >   }
> >> > 
> >> >   chain postrouting {
> >> >     type nat hook postrouting priority 0; policy accept;
> >> >     ip saddr 10.0.0.0/8 oif enp4s0 masquerade 
> >> >   }
> >> > }
> >> > 
> >> > On the second host, where the client runs i have the following:
> >> > # nft list ruleset
> >> > table ip nat {
> >> >   chain prerouting {
> >> >     type nat hook prerouting priority 0; policy accept;
> >> >   }
> >> > 
> >> >   chain output {
> >> >     type nat hook output priority 0; policy accept;
> >> >   }
> >> > 
> >> >   chain input {
> >> >     type nat hook input priority 0; policy accept;
> >> >   }
> >> > 
> >> >   chain postrouting {
> >> >     type nat hook postrouting priority 0; policy accept;
> >> >     ip saddr 10.0.0.0/8 oif enp0s31f6 masquerade 
> >> >   }
> >> > }
> >> > 
> >> > This works as expected and without any problems at all. Now IPSec
> >> > enters the picture. As soon as I setup a policy to encrypt everyting
> >> > between the two hosts the following happens:
> >> > + I can still connect from the second host to the server in the
> >> >   container without problems,
> >> > + I can still /connect/ (i.e. establish a connection) from the
> >> >   container on the second host to the server on the first host, but
> >> > + in tcpdump listening on the interface of the container (on the
> >> >   second host) I see lots of TCP Retransmissions and the TCP connection
> >> >   is effectively broken.
> >> > 
> >> > Can someone give me a hint what is going on here?
> > Did you find the root cause for this problem?
> > --
> > To unsubscribe from this list: send the line "unsubscribe netfilter" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> Probably missing TCP MTU clamping. Normal problem.
> Can happen with broken PMTUD.
> 
> We also need the policy match module to support ipsec in nftables.
> Is that on the TODO list?

I know Florian Westphal made a simple extension, he's got a patch in
his queue. Trimming off most of it, just leaving this small chunk:

diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index 6c1e024..76b70e1 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -190,6 +190,9 @@ void nft_meta_get_eval(const struct nft_expr
*expr,
                *dest = prandom_u32_state(state);
                break;
        }
+       case NFT_META_SECPATH:
+               *(__u8 *)dest = secpath_exists(skb);
+               break;
        default:
                WARN_ON(1);
                goto err;

Would this be enough for your usecase?

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: IPSec, masquerade and dnat with nftables
  2016-10-17 20:11     ` Pablo Neira Ayuso
@ 2016-10-17 20:17       ` Noel Kuntze
  2016-10-17 20:27         ` Pablo Neira Ayuso
  0 siblings, 1 reply; 13+ messages in thread
From: Noel Kuntze @ 2016-10-17 20:17 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Thomas Bach, netfilter, fw


[-- Attachment #1.1: Type: text/plain, Size: 4860 bytes --]

On 17.10.2016 22:11, Pablo Neira Ayuso wrote:
> On Mon, Oct 17, 2016 at 09:52:06PM +0200, Noel Kuntze wrote:
>> > On 17.10.2016 21:44, Pablo Neira Ayuso wrote:
>>> > > On Fri, Sep 09, 2016 at 09:06:59AM +0200, Thomas Bach wrote:
>>>>> > >> > Hi,
>>>>> > >> > 
>>>>> > >> > I have two hosts with public ip addresses running Ubuntu 16.04 with
>>>>> > >> > Kernel version 4.4.0.
>>>>> > >> > 
>>>>> > >> > I want to interconnect two containers (systemd-nspawn) with veth
>>>>> > >> > interfaces running on these hosts in a server client setup.
>>>>> > >> > 
>>>>> > >> > So on the first host, where the server in the container runs I have
>>>>> > >> > the following rules:
>>>>> > >> > # nft list ruleset
>>>>> > >> > table ip nat {
>>>>> > >> >   chain prerouting {
>>>>> > >> >     type nat hook prerouting priority 0; policy accept;
>>>>> > >> >     tcp dport { 4506, 4505} dnat 10.0.0.2 
>>>>> > >> >   }
>>>>> > >> > 
>>>>> > >> >   chain output {
>>>>> > >> >     type nat hook output priority 0; policy accept;
>>>>> > >> >     tcp dport { 4505, 4506} dnat 10.0.0.2
>>>>> > >> >   }
>>>>> > >> > 
>>>>> > >> >   chain input {
>>>>> > >> >     type nat hook input priority 0; policy accept;
>>>>> > >> >   }
>>>>> > >> > 
>>>>> > >> >   chain postrouting {
>>>>> > >> >     type nat hook postrouting priority 0; policy accept;
>>>>> > >> >     ip saddr 10.0.0.0/8 oif enp4s0 masquerade 
>>>>> > >> >   }
>>>>> > >> > }
>>>>> > >> > 
>>>>> > >> > On the second host, where the client runs i have the following:
>>>>> > >> > # nft list ruleset
>>>>> > >> > table ip nat {
>>>>> > >> >   chain prerouting {
>>>>> > >> >     type nat hook prerouting priority 0; policy accept;
>>>>> > >> >   }
>>>>> > >> > 
>>>>> > >> >   chain output {
>>>>> > >> >     type nat hook output priority 0; policy accept;
>>>>> > >> >   }
>>>>> > >> > 
>>>>> > >> >   chain input {
>>>>> > >> >     type nat hook input priority 0; policy accept;
>>>>> > >> >   }
>>>>> > >> > 
>>>>> > >> >   chain postrouting {
>>>>> > >> >     type nat hook postrouting priority 0; policy accept;
>>>>> > >> >     ip saddr 10.0.0.0/8 oif enp0s31f6 masquerade 
>>>>> > >> >   }
>>>>> > >> > }
>>>>> > >> > 
>>>>> > >> > This works as expected and without any problems at all. Now IPSec
>>>>> > >> > enters the picture. As soon as I setup a policy to encrypt everyting
>>>>> > >> > between the two hosts the following happens:
>>>>> > >> > + I can still connect from the second host to the server in the
>>>>> > >> >   container without problems,
>>>>> > >> > + I can still /connect/ (i.e. establish a connection) from the
>>>>> > >> >   container on the second host to the server on the first host, but
>>>>> > >> > + in tcpdump listening on the interface of the container (on the
>>>>> > >> >   second host) I see lots of TCP Retransmissions and the TCP connection
>>>>> > >> >   is effectively broken.
>>>>> > >> > 
>>>>> > >> > Can someone give me a hint what is going on here?
>>> > > Did you find the root cause for this problem?
>>> > > --
>>> > > To unsubscribe from this list: send the line "unsubscribe netfilter" in
>>> > > the body of a message to majordomo@vger.kernel.org
>>> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> > > 
>> > 
>> > Probably missing TCP MTU clamping. Normal problem.
>> > Can happen with broken PMTUD.
>> > 
>> > We also need the policy match module to support ipsec in nftables.
>> > Is that on the TODO list?
> I know Florian Westphal made a simple extension, he's got a patch in
> his queue. Trimming off most of it, just leaving this small chunk:
> 
> diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
> index 6c1e024..76b70e1 100644
> --- a/net/netfilter/nft_meta.c
> +++ b/net/netfilter/nft_meta.c
> @@ -190,6 +190,9 @@ void nft_meta_get_eval(const struct nft_expr
> *expr,
>                 *dest = prandom_u32_state(state);
>                 break;
>         }
> +       case NFT_META_SECPATH:
> +               *(__u8 *)dest = secpath_exists(skb);
> +               break;
>         default:
>                 WARN_ON(1);
>                 goto err;
> 
> Would this be enough for your usecase?

No, the problem is that in nftables, we can't tell apart ipsec protected packets
from unprotected ones. But we need that, because generally, we want to treat them differently.
In iptables we can do that with -m policy [additional args], but there's nothing like that in nftables.
We need complete support for all the options of the policy match module in nftables.

I don't see what that three line patch actually does. Would you kindly elaborate?

-- 

Mit freundlichen Grüßen/Kind Regards,
Noel Kuntze

GPG Key ID: 0x63EC6658
Fingerprint: 23CA BB60 2146 05E7 7278 6592 3839 298F 63EC 6658



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: IPSec, masquerade and dnat with nftables
  2016-10-17 20:17       ` Noel Kuntze
@ 2016-10-17 20:27         ` Pablo Neira Ayuso
  2016-10-17 21:07           ` Noel Kuntze
  0 siblings, 1 reply; 13+ messages in thread
From: Pablo Neira Ayuso @ 2016-10-17 20:27 UTC (permalink / raw)
  To: Noel Kuntze; +Cc: Thomas Bach, netfilter, fw

On Mon, Oct 17, 2016 at 10:17:28PM +0200, Noel Kuntze wrote:
> On 17.10.2016 22:11, Pablo Neira Ayuso wrote:
> > On Mon, Oct 17, 2016 at 09:52:06PM +0200, Noel Kuntze wrote:
> >> > On 17.10.2016 21:44, Pablo Neira Ayuso wrote:
> >>> > > On Fri, Sep 09, 2016 at 09:06:59AM +0200, Thomas Bach wrote:
> >>>>> > >> > Hi,
> >>>>> > >> > 
> >>>>> > >> > I have two hosts with public ip addresses running Ubuntu 16.04 with
> >>>>> > >> > Kernel version 4.4.0.
> >>>>> > >> > 
> >>>>> > >> > I want to interconnect two containers (systemd-nspawn) with veth
> >>>>> > >> > interfaces running on these hosts in a server client setup.
> >>>>> > >> > 
> >>>>> > >> > So on the first host, where the server in the container runs I have
> >>>>> > >> > the following rules:
> >>>>> > >> > # nft list ruleset
> >>>>> > >> > table ip nat {
> >>>>> > >> >   chain prerouting {
> >>>>> > >> >     type nat hook prerouting priority 0; policy accept;
> >>>>> > >> >     tcp dport { 4506, 4505} dnat 10.0.0.2 
> >>>>> > >> >   }
> >>>>> > >> > 
> >>>>> > >> >   chain output {
> >>>>> > >> >     type nat hook output priority 0; policy accept;
> >>>>> > >> >     tcp dport { 4505, 4506} dnat 10.0.0.2
> >>>>> > >> >   }
> >>>>> > >> > 
> >>>>> > >> >   chain input {
> >>>>> > >> >     type nat hook input priority 0; policy accept;
> >>>>> > >> >   }
> >>>>> > >> > 
> >>>>> > >> >   chain postrouting {
> >>>>> > >> >     type nat hook postrouting priority 0; policy accept;
> >>>>> > >> >     ip saddr 10.0.0.0/8 oif enp4s0 masquerade 
> >>>>> > >> >   }
> >>>>> > >> > }
> >>>>> > >> > 
> >>>>> > >> > On the second host, where the client runs i have the following:
> >>>>> > >> > # nft list ruleset
> >>>>> > >> > table ip nat {
> >>>>> > >> >   chain prerouting {
> >>>>> > >> >     type nat hook prerouting priority 0; policy accept;
> >>>>> > >> >   }
> >>>>> > >> > 
> >>>>> > >> >   chain output {
> >>>>> > >> >     type nat hook output priority 0; policy accept;
> >>>>> > >> >   }
> >>>>> > >> > 
> >>>>> > >> >   chain input {
> >>>>> > >> >     type nat hook input priority 0; policy accept;
> >>>>> > >> >   }
> >>>>> > >> > 
> >>>>> > >> >   chain postrouting {
> >>>>> > >> >     type nat hook postrouting priority 0; policy accept;
> >>>>> > >> >     ip saddr 10.0.0.0/8 oif enp0s31f6 masquerade 
> >>>>> > >> >   }
> >>>>> > >> > }
> >>>>> > >> > 
> >>>>> > >> > This works as expected and without any problems at all. Now IPSec
> >>>>> > >> > enters the picture. As soon as I setup a policy to encrypt everyting
> >>>>> > >> > between the two hosts the following happens:
> >>>>> > >> > + I can still connect from the second host to the server in the
> >>>>> > >> >   container without problems,
> >>>>> > >> > + I can still /connect/ (i.e. establish a connection) from the
> >>>>> > >> >   container on the second host to the server on the first host, but
> >>>>> > >> > + in tcpdump listening on the interface of the container (on the
> >>>>> > >> >   second host) I see lots of TCP Retransmissions and the TCP connection
> >>>>> > >> >   is effectively broken.
> >>>>> > >> > 
> >>>>> > >> > Can someone give me a hint what is going on here?
> >>> > > Did you find the root cause for this problem?
> >>> > > --
> >>> > > To unsubscribe from this list: send the line "unsubscribe netfilter" in
> >>> > > the body of a message to majordomo@vger.kernel.org
> >>> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>> > > 
> >> > 
> >> > Probably missing TCP MTU clamping. Normal problem.
> >> > Can happen with broken PMTUD.
> >> > 
> >> > We also need the policy match module to support ipsec in nftables.
> >> > Is that on the TODO list?
> >
> > I know Florian Westphal made a simple extension, he's got a patch in
> > his queue. Trimming off most of it, just leaving this small chunk:
> > 
> > diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
> > index 6c1e024..76b70e1 100644
> > --- a/net/netfilter/nft_meta.c
> > +++ b/net/netfilter/nft_meta.c
> > @@ -190,6 +190,9 @@ void nft_meta_get_eval(const struct nft_expr
> > *expr,
> >                 *dest = prandom_u32_state(state);
> >                 break;
> >         }
> > +       case NFT_META_SECPATH:
> > +               *(__u8 *)dest = secpath_exists(skb);
> > +               break;
> >         default:
> >                 WARN_ON(1);
> >                 goto err;
> > 
> > Would this be enough for your usecase?
> 
> No, the problem is that in nftables, we can't tell apart ipsec
> protected packets from unprotected ones. But we need that, because
> generally, we want to treat them differently.  In iptables we can do
> that with -m policy [additional args], but there's nothing like that
> in nftables.  We need complete support for all the options of the
> policy match module in nftables.

Are you using *all* options there? I'd appreciate if you can develop a
bit the usecases where you use these different options.

> I don't see what that three line patch actually does. Would you
> kindly elaborate?

Allowing to match if the packet is protected/unprotected in a
true/false fashion.

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: IPSec, masquerade and dnat with nftables
  2016-10-17 20:27         ` Pablo Neira Ayuso
@ 2016-10-17 21:07           ` Noel Kuntze
  2016-10-18  8:59             ` Florian Westphal
  0 siblings, 1 reply; 13+ messages in thread
From: Noel Kuntze @ 2016-10-17 21:07 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Thomas Bach, netfilter, fw


[-- Attachment #1.1: Type: text/plain, Size: 8223 bytes --]

On 17.10.2016 22:27, Pablo Neira Ayuso wrote:
> On Mon, Oct 17, 2016 at 10:17:28PM +0200, Noel Kuntze wrote:
>> > On 17.10.2016 22:11, Pablo Neira Ayuso wrote:
>>> > > On Mon, Oct 17, 2016 at 09:52:06PM +0200, Noel Kuntze wrote:
>>>>> > >> > On 17.10.2016 21:44, Pablo Neira Ayuso wrote:
>>>>>>> > >>> > > On Fri, Sep 09, 2016 at 09:06:59AM +0200, Thomas Bach wrote:
>>>>>>>>>>> > >>>>> > >> > Hi,
>>>>>>>>>>> > >>>>> > >> > 
>>>>>>>>>>> > >>>>> > >> > I have two hosts with public ip addresses running Ubuntu 16.04 with
>>>>>>>>>>> > >>>>> > >> > Kernel version 4.4.0.
>>>>>>>>>>> > >>>>> > >> > 
>>>>>>>>>>> > >>>>> > >> > I want to interconnect two containers (systemd-nspawn) with veth
>>>>>>>>>>> > >>>>> > >> > interfaces running on these hosts in a server client setup.
>>>>>>>>>>> > >>>>> > >> > 
>>>>>>>>>>> > >>>>> > >> > So on the first host, where the server in the container runs I have
>>>>>>>>>>> > >>>>> > >> > the following rules:
>>>>>>>>>>> > >>>>> > >> > # nft list ruleset
>>>>>>>>>>> > >>>>> > >> > table ip nat {
>>>>>>>>>>> > >>>>> > >> >   chain prerouting {
>>>>>>>>>>> > >>>>> > >> >     type nat hook prerouting priority 0; policy accept;
>>>>>>>>>>> > >>>>> > >> >     tcp dport { 4506, 4505} dnat 10.0.0.2 
>>>>>>>>>>> > >>>>> > >> >   }
>>>>>>>>>>> > >>>>> > >> > 
>>>>>>>>>>> > >>>>> > >> >   chain output {
>>>>>>>>>>> > >>>>> > >> >     type nat hook output priority 0; policy accept;
>>>>>>>>>>> > >>>>> > >> >     tcp dport { 4505, 4506} dnat 10.0.0.2
>>>>>>>>>>> > >>>>> > >> >   }
>>>>>>>>>>> > >>>>> > >> > 
>>>>>>>>>>> > >>>>> > >> >   chain input {
>>>>>>>>>>> > >>>>> > >> >     type nat hook input priority 0; policy accept;
>>>>>>>>>>> > >>>>> > >> >   }
>>>>>>>>>>> > >>>>> > >> > 
>>>>>>>>>>> > >>>>> > >> >   chain postrouting {
>>>>>>>>>>> > >>>>> > >> >     type nat hook postrouting priority 0; policy accept;
>>>>>>>>>>> > >>>>> > >> >     ip saddr 10.0.0.0/8 oif enp4s0 masquerade 
>>>>>>>>>>> > >>>>> > >> >   }
>>>>>>>>>>> > >>>>> > >> > }
>>>>>>>>>>> > >>>>> > >> > 
>>>>>>>>>>> > >>>>> > >> > On the second host, where the client runs i have the following:
>>>>>>>>>>> > >>>>> > >> > # nft list ruleset
>>>>>>>>>>> > >>>>> > >> > table ip nat {
>>>>>>>>>>> > >>>>> > >> >   chain prerouting {
>>>>>>>>>>> > >>>>> > >> >     type nat hook prerouting priority 0; policy accept;
>>>>>>>>>>> > >>>>> > >> >   }
>>>>>>>>>>> > >>>>> > >> > 
>>>>>>>>>>> > >>>>> > >> >   chain output {
>>>>>>>>>>> > >>>>> > >> >     type nat hook output priority 0; policy accept;
>>>>>>>>>>> > >>>>> > >> >   }
>>>>>>>>>>> > >>>>> > >> > 
>>>>>>>>>>> > >>>>> > >> >   chain input {
>>>>>>>>>>> > >>>>> > >> >     type nat hook input priority 0; policy accept;
>>>>>>>>>>> > >>>>> > >> >   }
>>>>>>>>>>> > >>>>> > >> > 
>>>>>>>>>>> > >>>>> > >> >   chain postrouting {
>>>>>>>>>>> > >>>>> > >> >     type nat hook postrouting priority 0; policy accept;
>>>>>>>>>>> > >>>>> > >> >     ip saddr 10.0.0.0/8 oif enp0s31f6 masquerade 
>>>>>>>>>>> > >>>>> > >> >   }
>>>>>>>>>>> > >>>>> > >> > }
>>>>>>>>>>> > >>>>> > >> > 
>>>>>>>>>>> > >>>>> > >> > This works as expected and without any problems at all. Now IPSec
>>>>>>>>>>> > >>>>> > >> > enters the picture. As soon as I setup a policy to encrypt everyting
>>>>>>>>>>> > >>>>> > >> > between the two hosts the following happens:
>>>>>>>>>>> > >>>>> > >> > + I can still connect from the second host to the server in the
>>>>>>>>>>> > >>>>> > >> >   container without problems,
>>>>>>>>>>> > >>>>> > >> > + I can still /connect/ (i.e. establish a connection) from the
>>>>>>>>>>> > >>>>> > >> >   container on the second host to the server on the first host, but
>>>>>>>>>>> > >>>>> > >> > + in tcpdump listening on the interface of the container (on the
>>>>>>>>>>> > >>>>> > >> >   second host) I see lots of TCP Retransmissions and the TCP connection
>>>>>>>>>>> > >>>>> > >> >   is effectively broken.
>>>>>>>>>>> > >>>>> > >> > 
>>>>>>>>>>> > >>>>> > >> > Can someone give me a hint what is going on here?
>>>>>>> > >>> > > Did you find the root cause for this problem?
>>>>>>> > >>> > > --
>>>>>>> > >>> > > To unsubscribe from this list: send the line "unsubscribe netfilter" in
>>>>>>> > >>> > > the body of a message to majordomo@vger.kernel.org
>>>>>>> > >>> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>> > >>> > > 
>>>>> > >> > 
>>>>> > >> > Probably missing TCP MTU clamping. Normal problem.
>>>>> > >> > Can happen with broken PMTUD.
>>>>> > >> > 
>>>>> > >> > We also need the policy match module to support ipsec in nftables.
>>>>> > >> > Is that on the TODO list?
>>> > >
>>> > > I know Florian Westphal made a simple extension, he's got a patch in
>>> > > his queue. Trimming off most of it, just leaving this small chunk:
>>> > > 
>>> > > diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
>>> > > index 6c1e024..76b70e1 100644
>>> > > --- a/net/netfilter/nft_meta.c
>>> > > +++ b/net/netfilter/nft_meta.c
>>> > > @@ -190,6 +190,9 @@ void nft_meta_get_eval(const struct nft_expr
>>> > > *expr,
>>> > >                 *dest = prandom_u32_state(state);
>>> > >                 break;
>>> > >         }
>>> > > +       case NFT_META_SECPATH:
>>> > > +               *(__u8 *)dest = secpath_exists(skb);
>>> > > +               break;
>>> > >         default:
>>> > >                 WARN_ON(1);
>>> > >                 goto err;
>>> > > 
>>> > > Would this be enough for your usecase?
>> > 
>> > No, the problem is that in nftables, we can't tell apart ipsec
>> > protected packets from unprotected ones. But we need that, because
>> > generally, we want to treat them differently.  In iptables we can do
>> > that with -m policy [additional args], but there's nothing like that
>> > in nftables.  We need complete support for all the options of the
>> > policy match module in nftables.
> Are you using *all* options there? I'd appreciate if you can develop a
> bit the usecases where you use these different options.
> 
>> > I don't see what that three line patch actually does. Would you
>> > kindly elaborate?
> Allowing to match if the packet is protected/unprotected in a
> true/false fashion.
> 
> Thanks.

Well, I am active in the strongSwan community, so I believe I've seen all the
use cases there are and I've seen uses of every option, except "--next" and "--strict".
But I think there are probably use cases where they are used as well.

--spi, --reqid --tunnel-src, --tunnel-dst, --mode and --proto are used to identify different tunnels,
in e.g. a scenario where an IPsec enabled router is part of an IPsec protected LAN with host-to-host
transport mode tunnels with ah+esp bundles between the hosts
while providing IPsec VPN access from roadwarrior users using tunnel mode that are marked with a particular, unique mark value, as well as
site-to-site tunnels using tunnel mode. A userspace component is used to multiplex broadcast
and multicast packets from the LAN to roadwarriors, as well as between different roadwarriors by
listening for those packets and sending them out with the MARK value that was set on the IPsec SPs.

In this scenario --tunnel-src, --tunnel-dst and --mode are used to identify the host-to-host LAN transport mode tunnels.
--mode tunnel, --spi  and --tunnel-dst and --mode are used to identify the roadwarrior tunnels.
--reqid is used to identify particular tunnels, which are configured with a special reqid by the userspace IKE daemon
to specially handle certain connections in the firewall configuration.

--spi is used to identify several transport mode tunnel endpoints behind a NAT device.
The different peers negotiated different SAS and SPs. --spi is used to tell them apart
and mark the connections from those clients with different connmark values, to enable
conntrack to tell them apart, as well as enable an accounting system, as well as the firewlal
on the host to differentiate them.

I hope this text was enlightning. :)

--
Mit freundlichen Grüßen/Kind Regards,
Noel Kuntze

GPG Key ID: 0x63EC6658
Fingerprint: 23CA BB60 2146 05E7 7278 6592 3839 298F 63EC 6658



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: IPSec, masquerade and dnat with nftables
  2016-10-17 21:07           ` Noel Kuntze
@ 2016-10-18  8:59             ` Florian Westphal
  2016-10-18 20:38               ` Noel Kuntze
  0 siblings, 1 reply; 13+ messages in thread
From: Florian Westphal @ 2016-10-18  8:59 UTC (permalink / raw)
  To: Noel Kuntze; +Cc: Pablo Neira Ayuso, Thomas Bach, netfilter, fw

Noel Kuntze <noel@familie-kuntze.de> wrote:
> On 17.10.2016 22:27, Pablo Neira Ayuso wrote:
[..]

> > Allowing to match if the packet is protected/unprotected in a
> > true/false fashion.
>
> Well, I am active in the strongSwan community, so I believe I've seen all the
> use cases there are and I've seen uses of every option, except "--next" and "--strict".
> But I think there are probably use cases where they are used as well.

Ok.  I still believe that 'meta secpath' makes sense as a more simple
alternative, I think most users are just interested in 'was this packet
ipsec protected' rather than doing the full policy option dance.

Wrt. -m policy in nftables, we have two different cases:

1. Check if a given daddr/saddr/spi etc is listed in *any* of the policies.
2. Check if a given policy contains the exact spi/daddr/saddr.

As first rfc, what about the below syntax?

It adds one expression (to load a given policy element into a register)
and one statement (to search policies for a given number/address).

add rule filter input xfrm policy direction original 0 spi eq 1

would take input policies, grab first one (policy[0]), get its spi and
place it into a register (i.e., the 'eq 1' is not part of the xfrm
expression, only 'spi' is passed as key so we know what to look for).

Chaining these would allow the strict mode matching, but as you might
imagine it would be quite bloated to do exact matching :-/

Statement would look like this:
add rule filter input xfrm policy direction original spi 1

... it would search all input policies for spi 1.
(i.e., 1 is passed as immediate value to the xfrm expression).

Thoughts?
Does anyone see a -m policy case that we could not cover with this?

diff --git a/src/parser_bison.y b/src/parser_bison.y
--- a/src/parser_bison.y
+++ b/src/parser_bison.y
@@ -420,6 +420,10 @@ static void location_update(struct location *loc, struct location *rhs, int n)
 %token XML			"xml"
 %token JSON			"json"
 
+%token XFRM			"xfrm"
+%token MODE			"mode"
+%token REQID			"reqid"
+
 %type <string>			identifier type_identifier string comment_spec
 %destructor { xfree($$); }	identifier type_identifier string comment_spec
 
@@ -600,6 +604,12 @@ static void location_update(struct location *loc, struct location *rhs, int n)
 %destructor { xfree($$); }	monitor_event
 %type <val>			monitor_object	monitor_format
 
+%type <val>			policy_type
+%type <expr>			policy_expr
+%type <stmt>			policy_stmt
+%destructor { expr_free($$); }	policy_expr
+%destructor { stmt_free($$); }	policy_stmt
+
 %%
 
 input			:	/* empty */
@@ -1396,6 +1406,7 @@ stmt			:	verdict_stmt
 			|	dup_stmt
 			|	fwd_stmt
 			|	set_stmt
+			|	policy_stmt
 			;
 
 verdict_stmt		:	verdict_expr
@@ -1983,6 +1994,7 @@ primary_expr		:	symbol_expr			{ $$ = $1; }
 			|	ct_expr				{ $$ = $1; }
 			|	numgen_expr			{ $$ = $1; }
 			|	hash_expr			{ $$ = $1; }
+			|	policy_expr			{ $$ = $1; }
 			|	'('	basic_expr	')'	{ $$ = $2; }
 			;
 
@@ -2480,6 +2492,49 @@ numgen_expr		:	NUMGEN	numgen_type	MOD	NUM
 			}
 			;
 
+policy_expr		:	XFRM	POLICY	DIRECTION	STRING	NUM	policy_type
+			{
+				struct error_record *erec;
+				int8_t direction;
+
+				erec = ct_dir_parse(&@$, $4, &direction);
+				if (erec != NULL) {
+					erec_queue(erec, state->msgs);
+					YYERROR;
+				}
+#if 0
+				$5 = which policy header in pol[] array
+				$6: what elem of policy 'header'
+#endif
+				$$ = meta_expr_alloc(&@$, 1);
+			}
+			;
+
+policy_stmt		: XFRM	POLICY	DIRECTION	STRING		policy_type	integer_expr
+			{
+				struct error_record *erec;
+				int8_t direction;
+
+				erec = ct_dir_parse(&@$, $4, &direction);
+				if (erec != NULL) {
+					erec_queue(erec, state->msgs);
+					YYERROR;
+				}
+#if 0
+				$5: what elem of policy 'header' to check against
+#endif
+				$$ = meta_stmt_alloc(&@$, 2, $6);
+			}
+			;
+
+policy_type		:	SPI	{ $$ = 1; }
+			|	REQID   { $$ = 2; }
+			|	PROTOCOL { $$ = 3; }
+			|	MODE  { $$ = 4; }
+			|	SADDR { $$ = 5; }
+			|	DADDR { $$ = 6; }
+			;
+
 hash_expr		:	JHASH	expr	MOD	NUM	SEED	NUM
 			{
 				$$ = hash_expr_alloc(&@$, $4, $6);
diff --git a/src/scanner.l b/src/scanner.l
index 8b5a383bd095..c18003459a12 100644
--- a/src/scanner.l
+++ b/src/scanner.l
@@ -480,6 +480,11 @@ addrstring	({macaddr}|{ip4addr}|{ip6addr})
 "xml"			{ return XML; }
 "json"			{ return JSON; }
 
+
+"mode"			{ return MODE; }
+"reqid"			{ return REQID; }
+"xfrm"			{ return XFRM; }
+
 {addrstring}		{
 				yylval->string = xstrdup(yytext);
 				return STRING;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: IPSec, masquerade and dnat with nftables
  2016-10-17 19:44 ` Pablo Neira Ayuso
  2016-10-17 19:52   ` Noel Kuntze
@ 2016-10-18  9:39   ` Thomas Bach
  2016-10-18 11:33     ` Noel Kuntze
  1 sibling, 1 reply; 13+ messages in thread
From: Thomas Bach @ 2016-10-18  9:39 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter

Hi there,

Pablo Neira Ayuso <pablo@netfilter.org> writes:

> On Fri, Sep 09, 2016 at 09:06:59AM +0200, Thomas Bach wrote:
>> Hi,
>> 
>> I have two hosts with public ip addresses running Ubuntu 16.04 with
>> Kernel version 4.4.0.
>> 
>> I want to interconnect two containers (systemd-nspawn) with veth
>> interfaces running on these hosts in a server client setup.
>>
>> […]
>> 
>> This works as expected and without any problems at all. Now IPSec
>> enters the picture. As soon as I setup a policy to encrypt everyting
>> between the two hosts the following happens:
>> + I can still connect from the second host to the server in the
>>   container without problems,
>> + I can still /connect/ (i.e. establish a connection) from the
>>   container on the second host to the server on the first host, but
>> + in tcpdump listening on the interface of the container (on the
>>   second host) I see lots of TCP Retransmissions and the TCP connection
>>   is effectively broken.
>> 
>> Can someone give me a hint what is going on here?
>
> Did you find the root cause for this problem?

Actually not. I worked around the issue by switching from the
"ipsec-tools" package (i.e. static rules and keying done by hand) to
strongswan. Now the whole setup works as intended with the rules being
more or less the ones cited in my original post.

It would be nice to know what the differences are on the package level
between strongswan configured ipsec and the ones configured via
ipsec-tools.

Regards

        Thomas.
-- 
ilexius GmbH
Thomas Bach
Unter den Eichen 5
Haus i
65195 Wiesbaden
Fon: +49-(0)611 - 180 33 49
Fax: +49-(0)611 - 236 80 84 29
----------------------------------------
ilexius GmbH
vertreten durch die Geschäftsleitung:
Thomas Schlüter und Sebastian Koch
Registergericht: Wiesbaden
Handelsregister: HRB 21723
Steuernummer: 040 236 22640
Ust-IdNr.: DE240822836
----------------------------------------

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: IPSec, masquerade and dnat with nftables
  2016-10-18  9:39   ` Thomas Bach
@ 2016-10-18 11:33     ` Noel Kuntze
  0 siblings, 0 replies; 13+ messages in thread
From: Noel Kuntze @ 2016-10-18 11:33 UTC (permalink / raw)
  To: Thomas Bach, Pablo Neira Ayuso; +Cc: netfilter


[-- Attachment #1.1: Type: text/plain, Size: 2018 bytes --]

On 18.10.2016 11:39, Thomas Bach wrote:
> Hi there,
> 
> Pablo Neira Ayuso <pablo@netfilter.org> writes:
> 
>> > On Fri, Sep 09, 2016 at 09:06:59AM +0200, Thomas Bach wrote:
>>> >> Hi,
>>> >> 
>>> >> I have two hosts with public ip addresses running Ubuntu 16.04 with
>>> >> Kernel version 4.4.0.
>>> >> 
>>> >> I want to interconnect two containers (systemd-nspawn) with veth
>>> >> interfaces running on these hosts in a server client setup.
>>> >>
>>> >> […]
>>> >> 
>>> >> This works as expected and without any problems at all. Now IPSec
>>> >> enters the picture. As soon as I setup a policy to encrypt everyting
>>> >> between the two hosts the following happens:
>>> >> + I can still connect from the second host to the server in the
>>> >>   container without problems,
>>> >> + I can still /connect/ (i.e. establish a connection) from the
>>> >>   container on the second host to the server on the first host, but
>>> >> + in tcpdump listening on the interface of the container (on the
>>> >>   second host) I see lots of TCP Retransmissions and the TCP connection
>>> >>   is effectively broken.
>>> >> 
>>> >> Can someone give me a hint what is going on here?
>> >
>> > Did you find the root cause for this problem?
> Actually not. I worked around the issue by switching from the
> "ipsec-tools" package (i.e. static rules and keying done by hand) to
> strongswan. Now the whole setup works as intended with the rules being
> more or less the ones cited in my original post.
> 
> It would be nice to know what the differences are on the package level
> between strongswan configured ipsec and the ones configured via
> ipsec-tools.

You'll have to figure that out by yourself, I don't know what racoon configures.
If racoon actually uses XFRM, the differences should only be the configuration
of the SAs.

-- 

Mit freundlichen Grüßen/Kind Regards,
Noel Kuntze

GPG Key ID: 0x63EC6658
Fingerprint: 23CA BB60 2146 05E7 7278 6592 3839 298F 63EC 6658



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: IPSec, masquerade and dnat with nftables
  2016-10-18  8:59             ` Florian Westphal
@ 2016-10-18 20:38               ` Noel Kuntze
  2016-10-18 20:55                 ` Florian Westphal
  0 siblings, 1 reply; 13+ messages in thread
From: Noel Kuntze @ 2016-10-18 20:38 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Pablo Neira Ayuso, Thomas Bach, netfilter


[-- Attachment #1.1: Type: text/plain, Size: 2514 bytes --]

On 18.10.2016 10:59, Florian Westphal wrote:
> Noel Kuntze <noel@familie-kuntze.de> wrote:
>> > On 17.10.2016 22:27, Pablo Neira Ayuso wrote:
> [..]
> 
>>> > > Allowing to match if the packet is protected/unprotected in a
>>> > > true/false fashion.
>> >
>> > Well, I am active in the strongSwan community, so I believe I've seen all the
>> > use cases there are and I've seen uses of every option, except "--next" and "--strict".
>> > But I think there are probably use cases where they are used as well.
> Ok.  I still believe that 'meta secpath' makes sense as a more simple
> alternative, I think most users are just interested in 'was this packet
> ipsec protected' rather than doing the full policy option dance.
> 
> Wrt. -m policy in nftables, we have two different cases:
> 
> 1. Check if a given daddr/saddr/spi etc is listed in *any* of the policies.
> 2. Check if a given policy contains the exact spi/daddr/saddr.
> 
> As first rfc, what about the below syntax?
> 
> It adds one expression (to load a given policy element into a register)
> and one statement (to search policies for a given number/address).
> 
> add rule filter input xfrm policy direction original 0 spi eq 1
> 
> would take input policies, grab first one (policy[0]), get its spi and
> place it into a register (i.e., the 'eq 1' is not part of the xfrm
> expression, only 'spi' is passed as key so we know what to look for).
> 
> Chaining these would allow the strict mode matching, but as you might
> imagine it would be quite bloated to do exact matching :-/
> 
> Statement would look like this:
> add rule filter input xfrm policy direction original spi 1
> 
> ... it would search all input policies for spi 1.
> (i.e., 1 is passed as immediate value to the xfrm expression).
> 
> Thoughts?
> Does anyone see a -m policy case that we could not cover with this?
> [SNIP]

*if* we can have all the options and data we can get with -m policy
(except --strict and --next, obviously) in nftables, then yes,
I think all use cases would be covered.

> 1. Check if a given daddr/saddr/spi etc is listed in *any* of the policies.
Any in the SPD or any that matched?
> 2. Check if a given policy contains the exact spi/daddr/saddr.
The exact SPI/daddr/saddr of what? Of a set policy match (whatever that might be)
in nftables?


-- 

Mit freundlichen Grüßen/Kind Regards,
Noel Kuntze

GPG Key ID: 0x63EC6658
Fingerprint: 23CA BB60 2146 05E7 7278 6592 3839 298F 63EC 6658



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: IPSec, masquerade and dnat with nftables
  2016-10-18 20:38               ` Noel Kuntze
@ 2016-10-18 20:55                 ` Florian Westphal
  2016-10-18 21:50                   ` Noel Kuntze
  0 siblings, 1 reply; 13+ messages in thread
From: Florian Westphal @ 2016-10-18 20:55 UTC (permalink / raw)
  To: Noel Kuntze; +Cc: Florian Westphal, Pablo Neira Ayuso, Thomas Bach, netfilter

Noel Kuntze <noel@familie-kuntze.de> wrote:
> On 18.10.2016 10:59, Florian Westphal wrote:
> > Noel Kuntze <noel@familie-kuntze.de> wrote:
> >> > On 17.10.2016 22:27, Pablo Neira Ayuso wrote:
> > [..]
> > 
> >>> > > Allowing to match if the packet is protected/unprotected in a
> >>> > > true/false fashion.
> >> >
> >> > Well, I am active in the strongSwan community, so I believe I've seen all the
> >> > use cases there are and I've seen uses of every option, except "--next" and "--strict".
> >> > But I think there are probably use cases where they are used as well.
> > Ok.  I still believe that 'meta secpath' makes sense as a more simple
> > alternative, I think most users are just interested in 'was this packet
> > ipsec protected' rather than doing the full policy option dance.
> > 
> > Wrt. -m policy in nftables, we have two different cases:
> > 
> > 1. Check if a given daddr/saddr/spi etc is listed in *any* of the policies.
> > 2. Check if a given policy contains the exact spi/daddr/saddr.
> > 
> > As first rfc, what about the below syntax?
> > 
> > It adds one expression (to load a given policy element into a register)
> > and one statement (to search policies for a given number/address).
> > 
> > add rule filter input xfrm policy direction original 0 spi eq 1
> > 
> > would take input policies, grab first one (policy[0]), get its spi and
> > place it into a register (i.e., the 'eq 1' is not part of the xfrm
> > expression, only 'spi' is passed as key so we know what to look for).
> > 
> > Chaining these would allow the strict mode matching, but as you might
> > imagine it would be quite bloated to do exact matching :-/
> > 
> > Statement would look like this:
> > add rule filter input xfrm policy direction original spi 1
> > 
> > ... it would search all input policies for spi 1.
> > (i.e., 1 is passed as immediate value to the xfrm expression).
> > 
> > Thoughts?
> > Does anyone see a -m policy case that we could not cover with this?
> > [SNIP]
> 
> *if* we can have all the options and data we can get with -m policy
> (except --strict and --next, obviously) in nftables, then yes,
> I think all use cases would be covered.
> 
> > 1. Check if a given daddr/saddr/spi etc is listed in *any* of the policies.
> Any in the SPD or any that matched?

Policies in the secpath or xfrm dst entry.

http://lxr.free-electrons.com/source/net/netfilter/xt_policy.c#L55

Searching SPD doesn't seem useful to me unless we want to do xfrm
encap/decap from nft itself.

> > 2. Check if a given policy contains the exact spi/daddr/saddr.
> The exact SPI/daddr/saddr of what? Of a set policy match (whatever that might be)
> in nftables?

Same as above, except that it would check sp->xvec[X], for a fixed (user
defined) value of X, rather then searching all of sp->xvec[].

Or, putting it differently, in 1) user providides data (ip address,
spi, ...) and sp->xvec is the haystack we will search in.

I expect most users and use cases are covered by this, rather than 2).

For 2), user gives a policy index and tells us if they want saddr,
daddr, spi or reqid and we will then copy it to a register.

(Where another nft expression, e.g. cmp, can evaluate it)

So 2) is only needed when exact matching of the entire policies
is requested (--strict) mode.

If you think we can go without strict, then only 1) is needed.

The drawback is that 1) is very un-nftables like, but alas, I don't
think we can avoid it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: IPSec, masquerade and dnat with nftables
  2016-10-18 20:55                 ` Florian Westphal
@ 2016-10-18 21:50                   ` Noel Kuntze
  0 siblings, 0 replies; 13+ messages in thread
From: Noel Kuntze @ 2016-10-18 21:50 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Pablo Neira Ayuso, Thomas Bach, netfilter


[-- Attachment #1.1: Type: text/plain, Size: 1241 bytes --]

On 18.10.2016 22:55, Florian Westphal wrote:
> Same as above, except that it would check sp->xvec[X], for a fixed (user
> defined) value of X, rather then searching all of sp->xvec[].
> 
> Or, putting it differently, in 1) user providides data (ip address,
> spi, ...) and sp->xvec is the haystack we will search in.
> 
> I expect most users and use cases are covered by this, rather than 2).
> 
> For 2), user gives a policy index and tells us if they want saddr,
> daddr, spi or reqid and we will then copy it to a register.
> 
> (Where another nft expression, e.g. cmp, can evaluate it)
> 
> So 2) is only needed when exact matching of the entire policies
> is requested (--strict) mode.
> 
> If you think we can go without strict, then only 1) is needed.
> 
> The drawback is that 1) is very un-nftables like, but alas, I don't
> think we can avoid it.


well, I think being able to search all policies would be a nifty thing to have.
But sure, doing the first thing would be much better and more suitable as a
replacement for -m policy in nftables.



-- 

Mit freundlichen Grüßen/Kind Regards,
Noel Kuntze

GPG Key ID: 0x63EC6658
Fingerprint: 23CA BB60 2146 05E7 7278 6592 3839 298F 63EC 6658



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-10-18 21:50 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-09  7:06 IPSec, masquerade and dnat with nftables Thomas Bach
2016-10-17 19:44 ` Pablo Neira Ayuso
2016-10-17 19:52   ` Noel Kuntze
2016-10-17 20:11     ` Pablo Neira Ayuso
2016-10-17 20:17       ` Noel Kuntze
2016-10-17 20:27         ` Pablo Neira Ayuso
2016-10-17 21:07           ` Noel Kuntze
2016-10-18  8:59             ` Florian Westphal
2016-10-18 20:38               ` Noel Kuntze
2016-10-18 20:55                 ` Florian Westphal
2016-10-18 21:50                   ` Noel Kuntze
2016-10-18  9:39   ` Thomas Bach
2016-10-18 11:33     ` Noel Kuntze

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.