All of lore.kernel.org
 help / color / mirror / Atom feed
* nftables: masquerade sets wrong source address
@ 2016-12-13 13:28 Tom Hacohen
  2016-12-13 14:32 ` /dev/rob0
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Tom Hacohen @ 2016-12-13 13:28 UTC (permalink / raw)
  To: netfilter

Hi,

I've recently migrated from iptables (no modules loaded anymore) to
nftables and came across a weird situation that looks like a bug to
me.

When using "masquerade" it always sets the ip address to that of one
of my interfaces, and not per interface as one would expect.

My config:

flush ruleset

table inet filter {
    chain input {
        type filter hook input priority 0; policy accept;

        iifname lo log accept
    }
    chain output {
        type filter hook output priority 0; policy accept;
    }
}

table ip nat {
    chain postrouting {
        type nat hook postrouting priority 100;
        masquerade
    }
}


With this, connections to localhost fail because the masquerade line
sets the source IP to that of the wlp1s0 interface, and not of the lo
interface.

Here is output from the log:
IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
SRC=192.168.86.18 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64
ID=64500 DF PROTO=TCP SPT=36844 DPT=8000 WINDOW=43690 RES=0x00 SYN
URGP=0

You can see how the source ip is wrong. This is from running "curl"
trying to connect to a local http server on port 8000.

Removing the masquerade line, or changing it to: "oifname wlp1s0
masquerade" fixes it, but this is just a workaround that will fail in
more complex situations.

I would have loved to provide you with tracing information, but
unfortunately I never got that to work for me.

Tried with kernels: 4.8.12 and 4.4.35 on arch linux. Nft version is 0.6.

Please let me know if there's any other info you'd like me to provide you with.

Thanks,
Tom.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: nftables: masquerade sets wrong source address
  2016-12-13 13:28 nftables: masquerade sets wrong source address Tom Hacohen
@ 2016-12-13 14:32 ` /dev/rob0
  2016-12-13 14:53   ` Tom Hacohen
  2016-12-14 22:28 ` Pablo Neira Ayuso
  2016-12-17 14:18 ` Liping Zhang
  2 siblings, 1 reply; 15+ messages in thread
From: /dev/rob0 @ 2016-12-13 14:32 UTC (permalink / raw)
  To: netfilter

On Tue, Dec 13, 2016 at 01:28:41PM +0000, Tom Hacohen wrote:
> Removing the masquerade line, or changing it to: "oifname wlp1s0 
> masquerade" fixes it, but this is just a workaround that will fail 
> in more complex situations.

ISTM you'd always want to limit a MASQ/SNAT rule by outgoing 
interface.  I don't get why that was "ugly" (as you said in IRC) or 
likely to fail ... well, certainly if using that ruleset where the 
default gateway was on some other interface, but so what?  Adjust 
your rule to suit the situation.

In more complex situations, such as multiple Internet connections 
with policy routing, masquerade is not appropriate.  You'd have to 
use SNAT.

If you're doing NAT among RFC 1918 networks, YDIW.  Fix the routing.
-- 
  http://rob0.nodns4.us/
  Offlist GMX mail is seen only if "/dev/rob0" is in the Subject:

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: nftables: masquerade sets wrong source address
  2016-12-13 14:32 ` /dev/rob0
@ 2016-12-13 14:53   ` Tom Hacohen
  0 siblings, 0 replies; 15+ messages in thread
From: Tom Hacohen @ 2016-12-13 14:53 UTC (permalink / raw)
  To: netfilter

On Tue, Dec 13, 2016 at 2:32 PM, /dev/rob0 <rob0@gmx.co.uk> wrote:
> On Tue, Dec 13, 2016 at 01:28:41PM +0000, Tom Hacohen wrote:
>> Removing the masquerade line, or changing it to: "oifname wlp1s0
>> masquerade" fixes it, but this is just a workaround that will fail
>> in more complex situations.
>
> ISTM you'd always want to limit a MASQ/SNAT rule by outgoing
> interface.  I don't get why that was "ugly" (as you said in IRC) or
> likely to fail ... well, certainly if using that ruleset where the
> default gateway was on some other interface, but so what?  Adjust
> your rule to suit the situation.
>
> In more complex situations, such as multiple Internet connections
> with policy routing, masquerade is not appropriate.  You'd have to
> use SNAT.
>
> If you're doing NAT among RFC 1918 networks, YDIW.  Fix the routing.

Perhaps "ugly" wasn't the best choice of words, also, as you said, maybe
more complex situations will require other more complex configurations
so this simple case wouldn't matter anyway.

However, let's leave that aside for a moment and consider the test
case I provided in my original email. If masquerade is turned on for "lo"
it will set the wrong address. Is that not an issue? Or at least an
indication something else may be broken there?

At the very least, I found this behaviour surprising.

--
Tom

> --
>   http://rob0.nodns4.us/
>   Offlist GMX mail is seen only if "/dev/rob0" is in the Subject:
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: nftables: masquerade sets wrong source address
  2016-12-13 13:28 nftables: masquerade sets wrong source address Tom Hacohen
  2016-12-13 14:32 ` /dev/rob0
@ 2016-12-14 22:28 ` Pablo Neira Ayuso
  2016-12-15 11:34   ` Tom Hacohen
  2016-12-17 14:18 ` Liping Zhang
  2 siblings, 1 reply; 15+ messages in thread
From: Pablo Neira Ayuso @ 2016-12-14 22:28 UTC (permalink / raw)
  To: Tom Hacohen; +Cc: netfilter

Hi Tom,

On Tue, Dec 13, 2016 at 01:28:41PM +0000, Tom Hacohen wrote:
> Hi,
> 
> I've recently migrated from iptables (no modules loaded anymore) to
> nftables and came across a weird situation that looks like a bug to
> me.
> 
> When using "masquerade" it always sets the ip address to that of one
> of my interfaces, and not per interface as one would expect.
> 
> My config:
> 
> flush ruleset
> 
> table inet filter {
>     chain input {
>         type filter hook input priority 0; policy accept;
> 
>         iifname lo log accept
>     }
>     chain output {
>         type filter hook output priority 0; policy accept;
>     }
> }
> 
> table ip nat {
>     chain postrouting {
>         type nat hook postrouting priority 100;
>         masquerade
>     }
> }
> 
> 
> With this, connections to localhost fail because the masquerade line
> sets the source IP to that of the wlp1s0 interface, and not of the lo
> interface.
> 
> Here is output from the log:
> IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
> SRC=192.168.86.18 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64
> ID=64500 DF PROTO=TCP SPT=36844 DPT=8000 WINDOW=43690 RES=0x00 SYN
> URGP=0
> 
> You can see how the source ip is wrong. This is from running "curl"
> trying to connect to a local http server on port 8000.
> 
> Removing the masquerade line, or changing it to: "oifname wlp1s0
> masquerade" fixes it, but this is just a workaround that will fail in
> more complex situations.
> 
> I would have loved to provide you with tracing information, but
> unfortunately I never got that to work for me.
> 
> Tried with kernels: 4.8.12 and 4.4.35 on arch linux. Nft version is 0.6.
> 
> Please let me know if there's any other info you'd like me to provide you with.

I don't remember if this behaviour has been always the case. Would you
please check what has been the behaviour in old kernels?

nftables shares this masquerade code with iptables, so you can test
this with iptables in older kernels.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: nftables: masquerade sets wrong source address
  2016-12-14 22:28 ` Pablo Neira Ayuso
@ 2016-12-15 11:34   ` Tom Hacohen
  2016-12-15 21:29     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 15+ messages in thread
From: Tom Hacohen @ 2016-12-15 11:34 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter

On Wed, Dec 14, 2016 at 10:28 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Hi Tom,
>
> On Tue, Dec 13, 2016 at 01:28:41PM +0000, Tom Hacohen wrote:
>> Hi,
>>
>> I've recently migrated from iptables (no modules loaded anymore) to
>> nftables and came across a weird situation that looks like a bug to
>> me.
>>
>> When using "masquerade" it always sets the ip address to that of one
>> of my interfaces, and not per interface as one would expect.
>>
>> My config:
>>
>> flush ruleset
>>
>> table inet filter {
>>     chain input {
>>         type filter hook input priority 0; policy accept;
>>
>>         iifname lo log accept
>>     }
>>     chain output {
>>         type filter hook output priority 0; policy accept;
>>     }
>> }
>>
>> table ip nat {
>>     chain postrouting {
>>         type nat hook postrouting priority 100;
>>         masquerade
>>     }
>> }
>>
>>
>> With this, connections to localhost fail because the masquerade line
>> sets the source IP to that of the wlp1s0 interface, and not of the lo
>> interface.
>>
>> Here is output from the log:
>> IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
>> SRC=192.168.86.18 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64
>> ID=64500 DF PROTO=TCP SPT=36844 DPT=8000 WINDOW=43690 RES=0x00 SYN
>> URGP=0
>>
>> You can see how the source ip is wrong. This is from running "curl"
>> trying to connect to a local http server on port 8000.
>>
>> Removing the masquerade line, or changing it to: "oifname wlp1s0
>> masquerade" fixes it, but this is just a workaround that will fail in
>> more complex situations.
>>
>> I would have loved to provide you with tracing information, but
>> unfortunately I never got that to work for me.
>>
>> Tried with kernels: 4.8.12 and 4.4.35 on arch linux. Nft version is 0.6.
>>
>> Please let me know if there's any other info you'd like me to provide you with.
>
> I don't remember if this behaviour has been always the case. Would you
> please check what has been the behaviour in old kernels?
>
> nftables shares this masquerade code with iptables, so you can test
> this with iptables in older kernels.


Hey,

Thanks for your reply.

I'm sorry, but I don't have access to older kernels. Furthermore, this
worked with
iptables on the same kernel version using the same rules as far as I can tell.
I therefore suspect (without knowing the code) that maybe nftables is trying
to masquerade all packets while iptables maybe has a noop when there was no
NAT applied, or if the address is already set correctly? That is the best
explanation that comes to mind given your assertion about them sharing the
masquerade code.

Thanks.

--
Tom.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: nftables: masquerade sets wrong source address
  2016-12-15 11:34   ` Tom Hacohen
@ 2016-12-15 21:29     ` Pablo Neira Ayuso
  2016-12-15 22:47       ` Tom Hacohen
  0 siblings, 1 reply; 15+ messages in thread
From: Pablo Neira Ayuso @ 2016-12-15 21:29 UTC (permalink / raw)
  To: Tom Hacohen; +Cc: netfilter

Hi Tom,

On Thu, Dec 15, 2016 at 11:34:57AM +0000, Tom Hacohen wrote:
> Hey,
> 
> Thanks for your reply.
> 
> I'm sorry, but I don't have access to older kernels. Furthermore,
> this worked with iptables on the same kernel version using the same
> rules as far as I can tell.

Hm, this is working with the same kernel version in iptables? What
kernel version are you using?

> I therefore suspect (without knowing the code) that maybe nftables is trying
> to masquerade all packets while iptables maybe has a noop when there was no
> NAT applied, or if the address is already set correctly? That is the best
> explanation that comes to mind given your assertion about them sharing the
> masquerade code.

Both nft and iptables share the same codebase for NAT/masquerade, so
if this works with iptables, it shoud work with nft too in the same
way.

Please, confirm this, it would be good if we get to the core of the
problem. If the behaviour differs, or started to differ from some
kernel version on, then this is a bug.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: nftables: masquerade sets wrong source address
  2016-12-15 21:29     ` Pablo Neira Ayuso
@ 2016-12-15 22:47       ` Tom Hacohen
  2016-12-16  0:04         ` Tom Hacohen
  0 siblings, 1 reply; 15+ messages in thread
From: Tom Hacohen @ 2016-12-15 22:47 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter

On Thu, Dec 15, 2016 at 9:29 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Hi Tom,
>
> On Thu, Dec 15, 2016 at 11:34:57AM +0000, Tom Hacohen wrote:
>> Hey,
>>
>> Thanks for your reply.
>>
>> I'm sorry, but I don't have access to older kernels. Furthermore,
>> this worked with iptables on the same kernel version using the same
>> rules as far as I can tell.
>
> Hm, this is working with the same kernel version in iptables? What
> kernel version are you using?
>
>> I therefore suspect (without knowing the code) that maybe nftables is trying
>> to masquerade all packets while iptables maybe has a noop when there was no
>> NAT applied, or if the address is already set correctly? That is the best
>> explanation that comes to mind given your assertion about them sharing the
>> masquerade code.
>
> Both nft and iptables share the same codebase for NAT/masquerade, so
> if this works with iptables, it shoud work with nft too in the same
> way.
>
> Please, confirm this, it would be good if we get to the core of the
> problem. If the behaviour differs, or started to differ from some
> kernel version on, then this is a bug.
>

Hi,

I can't be sure what I tested on before, because I had a setup that
used to work, and then I switched to nftables. I don't remember if I
updated the kernel since migrating, but even if I had, it was 4.4.x
for sure and probably 4.4.34/35. However, I can say for certain that
it used to work for years until very recently, whatever the reason may
have been.

FWIW, I'm now testing on 4.8.13 and can confirm it's broken with both.

I just tested with iptables, and the same masquerade happens for that,
so I can confirm that the behaviour is the same with this
configuration.

SRC=192.168.86.18 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64
ID=46247 DF PROTO=TCP SPT=46026 DPT=8000 WINDOW=43690 RES=0x00 SYN
URGP=0
SRC=127.0.0.1 DST=127.0.0.1 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=59070
DF PROTO=TCP SPT=8000 DPT=46026 WINDOW=0 RES=0x00 ACK RST URGP=0

Here's the iptables config I used to get these results, which (to me)
seems identical to the original nftables config:
*nat
:PREROUTING ACCEPT [4:252]
:INPUT ACCEPT [4:252]
:OUTPUT ACCEPT [1:76]
:POSTROUTING ACCEPT [0:0]
-A POSTROUTING -j MASQUERADE
COMMIT

*filter
:INPUT DROP [0:0]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -i lo -j LOG
-A INPUT -i lo -j ACCEPT
COMMIT

Other the above, there's nothing interesting in my previous
"production" iptables config that used to work. Unfortunately I can't
abuse the production machine any longer by changing the firewall rules
or jumping back and forth between iptables and nftables.

Is there anyone with an older lts kernel that can check this? If so,
don't forget to clear your modules, because the nftables nat modules
clash with the iptable ones.

Regression-hunting aside for a moment, it still looks like a bug to
me, even if that bug shared with iptables. There is a simple
workaround, just don't masquerade for lo, but still, this looks like
something that should be fixed.

Please let me know if there's anything else I can do to help.

--
Tom

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: nftables: masquerade sets wrong source address
  2016-12-15 22:47       ` Tom Hacohen
@ 2016-12-16  0:04         ` Tom Hacohen
  0 siblings, 0 replies; 15+ messages in thread
From: Tom Hacohen @ 2016-12-16  0:04 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter

I'm very confused, just ran the same iptables rules on a freshly
booted different box running 4.4.38 and got:

IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
SRC=192.168.86.10 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64
ID=17313 DF PROTO=TCP SPT=34548 DPT=8000 WINDOW=43690 RES=0x00 SYN
URGP=0
IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1
DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP
SPT=8000 DPT=34548 WINDOW=43690 RES=0x00 ACK SYN URGP=0

On the other hand, again, after a fresh boot, this time with nftables,
I'm getting:

IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
SRC=192.168.86.10 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64
ID=21924 DF PROTO=TCP SPT=48948 DPT=8000 WINDOW=43690 RES=0x00 SYN
URGP=0
IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1
DST=192.168.86.10 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP
SPT=8000 DPT=48948 WINDOW=43690 RES=0x00 ACK SYN URGP=0
IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
SRC=192.168.86.10 DST=127.0.0.1 LEN=40 TOS=0x00 PREC=0x00 TTL=64
ID=58958 DF PROTO=TCP SPT=48948 DPT=8000 WINDOW=0 RES=0x00 RST URGP=0

I have no idea what is going on. In all of the tests in this thread
I've been using "python3 -m http.server" as the test http server and
curl as the client.

Can you make sense of why I it replies to 127.0.0.1 with iptables, and
to 192.168.86.10 (my lan ip) on the other? Also, why doesn't it
masquerade the packets after the first one in the iptables case?

I also made two additional tests: I started my box ran nftables, it
failed as expected, then I ran iptables, it worked and then I ran
nftables, and it *worked*.
Maybe when iptables initialises masquerading it does it differently
from nftables?

Is there anything else you'd like me to test?

On Thu, Dec 15, 2016 at 10:47 PM, Tom Hacohen <tom@stosb.com> wrote:
> On Thu, Dec 15, 2016 at 9:29 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>> Hi Tom,
>>
>> On Thu, Dec 15, 2016 at 11:34:57AM +0000, Tom Hacohen wrote:
>>> Hey,
>>>
>>> Thanks for your reply.
>>>
>>> I'm sorry, but I don't have access to older kernels. Furthermore,
>>> this worked with iptables on the same kernel version using the same
>>> rules as far as I can tell.
>>
>> Hm, this is working with the same kernel version in iptables? What
>> kernel version are you using?
>>
>>> I therefore suspect (without knowing the code) that maybe nftables is trying
>>> to masquerade all packets while iptables maybe has a noop when there was no
>>> NAT applied, or if the address is already set correctly? That is the best
>>> explanation that comes to mind given your assertion about them sharing the
>>> masquerade code.
>>
>> Both nft and iptables share the same codebase for NAT/masquerade, so
>> if this works with iptables, it shoud work with nft too in the same
>> way.
>>
>> Please, confirm this, it would be good if we get to the core of the
>> problem. If the behaviour differs, or started to differ from some
>> kernel version on, then this is a bug.
>>
>
> Hi,
>
> I can't be sure what I tested on before, because I had a setup that
> used to work, and then I switched to nftables. I don't remember if I
> updated the kernel since migrating, but even if I had, it was 4.4.x
> for sure and probably 4.4.34/35. However, I can say for certain that
> it used to work for years until very recently, whatever the reason may
> have been.
>
> FWIW, I'm now testing on 4.8.13 and can confirm it's broken with both.
>
> I just tested with iptables, and the same masquerade happens for that,
> so I can confirm that the behaviour is the same with this
> configuration.
>
> SRC=192.168.86.18 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64
> ID=46247 DF PROTO=TCP SPT=46026 DPT=8000 WINDOW=43690 RES=0x00 SYN
> URGP=0
> SRC=127.0.0.1 DST=127.0.0.1 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=59070
> DF PROTO=TCP SPT=8000 DPT=46026 WINDOW=0 RES=0x00 ACK RST URGP=0
>
> Here's the iptables config I used to get these results, which (to me)
> seems identical to the original nftables config:
> *nat
> :PREROUTING ACCEPT [4:252]
> :INPUT ACCEPT [4:252]
> :OUTPUT ACCEPT [1:76]
> :POSTROUTING ACCEPT [0:0]
> -A POSTROUTING -j MASQUERADE
> COMMIT
>
> *filter
> :INPUT DROP [0:0]
> :FORWARD DROP [0:0]
> :OUTPUT ACCEPT [0:0]
> -A INPUT -i lo -j LOG
> -A INPUT -i lo -j ACCEPT
> COMMIT
>
> Other the above, there's nothing interesting in my previous
> "production" iptables config that used to work. Unfortunately I can't
> abuse the production machine any longer by changing the firewall rules
> or jumping back and forth between iptables and nftables.
>
> Is there anyone with an older lts kernel that can check this? If so,
> don't forget to clear your modules, because the nftables nat modules
> clash with the iptable ones.
>
> Regression-hunting aside for a moment, it still looks like a bug to
> me, even if that bug shared with iptables. There is a simple
> workaround, just don't masquerade for lo, but still, this looks like
> something that should be fixed.
>
> Please let me know if there's anything else I can do to help.
>
> --
> Tom

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: nftables: masquerade sets wrong source address
  2016-12-13 13:28 nftables: masquerade sets wrong source address Tom Hacohen
  2016-12-13 14:32 ` /dev/rob0
  2016-12-14 22:28 ` Pablo Neira Ayuso
@ 2016-12-17 14:18 ` Liping Zhang
  2016-12-19  2:25   ` Liping Zhang
  2016-12-20 15:16   ` Tom Hacohen
  2 siblings, 2 replies; 15+ messages in thread
From: Liping Zhang @ 2016-12-17 14:18 UTC (permalink / raw)
  To: Tom Hacohen, Pablo Neira Ayuso
  Cc: netfilter, Netfilter Developer Mailing List

Hi Tom,

2016-12-13 21:28 GMT+08:00 Tom Hacohen <tom@stosb.com>:
> Hi,
>
> I've recently migrated from iptables (no modules loaded anymore) to
> nftables and came across a weird situation that looks like a bug to
> me.
>
> When using "masquerade" it always sets the ip address to that of one
> of my interfaces, and not per interface as one would expect.
>
> My config:
>
> flush ruleset
>
> table inet filter {
>     chain input {
>         type filter hook input priority 0; policy accept;
>
>         iifname lo log accept
>     }
>     chain output {
>         type filter hook output priority 0; policy accept;
>     }
> }
>
> table ip nat {
>     chain postrouting {
>         type nat hook postrouting priority 100;
>         masquerade
>     }
> }
>

According to the explanations in nftables wifi:
https://wiki.nftables.org/wiki-nftables/index.php/Performing_Network_Address_Translation_(NAT)

You should add the following nft rules(I agree this is tricky and
unfriendly for the end user):
# nft add chain nat prerouting { type nat hook prerouting priority 0 \; }

But unfortunately,  even if you add the above rule, you will still fail to
connect to a local server.

Now add another nft rules listed below, you can probably make everything
work fine:
# nft add chain nat output { type nat hook output priority 0 \; }

[ cc netfilter-dev group ]

For loopback connection, the request packets will traverse:
OUTPUT->POSTROUTING->PREROUTING->INPUT
and the source ip will be modified in nat POSTROUTING hook.

Meanwhile the reply packets will also traverse:
OUTPUT->POSTROUTING->PREROUTING->INPUT
and if nat OUTPUT hook exist, the destination ip will be modified
in it, and re-route will happen. Otherwise, the destination ip will
be modified at nat PREROUTING hook, and the dst entry will
be dropped. In such situation(i.e. nat OUTPUT doesn't exist),
we will try to do routing lookup and packets will be dropped
at ip_route_input_slow->martian_destination.

Furthermore, if ipt_rpfilter is configured, the reply packet maybe
dropped at there.

In iptables, nat output chain always exists, so there's no
such problem.

But I think that enforcing the user to add a nat output chain
in nftables is not a good idea, so probably we need a following
patch(I only list the ipv4 part):

diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
index f8aad03..5bc9b22 100644
--- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
@@ -344,8 +344,21 @@ nf_nat_ipv4_in(void *priv, struct sk_buff *skb,

        ret = nf_nat_ipv4_fn(priv, skb, state, do_chain);
        if (ret != NF_DROP && ret != NF_STOLEN &&
-           daddr != ip_hdr(skb)->daddr)
-               skb_dst_drop(skb);
+           daddr != ip_hdr(skb)->daddr) {
+               const struct rtable *rt = skb_rtable(skb);
+               int err;
+
+               if (rt) {
+                       if (rt->rt_flags & RTCF_LOCAL) {
+                               err = ip_route_me_harder(state->net, skb,
+                                                        RTN_UNSPEC);
+                               if (err < 0)
+                                       ret = NF_DROP_ERR(err);
+                       } else {
+                               skb_dst_drop(skb);
+                       }
+               }
+       }

        return ret;
 }

>
> With this, connections to localhost fail because the masquerade line
> sets the source IP to that of the wlp1s0 interface, and not of the lo
> interface.
>
> Here is output from the log:
> IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
> SRC=192.168.86.18 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64
> ID=64500 DF PROTO=TCP SPT=36844 DPT=8000 WINDOW=43690 RES=0x00 SYN
> URGP=0
>
> You can see how the source ip is wrong. This is from running "curl"
> trying to connect to a local http server on port 8000.
>
> Removing the masquerade line, or changing it to: "oifname wlp1s0
> masquerade" fixes it, but this is just a workaround that will fail in
> more complex situations.
>
> I would have loved to provide you with tracing information, but
> unfortunately I never got that to work for me.
>
> Tried with kernels: 4.8.12 and 4.4.35 on arch linux. Nft version is 0.6.
>
> Please let me know if there's any other info you'd like me to provide you with.
>
> Thanks,
> Tom.
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: nftables: masquerade sets wrong source address
  2016-12-17 14:18 ` Liping Zhang
@ 2016-12-19  2:25   ` Liping Zhang
  2016-12-20 15:16   ` Tom Hacohen
  1 sibling, 0 replies; 15+ messages in thread
From: Liping Zhang @ 2016-12-19  2:25 UTC (permalink / raw)
  To: Tom Hacohen, Pablo Neira Ayuso
  Cc: netfilter, Netfilter Developer Mailing List

2016-12-17 22:18 GMT+08:00 Liping Zhang <zlpnobody@gmail.com>:
>
> For loopback connection, the request packets will traverse:
> OUTPUT->POSTROUTING->PREROUTING->INPUT
> and the source ip will be modified in nat POSTROUTING hook.
>
> Meanwhile the reply packets will also traverse:
> OUTPUT->POSTROUTING->PREROUTING->INPUT
> and if nat OUTPUT hook exist, the destination ip will be modified
> in it, and re-route will happen. Otherwise, the destination ip will
> be modified at nat PREROUTING hook, and the dst entry will
> be dropped. In such situation(i.e. nat OUTPUT doesn't exist),
> we will try to do routing lookup and packets will be dropped
> at ip_route_input_slow->martian_destination.
>
> Furthermore, if ipt_rpfilter is configured, the reply packet maybe
> dropped at there.
>
> In iptables, nat output chain always exists, so there's no
> such problem.
>
> But I think that enforcing the user to add a nat output chain
> in nftables is not a good idea, so probably we need a following
> patch(I only list the ipv4 part):
>
> diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> index f8aad03..5bc9b22 100644
> --- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> +++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> @@ -344,8 +344,21 @@ nf_nat_ipv4_in(void *priv, struct sk_buff *skb,
>
>         ret = nf_nat_ipv4_fn(priv, skb, state, do_chain);
>         if (ret != NF_DROP && ret != NF_STOLEN &&
> -           daddr != ip_hdr(skb)->daddr)
> -               skb_dst_drop(skb);
> +           daddr != ip_hdr(skb)->daddr) {
> +               const struct rtable *rt = skb_rtable(skb);
> +               int err;
> +
> +               if (rt) {
> +                       if (rt->rt_flags & RTCF_LOCAL) {
> +                               err = ip_route_me_harder(state->net, skb,
> +                                                        RTN_UNSPEC);
> +                               if (err < 0)
> +                                       ret = NF_DROP_ERR(err);
> +                       } else {
> +                               skb_dst_drop(skb);
> +                       }
> +               }
> +       }
>
>         return ret;
>  }
>

Please ignore the above patch, it's incorrect that we use ip_route_output_key
for the incoming packets.

Maybe the below one will be better, but I'm not sure whether this will break
some special use cases or not:

diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
index f8aad03..d358670 100644
--- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
@@ -344,8 +344,13 @@ int nf_nat_icmp_reply_translation(struct sk_buff *skb,

        ret = nf_nat_ipv4_fn(priv, skb, state, do_chain);
        if (ret != NF_DROP && ret != NF_STOLEN &&
-           daddr != ip_hdr(skb)->daddr)
+           daddr != ip_hdr(skb)->daddr) {
+               if (state->in->flags & IFF_LOOPBACK ||
+                   skb->pkt_type == PACKET_LOOPBACK)
+                       return ret;
+
                skb_dst_drop(skb);
+       }

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: nftables: masquerade sets wrong source address
  2016-12-17 14:18 ` Liping Zhang
  2016-12-19  2:25   ` Liping Zhang
@ 2016-12-20 15:16   ` Tom Hacohen
  2016-12-21  2:39     ` Liping Zhang
  1 sibling, 1 reply; 15+ messages in thread
From: Tom Hacohen @ 2016-12-20 15:16 UTC (permalink / raw)
  To: Liping Zhang
  Cc: Pablo Neira Ayuso, netfilter, Netfilter Developer Mailing List

On Sat, Dec 17, 2016 at 2:18 PM, Liping Zhang <zlpnobody@gmail.com> wrote:
> According to the explanations in nftables wifi:
> https://wiki.nftables.org/wiki-nftables/index.php/Performing_Network_Address_Translation_(NAT)
>
> You should add the following nft rules(I agree this is tricky and
> unfriendly for the end user):
> # nft add chain nat prerouting { type nat hook prerouting priority 0 \; }
>
> But unfortunately,  even if you add the above rule, you will still fail to
> connect to a local server.
>

Correct, doesn't change anything.

> Now add another nft rules listed below, you can probably make everything
> work fine:
> # nft add chain nat output { type nat hook output priority 0 \; }
>

Haven't tried it. Why would it change things? Have you tried it?

> [ cc netfilter-dev group ]
>
> For loopback connection, the request packets will traverse:
> OUTPUT->POSTROUTING->PREROUTING->INPUT
> and the source ip will be modified in nat POSTROUTING hook.

The problem is that the IP incorrectly changes to the wrong one.

>
> Meanwhile the reply packets will also traverse:
> OUTPUT->POSTROUTING->PREROUTING->INPUT
> and if nat OUTPUT hook exist, the destination ip will be modified
> in it, and re-route will happen. Otherwise, the destination ip will
> be modified at nat PREROUTING hook, and the dst entry will
> be dropped. In such situation(i.e. nat OUTPUT doesn't exist),
> we will try to do routing lookup and packets will be dropped
> at ip_route_input_slow->martian_destination.

I think that all throughout this thread we've been analysing the
behaviour in the broken scenario
instead of just fixing it (which I see your latest patch may actually
do, and that's good).
It doesn't matter where the packet goes through after it's been
wrongly rewritten, the
problem is that it has.

>
> Furthermore, if ipt_rpfilter is configured, the reply packet maybe
> dropped at there.
>

It's off.


> In iptables, nat output chain always exists, so there's no
> such problem.
>
> But I think that enforcing the user to add a nat output chain
> in nftables is not a good idea, so probably we need a following
> patch(I only list the ipv4 part):

Interesting. Maybe that's why it continued to work after iptables has
already been loaded on the box.
As said above though, I believe the problem is the masquerade setting
the wrong ip, and not (only?)
the fact that my setup happens to work with iptables but doesn't with nftables.

Don't you agree?

Thanks,
Tom.

>
> diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> index f8aad03..5bc9b22 100644
> --- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> +++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> @@ -344,8 +344,21 @@ nf_nat_ipv4_in(void *priv, struct sk_buff *skb,
>
>         ret = nf_nat_ipv4_fn(priv, skb, state, do_chain);
>         if (ret != NF_DROP && ret != NF_STOLEN &&
> -           daddr != ip_hdr(skb)->daddr)
> -               skb_dst_drop(skb);
> +           daddr != ip_hdr(skb)->daddr) {
> +               const struct rtable *rt = skb_rtable(skb);
> +               int err;
> +
> +               if (rt) {
> +                       if (rt->rt_flags & RTCF_LOCAL) {
> +                               err = ip_route_me_harder(state->net, skb,
> +                                                        RTN_UNSPEC);
> +                               if (err < 0)
> +                                       ret = NF_DROP_ERR(err);
> +                       } else {
> +                               skb_dst_drop(skb);
> +                       }
> +               }
> +       }
>
>         return ret;
>  }
>
>>
>> With this, connections to localhost fail because the masquerade line
>> sets the source IP to that of the wlp1s0 interface, and not of the lo
>> interface.
>>
>> Here is output from the log:
>> IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
>> SRC=192.168.86.18 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64
>> ID=64500 DF PROTO=TCP SPT=36844 DPT=8000 WINDOW=43690 RES=0x00 SYN
>> URGP=0
>>
>> You can see how the source ip is wrong. This is from running "curl"
>> trying to connect to a local http server on port 8000.
>>
>> Removing the masquerade line, or changing it to: "oifname wlp1s0
>> masquerade" fixes it, but this is just a workaround that will fail in
>> more complex situations.
>>
>> I would have loved to provide you with tracing information, but
>> unfortunately I never got that to work for me.
>>
>> Tried with kernels: 4.8.12 and 4.4.35 on arch linux. Nft version is 0.6.
>>
>> Please let me know if there's any other info you'd like me to provide you with.
>>
>> Thanks,
>> Tom.
>> --
>> To unsubscribe from this list: send the line "unsubscribe netfilter" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: nftables: masquerade sets wrong source address
  2016-12-20 15:16   ` Tom Hacohen
@ 2016-12-21  2:39     ` Liping Zhang
  2016-12-22 10:26       ` Tom Hacohen
  0 siblings, 1 reply; 15+ messages in thread
From: Liping Zhang @ 2016-12-21  2:39 UTC (permalink / raw)
  To: Tom Hacohen
  Cc: Pablo Neira Ayuso, netfilter, Netfilter Developer Mailing List

2016-12-20 23:16 GMT+08:00 Tom Hacohen <tom@stosb.com>:
> On Sat, Dec 17, 2016 at 2:18 PM, Liping Zhang <zlpnobody@gmail.com> wrote:
>> According to the explanations in nftables wifi:
>> https://wiki.nftables.org/wiki-nftables/index.php/Performing_Network_Address_Translation_(NAT)
>>
>> You should add the following nft rules(I agree this is tricky and
>> unfriendly for the end user):
>> # nft add chain nat prerouting { type nat hook prerouting priority 0 \; }
>>
>> But unfortunately,  even if you add the above rule, you will still fail to
>> connect to a local server.
>>
>
> Correct, doesn't change anything.
>
>> Now add another nft rules listed below, you can probably make everything
>> work fine:
>> # nft add chain nat output { type nat hook output priority 0 \; }
>>
>
> Haven't tried it. Why would it change things? Have you tried it?

I tried it and it did take effect. But my test scenario may be
different with yours.
So can you try it?

[...]
>> In iptables, nat output chain always exists, so there's no
>> such problem.
>>
>> But I think that enforcing the user to add a nat output chain
>> in nftables is not a good idea, so probably we need a following
>> patch(I only list the ipv4 part):
>
> Interesting. Maybe that's why it continued to work after iptables has
> already been loaded on the box.
> As said above though, I believe the problem is the masquerade setting
> the wrong ip, and not (only?)
> the fact that my setup happens to work with iptables but doesn't with nftables.

As I analyzed, the main difference is that nat OUTPUT hook always
exist in iptables, so the reply packet's destination ip address will be modified
in OUTPUT hook. While in nftables, without nft output chain, the reply packet's
destination ip address will be modified in PREROUTING hook. Then we try to
do routing lookup, and the packets will be dropped because the incoming packets'
destination ip address is 127.0.0.1

But I think that enforcing the user  to add the following nft rule is
not friendly:
# nft add chain nat output { type nat hook output priority 0 \; }

This will become more tricky. Do you agree with this?

So I send the related patch to try to improve it.

>
> Don't you agree?
>
> Thanks,
> Tom.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: nftables: masquerade sets wrong source address
  2016-12-21  2:39     ` Liping Zhang
@ 2016-12-22 10:26       ` Tom Hacohen
  2016-12-22 10:34         ` Florian Westphal
  0 siblings, 1 reply; 15+ messages in thread
From: Tom Hacohen @ 2016-12-22 10:26 UTC (permalink / raw)
  To: Liping Zhang
  Cc: Pablo Neira Ayuso, netfilter, Netfilter Developer Mailing List

On Wed, Dec 21, 2016 at 2:39 AM, Liping Zhang <zlpnobody@gmail.com> wrote:
> 2016-12-20 23:16 GMT+08:00 Tom Hacohen <tom@stosb.com>:
>> On Sat, Dec 17, 2016 at 2:18 PM, Liping Zhang <zlpnobody@gmail.com> wrote:
>>> According to the explanations in nftables wifi:
>>> https://wiki.nftables.org/wiki-nftables/index.php/Performing_Network_Address_Translation_(NAT)
>>>
>>> You should add the following nft rules(I agree this is tricky and
>>> unfriendly for the end user):
>>> # nft add chain nat prerouting { type nat hook prerouting priority 0 \; }
>>>
>>> But unfortunately,  even if you add the above rule, you will still fail to
>>> connect to a local server.
>>>
>>
>> Correct, doesn't change anything.
>>
>>> Now add another nft rules listed below, you can probably make everything
>>> work fine:
>>> # nft add chain nat output { type nat hook output priority 0 \; }
>>>
>>
>> Haven't tried it. Why would it change things? Have you tried it?
>
> I tried it and it did take effect. But my test scenario may be
> different with yours.
> So can you try it?

Tried it, and it works.

>
> [...]
>>> In iptables, nat output chain always exists, so there's no
>>> such problem.
>>>
>>> But I think that enforcing the user to add a nat output chain
>>> in nftables is not a good idea, so probably we need a following
>>> patch(I only list the ipv4 part):
>>
>> Interesting. Maybe that's why it continued to work after iptables has
>> already been loaded on the box.
>> As said above though, I believe the problem is the masquerade setting
>> the wrong ip, and not (only?)
>> the fact that my setup happens to work with iptables but doesn't with nftables.
>
> As I analyzed, the main difference is that nat OUTPUT hook always
> exist in iptables, so the reply packet's destination ip address will be modified
> in OUTPUT hook. While in nftables, without nft output chain, the reply packet's
> destination ip address will be modified in PREROUTING hook. Then we try to
> do routing lookup, and the packets will be dropped because the incoming packets'
> destination ip address is 127.0.0.1
>
> But I think that enforcing the user  to add the following nft rule is
> not friendly:
> # nft add chain nat output { type nat hook output priority 0 \; }
>
> This will become more tricky. Do you agree with this?
>
> So I send the related patch to try to improve it.

It's definitely not user friendly to have to add it, especially since
I expected having a chain with no rules to be a noop.
I don't know how nftables works well enough to comment on one design
choice or another, so I can't comment if this needs to be fixed, but
this definitely feels inconsistent and buggy.

I'm sorry for repeating myself, however I'd like to stress out again,
that while your workaround fixes an inconsistency between iptables and
nftables, the scenario itself is caused by the buggy behaviour of
masquerade with "lo", and that needs to be fixed too. The workaround
above, and any fixes to that issue will only fix the dropping of the
packets, but the wrong rewrite will still be there.

Please let me know if there's anything else you'd like me to test.

--
Tom.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: nftables: masquerade sets wrong source address
  2016-12-22 10:26       ` Tom Hacohen
@ 2016-12-22 10:34         ` Florian Westphal
       [not found]           ` <CAEvi_o8wV5GNk8JvSg96kP3WdDsVsf8PubRe=K1ZiD2+nBaYTg@mail.gmail.com>
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2016-12-22 10:34 UTC (permalink / raw)
  To: Tom Hacohen
  Cc: Liping Zhang, Pablo Neira Ayuso, netfilter,
	Netfilter Developer Mailing List

Tom Hacohen <tom@stosb.com> wrote:
> I'm sorry for repeating myself, however I'd like to stress out again,
> that while your workaround fixes an inconsistency between iptables and
> nftables, the scenario itself is caused by the buggy behaviour of
> masquerade with "lo", and that needs to be fixed too. The workaround
> above, and any fixes to that issue will only fix the dropping of the
> packets, but the wrong rewrite will still be there.

The 'wrong rewrite' also occurs with iptables.

It doesn't cause connectivity issues because in iptables the nat table
always registers the output hook.

(I agree that nft masquerade should not cause these connectivity issues,
 but I think proper ruleset fix is to use meta iif to restrict masq to
 the correct interface(s)).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: nftables: masquerade sets wrong source address
       [not found]           ` <CAEvi_o8wV5GNk8JvSg96kP3WdDsVsf8PubRe=K1ZiD2+nBaYTg@mail.gmail.com>
@ 2016-12-22 22:40             ` Tom Hacohen
  0 siblings, 0 replies; 15+ messages in thread
From: Tom Hacohen @ 2016-12-22 22:40 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Pablo Neira Ayuso, Liping Zhang, netfilter,
	Netfilter Developer Mailing List

On Thu, Dec 22, 2016 at 4:56 PM, Tom Hacohen <tom@stosb.com> wrote:
>
>
> On 22 Dec 2016 12:35, "Florian Westphal" <fw@strlen.de> wrote:
>
> Tom Hacohen <tom@stosb.com> wrote:
>> I'm sorry for repeating myself, however I'd like to stress out again,
>> that while your workaround fixes an inconsistency between iptables and
>> nftables, the scenario itself is caused by the buggy behaviour of
>> masquerade with "lo", and that needs to be fixed too. The workaround
>> above, and any fixes to that issue will only fix the dropping of the
>> packets, but the wrong rewrite will still be there.
>
> The 'wrong rewrite' also occurs with iptables.
>
> It doesn't cause connectivity issues because in iptables the nat table
> always registers the output hook.
>
> (I agree that nft masquerade should not cause these connectivity issues,
>  but I think proper ruleset fix is to use meta iif to restrict masq to
>  the correct interface(s)).
>
>
> Yes, iptables so misbehaves here. I know you agree about not causing the
> connectivity issues, but don't you agree that the wrong rewrite shouldn't
> happen? For both iptables and nftables?
>
> I already use oif to restrict the masquerade, I'm not trying to solve it for
> myself, because I already have a working workaround. I'm trying to help
> reporting and resolving a bug.
>
> --
> Tom

Resending as plain text.

Yes, iptables so misbehaves here. I know you agree about not causing
the connectivity issues, but don't you agree that the wrong rewrite
shouldn't happen? For both iptables and nftables?

I already use oif to restrict the masquerade, I'm not trying to solve
it for myself, because I already have a working workaround. I'm trying
to help reporting and resolving a bug.

--
Tom

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-12-22 22:40 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-13 13:28 nftables: masquerade sets wrong source address Tom Hacohen
2016-12-13 14:32 ` /dev/rob0
2016-12-13 14:53   ` Tom Hacohen
2016-12-14 22:28 ` Pablo Neira Ayuso
2016-12-15 11:34   ` Tom Hacohen
2016-12-15 21:29     ` Pablo Neira Ayuso
2016-12-15 22:47       ` Tom Hacohen
2016-12-16  0:04         ` Tom Hacohen
2016-12-17 14:18 ` Liping Zhang
2016-12-19  2:25   ` Liping Zhang
2016-12-20 15:16   ` Tom Hacohen
2016-12-21  2:39     ` Liping Zhang
2016-12-22 10:26       ` Tom Hacohen
2016-12-22 10:34         ` Florian Westphal
     [not found]           ` <CAEvi_o8wV5GNk8JvSg96kP3WdDsVsf8PubRe=K1ZiD2+nBaYTg@mail.gmail.com>
2016-12-22 22:40             ` Tom Hacohen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.