All of lore.kernel.org
 help / color / mirror / Atom feed
* GRE-NAT broken
@ 2018-01-24 19:54 Matthias Walther
  2018-01-25  0:34 ` Grant Taylor
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Matthias Walther @ 2018-01-24 19:54 UTC (permalink / raw)
  To: lartc

Hello,

I used to nat GRE-tunnels into a kvm machine. That used to work
perfectly, till it stopped working in early January.

I'm not really sure, what caused this malfunction. I tried different
kernel versions, 4.4.113, 4.10.0-35, 4.10.0-37, 4.14. All on ubuntu 16.04.3.

Normal destination based nat rules, like ssh tcp 22 e. g., work
perfectly. That gre nat rule is in place:

-A PREROUTING -i eth0 -p gre -j DNAT --to-destination 192.168.10.62

And the needed kernel modules are loaded:

root# lsmod|grep gre
61:nf_conntrack_proto_gre    16384  0
62:nf_nat_proto_gre       16384  0
63:nf_nat                 24576  4
nf_nat_proto_gre,nf_nat_ipv4,xt_nat,nf_nat_masquerade_ipv4
64:nf_conntrack          106496  6
nf_conntrack_proto_gre,nf_nat,nf_nat_ipv4,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_ipv4

Still some packes are just not correctly natted. The configuration
should be correct, as it used to work like this.

One or two tunnels usually work. For the others, the gre packages are
just not natted but dropped. First example, which shows the expected
behavior:

root# tcpdump -ni any host 185.66.195.1 <http://185.66.195.1> and \(
host 176.9.38.150 or host 192.168.10.62 <http://192.168.10.62> \) and
proto 47 and ip[33]=0x01 and \( ip[36:4]=0x644007BA or
ip[40:4]=0x644007BA \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size
262144 bytes
04:06:41.322914 IP 192.168.10.62 <http://192.168.10.62> > 185.66.195.1
<http://185.66.195.1>: GREv0, length 88: IP 185.66.194.49
<http://185.66.194.49> > 100.64.7.186: ICMP echo request, id 26639, seq
1, length 64
04:06:41.322922 IP 192.168.10.62 <http://192.168.10.62> > 185.66.195.1
<http://185.66.195.1>: GREv0, length 88: IP 185.66.194.49
<http://185.66.194.49> > 100.64.7.186: ICMP echo request, id 26639, seq
1, length 64
04:06:41.322928 IP 176.9.38.150 > 185.66.195.1 <http://185.66.195.1>:
GREv0, length 88: IP 185.66.194.49 <http://185.66.194.49> >
100.64.7.186: ICMP echo request, id 26639, seq 1, length 64
04:06:41.341906 IP 185.66.195.1 <http://185.66.195.1> > 176.9.38.150:
GREv0, length 88: IP 100.64.7.186 > 185.66.194.49
<http://185.66.194.49>: ICMP echo reply, id 26639, seq 1, length 64
04:06:41.341915 IP 185.66.195.1 <http://185.66.195.1> > 192.168.10.62
<http://192.168.10.62>: GREv0, length 88: IP 100.64.7.186 >
185.66.194.49 <http://185.66.194.49>: ICMP echo reply, id 26639, seq 1,
length 64
04:06:41.341918 IP 185.66.195.1 <http://185.66.195.1> > 192.168.10.62
<http://192.168.10.62>: GREv0, length 88: IP 100.64.7.186 >
185.66.194.49 <http://185.66.194.49>: ICMP echo reply, id 26639, seq 1,
length 64

This^^ works as it should. The packet goes through the bridge interface,
then the bridge though which all natted vms are connected, then it is
translated and then through the eth0 interface of the hypervisor. And
the reply packages follows in reverse direction. The nat works, the
address is translated. Not so in the second case:

root@# tcpdump -ni any host 185.66.195.0 and \( host 176.9.38.150 or
host 192.168.10.62 <http://192.168.10.62> \) and proto 47 and
ip[33]=0x01 and \( ip[36:4]=0x644007B4 or ip[40:4]=0x644007B4 \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size
262144 bytes
03:58:01.972551 IP 192.168.10.62 <http://192.168.10.62> > 185.66.195.0:
GREv0, length 88: IP 185.66.194.49 <http://185.66.194.49> >
100.64.7.180: ICMP echo request, id 25043, seq 1, length 64
03:58:01.972554 IP 192.168.10.62 <http://192.168.10.62> > 185.66.195.0:
GREv0, length 88: IP 185.66.194.49 <http://185.66.194.49> >
100.64.7.180: ICMP echo request, id 25043, seq 1, length 64
03:58:03.001013 IP 192.168.10.62 <http://192.168.10.62> > 185.66.195.0:
GREv0, length 88: IP 185.66.194.49 <http://185.66.194.49> >
100.64.7.180: ICMP echo request, id 25043, seq 2, length 64
03:58:03.001021 IP 192.168.10.62 <http://192.168.10.62> > 185.66.195.0:
GREv0, length 88: IP 185.66.194.49 <http://185.66.194.49> >
100.64.7.180: ICMP echo request, id 25043, seq 2, length 64

tcpdump catches the outgoing package. But instead of being translated,
it's dropped.

Any ideas, how I could analyse this? All tested kernels showed the exact
same behavior. It's as if only one gre nat connection was possible.

Regards,
Matthias


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GRE-NAT broken
  2018-01-24 19:54 GRE-NAT broken Matthias Walther
@ 2018-01-25  0:34 ` Grant Taylor
  2018-01-25  5:34 ` walther.xyz
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Grant Taylor @ 2018-01-25  0:34 UTC (permalink / raw)
  To: lartc

[-- Attachment #1: Type: text/plain, Size: 6445 bytes --]

On 01/24/2018 12:54 PM, Matthias Walther wrote:
> Hello,

Hi,

> I used to nat GRE-tunnels into a kvm machine. That used to work perfectly, 
> till it stopped working in early January.

Okay.  :-/

Can I get a high level overview of your network topology?  You've 
mentioned bridges, eth0, and VMs.  -  I figure asking is better than 
speculating.

> I'm not really sure, what caused this malfunction. I tried different 
> kernel versions, 4.4.113, 4.10.0-35, 4.10.0-37, 4.14. All on ubuntu 
> 16.04.3.

Do you know specifically when things stopped working as desired?  Have 
you tried the kernel that you were running before that?  Are you aware 
of anything that changed on the system about that time?  I.e. updates? 
Kernel versions?

> Normal destination based nat rules, like ssh tcp 22 e. g., work 
> perfectly. That gre nat rule is in place:
> 
> -A PREROUTING -i eth0 -p gre -j DNAT --to-destination 192.168.10.62
> 
> And the needed kernel modules are loaded:
> 
> root# lsmod|grep gre
> 61:nf_conntrack_proto_gre    16384  0
> 62:nf_nat_proto_gre       16384  0
> 63:nf_nat                 24576  4 
> nf_nat_proto_gre,nf_nat_ipv4,xt_nat,nf_nat_masquerade_ipv4
> 64:nf_conntrack          106496  6 
> nf_conntrack_proto_gre,nf_nat,nf_nat_ipv4,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_ipv4
> 
> Still some packes are just not correctly natted. The configuration should 
> be correct, as it used to work like this.

Please provide a high level packet flow as you think that it should be. 
I.e. GRE encaped comes in eth0 … does something … gets DNATed to $IP … 
goes out somewhere.

> One or two tunnels usually work. For the others, the gre packages are just 
> not natted but dropped. First example, which shows the expected behavior:

Are you saying that one or two tunnels at a time work?  As if it may be 
a load / state cache related problem?  Or that some specific tunnels 
seem to work.

Do the tunnels that seem to work do so all the time?

> root# tcpdump -ni any host 185.66.195.1 and \( host 176.9.38.150 or host 
> 192.168.10.62 \) and proto 47 and ip[33]=0x01 and \( ip[36:4]==0x644007BA 
> or ip[40:4]==0x644007BA \)
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on any, link-type LINUX_SLL (Linux cooked), capture size 
> 262144 bytes
> 04:06:41.322914 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: 
> IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 26639, seq 1, 
> length 64
> 04:06:41.322922 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: 
> IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 26639, seq 1, 
> length 64
> 04:06:41.322928 IP 176.9.38.150 > 185.66.195.1: GREv0, length 88: 
> IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 26639, seq 1, 
> length 64
> 04:06:41.341906 IP 185.66.195.1 > 176.9.38.150: GREv0, length 88: IP 
> 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 26639, seq 1, length 64
> 04:06:41.341915 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP 
> 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 26639, seq 1, length 64
> 04:06:41.341918 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP 
> 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 26639, seq 1, length 64

Would you please re-capture, both working and non-working, but specific 
to one interface?  I.e. -i eth0 and -i $outGoingInterface as separate 
captures?  (Or if there is a way to get tcpdump to show the interface in 
the textual output.)

> This^^ works as it should. The packet goes through the bridge interface, 
> then the bridge though which all natted vms are connected, then it is 
> translated and then through the eth0 interface of the hypervisor. And 
> the reply packages follows in reverse direction. The nat works, the 
> address is translated. Not so in the second case:

What type of bridge are you using?  Standard Linux bridging, ala brctl 
and or ip?  Or are you using Open vSwitch, or something else?

Can we see a config dump of the bridge?

I wonder if a sysctl (/proc) setting got changed and now IPTables is 
trying to filter bridged traffic.  I think it's 
/proc/sys/net/bridge/bridge-nf-call-iptables.  (At least that's what I'm 
seeing with a quick Google search.)

Can we see the output of iptables-save?

> root@# tcpdump -ni any host 185.66.195.0 and \( host 176.9.38.150 or host 
> 192.168.10.62 \) and proto 47 and ip[33]=0x01 and \( ip[36:4]==0x644007B4 
> or ip[40:4]==0x644007B4 \)
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on any, link-type LINUX_SLL (Linux cooked), capture size 
> 262144 bytes
> 03:58:01.972551 IP 192.168.10.62 > 185.66.195.0: GREv0, length 88: 
> IP 185.66.194.49 > 100.64.7.180: ICMP echo request, id 25043, seq 1, 
> length 64
> 03:58:01.972554 IP 192.168.10.62 > 185.66.195.0: GREv0, length 88: 
> IP 185.66.194.49 > 100.64.7.180: ICMP echo request, id 25043, seq 1, 
> length 64
> 03:58:03.001013 IP 192.168.10.62 > 185.66.195.0: GREv0, length 88: 
> IP 185.66.194.49 > 100.64.7.180: ICMP echo request, id 25043, seq 2, 
> length 64
> 03:58:03.001021 IP 192.168.10.62 > 185.66.195.0: GREv0, length 88: 
> IP 185.66.194.49 > 100.64.7.180: ICMP echo request, id 25043, seq 2, 
> length 64
> 
> tcpdump catches the outgoing package. But instead of being translated, 
> it's dropped.

We can't tell from the above output if it's traffic coming into the 
outside interface (eth0?) or traffic leaving the inside interface 
(connected to the bridge?).

What hypervisor are you using?  KVM, VirtualBox, something else?  How do 
the VMs connect to the bridge?

Also, if you're bridging, why are you DNATing packets?  -  Or is your 
bridge internal only and you're DNATing between the outside (eth0) and 
the internal (only) bridge where the VMs are connected?

It sort of looks like you may have a one to one mapping of outside IPs 
to inside IPs.  -  Which makes me ask the question why you're DNATing in 
the first place.  Or rather why you aren't bridging the VMs to the 
outside and running the globally routed IP directly in the VMs.

> Any ideas, how I could analyse this? All tested kernels showed the exact 
> same behavior. It's as if only one gre nat connection was possible.

I need more details to be able to start poking further.



-- 
Grant. . . .
unix || die


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3982 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GRE-NAT broken
  2018-01-24 19:54 GRE-NAT broken Matthias Walther
  2018-01-25  0:34 ` Grant Taylor
@ 2018-01-25  5:34 ` walther.xyz
  2018-01-25  7:47 ` walther.xyz
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: walther.xyz @ 2018-01-25  5:34 UTC (permalink / raw)
  To: lartc

I looked into this a little further: From what I've found in the source
code GRE NAT has never been properly implemented.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/net/ipv4/netfilter/nf_nat_proto_gre.c?id=HEAD

|switch (greh->flags & GRE_VERSION) { case GRE_VERSION_0: /* We do not
currently NAT any GREv0 packets. * Try to behave like
"nf_nat_proto_unknown" */ break; case GRE_VERSION_1: pr_debug("call_id
-> 0x%04x\n", ntohs(tuple->dst.u.gre.key)); pgreh->call_id tuple->dst.u.gre.key; break; default: pr_debug("can't nat unknown GRE
version\n"); return false; } How did this work before? Or am I looking
at the wrong place? Regards, Matthias |



Am 24.01.2018 um 20:54 schrieb Matthias Walther:
> Hello,
>
> I used to nat GRE-tunnels into a kvm machine. That used to work
> perfectly, till it stopped working in early January.
>
> I'm not really sure, what caused this malfunction. I tried different
> kernel versions, 4.4.113, 4.10.0-35, 4.10.0-37, 4.14. All on ubuntu 16.04.3.
>
> Normal destination based nat rules, like ssh tcp 22 e. g., work
> perfectly. That gre nat rule is in place:
>
> -A PREROUTING -i eth0 -p gre -j DNAT --to-destination 192.168.10.62
>
> And the needed kernel modules are loaded:
>
> root# lsmod|grep gre
> 61:nf_conntrack_proto_gre    16384  0
> 62:nf_nat_proto_gre       16384  0
> 63:nf_nat                 24576  4
> nf_nat_proto_gre,nf_nat_ipv4,xt_nat,nf_nat_masquerade_ipv4
> 64:nf_conntrack          106496  6
> nf_conntrack_proto_gre,nf_nat,nf_nat_ipv4,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_ipv4
>
> Still some packes are just not correctly natted. The configuration
> should be correct, as it used to work like this.
>
> One or two tunnels usually work. For the others, the gre packages are
> just not natted but dropped. First example, which shows the expected
> behavior:
>
> root# tcpdump -ni any host 185.66.195.1 <http://185.66.195.1> and \(
> host 176.9.38.150 or host 192.168.10.62 <http://192.168.10.62> \) and
> proto 47 and ip[33]=0x01 and \( ip[36:4]=0x644007BA or
> ip[40:4]=0x644007BA \)
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on any, link-type LINUX_SLL (Linux cooked), capture size
> 262144 bytes
> 04:06:41.322914 IP 192.168.10.62 <http://192.168.10.62> > 185.66.195.1
> <http://185.66.195.1>: GREv0, length 88: IP 185.66.194.49
> <http://185.66.194.49> > 100.64.7.186: ICMP echo request, id 26639, seq
> 1, length 64
> 04:06:41.322922 IP 192.168.10.62 <http://192.168.10.62> > 185.66.195.1
> <http://185.66.195.1>: GREv0, length 88: IP 185.66.194.49
> <http://185.66.194.49> > 100.64.7.186: ICMP echo request, id 26639, seq
> 1, length 64
> 04:06:41.322928 IP 176.9.38.150 > 185.66.195.1 <http://185.66.195.1>:
> GREv0, length 88: IP 185.66.194.49 <http://185.66.194.49> >
> 100.64.7.186: ICMP echo request, id 26639, seq 1, length 64
> 04:06:41.341906 IP 185.66.195.1 <http://185.66.195.1> > 176.9.38.150:
> GREv0, length 88: IP 100.64.7.186 > 185.66.194.49
> <http://185.66.194.49>: ICMP echo reply, id 26639, seq 1, length 64
> 04:06:41.341915 IP 185.66.195.1 <http://185.66.195.1> > 192.168.10.62
> <http://192.168.10.62>: GREv0, length 88: IP 100.64.7.186 >
> 185.66.194.49 <http://185.66.194.49>: ICMP echo reply, id 26639, seq 1,
> length 64
> 04:06:41.341918 IP 185.66.195.1 <http://185.66.195.1> > 192.168.10.62
> <http://192.168.10.62>: GREv0, length 88: IP 100.64.7.186 >
> 185.66.194.49 <http://185.66.194.49>: ICMP echo reply, id 26639, seq 1,
> length 64
>
> This^^ works as it should. The packet goes through the bridge interface,
> then the bridge though which all natted vms are connected, then it is
> translated and then through the eth0 interface of the hypervisor. And
> the reply packages follows in reverse direction. The nat works, the
> address is translated. Not so in the second case:
>
> root@# tcpdump -ni any host 185.66.195.0 and \( host 176.9.38.150 or
> host 192.168.10.62 <http://192.168.10.62> \) and proto 47 and
> ip[33]=0x01 and \( ip[36:4]=0x644007B4 or ip[40:4]=0x644007B4 \)
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on any, link-type LINUX_SLL (Linux cooked), capture size
> 262144 bytes
> 03:58:01.972551 IP 192.168.10.62 <http://192.168.10.62> > 185.66.195.0:
> GREv0, length 88: IP 185.66.194.49 <http://185.66.194.49> >
> 100.64.7.180: ICMP echo request, id 25043, seq 1, length 64
> 03:58:01.972554 IP 192.168.10.62 <http://192.168.10.62> > 185.66.195.0:
> GREv0, length 88: IP 185.66.194.49 <http://185.66.194.49> >
> 100.64.7.180: ICMP echo request, id 25043, seq 1, length 64
> 03:58:03.001013 IP 192.168.10.62 <http://192.168.10.62> > 185.66.195.0:
> GREv0, length 88: IP 185.66.194.49 <http://185.66.194.49> >
> 100.64.7.180: ICMP echo request, id 25043, seq 2, length 64
> 03:58:03.001021 IP 192.168.10.62 <http://192.168.10.62> > 185.66.195.0:
> GREv0, length 88: IP 185.66.194.49 <http://185.66.194.49> >
> 100.64.7.180: ICMP echo request, id 25043, seq 2, length 64
>
> tcpdump catches the outgoing package. But instead of being translated,
> it's dropped.
>
> Any ideas, how I could analyse this? All tested kernels showed the exact
> same behavior. It's as if only one gre nat connection was possible.
>
> Regards,
> Matthias
>
> --
> To unsubscribe from this list: send the line "unsubscribe lartc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GRE-NAT broken
  2018-01-24 19:54 GRE-NAT broken Matthias Walther
  2018-01-25  0:34 ` Grant Taylor
  2018-01-25  5:34 ` walther.xyz
@ 2018-01-25  7:47 ` walther.xyz
  2018-01-25  7:57 ` Florian Westphal
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: walther.xyz @ 2018-01-25  7:47 UTC (permalink / raw)
  To: lartc

Hello Grant,

thanks for your reply. I'll respond to your questions inline.

Am 25.01.2018 um 01:34 schrieb Grant Taylor:
> On 01/24/2018 12:54 PM, Matthias Walther wrote:
>> Hello,
>
> Hi,
>
>> I used to nat GRE-tunnels into a kvm machine. That used to work
>> perfectly, till it stopped working in early January.
>
> Okay.  :-/
>
> Can I get a high level overview of your network topology?  You've
> mentioned bridges, eth0, and VMs.  -  I figure asking is better than
> speculating.
We're running gateways for an open wifi project here in Germany, it's
called Freifunk (freifunk.net), it's non commercial. We connect those
gateways with our AS exit routers via GRE tunnels, GRE over IPv4.

To save money and ressources, we virtualize the hardware with KVM.
Usually we have an extra IPv4 address for each virtual machine. In two
experimental cases I tried to spare the ipv4 address and nat the gre
tunnels from the hypervisor's public ip address and only give the
virtual machine a private ip address (192.168....). Standard destination
nat with the iptables rule as mentioned.

The bridges are created with brctl and the topology in this paticular
case looks as following:

root@unimatrixzero ~ # brctl show
bridge name    bridge id        STP enabled    interfaces
br0        8000.fe540028664d    no        vnet2
                            vnet3
                            vnet5
                            vnet6
virbr1        8000.5254007bec03    yes        virbr1-nic
                            vnet4

The hoster is Hetzner, a German budget hosting company. The do not block
GRE tunnels. GRE to public ip addresses just work fine. As this
hypervisor contains both virtual machines with public ip addresses and
private (192.168...) ip addresses, we have two bridges. Depending on the
configuration, the virtual machines are in br0 (public ip addresses) and
the ones with private addresses in virbr1.


>> I'm not really sure, what caused this malfunction. I tried different
>> kernel versions, 4.4.113, 4.10.0-35, 4.10.0-37, 4.14. All on ubuntu
>> 16.04.3.
>
> Do you know specifically when things stopped working as desired?  Have
> you tried the kernel that you were running before that?  Are you aware
> of anything that changed on the system about that time?  I.e. updates?
> Kernel versions?
Unfortunately not. We're running unattended upgrades on the machines.
It's a free time project and we don't have the man power to updated all
our hosts manually. I'm not even sure, weather the kernel was updated or
not. I tried the oldest kernel still available on the machine  and a
much older kernel 4.4. Ubuntu automatically deinstalls unneeded, older
kernels. Maybe a security patch, that got applied to 4.4, aswell as 4.10
and 4.13 and 4.14 destroyed the case. Maybe I should try an older 4.4
kernel, not revision 113.

But I can say for sure, that we had two experimental machines running
this configuration with natted gre tunnels and both stopped working
around the same time after this had worked stabily for several months.
>
>> Normal destination based nat rules, like ssh tcp 22 e. g., work
>> perfectly. That gre nat rule is in place:
>>
>> -A PREROUTING -i eth0 -p gre -j DNAT --to-destination 192.168.10.62
>>
>> And the needed kernel modules are loaded:
>>
>> root# lsmod|grep gre
>> 61:nf_conntrack_proto_gre    16384  0
>> 62:nf_nat_proto_gre       16384  0
>> 63:nf_nat                 24576  4
>> nf_nat_proto_gre,nf_nat_ipv4,xt_nat,nf_nat_masquerade_ipv4
>> 64:nf_conntrack          106496  6
>> nf_conntrack_proto_gre,nf_nat,nf_nat_ipv4,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_ipv4
>>
>> Still some packes are just not correctly natted. The configuration
>> should be correct, as it used to work like this.
>
> Please provide a high level packet flow as you think that it should
> be. I.e. GRE encaped comes in eth0 … does something … gets DNATed to
> $IP … goes out somewhere.
I was pinging from the inside of the VM into the GRE tunnel. So the
packet flow is as follows:

ICMP packet goes into virtual GRE interface within the virtual machine.
Then it is encasuplated with the private ip address as source and send
out through eth0 of the virtual machine.

The packet is now in the network stack of the hypervisor. Comming in
through vnet4, going through virbr1-bridge. Then it should be natted, so
the private source address of the gre packet should be replaced by the
public ip address of the hypervisor. Then the natted packet is sent out
to the other end of the gre tunnel somewhere on the internet. The last
step, the nat and the sending though the physical interface is what
doesn't happen.
>> One or two tunnels usually work. For the others, the gre packages are
>> just not natted but dropped. First example, which shows the expected
>> behavior:
>
> Are you saying that one or two tunnels at a time work?  As if it may
> be a load / state cache related problem?  Or that some specific
> tunnels seem to work.
>
> Do the tunnels that seem to work do so all the time?
Funnily, after each reboot a different tunnel seemed to work. All
tunnels do the same, they're just going to different backbone upstream
severs for redundancy.

That's why we're not sure when the problem first occured. Due to the
fact that everything seemed to work fine, because one tunnel is enough,
the problem hadn't been discovered directly. Now it stopped working
completly.
>
>> root# tcpdump -ni any host 185.66.195.1 and \( host 176.9.38.150 or
>> host 192.168.10.62 \) and proto 47 and ip[33]=0x01 and \(
>> ip[36:4]==0x644007BA or ip[40:4]==0x644007BA \)
>> tcpdump: verbose output suppressed, use -v or -vv for full protocol
>> decode
>> listening on any, link-type LINUX_SLL (Linux cooked), capture size
>> 262144 bytes
>> 04:06:41.322914 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP
>> 185.66.194.49 > 100.64.7.186: ICMP echo request, id 26639, seq 1,
>> length 64
>> 04:06:41.322922 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP
>> 185.66.194.49 > 100.64.7.186: ICMP echo request, id 26639, seq 1,
>> length 64
>> 04:06:41.322928 IP 176.9.38.150 > 185.66.195.1: GREv0, length 88: IP
>> 185.66.194.49 > 100.64.7.186: ICMP echo request, id 26639, seq 1,
>> length 64
>> 04:06:41.341906 IP 185.66.195.1 > 176.9.38.150: GREv0, length 88: IP
>> 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 26639, seq 1,
>> length 64
>> 04:06:41.341915 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP
>> 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 26639, seq 1,
>> length 64
>> 04:06:41.341918 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP
>> 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 26639, seq 1,
>> length 64
>
> Would you please re-capture, both working and non-working, but
> specific to one interface?  I.e. -i eth0 and -i $outGoingInterface as
> separate captures?  (Or if there is a way to get tcpdump to show the
> interface in the textual output.)
Unfortunatley, I can't provide a working example as since I tested all
those different kernel version, nothing works anymore. Not a single
tunnel, even though I went back to 4.13.0-31 with which I had captured
the packets yesterday.

(As I rebooted again, vnet4 is now vnet0.) See here the three steps
seperatly:

root@unimatrixzero ~ #  tcpdump -ni vnet0 host 185.66.195.1 and \( host
176.9.38.150 or host 192.168.10.62 \) and proto 47 and ip[33]=0x01 and
\( ip[36:4]==0x644007BA or ip[40:4]==0x644007BA \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vnet0, link-type EN10MB (Ethernet), capture size 262144 bytes
08:29:15.127873 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP
185.66.194.49 > 100.64.7.186: ICMP echo request, id 18763, seq 59, length 64
08:29:16.151856 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP
185.66.194.49 > 100.64.7.186: ICMP echo request, id 18763, seq 60, length 64
08:29:17.175800 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP
185.66.194.49 > 100.64.7.186: ICMP echo request, id 18763, seq 61, length 64
08:29:18.199780 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP
185.66.194.49 > 100.64.7.186: ICMP echo request, id 18763, seq 62, length 64
^C
4 packets captured
4 packets received by filter
0 packets dropped by kernel
root@unimatrixzero ~ #  tcpdump -ni virbr1 host 185.66.195.1 and \( host
176.9.38.150 or host 192.168.10.62 \) and proto 47 and ip[33]=0x01 and
\( ip[36:4]==0x644007BA or ip[40:4]==0x644007BA \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on virbr1, link-type EN10MB (Ethernet), capture size 262144 bytes
08:29:33.495592 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP
185.66.194.49 > 100.64.7.186: ICMP echo request, id 18763, seq 77, length 64
08:29:34.519567 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP
185.66.194.49 > 100.64.7.186: ICMP echo request, id 18763, seq 78, length 64
08:29:35.543572 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP
185.66.194.49 > 100.64.7.186: ICMP echo request, id 18763, seq 79, length 64
^C
3 packets captured
3 packets received by filter
0 packets dropped by kernel
root@unimatrixzero ~ #  tcpdump -ni eth0 host 185.66.195.1 and \( host
176.9.38.150 or host 192.168.10.62 \) and proto 47 and ip[33]=0x01 and
\( ip[36:4]==0x644007BA or ip[40:4]==0x644007BA \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
10 packets received by filter
0 packets dropped by kernel

The GRE packets go through the interface and through the bridge, but the
GRE packet isn't natted and never send out through the physical
interface (=eth0) on the hypervisor. All those tcpdumps are made on the
hypervisor.

In the first example, where the nat worked, we've see those three steps
aswell. And the packets went out through eth0, got an icmp reply which
took the reverse path to it's destination, the virtual machine, where
the gre got decapsulated and ping got its result package.

I made sure, that the nf_nat_proto_gre and nf_conntrack_proto_gre
modules are loaded. Lsmod shows them.


>> This^^ works as it should. The packet goes through the bridge
>> interface, then the bridge though which all natted vms are connected,
>> then it is translated and then through the eth0 interface of the
>> hypervisor. And the reply packages follows in reverse direction. The
>> nat works, the address is translated. Not so in the second case:
>
> What type of bridge are you using?  Standard Linux bridging, ala brctl
> and or ip?  Or are you using Open vSwitch, or something else?
Standard Linux brctl as virsh and virsh manager create them.
>
> Can we see a config dump of the bridge?
Virsh creates the bridge based on this xml file:
virsh # net-dumpxml ipv4-nat
<network>
  <name>ipv4-nat</name>
  <uuid>2c0daba2-1e17-4d0d-9b9e-2acf09435da6</uuid>
  <forward mode='nat'>
    <nat>
      <port start='1024' end='65535'/>
    </nat>
  </forward>
  <bridge name='virbr1' stp='on' delay='0'/>
  <mac address='52:54:00:7b:ec:03'/>
  <ip address='192.168.10.1' netmask='255.255.255.0'>
    <dhcp>
      <range start='192.168.10.2' end='192.168.10.254'/>
    </dhcp>
  </ip>
</network>

>
> I wonder if a sysctl (/proc) setting got changed and now IPTables is
> trying to filter bridged traffic.  I think it's
> /proc/sys/net/bridge/bridge-nf-call-iptables.  (At least that's what
> I'm seeing with a quick Google search.)
This entry doesn't exist here.

root@unimatrixzero ~ # cat /proc/sys/net/
core/             ipv6/             nf_conntrack_max 
ipv4/             netfilter/        unix/            

There is no bridge, or virbr1 entry in ipv4 either. Nor did I find
something familiar in /netfilter/.

>
> Can we see the output of iptables-save?
root@unimatrixzero ~ # cat /etc/iptables/rules.v4
# Generated by iptables-save v1.6.0 on Fri Oct 27 23:36:29 2017
*raw
:PREROUTING ACCEPT [4134062347:2804377965525]
:OUTPUT ACCEPT [45794:9989552]
-A PREROUTING -d 192.168.0.0/24 -p tcp -m tcp --dport 80 -j TRACE
-A PREROUTING -d 192.168.10.0/24 -p tcp -m tcp --dport 222 -j TRACE
-A OUTPUT -d 192.168.0.0/24 -p tcp -m tcp --dport 80 -j TRACE
-A OUTPUT -d 192.168.10.0/24 -p tcp -m tcp --dport 222 -j TRACE
COMMIT
# Completed on Fri Oct 27 23:36:29 2017
# Generated by iptables-save v1.6.0 on Fri Oct 27 23:36:29 2017
*mangle
:PREROUTING ACCEPT [4134063569:2804378696201]
:INPUT ACCEPT [48005:5510967]
:FORWARD ACCEPT [4133838276:2804349602217]
:OUTPUT ACCEPT [45797:9990176]
:POSTROUTING ACCEPT [4133884073:2804359592393]
-A POSTROUTING -o virbr1 -p udp -m udp --dport 68 -j CHECKSUM
--checksum-fill
COMMIT
# Completed on Fri Oct 27 23:36:29 2017
# Generated by iptables-save v1.6.0 on Fri Oct 27 23:36:29 2017
*nat
:PREROUTING ACCEPT [86097:5109916]
:INPUT ACCEPT [7557:460113]
:OUTPUT ACCEPT [162:11119]
:POSTROUTING ACCEPT [78890:4669843]
-A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 222 -j DNAT
--to-destination 192.168.10.62:22
-A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 80 -j DNAT
--to-destination 192.168.10.248:80
-A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 443 -j DNAT
--to-destination 192.168.10.248:443
-A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 223 -j DNAT
--to-destination 192.168.10.248:22
-A PREROUTING -i eth0 -p gre -j DNAT --to-destination 192.168.10.62
-A PREROUTING -d 176.9.38.150/32 -p udp -m udp --dport 20000:20100 -j
DNAT --to-destination 192.168.10.62:20000-20100
-A POSTROUTING -s 192.168.10.0/24 -d 224.0.0.0/24 -j RETURN
-A POSTROUTING -s 192.168.10.0/24 -d 255.255.255.255/32 -j RETURN
-A POSTROUTING -s 192.168.10.0/24 ! -d 192.168.10.0/24 -p tcp -j
MASQUERADE --to-ports 1024-65535
-A POSTROUTING -s 192.168.10.0/24 ! -d 192.168.10.0/24 -p udp -j
MASQUERADE --to-ports 1024-65535
-A POSTROUTING -s 192.168.10.0/24 ! -d 192.168.10.0/24 -j MASQUERADE
COMMIT
# Completed on Fri Oct 27 23:36:29 2017
# Generated by iptables-save v1.6.0 on Fri Oct 27 23:36:29 2017
*filter
:INPUT ACCEPT [47667:5451204]
:FORWARD ACCEPT [4133512236:2804145422827]
:OUTPUT ACCEPT [45662:9946618]
-A INPUT -i virbr1 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -i virbr1 -p tcp -m tcp --dport 53 -j ACCEPT
-A INPUT -i virbr1 -p udp -m udp --dport 67 -j ACCEPT
-A INPUT -i virbr1 -p tcp -m tcp --dport 67 -j ACCEPT
-A FORWARD -d 192.168.10.0/24 -o virbr1 -m conntrack --ctstate
RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -s 192.168.10.0/24 -i virbr1 -j ACCEPT
-A FORWARD -i virbr1 -o virbr1 -j ACCEPT
-A FORWARD -d 192.168.10.0/24 -m state --state NEW,RELATED,ESTABLISHED
-j ACCEPT
-A FORWARD -d 192.168.10.0/24 -i eth0 -o virbr1 -m state --state
RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -s 192.168.10.0/24 -i virbr1 -o eth0 -j ACCEPT
-A FORWARD -i virbr1 -o virbr1 -j ACCEPT
-A OUTPUT -o virbr1 -p udp -m udp --dport 68 -j ACCEPT
COMMIT
# Completed on Fri Oct 27 23:36:29 2017

>
>> root@# tcpdump -ni any host 185.66.195.0 and \( host 176.9.38.150 or
>> host 192.168.10.62 \) and proto 47 and ip[33]=0x01 and \(
>> ip[36:4]==0x644007B4 or ip[40:4]==0x644007B4 \)
>> tcpdump: verbose output suppressed, use -v or -vv for full protocol
>> decode
>> listening on any, link-type LINUX_SLL (Linux cooked), capture size
>> 262144 bytes
>> 03:58:01.972551 IP 192.168.10.62 > 185.66.195.0: GREv0, length 88: IP
>> 185.66.194.49 > 100.64.7.180: ICMP echo request, id 25043, seq 1,
>> length 64
>> 03:58:01.972554 IP 192.168.10.62 > 185.66.195.0: GREv0, length 88: IP
>> 185.66.194.49 > 100.64.7.180: ICMP echo request, id 25043, seq 1,
>> length 64
>> 03:58:03.001013 IP 192.168.10.62 > 185.66.195.0: GREv0, length 88: IP
>> 185.66.194.49 > 100.64.7.180: ICMP echo request, id 25043, seq 2,
>> length 64
>> 03:58:03.001021 IP 192.168.10.62 > 185.66.195.0: GREv0, length 88: IP
>> 185.66.194.49 > 100.64.7.180: ICMP echo request, id 25043, seq 2,
>> length 64
>>
>> tcpdump catches the outgoing package. But instead of being
>> translated, it's dropped.
>
> We can't tell from the above output if it's traffic coming into the
> outside interface (eth0?) or traffic leaving the inside interface
> (connected to the bridge?).
>
> What hypervisor are you using?  KVM, VirtualBox, something else?  How
> do the VMs connect to the bridge?
KVM. KVM creates the interface on the hypervisor and puts it into the
bridge.
>
> Also, if you're bridging, why are you DNATing packets?  -  Or is your
> bridge internal only and you're DNATing between the outside (eth0) and
> the internal (only) bridge where the VMs are connected?
The bridge is a natted /24 subnet created by kvm. All VM that don't have
a public address are connected to the bridge which nats the outgoing
connections just like a standard home router would do.

A bridge isn't necessary here. It just makes things easier. You could
route each virtual machine seperately. It's just kvm's approch do make
things smothier.
>
> It sort of looks like you may have a one to one mapping of outside IPs
> to inside IPs.  -  Which makes me ask the question why you're DNATing
> in the first place.  Or rather why you aren't bridging the VMs to the
> outside and running the globally routed IP directly in the VMs.
Our standard configuration is to have a seperate global IPv4 for each
virtual machine. We experimented with natting those GRE tunnels so save
one ip address per hypervisor, which worked perfectly so far.

Freifunk is not just a wifi network. It's about getting to know network
stuff like mesh networks or software defined networks based on GRE
tunnels. My reasons to participate are mostly to understand the
technology behind all that.

As I wrote in my other email, I looked into the source code. As far as I
understand it, the GREv0 Nat has never been properly implemented. I
don't understand how this ever worked.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/net/ipv4/netfilter/nf_nat_proto_gre.c?id=HEAD

But GRE-natting is possible. Even my internet provider's 50 Euro router
can do it.

Thanks for your help!

Regards,
Matthias

>
>> Any ideas, how I could analyse this? All tested kernels showed the
>> exact same behavior. It's as if only one gre nat connection was
>> possible.
>
> I need more details to be able to start poking further.
>
>
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GRE-NAT broken
  2018-01-24 19:54 GRE-NAT broken Matthias Walther
                   ` (2 preceding siblings ...)
  2018-01-25  7:47 ` walther.xyz
@ 2018-01-25  7:57 ` Florian Westphal
  2018-01-25 22:57 ` Grant Taylor
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Florian Westphal @ 2018-01-25  7:57 UTC (permalink / raw)
  To: lartc

walther.xyz <matthias@walther.xyz> wrote:
> > I wonder if a sysctl (/proc) setting got changed and now IPTables is
> > trying to filter bridged traffic.  I think it's
> > /proc/sys/net/bridge/bridge-nf-call-iptables.  (At least that's what
> > I'm seeing with a quick Google search.)
> This entry doesn't exist here.

modprobe br_netfilter

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GRE-NAT broken
  2018-01-24 19:54 GRE-NAT broken Matthias Walther
                   ` (3 preceding siblings ...)
  2018-01-25  7:57 ` Florian Westphal
@ 2018-01-25 22:57 ` Grant Taylor
  2018-01-28  3:30 ` walther.xyz
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Grant Taylor @ 2018-01-25 22:57 UTC (permalink / raw)
  To: lartc

[-- Attachment #1: Type: text/plain, Size: 16440 bytes --]

On 01/25/2018 12:47 AM, walther.xyz wrote:
> Hello Grant,

Hi,

> thanks for your reply. I'll respond to your questions inline.

You're welcome.

> We're running gateways for an open wifi project here in Germany, it's 
> called Freifunk (freifunk.net), it's non commercial. We connect those 
> gateways with our AS exit routers via GRE tunnels, GRE over IPv4.

Okay.

> To save money and ressources, we virtualize the hardware with KVM. 
> Usually we have an extra IPv4 address for each virtual machine. In 
> two experimental cases I tried to spare the ipv4 address and nat the 
> gre tunnels from the hypervisor's public ip address and only give the 
> virtual machine a private ip address (192.168....). Standard destination 
> nat with the iptables rule as mentioned.

Okay.

So are the VMs functioning as routers for clients behind them?

It sounds like the GRE tunnel is functionally used to connect the VM 
with your border routers, correct?

> The bridges are created with brctl and the topology in this particular 
> case looks as following:
> 
> root@unimatrixzero ~ # brctl show
> bridge name    bridge id        STP enabled    interfaces
> br0        8000.fe540028664d    no             vnet2
>                                                vnet3
>                                                vnet5
>                                                vnet6
> virbr1     8000.5254007bec03    yes            virbr1-nic
>                                                vnet4
> 
> The hoster is Hetzner, a German budget hosting company. The do not 
> block GRE tunnels. GRE to public ip addresses just work fine. As this 
> hypervisor contains both virtual machines with public ip addresses and 
> private (192.168...) ip addresses, we have two bridges. Depending on 
> the configuration, the virtual machines are in br0 (public ip addresses) 
> and the ones with private addresses in virbr1.

Thank you for the details.

It now occurs to me to ask, are these VMs hosted within your network or 
outside in the cloud?

I'm now getting the impression that the GRE tunnel might be from your 
border router, across the Internet, and into VMs in the cloud.

> Unfortunately not. We're running unattended upgrades on the machines. 
> It's a free time project and we don't have the man power to updated all 
> our hosts manually. I'm not even sure, weather the kernel was updated 
> or not. I tried the oldest kernel still available on the machine  and a 
> much older kernel 4.4. Ubuntu automatically deinstalls unneeded, older 
> kernels. Maybe a security patch, that got applied to 4.4, aswell as 4.10 
> and 4.13 and 4.14 destroyed the case. Maybe I should try an older 4.4 
> kernel, not revision 113.
> 
> But I can say for sure, that we had two experimental machines running 
> this configuration with natted gre tunnels and both stopped working 
> around the same time after this had worked stabily for several months.

Okay.  That just means that it's not currently possible to revert to 
something that works as a diagnostic aid.  So the only way out is 
forward through the problem.

> I was pinging from the inside of the VM into the GRE tunnel. So the 
> packet flow is as follows:
> 
> ICMP packet goes into virtual GRE interface within the virtual machine. 
> Then it is encasuplated with the private ip address as source and send 
> out through eth0 of the virtual machine.
> 
> The packet is now in the network stack of the hypervisor. Comming in 
> through vnet4, going through virbr1-bridge. Then it should be natted, 
> so the private source address of the gre packet should be replaced by 
> the public ip address of the hypervisor. Then the natted packet is sent 
> out to the other end of the gre tunnel somewhere on the internet. The 
> last step, the nat and the sending though the physical interface is what 
> doesn't happen.

Okay.

What does tcpdump on the vNIC in the VM show?  I would expect to see 
encapsulated ICMP inside of GRE tunnel w/ the VM's private IP as the 
source and the far end's IP as the GRE destination.

What does the host see on vnet4 or the virbr1 interfaces?  I would 
expect them to see the same thing as what the guest VM saw on it's vNIC 
(eth0?).

> Funnily, after each reboot a different tunnel seemed to work. All tunnels 
> do the same, they're just going to different backbone upstream severs 
> for redundancy.

Okay.

> That's why we're not sure when the problem first occured. Due to the fact 
> that everything seemed to work fine, because one tunnel is enough, the 
> problem hadn't been discovered directly. Now it stopped working completly.

Oh.  I thought that something was partially working.

Can you disable both of the tunnels for 5 ~ 10 minutes, long enough for 
potentially stale state to clear, and then enable one tunnel?

> Unfortunatley, I can't provide a working example as since I tested all 
> those different kernel version, nothing works anymore. Not a single 
> tunnel, even though I went back to 4.13.0-31 with which I had captured 
> the packets yesterday.

:-/

> (As I rebooted again, vnet4 is now vnet0.)

ACK

> See here the three steps  seperatly:

Thank you.

> root@unimatrixzero ~ #  tcpdump -ni vnet0 host 185.66.195.1 and \( host 
> 176.9.38.150 or host 192.168.10.62 \) and proto 47 and ip[33]=0x01 and \( 
> ip[36:4]==0x644007BA or ip[40:4]==0x644007BA \)
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on vnet0, link-type EN10MB (Ethernet), capture size 262144 bytes
> 08:29:15.127873 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: 
> IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 18763, seq 59, 
> length 64
> 08:29:16.151856 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: 
> IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 18763, seq 60, 
> length 64
> 08:29:17.175800 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: 
> IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 18763, seq 61, 
> length 64
> 08:29:18.199780 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: 
> IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 18763, seq 62, 
> length 64
> ^C
> 4 packets captured
> 4 packets received by filter
> 0 packets dropped by kernel

That seems reasonable enough.

> root@unimatrixzero ~ #  tcpdump -ni virbr1 host 185.66.195.1 and \( host 
> 176.9.38.150 or host 192.168.10.62 \) and proto 47 and ip[33]=0x01 and \( 
> ip[36:4]==0x644007BA or ip[40:4]==0x644007BA \)
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on virbr1, link-type EN10MB (Ethernet), capture size 262144 
> bytes
> 08:29:33.495592 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: 
> IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 18763, seq 77, 
> length 64
> 08:29:34.519567 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: 
> IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 18763, seq 78, 
> length 64
> 08:29:35.543572 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: 
> IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 18763, seq 79, 
> length 64
> ^C
> 3 packets captured
> 3 packets received by filter
> 0 packets dropped by kernel

Likewise with this.

> root@unimatrixzero ~ #  tcpdump -ni eth0 host 185.66.195.1 and \( host 
> 176.9.38.150 or host 192.168.10.62 \) and proto 47 and ip[33]=0x01 and \( 
> ip[36:4]==0x644007BA or ip[40:4]==0x644007BA \)
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
> ^C
> 0 packets captured
> 10 packets received by filter
> 0 packets dropped by kernel

So, for some reason, your GRE packets don't seem to be leaving the 
system.  -  I'll have to look at your firewall config (I think you 
provided it below).

> The GRE packets go through the interface and through the bridge, but the 
> GRE packet isn't natted and never send out through the physical interface 
> (=eth0) on the hypervisor.

I would expect to see the GRE packets leaving the node, even if they 
aren't NATed.

> All those tcpdumps are made on the hypervisor.

ACK

> In the first example, where the nat worked, we've see those three steps 
> aswell. And the packets went out through eth0, got an icmp reply which 
> took the reverse path to it's destination, the virtual machine, where 
> the gre got decapsulated and ping got its result package.

This is what I would expect to happen, and I suspect what you desire to 
happen.

> I made sure, that the nf_nat_proto_gre and nf_conntrack_proto_gre modules 
> are loaded. Lsmod shows them.

ACK

> Virsh creates the bridge based on this xml file:
> 
> virsh # net-dumpxml ipv4-nat
> <network>
>   <name>ipv4-nat</name>
>   <uuid>2c0daba2-1e17-4d0d-9b9e-2acf09435da6</uuid>
>   <forward mode='nat'>
>     <nat>
>       <port start='1024' end='65535'/>
>     </nat>
>   </forward>
>   <bridge name='virbr1' stp='on' delay='0'/>
>   <mac address='52:54:00:7b:ec:03'/>
>   <ip address='192.168.10.1' netmask='255.255.255.0'>
>     <dhcp>
>       <range start='192.168.10.2' end='192.168.10.254'/>
>     </dhcp>
>   </ip>
> </network>

That looks reasonable enough.

I also feel like this may be more an IPTables problem than a bridge problem.

> This entry doesn't exist here.
> 
> root@unimatrixzero ~ # cat proc/sys/net
> core/             ipv6/             nf_conntrack_max
> ipv4/             netfilter/        unix/

Good.  I think you want it to not be there.

> There is no bridge, or virbr1 entry in ipv4 either. Nor did I find 
> something familiar in netfilter.

Okay.

> root@unimatrixzero ~ # cat /etc/iptables/rules.v4
> # Generated by iptables-save v1.6.0 on Fri Oct 27 23:36:29 2017
> *raw
> :PREROUTING ACCEPT [4134062347:2804377965525]
> :OUTPUT ACCEPT [45794:9989552]
> -A PREROUTING -d 192.168.0.0/24 -p tcp -m tcp --dport 80 -j TRACE
> -A PREROUTING -d 192.168.10.0/24 -p tcp -m tcp --dport 222 -j TRACE
> -A OUTPUT -d 192.168.0.0/24 -p tcp -m tcp --dport 80 -j TRACE
> -A OUTPUT -d 192.168.10.0/24 -p tcp -m tcp --dport 222 -j TRACE
> COMMIT
> # Completed on Fri Oct 27 23:36:29 2017
> # Generated by iptables-save v1.6.0 on Fri Oct 27 23:36:29 2017
> *mangle
> :PREROUTING ACCEPT [4134063569:2804378696201]
> :INPUT ACCEPT [48005:5510967]
> :FORWARD ACCEPT [4133838276:2804349602217]
> :OUTPUT ACCEPT [45797:9990176]
> :POSTROUTING ACCEPT [4133884073:2804359592393]
> -A POSTROUTING -o virbr1 -p udp -m udp --dport 68 -j CHECKSUM 
> --checksum-fill
> COMMIT
> # Completed on Fri Oct 27 23:36:29 2017
> # Generated by iptables-save v1.6.0 on Fri Oct 27 23:36:29 2017
> *nat
> :PREROUTING ACCEPT [86097:5109916]
> :INPUT ACCEPT [7557:460113]
> :OUTPUT ACCEPT [162:11119]
> :POSTROUTING ACCEPT [78890:4669843]
> -A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 222 -j DNAT 
> --to-destination 192.168.10.62:22
> -A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 80 -j DNAT 
> --to-destination 192.168.10.248:80
> -A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 443 -j DNAT 
> --to-destination 192.168.10.248:443
> -A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 223 -j DNAT 
> --to-destination 192.168.10.248:22
> -A PREROUTING -i eth0 -p gre -j DNAT --to-destination 192.168.10.62

I think this may route all incoming GRE to a single host / VM, 
192.168.10.62.

> -A PREROUTING -d 176.9.38.150/32 -p udp -m udp --dport 20000:20100 -j 
> DNAT --to-destination 192.168.10.62:20000-20100
> -A POSTROUTING -s 192.168.10.0/24 -d 224.0.0.0/24 -j RETURN
> -A POSTROUTING -s 192.168.10.0/24 -d 255.255.255.255/32 -j RETURN
> -A POSTROUTING -s 192.168.10.0/24 ! -d 192.168.10.0/24 -p tcp -j 
> MASQUERADE --to-ports 1024-65535
> -A POSTROUTING -s 192.168.10.0/24 ! -d 192.168.10.0/24 -p udp -j 
> MASQUERADE --to-ports 1024-65535
> -A POSTROUTING -s 192.168.10.0/24 ! -d 192.168.10.0/24 -j MASQUERADE

This will very likely MASQUERADE all of the GRE traffic from 
192.168.10.62, which means it will be NATed to the source IP of the 
interface with the best route to 185.66.195.1  Is that 176.9.38.150, the 
IP that you were looking for on the eth0 interface?  (You wouldn't see 
the 192.168.10.62 IP there as it's after NATing.

> COMMIT
> # Completed on Fri Oct 27 23:36:29 2017
> # Generated by iptables-save v1.6.0 on Fri Oct 27 23:36:29 2017
> *filter
> :INPUT ACCEPT [47667:5451204]
> :FORWARD ACCEPT [4133512236:2804145422827]
> :OUTPUT ACCEPT [45662:9946618]
> -A INPUT -i virbr1 -p udp -m udp --dport 53 -j ACCEPT
> -A INPUT -i virbr1 -p tcp -m tcp --dport 53 -j ACCEPT
> -A INPUT -i virbr1 -p udp -m udp --dport 67 -j ACCEPT
> -A INPUT -i virbr1 -p tcp -m tcp --dport 67 -j ACCEPT
> -A FORWARD -d 192.168.10.0/24 -o virbr1 -m conntrack --ctstate 
> RELATED,ESTABLISHED -j ACCEPT
> -A FORWARD -s 192.168.10.0/24 -i virbr1 -j ACCEPT
> -A FORWARD -i virbr1 -o virbr1 -j ACCEPT
> -A FORWARD -d 192.168.10.0/24 -m state --state NEW,RELATED,ESTABLISHED 
> -j ACCEPT
> -A FORWARD -d 192.168.10.0/24 -i eth0 -o virbr1 -m state --state 
> RELATED,ESTABLISHED -j ACCEPT
> -A FORWARD -s 192.168.10.0/24 -i virbr1 -o eth0 -j ACCEPT
> -A FORWARD -i virbr1 -o virbr1 -j ACCEPT
> -A OUTPUT -o virbr1 -p udp -m udp --dport 68 -j ACCEPT
> COMMIT
> # Completed on Fri Oct 27 23:36:29 2017

I don't see anything else that might interfere with the GRE traffic that 
you're looking for.

I also don't see where your firewall is actually blocking any traffic, 
and a couple of other things that I'm not quite sure why you did what 
you did.  But, this discussion is for GRE issues.

> KVM. KVM creates the interface on the hypervisor and puts it into 
> the bridge.

ACK

> The bridge is a natted /24 subnet created by kvm. All VM that don't have 
> a public address are connected to the bridge which nats the outgoing 
> connections just like a standard home router would do.

*nod*

> A bridge isn't necessary here. It just makes things easier. You could 
> route each virtual machine seperately. It's just kvm's approch do make 
> things smothier.

*nod*

> Our standard configuration is to have a seperate global IPv4 for each 
> virtual machine. We experimented with natting those GRE tunnels so save 
> one ip address per hypervisor, which worked perfectly so far.

I think I'm still missing something.

I would assign the globally routed IPs to the VMs directly, and route 
them to the eth0 IP of the machine.

Or are you talking about saving the globally routable IP address on virbr1?

Another trick would be to use private IP addresses on virbr1 and the 
vNICs of the VMs.  You use this for routing and assign the VM's globally 
routed IP address to a dummy interface in the VMs.  -  That would be 
clear channel routing all the way in.  Save for the private IPs in the 
path, which works, but is meh in a traceroute output.

> Freifunk is not just a wifi network. It's about getting to know 
> network stuff like mesh networks or software defined networks based 
> on GRE tunnels. My reasons to participate are mostly to understand the 
> technology behind all that.

That sounds interesting and like a worth while cause.

> As I wrote in my other email, I looked into the source code. As far as 
> I understand it, the GREv0 Nat has never been properly implemented. I 
> don't understand how this ever worked.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/net/ipv4/netfilter/nf_nat_proto_gre.c?id=HEAD

My read of the header is that GRE may not need a NAT helper per say.  It 
sounds like it's just a matter of altering the source / destination IP 
of the GRE encapsulation traffic.

I also don't see anything in RFC 2784 § 2.1. GRE Header that would need 
NAT as I understand it.

> But GRE-natting is possible. Even my internet provider's 50 Euro router 
> can do it.

I've not done much with GRE, but I think it's a very simple 
encapsulation.  Which means that as long as you get the local and remote 
IPs correct on both ends, including reflecting any NATing, I think 
things will likely work.

> Thanks for your help!

You're welcome.



-- 
Grant. . . .
unix || die


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3982 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GRE-NAT broken
  2018-01-24 19:54 GRE-NAT broken Matthias Walther
                   ` (4 preceding siblings ...)
  2018-01-25 22:57 ` Grant Taylor
@ 2018-01-28  3:30 ` walther.xyz
  2018-01-28  5:42 ` Grant Taylor
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: walther.xyz @ 2018-01-28  3:30 UTC (permalink / raw)
  To: lartc

Hallo Grant,

I'll short this email to keep clarity.

Am 25.01.2018 um 23:57 schrieb Grant Taylor:
> So are the VMs functioning as routers for clients behind them?
Yes. Our wifi access points, which are typically OpenWrt based routers
like the TP-LINK TL-WR841N or the TP-LINK TL-WR1043ND connect through
L2TP to the Gateways, which then route between our network and the
upstream via GRE tunnels. Our gateways are basically a VPN provider
system, this is needed because of German law, elsewise the people who
share their internet connection could be held responsable for what their
guests do in their wifi, if they do evil things like illegal filesharing
or worse things.

With our network, we fight this stupid law situation and made it quite
far already. The worst part of the law called "Störerhaftung", which had
held you responsable for anything that was sent over your internet
connection, has been abolished during the last legislation period. But
people are still scared after they've been told for 15 years that they
should never ever share their internet connection with strangers,
because they could get cease and desist letters, which can easily cost
you 500 or 1000 Euros. So for a while, we'll stick with our VPN network.
:) But things are getting better now and some of the federal states
started supporting us financially.
>
> It sounds like the GRE tunnel is functionally used to connect the VM
> with your border routers, correct?
Correct. We use GRE tunnels between the gateways for cross traffic and
for upstream. Both tunnels are affected.
> It now occurs to me to ask, are these VMs hosted within your network
> or outside in the cloud?
They are rented servers at Hetzner or other cheap hosting companies. We
don't have physical access to them. We just rent them and configure them
to become VPN endpoints.
>
> I'm now getting the impression that the GRE tunnel might be from your
> border router, across the Internet, and into VMs in the cloud.
It's not really important, where the servers are located. Most of them
are at Hetzner, but not all of them.

Typical situation:

VM(pubilc IP) <---> Hypervisor <---the internet---> Hypervisor <---> VM
(public OR private IP)

The tunnels between public IPs work perfectly. It's just the NAT VMs
that cause trouble.

We cannot affort to host everything on bare metal servers. We need
virtualisation to cut costs.

> Okay.  That just means that it's not currently possible to revert to
> something that works as a diagnostic aid.  So the only way out is
> forward through the problem.
Sometimes two of seven tunnels work. Sometimes none. Today two work,
here a status report from bird, our BGP daemon:
ffrl_fra0 BGP      ffnet    up     02:16:10    Established  
ffrl_fra1 BGP      ffnet    start  02:16:10    Connect      
ffrl_ber0 BGP      ffnet    start  02:16:10    Connect      
ffrl_ber1 BGP      ffnet    start  02:16:10    Connect      
ffrl_dus0 BGP      ffnet    up     02:16:22    Established  
ffrl_dus1 BGP      ffnet    start  02:16:10    Connect      
ibgp_gw02 BGP      ffnet    start  02:16:10    Connect

(Only established tunnels work correctly, "Connect" indicates, that
there's something wrong.)

The first six are the upstream connections. The last one is to the
partner gateway which exists as failover.

So I made you a record, how it is supposed to look:

root@unimatrixzero ~ # tcpdump -ni any host 185.66.195.1 and \( host
176.9.38.150 or host 192.168.10.62 \) and proto 47 and ip[33]=0x01 and
\( ip[36:4]==0x644007BA or ip[40:4]==0x644007BA \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size
262144 bytes
02:20:16.542279 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP
185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 23, length 64
02:20:16.542282 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP
185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 23, length 64
02:20:16.542286 IP 176.9.38.150 > 185.66.195.1: GREv0, length 88: IP
185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 23, length 64
02:20:16.561304 IP 185.66.195.1 > 176.9.38.150: GREv0, length 88: IP
100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 23, length 64
02:20:16.561313 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP
100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 23, length 64
02:20:16.561315 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP
100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 23, length 64
02:20:17.543573 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP
185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 24, length 64
02:20:17.543587 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP
185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 24, length 64
02:20:17.543605 IP 176.9.38.150 > 185.66.195.1: GREv0, length 88: IP
185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 24, length 64
02:20:17.562563 IP 185.66.195.1 > 176.9.38.150: GREv0, length 88: IP
100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 24, length 64
02:20:17.562585 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP
100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 24, length 64
02:20:17.562590 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP
100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 24, length 64

Well, while running those tests, more tunnels started working:
ffrl_fra0 BGP      ffnet    up     02:16:10    Established  
ffrl_fra1 BGP      ffnet    start  02:16:10    Connect      
ffrl_ber0 BGP      ffnet    start  02:16:10    Connect      
ffrl_ber1 BGP      ffnet    up     02:16:33    Established  
ffrl_dus0 BGP      ffnet    up     02:16:22    Established  
ffrl_dus1 BGP      ffnet    up     02:16:53    Established  
ibgp_gw02 BGP      ffnet    start  02:16:10    Connect   

I'll take the last one for tests:

Within the VM:
The packages are send through the tunnel:
root@gw03:~# tcpdump -i bck-gw02 icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bck-gw02, link-type LINUX_SLL (Linux cooked), capture size
262144 bytes
02:24:15.725060 IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id
11563, seq 12, length 64
02:24:16.749064 IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id
11563, seq 13, length 64
02:24:17.749033 IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id
11563, seq 14, length 64

bck-gw02 ist the GRE-interface.

Now eth0 of the VM:

root@gw03:~# tcpdump -i eth0 proto 47 and ip[33]=0x01 and \(
ip[36:4]==0xC0A80106 or ip[40:4]==0xC0A8
0106 \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
04:03:43.757089 IP gw03 > static.88-198-51-94.clients.your-server.de:
GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id
11563, seq 5859, length 64
04:03:44.781093 IP gw03 > static.88-198-51-94.clients.your-server.de:
GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id
11563, seq 5860, length 64
04:03:45.805110 IP gw03 > static.88-198-51-94.clients.your-server.de:
GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id
11563, seq 5861, length 64

(Don't look on the seqence number, I had to do some other stuff and let
the ping run.)

Now the hypervisor:

vnet0 (the interface of the vm)
root@unimatrixzero ~ # tcpdump -i vnet0 proto 47 and ip[33]=0x01 and \(
ip[36:4]==0xC0A80106 or ip[40:4]==0xC0A80106 \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vnet0, link-type EN10MB (Ethernet), capture size 262144 bytes
04:05:44.496867 IP 192.168.10.62 >
static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP
192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 5977, length 64
04:05:45.520863 IP 192.168.10.62 >
static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP
192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 5978, length 64
04:05:46.544832 IP 192.168.10.62 >
static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP
192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 5979, length 64
^C

root@unimatrixzero ~ # tcpdump -i virbr1 proto 47 and ip[33]=0x01 and \(
ip[36:4]==0xC0A80106 or ip[40:4]==0xC0A80106 \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on virbr1, link-type EN10MB (Ethernet), capture size 262144 bytes
04:06:14.096209 IP 192.168.10.62 >
static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP
192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 6006, length 64
04:06:15.120225 IP 192.168.10.62 >
static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP
192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 6007, length 64
04:06:16.144186 IP 192.168.10.62 >
static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP
192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 6008, length 64

And nothing on eth0 (physical interface):
root@unimatrixzero ~ # tcpdump -i eth0 proto 47 and ip[33]=0x01 and \(
ip[36:4]==0xC0A80106 or ip[40:4]==0xC0A80106 \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
27 packets received by filter
9 packets dropped by kernel

The NAT kernel modul eats the packages :) and makes them vanish.


> What does tcpdump on the vNIC in the VM show?  I would expect to see
> encapsulated ICMP inside of GRE tunnel w/ the VM's private IP as the
> source and the far end's IP as the GRE destination.
See above. Looks correct.
>
> What does the host see on vnet4 or the virbr1 interfaces?  I would
> expect them to see the same thing as what the guest VM saw on it's
> vNIC (eth0?).
Yes, I did a tcpdump on vnet4 and virbr1 last time. Looked correct.
Encapsulated ICMP packages.

Look above. Looks correct.
> Oh.  I thought that something was partially working.
>
It was at the time, when I wrote my first email on here. Then things
have become even worse because of all my testings. Now things work
partly again.
> Can you disable both of the tunnels for 5 ~ 10 minutes, long enough
> for potentially stale state to clear, and then enable one tunnel?
Okay, I'll disable all tunnels except the one I tested above. But I can
only disable the „inside part“. There will still be send encapsulated
BGP packets from the remote hosts. I can't disable them.
> I would expect to see the GRE packets leaving the node, even if they
> aren't NATed.
Yes, me too. The packages just vanish. I tried to catch them with a
iptables log rule inserted after the GRE NAT rule, so that it would
catch uncatched packages. But the GRE NAT rule catches them, there were
not log entries of uncatched packagse. They just vanish during the NAT
process.
> root@unimatrixzero ~ # cat /etc/iptables/rules.v4
>> # Generated by iptables-save v1.6.0 on Fri Oct 27 23:36:29 2017
>> *raw
>> :PREROUTING ACCEPT [4134062347:2804377965525]
>> :OUTPUT ACCEPT [45794:9989552]
>> -A PREROUTING -d 192.168.0.0/24 -p tcp -m tcp --dport 80 -j TRACE
>> -A PREROUTING -d 192.168.10.0/24 -p tcp -m tcp --dport 222 -j TRACE
>> -A OUTPUT -d 192.168.0.0/24 -p tcp -m tcp --dport 80 -j TRACE
>> -A OUTPUT -d 192.168.10.0/24 -p tcp -m tcp --dport 222 -j TRACE
>> COMMIT
>> # Completed on Fri Oct 27 23:36:29 2017
>> # Generated by iptables-save v1.6.0 on Fri Oct 27 23:36:29 2017
>> *mangle
>> :PREROUTING ACCEPT [4134063569:2804378696201]
>> :INPUT ACCEPT [48005:5510967]
>> :FORWARD ACCEPT [4133838276:2804349602217]
>> :OUTPUT ACCEPT [45797:9990176]
>> :POSTROUTING ACCEPT [4133884073:2804359592393]
>> -A POSTROUTING -o virbr1 -p udp -m udp --dport 68 -j CHECKSUM
>> --checksum-fill
>> COMMIT
>> # Completed on Fri Oct 27 23:36:29 2017
>> # Generated by iptables-save v1.6.0 on Fri Oct 27 23:36:29 2017
>> *nat
>> :PREROUTING ACCEPT [86097:5109916]
>> :INPUT ACCEPT [7557:460113]
>> :OUTPUT ACCEPT [162:11119]
>> :POSTROUTING ACCEPT [78890:4669843]
>> -A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 222 -j DNAT
>> --to-destination 192.168.10.62:22
>> -A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 80 -j DNAT
>> --to-destination 192.168.10.248:80
>> -A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 443 -j DNAT
>> --to-destination 192.168.10.248:443
>> -A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 223 -j DNAT
>> --to-destination 192.168.10.248:22
>> -A PREROUTING -i eth0 -p gre -j DNAT --to-destination 192.168.10.62
>
> I think this may route all incoming GRE to a single host / VM,
> 192.168.10.62.
Exactly. That is what I intend to do. In this case 176.9.38.150 is the
public IP. 192.168.10.62 the private IP.
>
>> -A PREROUTING -d 176.9.38.150/32 -p udp -m udp --dport 20000:20100 -j
>> DNAT --to-destination 192.168.10.62:20000-20100
>> -A POSTROUTING -s 192.168.10.0/24 -d 224.0.0.0/24 -j RETURN
>> -A POSTROUTING -s 192.168.10.0/24 -d 255.255.255.255/32 -j RETURN
>> -A POSTROUTING -s 192.168.10.0/24 ! -d 192.168.10.0/24 -p tcp -j
>> MASQUERADE --to-ports 1024-65535
>> -A POSTROUTING -s 192.168.10.0/24 ! -d 192.168.10.0/24 -p udp -j
>> MASQUERADE --to-ports 1024-65535
>> -A POSTROUTING -s 192.168.10.0/24 ! -d 192.168.10.0/24 -j MASQUERADE
>
> This will very likely MASQUERADE all of the GRE traffic from
> 192.168.10.62, which means it will be NATed to the source IP of the
> interface with the best route to 185.66.195.1  Is that 176.9.38.150,
> the IP that you were looking for on the eth0 interface?  (You wouldn't
> see the 192.168.10.62 IP there as it's after NATing.
Yes, that is what I intend to do.
> I also don't see where your firewall is actually blocking any traffic,
> and a couple of other things that I'm not quite sure why you did what
> you did.  But, this discussion is for GRE issues.
The firewall configuration should be correct. We create it an track it
with Ansible. This is definitly the iptables configuration that has
worked before.

What are you unsure about? This is a public project, we have nothing to
hide ;) Just ask.
>
>> Our standard configuration is to have a seperate global IPv4 for each
>> virtual machine. We experimented with natting those GRE tunnels so
>> save one ip address per hypervisor, which worked perfectly so far.
>
> I think I'm still missing something.
>
> I would assign the globally routed IPs to the VMs directly, and route
> them to the eth0 IP of the machine.
Yes, that works. Just trying to spare an ip address here. And requesting
an ip change at our upstream provider takes time. So I once natted the
tunnels though to a VM when we virtualized the system for the first
time. Originally the gateway ran directly on the host. No within a VM.

Requesting public IPs for the VM would solve the problem. But Linux
should be capable to nat the tunnels through. And it used to work. At
high performance and reliably for several months.
>
> Or are you talking about saving the globally routable IP address on
> virbr1?
Yes.
>
> Another trick would be to use private IP addresses on virbr1 and the
> vNICs of the VMs.  You use this for routing and assign the VM's
> globally routed IP address to a dummy interface in the VMs.  -  That
> would be clear channel routing all the way in.  Save for the private
> IPs in the path, which works, but is meh in a traceroute output.
The global IP address of the host is used to access it in the first
place. It's just a rented server somewhere on the internet.
>
>> Freifunk is not just a wifi network. It's about getting to know
>> network stuff like mesh networks or software defined networks based
>> on GRE tunnels. My reasons to participate are mostly to understand
>> the technology behind all that.
>
> That sounds interesting and like a worth while cause.
Yeah, it's fun :)
>
>> As I wrote in my other email, I looked into the source code. As far
>> as I understand it, the GREv0 Nat has never been properly
>> implemented. I don't understand how this ever worked.
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/net/ipv4/netfilter/nf_nat_proto_gre.c?id=HEAD
>>
>
> My read of the header is that GRE may not need a NAT helper per say. 
> It sounds like it's just a matter of altering the source / destination
> IP of the GRE encapsulation traffic.
Okay. From my understanding aswell, GRE is quite a simple technology. It
just encapsulates the IP packages and decapsulate them at their
destination. GRE is even more simple than UDP as it doesn't support ports.
>
> I also don't see anything in RFC 2784 § 2.1. GRE Header that would
> need NAT as I understand it.
But the destination IP address needs to be replaced. Not more or less
than that.

All those new tests didn't bring any new intel to me.

So I tested another kernel on the hypervisor:
root@unimatrixzero ~ # uname -a
Linux unimatrixzero 4.4.9-040409-generic #201605041832 SMP Wed May 4
22:34:16 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

I think, this was one of the first kernels for Ubuntu 16.04. And this
didn't work either. From my understanding, the kernel version is not the
problem. There must be another problem. 

Are there any debugging modes for the kernel moduls? I'd like to
understand why the packages are dropped. The kernel has nothing to do
but to replace the source IP and things should be done.

Regards,
Matthias


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GRE-NAT broken
  2018-01-24 19:54 GRE-NAT broken Matthias Walther
                   ` (5 preceding siblings ...)
  2018-01-28  3:30 ` walther.xyz
@ 2018-01-28  5:42 ` Grant Taylor
  2018-01-29 11:11 ` Matthias Walther
  2018-01-30 22:37 ` Grant Taylor
  8 siblings, 0 replies; 10+ messages in thread
From: Grant Taylor @ 2018-01-28  5:42 UTC (permalink / raw)
  To: lartc

[-- Attachment #1: Type: text/plain, Size: 16017 bytes --]

On 01/27/2018 08:30 PM, walther.xyz wrote:
> Hallo Grant,

Hi Matthias,

> I'll short this email to keep clarity.

Okay.

> Yes. Our wifi access points, which are typically OpenWrt based routers 
> like the TP-LINK TL-WR841N or the TP-LINK TL-WR1043ND connect through L2TP 
> to the Gateways, which then route between our network and the upstream 
> via GRE tunnels. Our gateways are basically a VPN provider system, this is 
> needed because of German law, elsewise the people who share their internet 
> connection could be held responsable for what their guests do in their 
> wifi, if they do evil things like illegal filesharing or worse things.

Okay.

It sounds like you are working under some restrictions that I'm 
completely ignorant to.  Please keep that in mind if I accidentally 
suggest something that you shouldn't do.

> With our network, we fight this stupid law situation and made it quite 
> far already. The worst part of the law called "Störerhaftung", which 
> had held you responsable for anything that was sent over your internet 
> connection, has been abolished during the last legislation period. But 
> people are still scared after they've been told for 15 years that 
> they should never ever share their internet connection with strangers, 
> because they could get cease and desist letters, which can easily cost 
> you 500 or 1000 Euros. So for a while, we'll stick with our VPN network.

Ouch.  That sounds serious.  -  On my side of the pond, cease and desist 
letters usually are a friendly "stop, or else" type.  In fact, most ISPs 
over here need to send three before they can actually terminate your 
connection.  (RIAA and MPAA are notorious for causing ISPs to send such 
letters.)

> But things are getting better now and some of the federal states started 
> supporting us financially.

:-)

> Correct. We use GRE tunnels between the gateways for cross traffic and 
> for upstream. Both tunnels are affected.

ACK

> They are rented servers at Hetzner or other cheap hosting companies. We 
> don't have physical access to them. We just rent them and configure them 
> to become VPN endpoints.

ACK

> It's not really important, where the servers are located. Most of them 
> are at Hetzner, but not all of them.

The thing that is important, at least for me to understand, is if the 
GRE tunnels are between the CPE (OpenWRT routers) and your VMs are 
across the internet.  Verses, what I thought when this thread started, 
your border routers across your internal LAN to your VMs on your own hosts.

The pertinent part is if the GRE is crossing the internet vs your own LAN.

I don't really care where they are hosted (as in which provider), just 
which side of your internet border router they are on, inside or outside.

> Typical situation:
> 
> VM (pubilc IP) <---> Hypervisor <---the internet---> Hypervisor <---> 
> VM (public OR private IP)

Please confirm the GRE tunnels are between the VM on the left (with a 
public IP) and the VM on the right (with a public OR private IP).

> The tunnels between public IPs work perfectly. It's just the NAT VMs 
> that cause trouble.

Okay.

> We cannot affort to host everything on bare metal servers. We need 
> virtualisation to cut costs.

Sorry if I gave the impression that bare metal was necessary.  I think 
VMs are perfectly fine.

I am somewhat questioning the need for private IPs vs public IPs on the 
VMs.  (I'm still trying to wrap my head around things.)

> Sometimes two of seven tunnels work. Sometimes none. Today two work, 
> here a status report from bird, our BGP daemon:
> 
> ffrl_fra0 BGP      ffnet    up     02:16:10    Established
> ffrl_fra1 BGP      ffnet    start  02:16:10    Connect
> ffrl_ber0 BGP      ffnet    start  02:16:10    Connect
> ffrl_ber1 BGP      ffnet    start  02:16:10    Connect
> ffrl_dus0 BGP      ffnet    up     02:16:22    Established
> ffrl_dus1 BGP      ffnet    start  02:16:10    Connect
> ibgp_gw02 BGP      ffnet    start  02:16:10    Connect
> 
> (Only established tunnels work correctly, "Connect" indicates, that 
> there's something wrong.)
> 
> The first six are the upstream connections. The last one is to the 
> partner gateway which exists as failover.

Okay.

> So I made you a record, how it is supposed to look:
> 
> root@unimatrixzero ~ # tcpdump -ni any host 185.66.195.1 and \( host 
> 176.9.38.150 or host 192.168.10.62 \) and proto 47 and ip[33]=0x01 and \( 
> ip[36:4]==0x644007BA or ip[40:4]==0x644007BA \)
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on any, link-type LINUX_SLL (Linux cooked), capture size 
> 262144 bytes
> 02:20:16.542279 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: 
> IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 23, 
> length 64
> 02:20:16.542282 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: 
> IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 23, 
> length 64
> 02:20:16.542286 IP 176.9.38.150 > 185.66.195.1: GREv0, length 88: 
> IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 23, 
> length 64
> 02:20:16.561304 IP 185.66.195.1 > 176.9.38.150: GREv0, length 88: IP 
> 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 23, length 64
> 02:20:16.561313 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP 
> 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 23, length 64
> 02:20:16.561315 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP 
> 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 23, length 64
> 02:20:17.543573 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: 
> IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 24, 
> length 64
> 02:20:17.543587 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: 
> IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 24, 
> length 64
> 02:20:17.543605 IP 176.9.38.150 > 185.66.195.1: GREv0, length 88: 
> IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 24, 
> length 64
> 02:20:17.562563 IP 185.66.195.1 > 176.9.38.150: GREv0, length 88: IP 
> 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 24, length 64
> 02:20:17.562585 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP 
> 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 24, length 64
> 02:20:17.562590 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP 
> 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 24, length 64
> 
> Well, while running those tests, more tunnels started working:
> 
> ffrl_fra0 BGP      ffnet    up     02:16:10    Established
> ffrl_fra1 BGP      ffnet    start  02:16:10    Connect
> ffrl_ber0 BGP      ffnet    start  02:16:10    Connect
> ffrl_ber1 BGP      ffnet    up     02:16:33    Established
> ffrl_dus0 BGP      ffnet    up     02:16:22    Established
> ffrl_dus1 BGP      ffnet    up     02:16:53    Established
> ibgp_gw02 BGP      ffnet    start  02:16:10    Connect
> 
> I'll take the last one for tests:
> 
> I'll take the last one for tests:
> 
> Within the VM:
> 
> The packages are send through the tunnel:
> 
> root@gw03:~# tcpdump -i bck-gw02 icmp tcpdump: verbose output suppressed, 
> use -v or -vv for full protocol decode
> listening on bck-gw02, link-type LINUX_SLL (Linux cooked), capture size 
> 262144 bytes
> 02:24:15.725060 IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, 
> seq 12, length 64
> 02:24:16.749064 IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, 
> seq 13, length 64
> 02:24:17.749033 IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, 
> seq 14, length 64
> 
> bck-gw02 ist the GRE-interface.
> 
> Now eth0 of the VM:
> 
> root@gw03:~# tcpdump -i eth0 proto 47 and ip[33]=0x01 and \( 
> ip[36:4]==0xC0A80106 or ip[40:4]==0xC0A8 0106 \)
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
> 04:03:43.757089 IP gw03 > static.88-198-51-94.clients.your-server.de: 
> GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, 
> id 11563, seq 5859, length 64
> 04:03:44.781093 IP gw03 > static.88-198-51-94.clients.your-server.de: 
> GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, 
> id 11563, seq 5860, length 64
> 04:03:45.805110 IP gw03 > static.88-198-51-94.clients.your-server.de: 
> GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, 
> id 11563, seq 5861, length 64
> 
> (Don't look on the seqence number, I had to do some other stuff and let 
> the ping run.)
> 
> Now the hypervisor:
> 
> vnet0 (the interface of the vm)
> root@unimatrixzero ~ # tcpdump -i vnet0 proto 47 and ip[33]=0x01 and \( 
> ip[36:4]==0xC0A80106 or ip[40:4]==0xC0A80106 \)
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on vnet0, link-type EN10MB (Ethernet), capture size 262144 bytes
> 04:05:44.496867 IP 192.168.10.62 > 
> static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 
> 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 5977, 
> length 64
> 04:05:45.520863 IP 192.168.10.62 > 
> static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 
> 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 5978, 
> length 64
> 04:05:46.544832 IP 192.168.10.62 > 
> static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 
> 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 5979, 
> length 64
> ^C
> root@unimatrixzero ~ # tcpdump -i virbr1 proto 47 and ip[33]=0x01 and \( 
> ip[36:4]==0xC0A80106 or ip[40:4]==0xC0A80106 \)
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on virbr1, link-type EN10MB (Ethernet), capture size 262144 
> bytes
> 04:06:14.096209 IP 192.168.10.62 > 
> static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 
> 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 6006, 
> length 64
> 04:06:15.120225 IP 192.168.10.62 > 
> static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 
> 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 6007, 
> length 64
> 04:06:16.144186 IP 192.168.10.62 > 
> static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 
> 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 6008, 
> length 64
> 
> And nothing on eth0 (physical interface):
> 
> root@unimatrixzero ~ # tcpdump -i eth0 proto 47 and ip[33]=0x01 and \( 
> ip[36:4]==0xC0A80106 or ip[40:4]==0xC0A80106 \)
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
> ^C
> 0 packets captured
> 27 packets received by filter
> 9 packets dropped by kernel
> 
> The NAT kernel modul eats the packages  and makes them vanish.

That's odd.

NAT shouldn't eat anything.

NAT should alter IP addresses or not alter them.  But NAT itself should 
not cause traffic to disappear.  That sounds like something else.  I 
just don't know what.

> See above. Looks correct.
> 
> Yes, I did a tcpdump on vnet4 and virbr1 last time. Looked correct. 
> Encapsulated ICMP packages.
> 
> Look above. Looks correct.

Agreed.

> It was at the time, when I wrote my first email on here. Then things 
> have become even worse because of all my testings. Now things work 
> partly again.

Weird.

Does dmesg give any output that might shed some light on things?

> Okay, I'll disable all tunnels except the one I tested above. But I can 
> only disable the „inside part“. There will still be send encapsulated 
> BGP packets from the remote hosts. I can't disable them.

Fair.

The idea is to minimize complicating factors for a few minutes.  Do what 
you can, test, and then restore normal functionality.

> Yes, me too. The packages just vanish. I tried to catch them with a 
> iptables log rule inserted after the GRE NAT rule, so that it would catch 
> uncatched packages. But the GRE NAT rule catches them, there were not log 
> entries of uncatched packagse. They just vanish during the NAT process.

Strange.

What if you change your tcpdump filter to just look for GRE (protocol 
47) traffic?  It will likely match more than what is necessary.  But 
hopefully it will show that packets are being NATed in an unexpected 
way.  In a way that doesn't function on the other end.

> Exactly. That is what I intend to do. In this case 176.9.38.150 is the 
> public IP. 192.168.10.62 the private IP.

Okay.

That will work for one VM, but I don't see how it will work for a second 
VM.  Or are you wanting to route all GRE tunnels to one VM and rely on 
the far end IP to differentiate them?

> Yes, that is what I intend to do.

Okay.

> The firewall configuration should be correct. We create it an track 
> it with Ansible. This is definitly the iptables configuration that has 
> worked before.

I don't doubt that it functions to allow what you want through.  Aside 
from the GRE NAT issue.

I'm thinking that it might not block traffic that you want filtered. 
Though I'm assuming that you do want to filter / block some traffic. 
(All of the firewalls that I've written were to filter all but the 
desired traffic.)

> What are you unsure about? This is a public project, we have nothing to 
> hide  Just ask.

The uncertanty is just my perception of how you wrote your firewall vs 
how I would write a firewall.

I'm not afraid to ask.  I was more trying to keep the conversation on 
the topic of GRE NAT instead of going down a side distraction that is 
not germane to GRE NAT.

> Yes, that works. Just trying to spare an ip address here. And requesting 
> an ip change at our upstream provider takes time. So I once natted the 
> tunnels though to a VM when we virtualized the system for the first 
> time. Originally the gateway ran directly on the host. No within a VM.

Okay.

> Requesting public IPs for the VM would solve the problem. But Linux 
> should be capable to nat the tunnels through. And it used to work. At 
> high performance and reliably for several months.

Fair.

> Yes.

Understood.

> The global IP address of the host is used to access it in the first 
> place. It's just a rented server somewhere on the internet.

*nod*

> Yeah, it's fun

:-)

> Okay. From my understanding aswell, GRE is quite a simple technology. It 
> just encapsulates the IP packages and decapsulate them at their 
> destination. GRE is even more simple than UDP as it doesn't support ports.

Agreed.

> But the destination IP address needs to be replaced. Not more or less 
> than that.

That sounds like standard DNAT to me.

Along with SNAT or MASQUERADE going the other direction.

> All those new tests didn't bring any new intel to me.

:-(

> So I tested another kernel on the hypervisor:
> 
> root@unimatrixzero ~ # uname -a
> Linux unimatrixzero 4.4.9-040409-generic #201605041832 SMP Wed May 4 
> 22:34:16 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
> 
> I think, this was one of the first kernels for Ubuntu 16.04. And this 
> didn't work either. From my understanding, the kernel version is not 
> the problem. There must be another problem.

:-(

> Are there any debugging modes for the kernel moduls? I'd like to 
> understand why the packages are dropped. The kernel has nothing to do 
> but to replace the source IP and things should be done.

There's a way to get connection tracking which is used by NATing 
information out of the kernel.

Look into conntrackd and the conntrack command.  You can use the command 
to get information about what the connection tracking subsystem is doing.

> Regards,

Likewise.



-- 
Grant. . . .
unix || die


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3982 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GRE-NAT broken
  2018-01-24 19:54 GRE-NAT broken Matthias Walther
                   ` (6 preceding siblings ...)
  2018-01-28  5:42 ` Grant Taylor
@ 2018-01-29 11:11 ` Matthias Walther
  2018-01-30 22:37 ` Grant Taylor
  8 siblings, 0 replies; 10+ messages in thread
From: Matthias Walther @ 2018-01-29 11:11 UTC (permalink / raw)
  To: lartc

Hello,

> Ouch.  That sounds serious.  -  On my side of the pond, cease and
> desist letters usually are a friendly "stop, or else" type.  In fact,
> most ISPs over here need to send three before they can actually
> terminate your connection.  (RIAA and MPAA are notorious for causing
> ISPs to send such letters.)
Yeah, they'd never shut you down here. Lawyers will just send you one
expensive letter after another ;). But that's another topic. Things are
almost resolved, just some trials needed to proove the new laws.
>
> The thing that is important, at least for me to understand, is if the
> GRE tunnels are between the CPE (OpenWRT routers) and your VMs are
> across the internet.  Verses, what I thought when this thread started,
> your border routers across your internal LAN to your VMs on your own
> hosts.
>
> The pertinent part is if the GRE is crossing the internet vs your own
> LAN.
>
The GRE-Tunnels are always online between VM on the internet. Never in
my private LAN.
>
>> Typical situation:
>>
>> VM (pubilc IP) <---> Hypervisor <---the internet---> Hypervisor <--->
>> VM (public OR private IP)
>
> Please confirm the GRE tunnels are between the VM on the left (with a
> public IP) and the VM on the right (with a public OR private IP).
Exactly, from the very left to the very right. From one VM to another.
Some hosts aren't virtualized, but that doesn't make a difference.
>
> I am somewhat questioning the need for private IPs vs public IPs on
> the VMs.  (I'm still trying to wrap my head around things.)
It is not needed. We could book another public IP, assign it to the VM
and request a tunneld endpoint change to the new IP. But I'd like to
understand, how to diagnose those kinds of problems and what worked once
flawlessly, should still work. :)
>
> That's odd.
>
> NAT shouldn't eat anything.
>
> NAT should alter IP addresses or not alter them.  But NAT itself
> should not cause traffic to disappear.  That sounds like something
> else.  I just don't know what.
Jep, completly wired.
>
>
> Does dmesg give any output that might shed some light on things?
Nothing in dmesg, nothing in syslog.
>
> What if you change your tcpdump filter to just look for GRE (protocol
> 47) traffic?  It will likely match more than what is necessary.  But
> hopefully it will show that packets are being NATed in an unexpected
> way.  In a way that doesn't function on the other end.
There is nothing there. I ran the ping at hundret packages per second to
have enough packets to find between the hundrets of packets going
through here.

There were just the two unnatted ICMP request packages, we've seen
before, followed by the next two with the next sequential number.

Nothing inbetween. Nothing that could be a wrongly natted or broken package.
>
> I'm thinking that it might not block traffic that you want filtered.
> Though I'm assuming that you do want to filter / block some traffic.
> (All of the firewalls that I've written were to filter all but the
> desired traffic.)
No, we have very strict network neutrality. We filter absolutely
nothing, no traffic shaping, no blocked ports.
>
> That sounds like standard DNAT to me.
Yes.
>
> Along with SNAT or MASQUERADE going the other direction.
Yes.
>
> There's a way to get connection tracking which is used by NATing
> information out of the kernel.
>
> Look into conntrackd and the conntrack command.  You can use the
> command to get information about what the connection tracking
> subsystem is doing.

I tried this one. The broken tunnels are marked with „UNREPLIED“. Well,
that sounds reasonable, as there's nothing coming back.

root@unimatrixzero ~ # conntrack -L|grep gre
conntrack v1.4.3 (conntrack-tools): 97 flow entries have been shown.
3:gre 47 176 src\x185.66.193.1 dst\x176.9.38.158 srckey=0x0 dstkey=0x0
src\x176.9.38.158 dst\x185.66.193.1 srckey=0x0 dstkey=0x0 [ASSURED] mark=0
use=1
9:gre 47 29 src\x185.66.194.1 dst\x176.9.38.150 srckey=0x0 dstkey=0x0
[UNREPLIED] src\x176.9.38.150 dst\x185.66.194.1 srckey=0x0 dstkey=0x0
mark=0 use=1
14:gre 47 29 src\x185.66.194.0 dst\x176.9.38.150 srckey=0x0 dstkey=0x0
[UNREPLIED] src\x176.9.38.150 dst\x185.66.194.0 srckey=0x0 dstkey=0x0
mark=0 use=1
29:gre 47 179 srcF.4.80.131 dst\x176.9.38.158 srckey=0x0 dstkey=0x0
src\x176.9.38.158 dstF.4.80.131 srckey=0x0 dstkey=0x0 [ASSURED] mark=0
use=1
30:gre 47 177 src\x185.66.193.0 dst\x176.9.38.158 srckey=0x0 dstkey=0x0
src\x176.9.38.158 dst\x185.66.193.0 srckey=0x0 dstkey=0x0 [ASSURED] mark=0
use=1
40:gre 47 29 src\x185.66.195.0 dst\x176.9.38.150 srckey=0x0 dstkey=0x0
[UNREPLIED] src\x176.9.38.150 dst\x185.66.195.0 srckey=0x0 dstkey=0x0
mark=0 use=1
60:gre 47 26 srcˆ.198.51.94 dst\x176.9.38.150 srckey=0x0 dstkey=0x0
[UNREPLIED] src\x176.9.38.150 dstˆ.198.51.94 srckey=0x0 dstkey=0x0
mark=0 use=1
62:gre 47 179 src\x185.66.194.0 dst\x176.9.38.158 srckey=0x0 dstkey=0x0
src\x176.9.38.158 dst\x185.66.194.0 srckey=0x0 dstkey=0x0 [ASSURED] mark=0
use=1
69:gre 47 177 src\x185.66.193.1 dst\x176.9.38.150 srckey=0x0 dstkey=0x0
src\x192.168.10.62 dst\x185.66.193.1 srckey=0x0 dstkey=0x0 [ASSURED]
mark=0 use=1
74:gre 47 174 src\x192.168.10.62 dst\x185.66.193.0 srckey=0x0 dstkey=0x0
src\x185.66.193.0 dst\x176.9.38.150 srckey=0x0 dstkey=0x0 [ASSURED] mark=0
use=1
75:gre 47 169 src\x185.66.194.1 dst\x176.9.38.158 srckey=0x0 dstkey=0x0
src\x176.9.38.158 dst\x185.66.194.1 srckey=0x0 dstkey=0x0 [ASSURED] mark=0
use=1
80:gre 47 179 src\x176.9.38.158 dst\x176.9.38.156 srckey=0x0 dstkey=0x0
src\x176.9.38.156 dst\x176.9.38.158 srckey=0x0 dstkey=0x0 [ASSURED] mark=0
use=1
82:gre 47 179 src\x185.66.195.1 dst\x176.9.38.158 srckey=0x0 dstkey=0x0
src\x176.9.38.158 dst\x185.66.195.1 srckey=0x0 dstkey=0x0 [ASSURED] mark=0
use=1
85:gre 47 179 srcF.4.80.131 dst\x176.9.38.156 srckey=0x0 dstkey=0x0
src\x176.9.38.156 dstF.4.80.131 srckey=0x0 dstkey=0x0 [ASSURED] mark=0
use=1
91:gre 47 179 src\x185.66.195.0 dst\x176.9.38.158 srckey=0x0 dstkey=0x0
src\x176.9.38.158 dst\x185.66.195.0 srckey=0x0 dstkey=0x0 [ASSURED] mark=0
use=1
95:gre 47 177 src\x192.168.10.62 dst\x185.66.195.1 srckey=0x0 dstkey=0x0
src\x185.66.195.1 dst\x176.9.38.150 srckey=0x0 dstkey=0x0 [ASSURED] mark=0
use=1

Do you have any conntrack tricks to look into this further?

Maybe we should start looking into the code, the package goes through.
Are you familiar with that part of the kernel? So far I only found that
one function, I copied a few days earlier.

Maybe the „standard“ nat code fails here, because GRE is less than an
UDP package. Or because it's stateless. Just a wild guess.

Maybe it has something to do with weather the first package in this
tunnel is incoming or outgoing. This is random, because you never know,
which of the two BGP daemons try to get a connection first.

Bye,
Matthias

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GRE-NAT broken
  2018-01-24 19:54 GRE-NAT broken Matthias Walther
                   ` (7 preceding siblings ...)
  2018-01-29 11:11 ` Matthias Walther
@ 2018-01-30 22:37 ` Grant Taylor
  8 siblings, 0 replies; 10+ messages in thread
From: Grant Taylor @ 2018-01-30 22:37 UTC (permalink / raw)
  To: lartc

[-- Attachment #1: Type: text/plain, Size: 7367 bytes --]

On 01/29/2018 04:11 AM, Matthias Walther wrote:
> Hello,

Hi Matthias,

> Yeah, they'd never shut you down here. Lawyers will just send you one 
> expensive letter after another . But that's another topic. Things are 
> almost resolved, just some trials needed to proove the new laws.

It sounds like the situation is improving.

> The GRE-Tunnels are always online between VM on the internet. Never in 
> my private LAN.

Thank you for the confirmation.

> Exactly, from the very left to the very right. From one VM to another. 
> Some hosts aren't virtualized, but that doesn't make a difference.

*nod*

> It is not needed. We could book another public IP, assign it to the VM 
> and request a tunneld endpoint change to the new IP. But I'd like to 
> understand, how to diagnose those kinds of problems and what worked once 
> flawlessly, should still work.

Fair enough.

> Nothing in dmesg, nothing in syslog.

:-/

It seems as if something is intercepting the packets.  -  I doubt that 
it's the NAT module, but I can't rule it out.

> There is nothing there. I ran the ping at hundret packages per second 
> to have enough packets to find between the hundrets of packets going 
> through here.

Wait.  tcpdump shows that packets are entering one network interface but 
they aren't leaving another network interface?

That sounds like something is filtering the packets.

I assume that packet forwarding is enabled for the interface(s) in 
question, correct?

> There were just the two unnatted ICMP request packages, we've seen before, 
> followed by the next two with the next sequential number.

I assume that you're talking about the packets entering the inside 
interface.  Or is one of the two that you're talking about possibly the 
same packet leaving the outside interface, without NAT having been applied?

This is why I like to sniff on specific interfaces.  Purportedly PCAP-NG 
has the ability to record interface names / numbers, but I've never 
needed it or messed with it.

> Nothing inbetween. Nothing that could be a wrongly natted or broken 
> package.

:-/

> No, we have very strict network neutrality. We filter absolutely nothing, 
> no traffic shaping, no blocked ports.

Okay.

I'm used to filtering things like NetBIOS and SMTP from end user prefixes.

> I tried this one. The broken tunnels are marked with 
> „UNREPLIED“. Well, that sounds reasonable, as there's nothing 
> coming back.

I feel like the kicker is that the traffic is never making it out of the 
local system to the far side.  As such the far side never gets anything, 
much less replies.

Can you do some checking on the far side to see if it's receiving the 
requests?  I suspect that it is not.

> root@unimatrixzero ~ # conntrack -L|grep gre
> conntrack v1.4.3 (conntrack-tools): 97 flow entries have been shown.
> 3:gre   47  176  src=185.66.193.1   dst=176.9.38.158  srckey=0x0 
> dstkey=0x0               src=176.9.38.158   dst=185.66.193.1  srckey=0x0 
> dstkey=0x0  [ASSURED]  mark=0  use=1
> 9:gre   47  29   src=185.66.194.1   dst=176.9.38.150  srckey=0x0 
> dstkey=0x0  [UNREPLIED]  src=176.9.38.150   dst=185.66.194.1  srckey=0x0 
> dstkey=0x0             mark=0  use=1
> 14:gre  47  29   src=185.66.194.0   dst=176.9.38.150  srckey=0x0 
> dstkey=0x0  [UNREPLIED]  src=176.9.38.150   dst=185.66.194.0  srckey=0x0 
> dstkey=0x0             mark=0  use=1
> 29:gre  47  179  src=46.4.80.131    dst=176.9.38.158  srckey=0x0 
> dstkey=0x0               src=176.9.38.158   dst=46.4.80.131   srckey=0x0 
> dstkey=0x0  [ASSURED]  mark=0  use=1
> 30:gre  47  177  src=185.66.193.0   dst=176.9.38.158  srckey=0x0 
> dstkey=0x0               src=176.9.38.158   dst=185.66.193.0  srckey=0x0 
> dstkey=0x0  [ASSURED]  mark=0  use=1
> 40:gre  47  29   src=185.66.195.0   dst=176.9.38.150  srckey=0x0 
> dstkey=0x0  [UNREPLIED]  src=176.9.38.150   dst=185.66.195.0  srckey=0x0 
> dstkey=0x0             mark=0  use=1
> 60:gre  47  26   src=88.198.51.94   dst=176.9.38.150  srckey=0x0 
> dstkey=0x0  [UNREPLIED]  src=176.9.38.150   dst=88.198.51.94  srckey=0x0 
> dstkey=0x0             mark=0  use=1
> 62:gre  47  179  src=185.66.194.0   dst=176.9.38.158  srckey=0x0 
> dstkey=0x0               src=176.9.38.158   dst=185.66.194.0  srckey=0x0 
> dstkey=0x0  [ASSURED]  mark=0  use=1
> 69:gre  47  177  src=185.66.193.1   dst=176.9.38.150  srckey=0x0 
> dstkey=0x0               src=192.168.10.62  dst=185.66.193.1  srckey=0x0 
> dstkey=0x0  [ASSURED]  mark=0  use=1
> 74:gre  47  174  src=192.168.10.62  dst=185.66.193.0  srckey=0x0 
> dstkey=0x0               src=185.66.193.0   dst=176.9.38.150  srckey=0x0 
> dstkey=0x0  [ASSURED]  mark=0  use=1
> 75:gre  47  169  src=185.66.194.1   dst=176.9.38.158  srckey=0x0 
> dstkey=0x0               src=176.9.38.158   dst=185.66.194.1  srckey=0x0 
> dstkey=0x0  [ASSURED]  mark=0  use=1
> 80:gre  47  179  src=176.9.38.158   dst=176.9.38.156  srckey=0x0 
> dstkey=0x0               src=176.9.38.156   dst=176.9.38.158  srckey=0x0 
> dstkey=0x0  [ASSURED]  mark=0  use=1
> 82:gre  47  179  src=185.66.195.1   dst=176.9.38.158  srckey=0x0 
> dstkey=0x0               src=176.9.38.158   dst=185.66.195.1  srckey=0x0 
> dstkey=0x0  [ASSURED]  mark=0  use=1
> 85:gre  47  179  src=46.4.80.131    dst=176.9.38.156  srckey=0x0 
> dstkey=0x0               src=176.9.38.156   dst=46.4.80.131   srckey=0x0 
> dstkey=0x0  [ASSURED]  mark=0  use=1
> 91:gre  47  179  src=185.66.195.0   dst=176.9.38.158  srckey=0x0 
> dstkey=0x0               src=176.9.38.158   dst=185.66.195.0  srckey=0x0 
> dstkey=0x0  [ASSURED]  mark=0  use=1

Ya, the [UNREPLIED] bothers me.  As does the fact that you aren't seeing 
the traffic leaving the host's external interface.

> Do you have any conntrack tricks to look into this further?

I'd look more into the TRACE option (target) that you seem to have 
enabled in the raw table.  That should give you more information about 
the packets flowing through the kernel.

My hunch is that the packets aren't making it out onto the wire for some 
reason.  Thus the lack of reply.

> Maybe we should start looking into the code, the package goes through. 
> Are you familiar with that part of the kernel? So far I only found that 
> one function, I copied a few days earlier.

No, I am not.

I'll see if I can't throw together a PoC in Network namespaces this 
evening to evaluate if NATing GRE works.  -  I'd like to test NATing 
different sets of endpoints (1:1) and NATing multiple remote endpoints 
to one local endpoint (many:1).

> Maybe the „standard“ nat code fails here, because GRE is less than 
> an UDP package. Or because it's stateless. Just a wild guess.

I have no idea.

I work with the tools that others build, like Lego bricks, putting them 
together in new and interesting ways.  -  I don't have the skills to 
create the bricks themselves.

> Maybe it has something to do with weather the first package in this 
> tunnel is incoming or outgoing. This is random, because you never know, 
> which of the two BGP daemons try to get a connection first.

You might be onto something about the first packet.  At least as far as 
what connection tracking sees.

> Bye,

:-)



-- 
Grant. . . .
unix || die


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3982 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-01-30 22:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-24 19:54 GRE-NAT broken Matthias Walther
2018-01-25  0:34 ` Grant Taylor
2018-01-25  5:34 ` walther.xyz
2018-01-25  7:47 ` walther.xyz
2018-01-25  7:57 ` Florian Westphal
2018-01-25 22:57 ` Grant Taylor
2018-01-28  3:30 ` walther.xyz
2018-01-28  5:42 ` Grant Taylor
2018-01-29 11:11 ` Matthias Walther
2018-01-30 22:37 ` Grant Taylor

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.