All of lore.kernel.org
 help / color / mirror / Atom feed
* "Carrier Grade" NAT44 setup
@ 2020-06-05 16:23 Maximilian Wilhelm
  2020-06-11  5:51 ` Trent W. Buck
  2020-06-15  5:05 ` n3ph
  0 siblings, 2 replies; 4+ messages in thread
From: Maximilian Wilhelm @ 2020-06-05 16:23 UTC (permalink / raw)
  To: netfilter

Hi,

I have to set up a high available and scalable NAT44 solution for 10k
(up to 20k) users at my university and am looking for options to
implement such a set up.

The easy way out would be to throw money at some vendor and for exmaple
get a pair of ASR1k etc. but as I like Linux a lot and am using it for
other production setup already I'd like to explore what folks deem
possible or maybe have build with Linux and netfilter in that regard.

My idea would be to set up two boxes with some 10G interfaces, some
decent CPUs/RAM and write some lines of nftables config (most likely
Debian buster with backports kernel 5.5.x). All traffic which would have
to be NATed would be routed through those boxes.

I drew a topology diagram to explain what I have in mind [0].

The primary focus for NATing are our wifi users. Within the university
network any connections should be made via IPv6 or the RFC1918 IPs, only
traffic for external destinations should be NATed. This could be
achieved by policy routing on the Nexus 7000 routers in the DC, which
would only route traffic for external targets to the NAT boxes. My
preference would be to set up BGP sessions to our DC routers and be able
to set up ECMP that way. Each box will get it's own pool of external
address so answer packets will be routed to the correct NAT box to
"de-NAT".  So far so straight forward.

What I'm wondering about is:

Did anyone here already build such a setup? If so, did you build it as I
described or different? Would you do it again? :)

What ressources would be required on the Linux box? I would assume any
decent server CPU with 6+ cores will be fine and 16-32GB of RAM would
suffice for storing the conntrack mappings?

According to the nft man pages

  snat to address - address [:port - port] [persistent, random,
fully-random]

SNATing to a pool of addresses should be possible and I guess
"persistent" would be a good idea in this case.

Does anyone have thoughts about wether active/active or active/passive,
most like with conntrack, would be a better move?

Thanks for any input, stay safe!

Best
Max

[0] https://homepages.uni-paderborn.de/mwilhelm/NAT-Topology.png

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: "Carrier Grade" NAT44 setup
  2020-06-05 16:23 "Carrier Grade" NAT44 setup Maximilian Wilhelm
@ 2020-06-11  5:51 ` Trent W. Buck
  2020-06-14 19:27   ` Maximilian Wilhelm
  2020-06-15  5:05 ` n3ph
  1 sibling, 1 reply; 4+ messages in thread
From: Trent W. Buck @ 2020-06-11  5:51 UTC (permalink / raw)
  To: netfilter

Maximilian Wilhelm <max@rfc2324.org> writes:
> Did anyone here already build such a setup [linux as CGNAT router]?

I have some derpy non-expert comments, below.

> What resources would be required on the Linux box? I would assume any
> decent server CPU with 6+ cores will be fine and 16-32GB of RAM would
> suffice for storing the conntrack mappings?

Obligatory question whenever CGNAT comes up:
Can you just use IPv6 instead? ;-)


When I was doing NAT for up to 1000 desktops,
I looked into conntrack table size, and
concluded it was not worth even worrying about.

From first principles, the NAT record is basically a struct like

    (orig_ip, orig_port, nat_ip, nat_port)

Which for IPv4 is only like 10 bytes or something.
So in 10MiB you can remember 10Mi concurrent flows.

I looked for a quick sanity-check of that and I found this old post
which reckons 32K concurrent flows in 512MB:

    https://wiki.khnet.info/index.php/Conntrack_tuning

Another old post estimates about 350b/flow, so about 10MB = 28K flows:

    https://www.cyberciti.biz/faq/ip_conntrack-table-ful-dropping-packet-error/

Obviously those numbers don't line up too well.
Next step is probably to dig through the kernel's Documentation/ tree
for notes about conntrack limits.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: "Carrier Grade" NAT44 setup
  2020-06-11  5:51 ` Trent W. Buck
@ 2020-06-14 19:27   ` Maximilian Wilhelm
  0 siblings, 0 replies; 4+ messages in thread
From: Maximilian Wilhelm @ 2020-06-14 19:27 UTC (permalink / raw)
  To: netfilter

Anno domini 2020 Trent W. Buck scripsit:

> Maximilian Wilhelm <max@rfc2324.org> writes:
> > Did anyone here already build such a setup [linux as CGNAT router]?
> 
> I have some derpy non-expert comments, below.
> 
> > What resources would be required on the Linux box? I would assume any
> > decent server CPU with 6+ cores will be fine and 16-32GB of RAM would
> > suffice for storing the conntrack mappings?
> 
> Obligatory question whenever CGNAT comes up:
> Can you just use IPv6 instead? ;-)

"Instead" as in IPv6-only won't work, as people are using all sorts of
applications we can't control. The final setup will be real dual-stack
of corse, so we are "only" talking about the bits of traffic which are
IPv4 and will leave our network. I would expect this to be a lot less
than the full traffic volume from today, but that's only a guessimate.

> When I was doing NAT for up to 1000 desktops,
> I looked into conntrack table size, and
> concluded it was not worth even worrying about.
> 
> From first principles, the NAT record is basically a struct like
> 
>     (orig_ip, orig_port, nat_ip, nat_port)
> 
> Which for IPv4 is only like 10 bytes or something.
> So in 10MiB you can remember 10Mi concurrent flows.
> 
> I looked for a quick sanity-check of that and I found this old post
> which reckons 32K concurrent flows in 512MB:
> 
>     https://wiki.khnet.info/index.php/Conntrack_tuning
> 
> Another old post estimates about 350b/flow, so about 10MB = 28K flows:
> 
>     https://www.cyberciti.biz/faq/ip_conntrack-table-ful-dropping-packet-error/
> 
> Obviously those numbers don't line up too well.
> Next step is probably to dig through the kernel's Documentation/ tree
> for notes about conntrack limits.

I would expect the conntrack entry for NAT being more like two
5-tuples (L4 proto, src/dst ip, src/dst port). I guess I'll really have
to dig through Documentation and/or the code the next days :)

We're currently looking at boxes with 64GB of RAM as RAM is rather cheap
compared with CPUs and meaningful NICs. On the latter we're going for
Mellanox ConnectX 4 with 2x 10/25G ports as of now, they seem to provide
a lot of queues and I don't hear people complain about problems as I hear
with bcm or Intel X7xx lately.

Best
Max
-- 
     "really soon now":      an unspecified period of time, likly to
                             be greater than any reasonable definition
                             of "soon".

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: "Carrier Grade" NAT44 setup
  2020-06-05 16:23 "Carrier Grade" NAT44 setup Maximilian Wilhelm
  2020-06-11  5:51 ` Trent W. Buck
@ 2020-06-15  5:05 ` n3ph
  1 sibling, 0 replies; 4+ messages in thread
From: n3ph @ 2020-06-15  5:05 UTC (permalink / raw)
  To: Maximilian Wilhelm, netfilter

[-- Attachment #1: Type: text/plain, Size: 579 bytes --]

Hi There,

HE once wrote a recommendation of using 1 public IPv4 address per /24
(255 clients) [1]. Therefore you would need at least 40 public IPv4
addresses for 10k clients.

It is easy to statically link every /24 to one of these NAT IPs, what
makes me think of a more sophisticated pool management mechanism, to get
connections dynamically patched to "free" ones. Does this makes sense?
ATM I have no idea how to approach this.

[1]
https://services.geant.net/sites/cbp/Knowledge_Base/Campus_Networking/Documents/CBP-16_NAT44_address_translation.pdf

-- 
best regards,

n3ph

[-- Attachment #2: 0x520DF07814B030DF.asc --]
[-- Type: application/pgp-keys, Size: 3173 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-06-15  5:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-05 16:23 "Carrier Grade" NAT44 setup Maximilian Wilhelm
2020-06-11  5:51 ` Trent W. Buck
2020-06-14 19:27   ` Maximilian Wilhelm
2020-06-15  5:05 ` n3ph

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.