Running an active/active firewall/router (xt_cluster?)

* Running an active/active firewall/router (xt_cluster?)
@ 2021-05-09 17:52 Oliver Freyermuth
  2021-05-10 16:57 ` Paul Robert Marino
  2021-05-10 22:19 ` Pablo Neira Ayuso
  0 siblings, 2 replies; 11+ messages in thread
From: Oliver Freyermuth @ 2021-05-09 17:52 UTC (permalink / raw)
  To: netfilter

[-- Attachment #1: Type: text/plain, Size: 3326 bytes --]

Dear netfilter experts,

we are trying to setup an active/active firewall, making use of "xt_cluster".
We can configure the switch to act like a hub, i.e. both machines can share the same MAC and IP and get the same packets without additional ARPtables tricks.

So we set rules like:

  iptables -I PREROUTING -t mangle -i external_interface -m cluster --cluster-total-nodes 2 --cluster-local-node 1 --cluster-hash-seed 0xdeadbeef -j MARK --set-mark 0xffff
  iptables -A PREROUTING -t mangle -i external_interface -m mark ! --mark 0xffff -j DROP

Ideally, it we'd love to have the possibility to scale this to more than two nodes, but let's stay with two for now.

Basic tests show that this works as expected, but the details get messy.

1. Certainly, conntrackd is needed to synchronize connection states.
    But is it always "fast enough"?
    xt_cluster seems to match by the src_ip of the original direction of the flow[0] (if I read the code correctly),
    but what happens if the reply to an outgoing packet arrives at both firewalls before state is synchronized?
    We are currently using conntrackd in FTFW mode with a direct link, set "DisableExternalCache", and additonally set "PollSecs 15" since without that it seems
    only new and destroyed connections are synced, but lifetime updates for existing connections do not propagate without polling.
    Maybe another way which e.g. may use XOR(src,dst) might work around tight synchronization requirements, or is it possible to always uses the "internal" source IP?
    Is anybody doing that with a custom BPF?

2. How to do failover in such cases?
    For failover we'd need to change these rules (if one node fails, the total-nodes will change).
    As an alternative, I found [1] which states multiple rules can be used and enabled / disabled,
    but does somebody know of a cleaner (and easier to read) way, also not costing extra performance?

3. We have several internal networks, which need to talk to each other (partially with firewall rules and NATting),
    so we'd also need similar rules there, complicating things more. That's why a cleaner way would be very welcome :-).

4. Another point is how to actually perform the failover. Classical cluster suites (corosync + pacemaker)
    are rather used to migrate services, but not to communicate node ids and number of total active nodes.
    They can probably be tricked into doing that somehow, but they are not designed this way.
    TIPC may be something to use here, but I found nothing "ready to use".

You may also tell me there's a better way to do this than use xt_cluster (custom BPF?) — we've up to now only done "classic" active/passive setups,
but maybe someone on this list has already done active/active without commercial hardware, and can share experience from this?

Cheers and thanks in advance,
	Oliver

PS: Please keep me in CC, I'm not subscribed to the list. Thanks!

[0] https://github.com/torvalds/linux/blob/10a3efd0fee5e881b1866cf45950808575cb0f24/net/netfilter/xt_cluster.c#L16-L19
[1] https://lore.kernel.org/netfilter-devel/499BEBBF.7080705@netfilter.org/

-- 
Oliver Freyermuth
Universität Bonn
Physikalisches Institut, Raum 1.047
Nußallee 12
53115 Bonn
--
Tel.: +49 228 73 2367
Fax:  +49 228 73 7869
--

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5432 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread