[v9 PATCH 0/3] NETFILTER new target module, HMARK

* [v9 PATCH 0/3] NETFILTER new target module, HMARK
@ 2012-02-16 10:21 Hans Schillstrom
  2012-02-16 10:21 ` [v9 PATCH 1/3] NETFILTER added flags to ipv6_find_hdr() Hans Schillstrom
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Hans Schillstrom @ 2012-02-16 10:21 UTC (permalink / raw)
  To: kaber, pablo, jengelh, netfilter-devel, netdev; +Cc: hans, Hans Schillstrom

The target allows you to create rules in the "raw" and "mangle" tables
which alter the netfilter mark (nfmark) field within a given range.
First a 32 bit hash value is generated then modulus by <limit> and
finally an offset is added before it's written to nfmark.
Prior to routing, the nfmark can influence the routing method (see
"Use netfilter MARK value as routing key") and can also be used by
other subsystems to change their behavior.

The mark match can also be used to match nfmark produced by this module.
See the kernel module for more info.

REVISION
Version 9
        Simpliefied nat handling in IPv4, some formating
        checkentry() used in kernel. 
        Most changes are base on Pablos review.

Version 8
        method L3 / L3-4 added i.e. Fragment handling changed to:
        - don't handle in "method L3-4"
        Syntax change in user mode to be more NF compatible.
        Most changes are base on Pablos review.

Version 7
	ahuum, IPv6 descending into icmp error hdr didn't work as expected
        with ipv6_find_hdr() Now it works as expected.

Version 6
        Removed ipv6_find_hdr() wrapper (Pablo)
	NAT / Conntrack compilation switches.

Version 5
	Use length of mask an smask and dmask and whole IPv6 addr (Jan E)
	Modify ipv6_find_hdr() and use it while traversing the IPv6 header.
        Manual changes.
	More or less all comments implemented.

Version 4
	Split of IPv6 and IPv4, use IP_CT_IS_REPLY, as Pablo suggested.
	removed one pskb_may_pull()
	xtoption parse used in the user space part.

Version 3
        Handling of SCTP for IPv6 added.

Version 2
	NAT Added for IPv4
	IPv6 ICMP handling enhanced.
	Usage example added

Version 1
	Initial RFC

We (Ericsson) use hmark in-front of ipvs as a pre-loadbalancer and
handles up to 70 ipvs running in parallel in clusters.
However hmark is not restricted to run in front of IPVS it can also be used as
"poor mans" load balancer.
With this version is also NAT supported as an option, with very high flows
you might not want to use conntrack.

The idea is to generate a direction independent fw mark range to use as input to
the routing (i.e. ip rule add fwmark ...).
Pretty straight forward and simple.

Example:
                                      App Server (Real Server)

                                           +---------+
                                        -->| Service |
     Gateway A                             +---------+
                          /
            +----------+ /     +----+      +---------+
--- if -A---| selector |---->  |ipvs|  --->| Service |
            +----------+ \     +----+      +---------+
                          \
                               +----+      +---------+
                               |ipvs|   -->| Service |
                               +----+      +---------+
      Gateway C
            +----------+ /     +----+
--- if-B ---| selector | --->  |ipvs|
            +----------+ \     +----+      +---------+
                                           | Service |
                                           +---------+
                          /
            +----------+ /     +----+     ..
--- if-B ---| selector | --->  |ipvs|      +---------+
            +----------+ \     +----+      | Service |
                          \                +---------+
#
# Example with four ipvs loadbalancers
#
iptables -t mangle -I PREROUTING -d $IPADDR -j HMARK --hmark-mod 4 --hmark-offs 100

ip rule add fwmark 100 table 100
ip rule add fwmark 101 table 101
ip rule add fwmark 102 table 102
ip rule add fwmark 103 table 103

ip ro ad table 100 default via x.y.z.1 dev bond1
ip ro ad table 101 default via x.y.z.2 dev bond1
ip ro ad table 102 default via x.y.z.3 dev bond1
ip ro ad table 103 default via x.y.z.4 dev bond1

If conntrack doesn't handle the return path,
do the oposite with HMARK and send it back right to ipvs.

Another exmaple of usage could be if you have cluster originated connections
and want to spread the connections over a number of interfaces
(NAT will complpicate things for you in this case)

                     \  Blade 1
                      \ +----------+      +---------+
                    <-- | selector | <--- | Service |
                      / +----------+      +---------+
                     /
   +------+
-- | Gw-A |          \  Blade 2
   +------+           \ +----------+      +---------+
   +------+         <-- | selector | <--- | Service |
-- | Gw-B |           / +----------+      +---------+
   +------+          /
   +------+
-- | Gw-C |          \
   +------+           \ +----------+      +---------+
                    <-- | selector | <--- | Service |
                      / +----------+      +---------+
                     /

                     \  Blande -n
                      \ +----------+      +---------+
                    <-- | selector | <--- | Service |
                      / +----------+      +---------+
                     /

Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread