All of lore.kernel.org
 help / color / mirror / Atom feed
* [v12 PATCH 0/3] NETFILTER new target module, HMARK
@ 2012-04-23 13:35 Hans Schillstrom
  2012-04-23 13:35 ` [v12 PATCH 1/3] NETFILTER added flags to ipv6_find_hdr() Hans Schillstrom
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Hans Schillstrom @ 2012-04-23 13:35 UTC (permalink / raw)
  To: kaber, jengelh, netfilter-devel, netdev; +Cc: hans, Hans Schillstrom

The target allows you to create rules in the "raw" and "mangle" tables
which alter the netfilter mark (nfmark) field within a given range.
First a 32 bit hash value is generated then modulus by <limit> and
finally an offset is added before it's written to nfmark.
Prior to routing, the nfmark can influence the routing method (see
"Use netfilter MARK value as routing key") and can also be used by
other subsystems to change their behavior.

The mark match can also be used to match nfmark produced by this module.
See the kernel module for more info.

REVISION
Version 12
	Conntrack support for IPv6, and a buf fix.

Version 11
	Changed two comments

Version 10
        Even more simplified NAT handling just one switch --hmark-ct
        some renaming and some minor changes.
        Renaming of vars in xt_hmark_info
        Adding helptext and updated man due to --hmark-ct switch
        Changes are based on Pablos review.

Version 9
        Simpliefied nat handling in IPv4, some formating
        checkentry() used in kernel. 
        Most changes are based on Pablos review.

Version 8
        method L3 / L3-4 added i.e. Fragment handling changed to:
        - don't handle in "method L3-4"
        Syntax change in user mode to be more NF compatible.
        Most changes are based on Pablos review.

Version 7
	ahuum, IPv6 descending into icmp error hdr didn't work as expected
        with ipv6_find_hdr() Now it works as expected.

Version 6
        Removed ipv6_find_hdr() wrapper (Pablo)
	NAT / Conntrack compilation switches.

Version 5
	Use length of mask an smask and dmask and whole IPv6 addr (Jan E)
	Modify ipv6_find_hdr() and use it while traversing the IPv6 header.
        Manual changes.
	More or less all comments implemented.

Version 4
	Split of IPv6 and IPv4, use IP_CT_IS_REPLY, as Pablo suggested.
	removed one pskb_may_pull()
	xtoption parse used in the user space part.

Version 3
        Handling of SCTP for IPv6 added.

Version 2
	NAT Added for IPv4
	IPv6 ICMP handling enhanced.
	Usage example added

Version 1
	Initial RFC


We (Ericsson) use hmark in-front of ipvs as a pre-loadbalancer and
handles up to 70 ipvs running in parallel in clusters.
However hmark is not restricted to run in front of IPVS it can also be used as
"poor mans" load balancer.
With this version is also NAT supported as an option, with very high flows
you might not want to use conntrack.

The idea is to generate a direction independent fw mark range to use as input to
the routing (i.e. ip rule add fwmark ...).
Pretty straight forward and simple.


Example:
                                      App Server (Real Server)

                                           +---------+
                                        -->| Service |
     Gateway A                             +---------+
                          /
            +----------+ /     +----+      +---------+
--- if -A---| selector |---->  |ipvs|  --->| Service |
            +----------+ \     +----+      +---------+
                          \
                               +----+      +---------+
                               |ipvs|   -->| Service |
                               +----+      +---------+
      Gateway C
            +----------+ /     +----+
--- if-B ---| selector | --->  |ipvs|
            +----------+ \     +----+      +---------+
                                           | Service |
                                           +---------+
                          /
            +----------+ /     +----+     ..
--- if-B ---| selector | --->  |ipvs|      +---------+
            +----------+ \     +----+      | Service |
                          \                +---------+
#
# Example with four ipvs loadbalancers
#
iptables -t mangle -I PREROUTING -d $IPADDR -j HMARK --hmark-mod 4 --hmark-offs 100

ip rule add fwmark 100 table 100
ip rule add fwmark 101 table 101
ip rule add fwmark 102 table 102
ip rule add fwmark 103 table 103

ip ro ad table 100 default via x.y.z.1 dev bond1
ip ro ad table 101 default via x.y.z.2 dev bond1
ip ro ad table 102 default via x.y.z.3 dev bond1
ip ro ad table 103 default via x.y.z.4 dev bond1


If conntrack doesn't handle the return path,
do the oposite with HMARK and send it back right to ipvs.

Another exmaple of usage could be if you have cluster originated connections
and want to spread the connections over a number of interfaces
(NAT will complpicate things for you in this case)



                     \  Blade 1
                      \ +----------+      +---------+
                    <-- | selector | <--- | Service |
                      / +----------+      +---------+
                     /
   +------+
-- | Gw-A |          \  Blade 2
   +------+           \ +----------+      +---------+
   +------+         <-- | selector | <--- | Service |
-- | Gw-B |           / +----------+      +---------+
   +------+          /
   +------+
-- | Gw-C |          \
   +------+           \ +----------+      +---------+
                    <-- | selector | <--- | Service |
                      / +----------+      +---------+
                     /

                     \  Blande -n
                      \ +----------+      +---------+
                    <-- | selector | <--- | Service |
                      / +----------+      +---------+
                     /


Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [v12 PATCH 1/3] NETFILTER added flags to ipv6_find_hdr()
  2012-04-23 13:35 [v12 PATCH 0/3] NETFILTER new target module, HMARK Hans Schillstrom
@ 2012-04-23 13:35 ` Hans Schillstrom
  2012-05-09 11:01   ` Pablo Neira Ayuso
  2012-04-23 13:35 ` [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark Hans Schillstrom
  2012-04-23 13:35 ` [v12 PATCH 3/3] NETFILTER userspace part for target HMARK Hans Schillstrom
  2 siblings, 1 reply; 21+ messages in thread
From: Hans Schillstrom @ 2012-04-23 13:35 UTC (permalink / raw)
  To: kaber, jengelh, netfilter-devel, netdev; +Cc: hans, Hans Schillstrom

Two new flags to ipv6_find_hdr,
One that tells us that this is a fragment.
One that stops at AH if any i.e. treat it like a transport header.
i.e. make handling of ESP and AH the same.
Param offset can now point to an inner icmp ipv5 header.

Version 3:
    offset param into ipv6_find_hdr set to zero.

Version 2:
    wrapper removed and changes made at every call.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/linux/netfilter_ipv6/ip6_tables.h |   12 +++++++++-
 net/ipv6/netfilter/ip6_tables.c           |   35 ++++++++++++++++++++++++----
 net/ipv6/netfilter/ip6t_ah.c              |    4 +-
 net/ipv6/netfilter/ip6t_frag.c            |    4 +-
 net/ipv6/netfilter/ip6t_hbh.c             |    4 +-
 net/ipv6/netfilter/ip6t_rt.c              |    4 +-
 net/netfilter/xt_TPROXY.c                 |    4 +-
 net/netfilter/xt_socket.c                 |    4 +-
 8 files changed, 53 insertions(+), 18 deletions(-)

diff --git a/include/linux/netfilter_ipv6/ip6_tables.h b/include/linux/netfilter_ipv6/ip6_tables.h
index 1bc898b..d96a39d 100644
--- a/include/linux/netfilter_ipv6/ip6_tables.h
+++ b/include/linux/netfilter_ipv6/ip6_tables.h
@@ -287,6 +287,7 @@ extern unsigned int ip6t_do_table(struct sk_buff *skb,
 				  struct xt_table *table);
 
 /* Check for an extension */
+
 static inline int
 ip6t_ext_hdr(u8 nexthdr)
 {	return (nexthdr == IPPROTO_HOPOPTS) ||
@@ -298,9 +299,18 @@ ip6t_ext_hdr(u8 nexthdr)
 	       (nexthdr == IPPROTO_DSTOPTS);
 }
 
+
+extern int ip6t_ext_hdr(u8 nexthdr);
+enum {
+	IP6T_FH_FRAG,
+	IP6T_FH_AUTH,
+	IP6T_FH_F_FRAG = 1 << IP6T_FH_FRAG,
+	IP6T_FH_F_AUTH = 1 << IP6T_FH_AUTH,
+};
+
 /* find specified header and get offset to it */
 extern int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
-			 int target, unsigned short *fragoff);
+			 int target, unsigned short *fragoff, int *fragflg);
 
 #ifdef CONFIG_COMPAT
 #include <net/compat.h>
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index d4e350f..1f18662 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -133,7 +133,7 @@ ip6_packet_match(const struct sk_buff *skb,
 		int protohdr;
 		unsigned short _frag_off;
 
-		protohdr = ipv6_find_hdr(skb, protoff, -1, &_frag_off);
+		protohdr = ipv6_find_hdr(skb, protoff, -1, &_frag_off, NULL);
 		if (protohdr < 0) {
 			if (_frag_off == 0)
 				*hotdrop = true;
@@ -362,6 +362,7 @@ ip6t_do_table(struct sk_buff *skb,
 		const struct xt_entry_match *ematch;
 
 		IP_NF_ASSERT(e);
+		acpar.thoff = 0;
 		if (!ip6_packet_match(skb, indev, outdev, &e->ipv6,
 		    &acpar.thoff, &acpar.fragoff, &acpar.hotdrop)) {
  no_match:
@@ -2277,6 +2278,8 @@ static void __exit ip6_tables_fini(void)
  * find the offset to specified header or the protocol number of last header
  * if target < 0. "last header" is transport protocol header, ESP, or
  * "No next header".
+ * Note, *offset is used as input param. an if != 0
+ * it must be an offset to an inner ipv6 header ex. icmp error
  *
  * If target header is found, its offset is set in *offset and return protocol
  * number. Otherwise, return -1.
@@ -2289,17 +2292,34 @@ static void __exit ip6_tables_fini(void)
  * *offset is meaningless and fragment offset is stored in *fragoff if fragoff
  * isn't NULL.
  *
+ * if flags != NULL AND
+ *    it's a fragment the frag flag "IP6T_FH_F_FRAG" will be set
+ *    it's an AH header and IP6T_FH_F_AUTH is set and target < 0
+ *      stop at AH (i.e. treat is as a transport header)
  */
 int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
-		  int target, unsigned short *fragoff)
+		  int target, unsigned short *fragoff, int *flags)
 {
 	unsigned int start = skb_network_offset(skb) + sizeof(struct ipv6hdr);
 	u8 nexthdr = ipv6_hdr(skb)->nexthdr;
-	unsigned int len = skb->len - start;
+	unsigned int len;
 
 	if (fragoff)
 		*fragoff = 0;
 
+	if (*offset) {
+		struct ipv6hdr _ip6, *ip6;
+
+		ip6 = skb_header_pointer(skb, *offset, sizeof(_ip6), &_ip6);
+		if (!ip6 || (ip6->version != 6)) {
+			printk(KERN_ERR "IPv6 header not found\n");
+			return -EBADMSG;
+		}
+		start = *offset + sizeof(struct ipv6hdr);
+		nexthdr = ip6->nexthdr;
+	}
+	len = skb->len - start;
+
 	while (nexthdr != target) {
 		struct ipv6_opt_hdr _hdr, *hp;
 		unsigned int hdrlen;
@@ -2316,6 +2336,9 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
 		if (nexthdr == NEXTHDR_FRAGMENT) {
 			unsigned short _frag_off;
 			__be16 *fp;
+
+			if (flags)	/* Indicate that this is a fragment */
+				*flags |= IP6T_FH_F_FRAG;
 			fp = skb_header_pointer(skb,
 						start+offsetof(struct frag_hdr,
 							       frag_off),
@@ -2336,9 +2359,11 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
 				return -ENOENT;
 			}
 			hdrlen = 8;
-		} else if (nexthdr == NEXTHDR_AUTH)
+		} else if (nexthdr == NEXTHDR_AUTH) {
+			if (flags && (*flags & IP6T_FH_F_AUTH) && (target < 0))
+				break;
 			hdrlen = (hp->hdrlen + 2) << 2;
-		else
+		} else
 			hdrlen = ipv6_optlen(hp);
 
 		nexthdr = hp->nexthdr;
diff --git a/net/ipv6/netfilter/ip6t_ah.c b/net/ipv6/netfilter/ip6t_ah.c
index 89cccc5..04099ab 100644
--- a/net/ipv6/netfilter/ip6t_ah.c
+++ b/net/ipv6/netfilter/ip6t_ah.c
@@ -41,11 +41,11 @@ static bool ah_mt6(const struct sk_buff *skb, struct xt_action_param *par)
 	struct ip_auth_hdr _ah;
 	const struct ip_auth_hdr *ah;
 	const struct ip6t_ah *ahinfo = par->matchinfo;
-	unsigned int ptr;
+	unsigned int ptr = 0;
 	unsigned int hdrlen = 0;
 	int err;
 
-	err = ipv6_find_hdr(skb, &ptr, NEXTHDR_AUTH, NULL);
+	err = ipv6_find_hdr(skb, &ptr, NEXTHDR_AUTH, NULL, NULL);
 	if (err < 0) {
 		if (err != -ENOENT)
 			par->hotdrop = true;
diff --git a/net/ipv6/netfilter/ip6t_frag.c b/net/ipv6/netfilter/ip6t_frag.c
index eda898f..3b5735e 100644
--- a/net/ipv6/netfilter/ip6t_frag.c
+++ b/net/ipv6/netfilter/ip6t_frag.c
@@ -40,10 +40,10 @@ frag_mt6(const struct sk_buff *skb, struct xt_action_param *par)
 	struct frag_hdr _frag;
 	const struct frag_hdr *fh;
 	const struct ip6t_frag *fraginfo = par->matchinfo;
-	unsigned int ptr;
+	unsigned int ptr = 0;
 	int err;
 
-	err = ipv6_find_hdr(skb, &ptr, NEXTHDR_FRAGMENT, NULL);
+	err = ipv6_find_hdr(skb, &ptr, NEXTHDR_FRAGMENT, NULL, NULL);
 	if (err < 0) {
 		if (err != -ENOENT)
 			par->hotdrop = true;
diff --git a/net/ipv6/netfilter/ip6t_hbh.c b/net/ipv6/netfilter/ip6t_hbh.c
index 59df051..01df142 100644
--- a/net/ipv6/netfilter/ip6t_hbh.c
+++ b/net/ipv6/netfilter/ip6t_hbh.c
@@ -50,7 +50,7 @@ hbh_mt6(const struct sk_buff *skb, struct xt_action_param *par)
 	const struct ipv6_opt_hdr *oh;
 	const struct ip6t_opts *optinfo = par->matchinfo;
 	unsigned int temp;
-	unsigned int ptr;
+	unsigned int ptr = 0;
 	unsigned int hdrlen = 0;
 	bool ret = false;
 	u8 _opttype;
@@ -62,7 +62,7 @@ hbh_mt6(const struct sk_buff *skb, struct xt_action_param *par)
 
 	err = ipv6_find_hdr(skb, &ptr,
 			    (par->match == &hbh_mt6_reg[0]) ?
-			    NEXTHDR_HOP : NEXTHDR_DEST, NULL);
+			    NEXTHDR_HOP : NEXTHDR_DEST, NULL, NULL);
 	if (err < 0) {
 		if (err != -ENOENT)
 			par->hotdrop = true;
diff --git a/net/ipv6/netfilter/ip6t_rt.c b/net/ipv6/netfilter/ip6t_rt.c
index d8488c5..2c99b94 100644
--- a/net/ipv6/netfilter/ip6t_rt.c
+++ b/net/ipv6/netfilter/ip6t_rt.c
@@ -42,14 +42,14 @@ static bool rt_mt6(const struct sk_buff *skb, struct xt_action_param *par)
 	const struct ipv6_rt_hdr *rh;
 	const struct ip6t_rt *rtinfo = par->matchinfo;
 	unsigned int temp;
-	unsigned int ptr;
+	unsigned int ptr = 0;
 	unsigned int hdrlen = 0;
 	bool ret = false;
 	struct in6_addr _addr;
 	const struct in6_addr *ap;
 	int err;
 
-	err = ipv6_find_hdr(skb, &ptr, NEXTHDR_ROUTING, NULL);
+	err = ipv6_find_hdr(skb, &ptr, NEXTHDR_ROUTING, NULL, NULL);
 	if (err < 0) {
 		if (err != -ENOENT)
 			par->hotdrop = true;
diff --git a/net/netfilter/xt_TPROXY.c b/net/netfilter/xt_TPROXY.c
index 35a959a..146033a 100644
--- a/net/netfilter/xt_TPROXY.c
+++ b/net/netfilter/xt_TPROXY.c
@@ -282,10 +282,10 @@ tproxy_tg6_v1(struct sk_buff *skb, const struct xt_action_param *par)
 	struct sock *sk;
 	const struct in6_addr *laddr;
 	__be16 lport;
-	int thoff;
+	int thoff = 0;
 	int tproto;
 
-	tproto = ipv6_find_hdr(skb, &thoff, -1, NULL);
+	tproto = ipv6_find_hdr(skb, &thoff, -1, NULL, NULL);
 	if (tproto < 0) {
 		pr_debug("unable to find transport header in IPv6 packet, dropping\n");
 		return NF_DROP;
diff --git a/net/netfilter/xt_socket.c b/net/netfilter/xt_socket.c
index 72bb07f..9ea482d 100644
--- a/net/netfilter/xt_socket.c
+++ b/net/netfilter/xt_socket.c
@@ -263,10 +263,10 @@ socket_mt6_v1(const struct sk_buff *skb, struct xt_action_param *par)
 	struct sock *sk;
 	struct in6_addr *daddr, *saddr;
 	__be16 dport, sport;
-	int thoff, tproto;
+	int thoff = 0, tproto;
 	const struct xt_socket_mtinfo1 *info = (struct xt_socket_mtinfo1 *) par->matchinfo;
 
-	tproto = ipv6_find_hdr(skb, &thoff, -1, NULL);
+	tproto = ipv6_find_hdr(skb, &thoff, -1, NULL, NULL);
 	if (tproto < 0) {
 		pr_debug("unable to find transport header in IPv6 packet, dropping\n");
 		return NF_DROP;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
  2012-04-23 13:35 [v12 PATCH 0/3] NETFILTER new target module, HMARK Hans Schillstrom
  2012-04-23 13:35 ` [v12 PATCH 1/3] NETFILTER added flags to ipv6_find_hdr() Hans Schillstrom
@ 2012-04-23 13:35 ` Hans Schillstrom
  2012-05-02  0:34   ` Pablo Neira Ayuso
  2012-04-23 13:35 ` [v12 PATCH 3/3] NETFILTER userspace part for target HMARK Hans Schillstrom
  2 siblings, 1 reply; 21+ messages in thread
From: Hans Schillstrom @ 2012-04-23 13:35 UTC (permalink / raw)
  To: kaber, jengelh, netfilter-devel, netdev; +Cc: hans, Hans Schillstrom

The target allows you to create rules in the "raw" and "mangle" tables
which alter the netfilter mark (nfmark) field within a given range.
First a 32 bit hash value is generated then modulus by <limit> and
finally an offset is added before it's written to nfmark.
Prior to routing, the nfmark can influence the routing method (see
"Use netfilter MARK value as routing key") and can also be used by
other subsystems to change their behavior.

man page
   HMARK
       This  module  does  the  same  as MARK, i.e. set an fwmark, but the mark
       is based on a hash value.  The hash is based on saddr, daddr, sport,
       dport and proto. The same mark will be produced independent of direction
       if no masks is set or the same masks is used for src and dest.
       The hash mark could be adjusted by modulus and finally an offset could
       be added, i.e the final mark will be within a range. ICMP error will use
       the the original message for hash calculation not the icmp it self.

       Note: IPv4 packets with nf_defrag_ipv4 loaded will be defragmented before they reach hmark,
             IPv6 nf_defrag is not implemented this way, hence fragmented ipv6 packets will reach hmark.
             Default behavior is to completely ignore any fragment if it reach hmark.
             --hmark-method L3 is fragment safe since neither ports or L4 protocol field is used.
             None of the parameters effect the packet it self only the calculated hash value.

       Parameters: Short hand methods

       --hmark-method L3
              Do not use L4 protocol field, ports or spi, only Layer 3 addresses,
              mask length of L3 addresses can still be used. Fragment or not
              does not matter in this case since only L3 address can be used in
              calc. of hash value.

       --hmark-method L3-4 (Default)
              Include  L4  in  calculation. of hash value i.e. all masks below are valid.
              Fragments will be ignored. (i.e no hash value produced)

       For all masks default is all "1:s", to disable a field use mask 0

       --hmark-src-mask length
              The length of the mask to AND the source address with (saddr & value).

       --hmark-dst-mask length
              The length of the mask to AND the dest. address with (daddr & value).

       --hmark-sport-mask value
              A 16 bit value to AND the src port with (sport & value).

       --hmark-dport-mask value
              A 16 bit value to AND the dest port with (dport & value).

       --hmark-sport-set value
              A 16 bit value to OR the src port with (sport | value).

       --hmark-dport-set value
              A 16 bit value to OR the dest port with (dport | value).

       --hmark-spi-mask value
              Value to AND the spi field with (spi & value) valid for proto esp or ah.

       --hmark-spi-set value
              Value to OR the spi field with (spi | value) valid for proto esp or ah.

       --hmark-proto-mask value
              An 8 bit value to AND the L4 proto field with (proto & value).

       --hmark-ct
              When flag is set, conntrack data should be used. Useful when NAT internal
              addressed should be used in calculation.  Be careful when using DNAT
              since mangle table is handled before nat table. I.e it will not work as
              expected to put HMARK in table mangle and PREROUTING chain. The  initial
              packet will have it's hash based on the original address,
              while the rest of the flow will use the NAT:ed address.

       --hmark-rnd value
              A 32 bit initial value for hash calc, default is 0xc175a3b8.

       Final processing of the mark in order of execution.

       --hmark-mod value (must be > 0)
              The easiest way to describe this is:  hash = hash mod <value>

       --hmark-offset value
              The easiest way to describe this is:  hash = hash + <value>

       Examples:

       Default rule handles all TCP, UDP, SCTP, ESP & AH

              iptables -t mangle -A PREROUTING -m state --state NEW,ESTABLISHED,RELATED
               -j HMARK --hmark-offset 10000 --hmark-mod 10

       Handle SCTP and hash dest port only and produce a nfmark between 100-119.

              iptables -t mangle -A PREROUTING -p SCTP -j HMARK --src-mask 0 --dst-mask 0
               --sp-mask 0 --offset 100 --mod 20

       Fragment safe Layer 3 only, that keep a class C network flow together

              iptables -t mangle -A PREROUTING -j HMARK --method L3 --src-mask 24 --mod 20 --offset 100

Rev 12
    Conntrack support for IPv6 added, and a minor restructing due to that.
    Bug fix in L3 mode, protocol had a uninitialized value.

Rev 11
    Two comments changed

Rev 10
     Even more simplified NAT handling just one switch --hmark-ct
     some renaming and some minor changes.
     Changes are based on Pablos review.

Rev 9
      Simplified NAT selections, cleanup of comments, added checkentry()
      change of #ifdef to #if IS_ENABLED and dependency.
      Some minor formating.
      Most changes are based on Pablos review.

Rev 8
      method L3 / L3-4 added i.e. Fragment handling changed to
      don't handle in "method L3-4"
      Syntax change in user mode more NF compatible.
      Most changes are based on Pablos review.

Rev 7
      IPv6 descending into icmp error hdr didn't work as expected
      with ipv6_find_hdr() Now it works as expected.

Rev 6
      Compile options with or without conntrack fixed.
      __ipv6_find_hdr() replaced by ipv6_find_hdr()

Rev 5
      IPv6 rewritten uses __ipv6_find_hdr() (P. Mc Hardy)
      Full mask and address used for IPv6 smask and dmask (J.Engelhart)
      Changes due to comments by Pablo Neira Ayuso  and Eric Dumazet
      i.e uses of skb_header_pointer() and Null check of info->hmod
      Man page changes

Rev 4
      different targets for IPv4 and IPv6
      Changes based on review by Pablo.

Rev 3
      Support added to SCTP for IPv6
Rev 2
      IPv6 header scan changed to follow RFC 2640
      IPv4 icmp echo fragmented does now use proto as ipv6
      IPv6 pskb_may_pull() check is done in every time in header loop.
      IPv4 nat support added.
      default added in IPv6 loop and null check of hp

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/linux/netfilter/xt_HMARK.h |   62 +++++++
 net/netfilter/Kconfig              |   18 ++
 net/netfilter/Makefile             |    1 +
 net/netfilter/xt_HMARK.c           |  319 ++++++++++++++++++++++++++++++++++++
 4 files changed, 400 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_HMARK.h
 create mode 100644 net/netfilter/xt_HMARK.c

diff --git a/include/linux/netfilter/xt_HMARK.h b/include/linux/netfilter/xt_HMARK.h
new file mode 100644
index 0000000..cdf4a8f
--- /dev/null
+++ b/include/linux/netfilter/xt_HMARK.h
@@ -0,0 +1,62 @@
+#ifndef XT_HMARK_H_
+#define XT_HMARK_H_
+
+#include <linux/types.h>
+
+enum {
+	XT_HMARK_NONE,
+	XT_HMARK_SADR_AND,
+	XT_HMARK_DADR_AND,
+	XT_HMARK_SPI_AND,
+	XT_HMARK_SPI_OR,
+	XT_HMARK_SPORT_AND,
+	XT_HMARK_DPORT_AND,
+	XT_HMARK_SPORT_OR,
+	XT_HMARK_DPORT_OR,
+	XT_HMARK_PROTO_AND,
+	XT_HMARK_RND,
+	XT_HMARK_MODULUS,
+	XT_HMARK_OFFSET,
+	XT_HMARK_CT,
+	XT_HMARK_METHOD_L3,
+	XT_HMARK_METHOD_L3_4,
+	XT_F_HMARK_SADR_AND    = 1 << XT_HMARK_SADR_AND,
+	XT_F_HMARK_DADR_AND    = 1 << XT_HMARK_DADR_AND,
+	XT_F_HMARK_SPI_AND     = 1 << XT_HMARK_SPI_AND,
+	XT_F_HMARK_SPI_OR      = 1 << XT_HMARK_SPI_OR,
+	XT_F_HMARK_SPORT_AND   = 1 << XT_HMARK_SPORT_AND,
+	XT_F_HMARK_DPORT_AND   = 1 << XT_HMARK_DPORT_AND,
+	XT_F_HMARK_SPORT_OR    = 1 << XT_HMARK_SPORT_OR,
+	XT_F_HMARK_DPORT_OR    = 1 << XT_HMARK_DPORT_OR,
+	XT_F_HMARK_PROTO_AND   = 1 << XT_HMARK_PROTO_AND,
+	XT_F_HMARK_RND         = 1 << XT_HMARK_RND,
+	XT_F_HMARK_MODULUS     = 1 << XT_HMARK_MODULUS,
+	XT_F_HMARK_OFFSET      = 1 << XT_HMARK_OFFSET,
+	XT_F_HMARK_CT          = 1 << XT_HMARK_CT,
+	XT_F_HMARK_METHOD_L3   = 1 << XT_HMARK_METHOD_L3,
+	XT_F_HMARK_METHOD_L3_4 = 1 << XT_HMARK_METHOD_L3_4,
+};
+
+union hmark_ports {
+	struct {
+		__u16	src;
+		__u16	dst;
+	} p16;
+	__u32	v32;
+};
+
+struct xt_hmark_info {
+	union nf_inet_addr	src_mask;	/* Source address mask */
+	union nf_inet_addr	dst_mask;	/* Dest address mask */
+	union hmark_ports	port_mask;
+	union hmark_ports	port_set;
+	__u32			spi_mask;
+	__u32			spi_set;
+	__u32			flags;		/* Print out only */
+	__u16			proto_mask;	/* L4 Proto mask */
+	__u32			hashrnd;
+	__u32			hmodulus;	/* Modulus */
+	__u32			hoffset;	/* Offset */
+};
+
+#endif /* XT_HMARK_H_ */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 0c6f67e..7b59dd0 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -509,6 +509,24 @@ config NETFILTER_XT_TARGET_HL
 	since you can easily create immortal packets that loop
 	forever on the network.
 
+config NETFILTER_XT_TARGET_HMARK
+	tristate '"HMARK" target support'
+	depends on (IP6_NF_IPTABLES || IP6_NF_IPTABLES=n)
+	depends on NETFILTER_ADVANCED
+	---help---
+	This option adds the "HMARK" target.
+
+	The target allows you to create rules in the "raw" and "mangle" tables
+	which alter the netfilter mark (nfmark) field within a given range.
+	First a 32 bit hash value is generated then modulus by <limit> and
+	finally an offset is added before it's written to nfmark.
+
+	Prior to routing, the nfmark can influence the routing method (see
+	"Use netfilter MARK value as routing key") and can also be used by
+	other subsystems to change their behavior.
+
+	The mark match can also be used to match nfmark produced by this module.
+
 config NETFILTER_XT_TARGET_IDLETIMER
 	tristate  "IDLETIMER target support"
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index ca36765..4e7960c 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_CONNSECMARK) += xt_CONNSECMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_CT) += xt_CT.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_DSCP) += xt_DSCP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_HL) += xt_HL.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_HMARK) += xt_HMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_LED) += xt_LED.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_LOG) += xt_LOG.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_NFLOG) += xt_NFLOG.o
diff --git a/net/netfilter/xt_HMARK.c b/net/netfilter/xt_HMARK.c
new file mode 100644
index 0000000..d90549d
--- /dev/null
+++ b/net/netfilter/xt_HMARK.c
@@ -0,0 +1,322 @@
+/*
+ * xt_hmark - Netfilter module to set mark as hash value
+ *
+ * (C) 2012 Hans Schillstrom <hans.schillstrom@ericsson.com>
+ *
+ *Description:
+ *	This module calculates a hash value that can be modified by modulus
+ *	and an offset, i.e. it is possible to produce a skb->mark within a range
+ *	The hash value is based on a direction independent five tuple:
+ *	src & dst addr src & dst ports and protocol.
+ *	There is two distinct modes for hash calculation:
+ *
+ *	This program is free software; you can redistribute it and/or modify
+ *	it under the terms of the GNU General Public License version 2 as
+ *	published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <net/ip.h>
+#include <linux/icmp.h>
+
+#include <linux/netfilter/xt_HMARK.h>
+#include <linux/netfilter/x_tables.h>
+if IS_ENABLED(CONFIG_NF_CONNTRACK)
+#include <net/netfilter/nf_conntrack.h>
+#endif
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+#include <net/ipv6.h>
+#include <linux/netfilter_ipv6/ip6_tables.h>
+#endif
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Hans Schillstrom <hans.schillstrom@ericsson.com>");
+MODULE_DESCRIPTION("Xtables: Packet range mark operations by Hash value");
+MODULE_ALIAS("ipt_HMARK");
+MODULE_ALIAS("ip6t_HMARK");
+
+/*
+ * ICMP, get header offset if icmp error
+ */
+static int get_inner_hdr(struct sk_buff *skb, int iphsz, int *nhoff)
+{
+	const struct icmphdr *icmph;
+	struct icmphdr _ih;
+
+	/* Not enough header? */
+	icmph = skb_header_pointer(skb, *nhoff + iphsz, sizeof(_ih), &_ih);
+	if (icmph == NULL && icmph->type > NR_ICMP_TYPES)
+		return 0;
+
+	/* Error message? */
+	if (icmph->type != ICMP_DEST_UNREACH &&
+	    icmph->type != ICMP_SOURCE_QUENCH &&
+	    icmph->type != ICMP_TIME_EXCEEDED &&
+	    icmph->type != ICMP_PARAMETERPROB &&
+	    icmph->type != ICMP_REDIRECT)
+		return 0;
+
+	*nhoff += iphsz + sizeof(_ih);
+	return 1;
+}
+
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+/*
+ * Get ipv6 header offset if icmp error
+ */
+static int get_inner6_hdr(struct sk_buff *skb, int *offset)
+{
+	struct icmp6hdr *icmp6h, _ih6;
+
+	icmp6h = skb_header_pointer(skb, *offset, sizeof(_ih6), &_ih6);
+	if (icmp6h == NULL)
+		return 0;
+
+	if (icmp6h->icmp6_type && icmp6h->icmp6_type < 128) {
+		*offset += sizeof(struct icmp6hdr);
+		return 1;
+	}
+	return 0;
+}
+/*
+ * Calculate hash based fw-mark, on the five tuple if possible.
+ * special cases :
+ *  - Fragments do not use ports not even on the first fragment,
+ *    nf_defrag_ipv6.ko don't defrag for us like it do in ipv4.
+ *    This might be changed in the future.
+ *  - On ICMP errors the inner header will be used.
+ *  - Tunnels no ports
+ *  - ESP & AH uses SPI
+ * @returns XT_CONTINUE
+ */
+static unsigned int
+hmark_v6(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+	struct ipv6hdr *ip6, _ip6;
+	int poff, flag = IP6T_FH_F_AUTH; /* Ports offset, find_hdr flags */
+	union hmark_ports uports;
+	u32 addr_src, addr_dst, hash, nhoffs = 0;
+	u16 fragoff = 0;
+	u8 nexthdr;
+
+	ip6 = (struct ipv6hdr *) (skb->data + skb_network_offset(skb));
+	nexthdr = ipv6_find_hdr(skb, &nhoffs, -1, &fragoff, &flag);
+	if (nexthdr < 0)
+		return XT_CONTINUE;
+	/* No need to check for icmp errors on fragments */
+	if ((flag & IP6T_FH_F_FRAG) || (nexthdr != IPPROTO_ICMPV6))
+		goto noicmp;
+	/* if an icmp error, use the inner header */
+	if (get_inner6_hdr(skb, &nhoffs)) {
+		ip6 = skb_header_pointer(skb, nhoffs, sizeof(_ip6), &_ip6);
+		if (!ip6)
+			return XT_CONTINUE;
+		/* Treat AH as ESP, use SPI nothing else. */
+		flag = IP6T_FH_F_AUTH;
+		nexthdr = ipv6_find_hdr(skb, &nhoffs, -1, &fragoff, &flag);
+		if (nexthdr < 0)
+			return XT_CONTINUE;
+	}
+noicmp:
+	addr_src = (__force u32)
+		(ip6->saddr.s6_addr32[0] & info->src_mask.in6.s6_addr32[0]) ^
+		(ip6->saddr.s6_addr32[1] & info->src_mask.in6.s6_addr32[1]) ^
+		(ip6->saddr.s6_addr32[2] & info->src_mask.in6.s6_addr32[2]) ^
+		(ip6->saddr.s6_addr32[3] & info->src_mask.in6.s6_addr32[3]);
+	addr_dst = (__force u32)
+		(ip6->daddr.s6_addr32[0] & info->dst_mask.in6.s6_addr32[0]) ^
+		(ip6->daddr.s6_addr32[1] & info->dst_mask.in6.s6_addr32[1]) ^
+		(ip6->daddr.s6_addr32[2] & info->dst_mask.in6.s6_addr32[2]) ^
+		(ip6->daddr.s6_addr32[3] & info->dst_mask.in6.s6_addr32[3]);
+
+	uports.v32 = 0;
+	if ((info->flags & XT_F_HMARK_METHOD_L3) ||
+	    (nexthdr == IPPROTO_ICMPV6))
+		goto no_ports;
+	/* Is next header valid for port or SPI calculation ? */
+	poff = proto_ports_offset(nexthdr);
+	if ((flag & IP6T_FH_F_FRAG) || poff < 0)
+		return XT_CONTINUE;
+
+	nhoffs += poff;
+	if (skb_copy_bits(skb, nhoffs, &uports, sizeof(uports)) < 0)
+		return XT_CONTINUE;
+
+	if ((nexthdr == IPPROTO_ESP) || (nexthdr == IPPROTO_AH))
+		uports.v32 = (uports.v32 & info->spi_mask) | info->spi_set;
+	else {
+		uports.v32 = (uports.v32 & info->port_mask.v32) |
+			      info->port_set.v32;
+		/* get a consistent hash (same value in any flow dirs.) */
+		if (uports.p16.dst < uports.p16.src)
+			swap(uports.p16.dst, uports.p16.src);
+	}
+
+no_ports:
+	nexthdr &= info->proto_mask;
+	/* get a consistent hash (same value in any flow direction) */
+	if (addr_dst < addr_src)
+		swap(addr_src, addr_dst);
+
+	hash = jhash_3words(addr_src, addr_dst, uports.v32, info->hashrnd)
+	       ^ nexthdr;
+	skb->mark = (hash % info->hmodulus) + info->hoffset;
+	return XT_CONTINUE;
+}
+#endif
+/*
+ * Calculate hash based fw-mark, on the five tuple if possible.
+ * special cases :
+ *  - Fragments do not use ports not even on the first fragment,
+ *    unless nf_defrag_xx.ko is used.
+ *  - On ICMP errors the inner header will be used.
+ *  - Tunnels no ports
+ *  - ESP & AH uses SPI
+ * @returns XT_CONTINUE
+ */
+static unsigned int
+hmark_v4(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+	struct iphdr *ip, _ip;
+	int nhoff, poff, frag = 0;
+	union hmark_ports uports;
+	u32 addr_src, addr_dst, hash;
+	u8 ip_proto;
+
+	nhoff = skb_network_offset(skb);
+	ip = (struct iphdr *) (skb->data + nhoff);
+	if (ip->protocol == IPPROTO_ICMP) {
+		/* if an icmp error, calc hash on inner header */
+		if (get_inner_hdr(skb, ip->ihl * 4, &nhoff)) {
+			ip = skb_header_pointer(skb, nhoff, sizeof(_ip), &_ip);
+			if (!ip)
+				return XT_CONTINUE;
+		}
+	}
+
+	ip_proto = ip->protocol;
+	if (ip->frag_off & htons(IP_MF | IP_OFFSET))
+		frag = 1;
+
+	addr_src = (__force u32) ip->saddr;
+	addr_dst = (__force u32) ip->daddr;
+	uports.v32 = 0;
+/* todo: Check conntrack ICMP relation */
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+	if (info->flags & XT_F_HMARK_CT) {
+		struct nf_conntrack_tuple *otuple;
+		struct nf_conntrack_tuple *rtuple;
+		enum ip_conntrack_info ctinfo;
+		struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
+
+		if (!ct || nf_ct_is_untracked(ct))
+			return XT_CONTINUE;
+
+		otuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
+		rtuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
+
+		addr_src       = (__force u32)otuple->src.u3.in.s_addr;
+		uports.p16.src = otuple->src.u.udp.port;
+		addr_dst       = (__force u32)rtuple->src.u3.in.s_addr;
+		uports.p16.dst = rtuple->src.u.udp.port;
+	}
+#endif
+	addr_src &= info->src_mask.ip;
+	addr_dst &= info->dst_mask.ip;
+
+	if ((info->flags & XT_F_HMARK_METHOD_L3) ||
+	    (ip_proto == IPPROTO_ICMP)) {
+		uports.v32 = 0;
+		goto noports;
+	}
+	/* Check if ports can be used in hash calculation. */
+	poff = proto_ports_offset(ip_proto);
+	if (frag || poff < 0)
+		return XT_CONTINUE;
+
+	/* if no ports from conntrack try to get ports from skb */
+	if (!uports.v32) {
+		nhoff += (ip->ihl * 4) + poff;
+		if (skb_copy_bits(skb, nhoff, &uports, sizeof(uports)) < 0)
+			return XT_CONTINUE;
+	}
+
+	if (ip_proto == IPPROTO_ESP || ip_proto == IPPROTO_AH)
+		uports.v32 = (uports.v32 & info->spi_mask) | info->spi_set;
+	else {
+		uports.v32 = (uports.v32 & info->port_mask.v32) |
+				info->port_set.v32;
+		/* get a consistent hash (same value in any flow dirs.) */
+		if (uports.p16.dst < uports.p16.src)
+			swap(uports.p16.src, uports.p16.dst);
+	}
+
+noports:
+	/* get a consistent hash (same value in any flow direction) */
+	if (addr_dst < addr_src)
+		swap(addr_src, addr_dst);
+
+	hash = jhash_3words(addr_src, addr_dst, uports.v32, info->hashrnd);
+	hash = hash ^ (ip_proto & info->proto_mask);
+	skb->mark = (hash % info->hmodulus) + info->hoffset;
+	return XT_CONTINUE;
+}
+
+static int hmark_check(const struct xt_tgchk_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+
+	if (!info->hmodulus) {
+		pr_info("HMARK: hmark-mod can't be zero\n");
+		return -EINVAL;
+	}
+	if (info->proto_mask && (info->flags & XT_F_HMARK_METHOD_L3)) {
+		pr_info("HMARK: When method L3 proto mask must be zero\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static struct xt_target hmark_tg_reg[] __read_mostly = {
+	{
+		.name           = "HMARK",
+		.revision       = 0,
+		.family         = NFPROTO_IPV4,
+		.target         = hmark_v4,
+		.targetsize     = sizeof(struct xt_hmark_info),
+		.checkentry     = hmark_check,
+		.me             = THIS_MODULE,
+	},
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+	{
+		.name           = "HMARK",
+		.revision       = 0,
+		.family         = NFPROTO_IPV6,
+		.target         = hmark_v6,
+		.targetsize     = sizeof(struct xt_hmark_info),
+		.checkentry     = hmark_check,
+		.me             = THIS_MODULE,
+	},
+#endif
+};
+
+static int __init hmark_mt_init(void)
+{
+	int ret;
+
+	ret = xt_register_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+	if (ret < 0)
+		return ret;
+	return 0;
+}
+
+static void __exit hmark_mt_exit(void)
+{
+	xt_unregister_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+}
+
+module_init(hmark_mt_init);
+module_exit(hmark_mt_exit);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [v12 PATCH 3/3] NETFILTER userspace part for target HMARK
  2012-04-23 13:35 [v12 PATCH 0/3] NETFILTER new target module, HMARK Hans Schillstrom
  2012-04-23 13:35 ` [v12 PATCH 1/3] NETFILTER added flags to ipv6_find_hdr() Hans Schillstrom
  2012-04-23 13:35 ` [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark Hans Schillstrom
@ 2012-04-23 13:35 ` Hans Schillstrom
  2 siblings, 0 replies; 21+ messages in thread
From: Hans Schillstrom @ 2012-04-23 13:35 UTC (permalink / raw)
  To: kaber, jengelh, netfilter-devel, netdev; +Cc: hans, Hans Schillstrom

    The target allows you to create rules in the "raw" and "mangle" tables
    which alter the netfilter mark (nfmark) field within a given range.
    First a 32 bit hash value is generated then modulus by <limit> and
    finally an offset is added before it's written to nfmark.
    Prior to routing, the nfmark can influence the routing method (see
    "Use netfilter MARK value as routing key") and can also be used by
    other subsystems to change their behaviour.

    The mark match can also be used to match nfmark produced by this module.

Ver 12
    Reset option flag in some cases, where option is disabled by value.

Ver 10
    conntrack reduced to --hmark-ct switch
    renaming of vars in xt_hmark_info
    Adding helptext and updated man due to --hmark-ct switc

Ver 9
    Formating changes.

Ver 8
    Syntax changes more descriptive options
    --hmark-method added.

Ver 6-7 -

Ver 5
      smask and dmask changed to length

Ver 4
      xtoptions used for parsing.

Ver 3
       -

Ver 2
      IPv4 NAT added
      iptables ver 1.4.12.1 adaptions.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 extensions/libxt_HMARK.c           |  506 ++++++++++++++++++++++++++++++++++++
 extensions/libxt_HMARK.man         |   84 ++++++
 include/linux/netfilter/xt_HMARK.h |   62 +++++
 3 files changed, 652 insertions(+), 0 deletions(-)
 create mode 100644 extensions/libxt_HMARK.c
 create mode 100644 extensions/libxt_HMARK.man
 create mode 100644 include/linux/netfilter/xt_HMARK.h

diff --git a/extensions/libxt_HMARK.c b/extensions/libxt_HMARK.c
new file mode 100644
index 0000000..c4d6efd
--- /dev/null
+++ b/extensions/libxt_HMARK.c
@@ -0,0 +1,506 @@
+/*
+ * Shared library add-on to iptables to add HMARK target support.
+ *
+ * The kernel module calculates a hash value that can be modified by modulus
+ * and an offset. The hash value is based on a direction independent
+ * five tuple: src & dst addr src & dst ports and protocol.
+ * However src & dst port can be masked and are not used for fragmented
+ * packets, ESP and AH don't have ports so SPI will be used instead.
+ * For ICMP error messages the hash mark values will be calculated on
+ * the source packet i.e. the packet caused the error (If sufficient
+ * amount of data exists).
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <stdbool.h>
+#include <stdio.h>
+#include <string.h>
+
+#include "xtables.h"
+#include <linux/netfilter/xt_HMARK.h>
+
+
+#define DEF_HRAND 0xc175a3b8	/* Default "random" value to jhash */
+
+#define XT_F_HMARK_L4_OPTS  (XT_F_HMARK_SPI_AND | XT_F_HMARK_SPI_OR\
+			     | XT_F_HMARK_SPORT_AND | XT_F_HMARK_SPORT_OR\
+			     | XT_F_HMARK_DPORT_AND | XT_F_HMARK_DPORT_OR\
+			     | XT_F_HMARK_PROTO_AND)
+
+static void HMARK_help(void)
+{
+	printf(
+"HMARK target options, i.e. modify hash calculation by:\n"
+"  --hmark-method <method>          Overall L3/L4 and fragment behavior\n"
+"                 L3                Fragment safe, do not use ports or proto\n"
+"                                   i.e. Fragments don't need special care.\n"
+"                 L3-4 (Default)    Fragment unsafe, use ports and proto\n"
+"                                   if defrag off in conntrack\n"
+"                                      no hmark on any part of a fragment\n"
+"  Limit/modify the calculated hash mark by:\n"
+"  --hmark-mod value                nfmark modulus value\n"
+"  --hmark-offset value             Last action add value to nfmark\n\n"
+" Fine tuning of what will be included in hash calculation\n"
+"  --hmark-src-mask length          Source address mask length\n"
+"  --hmark-dst-mask length          Dest address mask length\n"
+"  --hmark-sport-mask value         Mask src port with value\n"
+"  --hmark-dport-mask value         Mask dst port with value\n"
+"  --hmark-spi-mask value           For esp and ah AND spi with value\n"
+"  --hmark-sport-set value          OR src port with value\n"
+"  --hmark-dport-set value          OR dst port with value\n"
+"  --hmark-spi-set value            For esp and ah OR spi with value\n"
+"  --hmark-proto-mask value         Mask Protocol with value\n"
+"  --hmark-rnd                      Initial Random value to hash cacl.\n"
+" For NAT in IPv4: src part from original/reply tuple will always be used\n"
+" i.e. orig src part will be used as src address/port.\n"
+"     reply src part will be used as dst address/port\n"
+" Make sure to qualify the rule in a proper way when using NAT flag\n"
+" When --ct is used only tracked connections will match\n"
+"  --hmark-ct                       Force conntrack orig and rely tuples as\n"
+"                                   source and destination.\n\n"
+" In many cases hmark can be omitted i.e. --src-mask can be used\n");
+}
+
+#define hi struct xt_hmark_info
+
+static const struct xt_option_entry HMARK_opts[] = {
+	{ .name  = "hmark-method",
+	  .type  = XTTYPE_STRING,
+	  .id    = XT_HMARK_METHOD_L3
+	},
+	{ .name  = "hmark-src-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_SADR_AND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, src_mask)
+	},
+	{ .name  = "hmark-dst-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_DADR_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, dst_mask)
+	},
+	{ .name  = "hmark-sport-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.src)
+	},
+	{ .name  = "hmark-dport-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_DPORT_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.dst)
+	},
+	{ .name  = "hmark-spi-mask",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, spi_mask)
+	},
+	{ .name  = "hmark-sport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.src)
+	},
+	{ .name  = "hmark-dport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_DPORT_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.dst)
+	},
+	{ .name  = "hmark-spi-set",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, spi_set)
+	},
+	{ .name  = "hmark-proto-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_PROTO_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, proto_mask)
+	},
+	{ .name  = "hmark-rnd",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_RND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hashrnd)
+	},
+	{ .name = "hmark-mod",
+	  .type = XTTYPE_UINT32,
+	  .id = XT_HMARK_MODULUS,
+	  .min = 1,
+	  .flags = XTOPT_PUT | XTOPT_MAND,
+	  XTOPT_POINTER(hi, hmodulus)
+	},
+	{ .name  = "hmark-offset",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_OFFSET,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hoffset)
+	},
+	{ .name  = "hmark-ct",
+	  .type  = XTTYPE_NONE,
+	  .id    = XT_HMARK_CT
+	},
+
+	{ .name  = "method",
+	  .type  = XTTYPE_STRING,
+	  .id    = XT_HMARK_METHOD_L3
+	},
+	{ .name  = "src-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_SADR_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, src_mask)
+	},
+	{ .name  = "dst-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_DADR_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, dst_mask)
+	},
+	{ .name  = "sport-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.src)
+	},
+	{ .name  = "dport-mask", .type = XTTYPE_UINT16,
+	  .id = XT_HMARK_DPORT_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.dst)
+	},
+	{ .name  = "spi-mask",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, spi_mask)
+	},
+	{ .name  = "sport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.src)
+	},
+	{ .name  = "dport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_DPORT_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.dst)
+	},
+	{ .name  = "spi-set",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, spi_set)
+	},
+	{ .name  = "proto-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_PROTO_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, proto_mask)
+	},
+	{ .name  = "rnd",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_RND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hashrnd)
+	},
+	{ .name  = "mod",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_MODULUS,
+	  .min   = 1,
+	  .flags = XTOPT_PUT,
+	  XTOPT_MAND, XTOPT_POINTER(hi, hmodulus)
+	},
+	{ .name  = "offset",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_OFFSET,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hoffset)
+	},
+	{ .name  = "ct",
+	  .type  = XTTYPE_NONE,
+	  .id    = XT_HMARK_CT
+	},
+	XTOPT_TABLEEND,
+};
+
+static void HMARK_parse(struct xt_option_call *cb, int plen)
+{
+	struct xt_hmark_info *info = cb->data;
+
+	if (!cb->xflags) {
+		memset(info, 0xff, sizeof(struct xt_hmark_info));
+		info->port_set.v32 = 0;
+		info->flags = 0;
+		info->spi_set = 0;
+		info->hoffset = 0;
+		info->hashrnd = DEF_HRAND;
+	}
+	xtables_option_parse(cb);
+
+	switch (cb->entry->id) {
+	case XT_HMARK_SADR_AND:
+		if (cb->val.hlen == plen)
+			cb->xflags &= ~XT_F_HMARK_SADR_AND;
+		break;
+	case XT_HMARK_DADR_AND:
+		if (cb->val.hlen == plen)
+			cb->xflags &= ~XT_F_HMARK_DADR_AND;
+		break;
+	case XT_HMARK_SPI_AND:
+		info->spi_mask = htonl(cb->val.u32);
+		if (cb->val.u32 == 0xffffffff)
+			cb->xflags &= ~XT_F_HMARK_SPI_AND;
+		break;
+	case XT_HMARK_SPI_OR:
+		info->spi_set = htonl(cb->val.u32);
+		if (cb->val.u32 == 0)
+			cb->xflags &= ~XT_F_HMARK_SPI_OR;
+		break;
+	case XT_HMARK_SPORT_AND:
+		info->port_mask.p16.src = htons(cb->val.u16);
+		if (cb->val.u16 == 0xffff)
+			cb->xflags &= ~XT_F_HMARK_SPORT_AND;
+		break;
+	case XT_HMARK_DPORT_AND:
+		info->port_mask.p16.dst = htons(cb->val.u16);
+		if (cb->val.u16 == 0xffff)
+			cb->xflags &= ~XT_F_HMARK_DPORT_AND;
+		break;
+	case XT_HMARK_SPORT_OR:
+		info->port_set.p16.src = htons(cb->val.u16);
+		if (cb->val.u16 == 0)
+			cb->xflags &= ~XT_F_HMARK_SPORT_OR;
+		break;
+	case XT_HMARK_DPORT_OR:
+		info->port_set.p16.dst = htons(cb->val.u16);
+		if (cb->val.u16 == 0)
+			cb->xflags &= ~XT_F_HMARK_DPORT_OR;
+		break;
+	case XT_HMARK_PROTO_AND:
+		if (cb->val.u16 == 0xffff)
+			cb->xflags &= ~XT_F_HMARK_PROTO_AND;
+		break;
+	case XT_HMARK_MODULUS:
+		if (info->hmodulus == 0) {
+			xtables_error(PARAMETER_PROBLEM,
+				      "xxx modulus 0 ? "
+				      "thats a div by 0");
+			info->hmodulus = 0xffffffff;
+		}
+		break;
+	case XT_HMARK_METHOD_L3:
+		if (strcmp(cb->arg, "L3") == 0) {
+			info->proto_mask = 0;
+			cb->xflags &= ~XT_F_HMARK_METHOD_L3_4;
+		} else if (strcmp(cb->arg, "L3-4") == 0) {
+			cb->xflags &= ~XT_F_HMARK_METHOD_L3;
+			cb->xflags |= XT_F_HMARK_METHOD_L3_4;
+		}
+		break;
+	}
+	info->flags = cb->xflags;
+}
+
+static void HMARK_ip4_parse(struct xt_option_call *cb)
+{
+	HMARK_parse(cb, 32);
+}
+static void HMARK_ip6_parse(struct xt_option_call *cb)
+{
+	HMARK_parse(cb, 128);
+}
+
+static void HMARK_check(struct xt_fcheck_call *cb)
+{
+	if (!(cb->xflags & XT_F_HMARK_MODULUS))
+		xtables_error(PARAMETER_PROBLEM, "HMARK: the --hmark-mod, "
+			      "is not set, or zero wich is a div by zero");
+	/* Check for invalid options */
+	if (cb->xflags & XT_F_HMARK_METHOD_L3 &&
+	   (cb->xflags & XT_F_HMARK_L4_OPTS))
+		xtables_error(PARAMETER_PROBLEM, "HMARK: --hmark-method L3, "
+			      "can not be combined by an Layer 4 options: "
+			      "port, spi or proto ");
+}
+/*
+ * Common print for IPv4 & IPv6
+ */
+static void HMARK_print(const struct xt_hmark_info *info)
+{
+	if (info->flags & XT_F_HMARK_METHOD_L3) {
+		printf("method L3 ");
+	} else {
+		if (info->flags & XT_F_HMARK_METHOD_L3_4)
+			printf("method L3-4 ");
+		if (info->flags & XT_F_HMARK_SPORT_AND)
+			printf("sport-mask 0x%x ",
+			       htons(info->port_mask.p16.src));
+		if (info->flags & XT_F_HMARK_DPORT_AND)
+			printf("dport-mask 0x%x ",
+			       htons(info->port_mask.p16.dst));
+		if (info->flags & XT_F_HMARK_SPI_AND)
+			printf("spi-mask 0x%x ", htonl(info->spi_mask));
+		if (info->flags & XT_F_HMARK_SPORT_OR)
+			printf("sport-set 0x%x ",
+			       htons(info->port_set.p16.src));
+		if (info->flags & XT_F_HMARK_DPORT_OR)
+			printf("dport-set 0x%x ",
+			       htons(info->port_set.p16.dst));
+		if (info->flags & XT_F_HMARK_SPI_OR)
+			printf("spi-set 0x%x ", htonl(info->spi_set));
+		if (info->flags & XT_F_HMARK_PROTO_AND)
+			printf("proto-mask 0x%x ", info->proto_mask);
+	}
+	if (info->flags & XT_F_HMARK_RND)
+		printf("rnd 0x%x ", info->hashrnd);
+
+}
+
+static void HMARK_ip6_print(const void *ip,
+			    const struct xt_entry_target *target, int numeric)
+{
+	const struct xt_hmark_info *info =
+			(const struct xt_hmark_info *)target->data;
+
+	printf(" HMARK ");
+	if (info->flags & XT_F_HMARK_MODULUS)
+		printf("%% 0x%x ", info->hmodulus);
+	if (info->flags & XT_F_HMARK_OFFSET)
+		printf("+ 0x%x ", info->hoffset);
+	if (info->flags & XT_F_HMARK_CT)
+		printf("ct, ");
+	if (info->flags & XT_F_HMARK_SADR_AND)
+		printf("src-mask %s ",
+		       xtables_ip6mask_to_numeric(&info->src_mask.in6) + 1);
+	if (info->flags & XT_F_HMARK_DADR_AND)
+		printf("dst-mask %s ",
+		       xtables_ip6mask_to_numeric(&info->dst_mask.in6) + 1);
+	HMARK_print(info);
+}
+static void HMARK_ip4_print(const void *ip,
+			    const struct xt_entry_target *target, int numeric)
+{
+	const struct xt_hmark_info *info =
+		(const struct xt_hmark_info *)target->data;
+
+	printf(" HMARK ");
+	if (info->flags & XT_F_HMARK_MODULUS)
+		printf("%% 0x%x ", info->hmodulus);
+	if (info->flags & XT_F_HMARK_OFFSET)
+		printf("+ 0x%x ", info->hoffset);
+	if (info->flags & XT_F_HMARK_CT)
+		printf("ct, ");
+	if (info->flags & XT_F_HMARK_SADR_AND)
+		printf("src-mask %s ",
+		       xtables_ipmask_to_numeric(&info->src_mask.in) + 1);
+	if (info->flags & XT_F_HMARK_DADR_AND)
+		printf("dst-mask %s ",
+		       xtables_ipmask_to_numeric(&info->dst_mask.in) + 1);
+	HMARK_print(info);
+}
+static void HMARK_save(const struct xt_hmark_info *info)
+{
+	if (info->flags & XT_F_HMARK_METHOD_L3) {
+		printf(" --hmark-method L3");
+	} else {
+		if (info->flags & XT_F_HMARK_METHOD_L3_4)
+			printf(" --hmark-method L3-4");
+		if (info->flags & XT_F_HMARK_SPORT_AND)
+			printf(" --hmark-sport-mask 0x%x",
+			       htons(info->port_mask.p16.src));
+		if (info->flags & XT_F_HMARK_DPORT_AND)
+			printf(" --hmark-dport-mask 0x%x",
+			       htons(info->port_mask.p16.dst));
+		if (info->flags & XT_F_HMARK_SPI_AND)
+			printf(" --hmark-spi-mask 0x%x",
+			       htonl(info->spi_mask));
+		if (info->flags & XT_F_HMARK_SPORT_OR)
+			printf(" --hmark-sport-set 0x%x",
+			       htons(info->port_set.p16.src));
+		if (info->flags & XT_F_HMARK_DPORT_OR)
+			printf(" --hmark-dport-set 0x%x",
+			       htons(info->port_set.p16.dst));
+		if (info->flags & XT_F_HMARK_SPI_OR)
+			printf(" --hmark-spi-set 0x%x", htonl(info->spi_set));
+		if (info->flags & XT_F_HMARK_PROTO_AND)
+			printf(" --hmark-proto-mask 0x%x", info->proto_mask);
+	}
+	if (info->flags & XT_F_HMARK_RND)
+		printf(" --hmark-rnd 0x%x", info->hashrnd);
+	if (info->flags & XT_F_HMARK_MODULUS)
+		printf(" --hmark-mod 0x%x", info->hmodulus);
+	if (info->flags & XT_F_HMARK_OFFSET)
+		printf(" --hmark-offset 0x%x", info->hoffset);
+	if (info->flags & XT_F_HMARK_CT)
+		printf(" --hmark-ct");
+}
+
+static void HMARK_ip6_save(const void *ip, const struct xt_entry_target *target)
+{
+	const struct xt_hmark_info *info =
+		(const struct xt_hmark_info *)target->data;
+
+	if (info->flags & XT_F_HMARK_SADR_AND)
+		printf(" --hmark-src-mask %s",
+		       xtables_ip6mask_to_numeric(&info->src_mask.in6) + 1);
+	if (info->flags & XT_F_HMARK_DADR_AND)
+		printf(" --hmark-dst-mask %s",
+		       xtables_ip6mask_to_numeric(&info->dst_mask.in6) + 1);
+	HMARK_save(info);
+}
+
+static void HMARK_ip4_save(const void *ip, const struct xt_entry_target *target)
+{
+	const struct xt_hmark_info *info =
+		(const struct xt_hmark_info *)target->data;
+
+	if (info->flags & XT_F_HMARK_SADR_AND)
+		printf(" --hmark-src-mask %s",
+		       xtables_ipmask_to_numeric(&info->src_mask.in) + 1);
+	if (info->flags & XT_F_HMARK_DADR_AND)
+		printf(" --hmark-dst-mask %s",
+		       xtables_ipmask_to_numeric(&info->dst_mask.in) + 1);
+	HMARK_save(info);
+}
+
+static struct xtables_target mark_tg_reg[] = {
+	{
+		.family        = NFPROTO_IPV4,
+		.name          = "HMARK",
+		.version       = XTABLES_VERSION,
+		.revision      = 0,
+		.size          = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.help          = HMARK_help,
+		.print         = HMARK_ip4_print,
+		.save          = HMARK_ip4_save,
+		.x6_parse      = HMARK_ip4_parse,
+		.x6_fcheck     = HMARK_check,
+		.x6_options    = HMARK_opts,
+	},
+	{
+		.family        = NFPROTO_IPV6,
+		.name          = "HMARK",
+		.version       = XTABLES_VERSION,
+		.revision      = 0,
+		.size          = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.help          = HMARK_help,
+		.print         = HMARK_ip6_print,
+		.save          = HMARK_ip6_save,
+		.x6_parse      = HMARK_ip6_parse,
+		.x6_fcheck     = HMARK_check,
+		.x6_options    = HMARK_opts,
+	},
+};
+
+void _init(void)
+{
+	xtables_register_targets(mark_tg_reg, ARRAY_SIZE(mark_tg_reg));
+}
diff --git a/extensions/libxt_HMARK.man b/extensions/libxt_HMARK.man
new file mode 100644
index 0000000..c258e59
--- /dev/null
+++ b/extensions/libxt_HMARK.man
@@ -0,0 +1,84 @@
+This module does the same as MARK, i.e. set an fwmark, but the mark is based on a hash value.
+The hash is based on src-addr, dst-addr, sport, dport and proto. The same mark will be produced independent of direction if no masks is set or the same masks is used for src and dest.
+The hash mark could be adjusted by modulus and finally an offset could be added, i.e the final mark will be within a range.
+ICMP error will use the the original message for hash calculation not the icmp it self.
+
+Note: IPv4 packets with nf_defrag_ipv4 loaded will be defragmented before they reach hmark,
+      IPv6 nf_defrag is not implemented this way, hence fragmented ipv6 packets will reach hmark.
+      Default behavior is to completely ignore any fragment if it reach hmark.
+      --hmark-method L3 is fragment safe since neither ports or L4 protocol field is used.
+      None of the parameters effect the packet it self only the calculated hash value.
+
+.PP
+Parameters:
+Short hand methods
+.TP
+\fB\-\-hmark\-method\fP \fIL3\fP
+Do not use L4 protocol field, ports or spi, only Layer 3 addresses, mask length
+of L3 addresses can still be used. Fragment or not does not matter in
+this case since only L3 address can be used in calc. of hash value.
+.TP
+\fB\-\-hmark\-method\fP \fIL3-4\fP (Default)
+Include L4 in calculation. of hash value i.e. all masks below are valid.
+Fragments will be ignored. (i.e no hash value produced)
+.PP
+For all masks default is all "1:s", to disable a field use mask 0
+.TP
+\fB\-\-hmark\-src\-mask\fP \fIlength\fP
+The length of the mask to AND the source address with (saddr & value).
+.TP
+\fB\-\-hmark\-dst\-mask\fP \fIlength\fP
+The length of the mask to AND the dest. address with (daddr & value).
+.TP
+\fB\-\-hmark\-sport\-mask\fP \fIvalue\fP
+A 16 bit value to AND the src port with (sport & value).
+.TP
+\fB\-\-hmark\-dport\-mask\fP \fIvalue\fP
+A 16 bit value to AND the dest port with (dport & value).
+.TP
+\fB\-\-hmark\-sport\-set\fP \fIvalue\fP
+A 16 bit value to OR the src port with (sport | value).
+.TP
+\fB\-\-hmark\-dport\-set\fP \fIvalue\fP
+A 16 bit value to OR the dest port with (dport | value).
+.TP
+\fB\-\-hmark\-spi\-mask\fP \fIvalue\fP
+Value to AND the spi field with (spi & value) valid for proto esp or ah.
+.TP
+\fB\-\-hmark\-spi\-set\fP \fIvalue\fP
+Value to OR the spi field with (spi | value) valid for proto esp or ah.
+.TP
+\fB\-\-hmark\-proto\-mask\fP \fIvalue\fP
+An 8 bit value to AND the L4 proto field with (proto & value).
+.TP
+\fB\-\-hmark\-ct\fP
+When flag is set, conntrack data should be used. Useful when NAT internal addressed should be used in calculation.
+Be careful when using DNAT since mangle table is handled before nat table. I.e it will not work as expected to put HMARK in table mangle and PREROUTING chain. The initial packet will have it's hash based on the original address, while the rest of the flow will use the NAT:ed address.
+.TP
+\fB\-\-hmark\-rnd\fP \fIvalue\fP
+A 32 bit initial value for hash calc, default is 0xc175a3b8.
+.PP
+Final processing of the mark in order of execution.
+.TP
+\fB\-\-hmark\-mod\fP \fIvalue (must be > 0)\fP
+The easiest way to describe this is:  hash = hash mod <value>
+.TP
+\fB\-\-hmark\-offset\fP \fIvalue\fP
+The easiest way to describe this is:  hash = hash + <value>
+.PP
+\fIExamples:\fP
+.PP
+Default rule handles all TCP, UDP, SCTP, ESP & AH
+.IP
+iptables \-t mangle \-A PREROUTING \-m state \-\-state NEW,ESTABLISHED,RELATED
+ \-j HMARK \-\-hmark-offs 10000 \-\-hmark-mod 10
+.PP
+Handle SCTP and hash dest port only and produce a nfmark between 100-119.
+.IP
+iptables \-t mangle \-A PREROUTING -p SCTP \-j HMARK \-\-src\-mask 0 \-\-dst\-mask 0
+ \-\-sp\-mask 0 \-\-offset 100 \-\-mod 20
+.PP
+Fragment safe Layer 3 only that keep a class C network flow together
+.IP
+iptables \-t mangle \-A PREROUTING \-j HMARK \-\-method L3 \-\-src\-mask 24 \-\-mod 20 \-\-offset 100
+
diff --git a/include/linux/netfilter/xt_HMARK.h b/include/linux/netfilter/xt_HMARK.h
new file mode 100644
index 0000000..cdf4a8f
--- /dev/null
+++ b/include/linux/netfilter/xt_HMARK.h
@@ -0,0 +1,62 @@
+#ifndef XT_HMARK_H_
+#define XT_HMARK_H_
+
+#include <linux/types.h>
+
+enum {
+	XT_HMARK_NONE,
+	XT_HMARK_SADR_AND,
+	XT_HMARK_DADR_AND,
+	XT_HMARK_SPI_AND,
+	XT_HMARK_SPI_OR,
+	XT_HMARK_SPORT_AND,
+	XT_HMARK_DPORT_AND,
+	XT_HMARK_SPORT_OR,
+	XT_HMARK_DPORT_OR,
+	XT_HMARK_PROTO_AND,
+	XT_HMARK_RND,
+	XT_HMARK_MODULUS,
+	XT_HMARK_OFFSET,
+	XT_HMARK_CT,
+	XT_HMARK_METHOD_L3,
+	XT_HMARK_METHOD_L3_4,
+	XT_F_HMARK_SADR_AND    = 1 << XT_HMARK_SADR_AND,
+	XT_F_HMARK_DADR_AND    = 1 << XT_HMARK_DADR_AND,
+	XT_F_HMARK_SPI_AND     = 1 << XT_HMARK_SPI_AND,
+	XT_F_HMARK_SPI_OR      = 1 << XT_HMARK_SPI_OR,
+	XT_F_HMARK_SPORT_AND   = 1 << XT_HMARK_SPORT_AND,
+	XT_F_HMARK_DPORT_AND   = 1 << XT_HMARK_DPORT_AND,
+	XT_F_HMARK_SPORT_OR    = 1 << XT_HMARK_SPORT_OR,
+	XT_F_HMARK_DPORT_OR    = 1 << XT_HMARK_DPORT_OR,
+	XT_F_HMARK_PROTO_AND   = 1 << XT_HMARK_PROTO_AND,
+	XT_F_HMARK_RND         = 1 << XT_HMARK_RND,
+	XT_F_HMARK_MODULUS     = 1 << XT_HMARK_MODULUS,
+	XT_F_HMARK_OFFSET      = 1 << XT_HMARK_OFFSET,
+	XT_F_HMARK_CT          = 1 << XT_HMARK_CT,
+	XT_F_HMARK_METHOD_L3   = 1 << XT_HMARK_METHOD_L3,
+	XT_F_HMARK_METHOD_L3_4 = 1 << XT_HMARK_METHOD_L3_4,
+};
+
+union hmark_ports {
+	struct {
+		__u16	src;
+		__u16	dst;
+	} p16;
+	__u32	v32;
+};
+
+struct xt_hmark_info {
+	union nf_inet_addr	src_mask;	/* Source address mask */
+	union nf_inet_addr	dst_mask;	/* Dest address mask */
+	union hmark_ports	port_mask;
+	union hmark_ports	port_set;
+	__u32			spi_mask;
+	__u32			spi_set;
+	__u32			flags;		/* Print out only */
+	__u16			proto_mask;	/* L4 Proto mask */
+	__u32			hashrnd;
+	__u32			hmodulus;	/* Modulus */
+	__u32			hoffset;	/* Offset */
+};
+
+#endif /* XT_HMARK_H_ */
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
  2012-04-23 13:35 ` [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark Hans Schillstrom
@ 2012-05-02  0:34   ` Pablo Neira Ayuso
  2012-05-02  7:55     ` Hans Schillstrom
  0 siblings, 1 reply; 21+ messages in thread
From: Pablo Neira Ayuso @ 2012-05-02  0:34 UTC (permalink / raw)
  To: Hans Schillstrom; +Cc: kaber, jengelh, netfilter-devel, netdev, hans

[-- Attachment #1: Type: text/plain, Size: 1084 bytes --]

Hi Hans,

I have decided to take your patch and give it one spin today.

Please, find it attached. The main things I've done are:

* splitting the code into smaller functions, thus, it becomes more
  maintainable.

* try to put common code into functions, eg. the layer 4 protocol
  parsing to obtain the ports is the same for both IPv4 and IPv6.

* adding the hmark_tuple abstraction, cleaner than using several
  variables to set the address, ports, and so on. Thus, we only pass
  one single pointer to it.

* I have removed most of the comments, they bloat the file and most
  information can be extracted by reading the code. I only left the
  comments that clarify "strange" things.

Regarding ICMP traffic, I think we can use the ID field for the
hashing as well. Thus, we handle ICMP like other protocols.

Please, I'd appreciate if you can test and spot issues after my
rework. I have slightly tested here.

I may make some minor cleanup on it before submission but, in that
case, in that case, I'll post the patch. I would not expect more major
changes in it.

Let me know.

[-- Attachment #2: 0001-netfilter-add-xt_hmark-target-for-hash-based-skb-mar.patch --]
[-- Type: text/x-diff, Size: 15964 bytes --]

>From 2aaa13cb2020d7cd8fe7f30b54e083fecbff9975 Mon Sep 17 00:00:00 2001
From: Hans Schillstrom <hans.schillstrom@ericsson.com>
Date: Mon, 23 Apr 2012 03:35:27 +0000
Subject: [PATCH] netfilter: add xt_hmark target for hash-based skb marking

The target allows you to create rules in the "raw" and "mangle" tables
which set the skbuff mark by means of hash calculation within a given
range. The nfmark can influence the routing method (see "Use netfilter
MARK value as routing key") and can also be used by other subsystems to
change their behaviour.

Some examples:

* Default rule handles all TCP, UDP, SCTP, ESP & AH

 iptables -t mangle -A PREROUTING -m state --state NEW,ESTABLISHED,RELATED \
	-j HMARK --hmark-offset 10000 --hmark-mod 10

* Handle SCTP and hash dest port only and produce a nfmark between 100-119.

 iptables -t mangle -A PREROUTING -p SCTP -j HMARK --src-mask 0 --dst-mask 0 \
	--sp-mask 0 --offset 100 --mod 20

* Fragment safe Layer 3 only, that keep a class C network flow together

 iptables -t mangle -A PREROUTING -j HMARK --method L3 \
	--src-mask 24 --mod 20 --offset 100

[ Many code of this patch has been refactorized by Pablo Neira Ayuso ]

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/linux/netfilter/xt_HMARK.h |   62 ++++++
 net/netfilter/Kconfig              |   15 ++
 net/netfilter/Makefile             |    1 +
 net/netfilter/xt_HMARK.c           |  391 ++++++++++++++++++++++++++++++++++++
 4 files changed, 469 insertions(+)
 create mode 100644 include/linux/netfilter/xt_HMARK.h
 create mode 100644 net/netfilter/xt_HMARK.c

diff --git a/include/linux/netfilter/xt_HMARK.h b/include/linux/netfilter/xt_HMARK.h
new file mode 100644
index 0000000..cdf4a8f
--- /dev/null
+++ b/include/linux/netfilter/xt_HMARK.h
@@ -0,0 +1,62 @@
+#ifndef XT_HMARK_H_
+#define XT_HMARK_H_
+
+#include <linux/types.h>
+
+enum {
+	XT_HMARK_NONE,
+	XT_HMARK_SADR_AND,
+	XT_HMARK_DADR_AND,
+	XT_HMARK_SPI_AND,
+	XT_HMARK_SPI_OR,
+	XT_HMARK_SPORT_AND,
+	XT_HMARK_DPORT_AND,
+	XT_HMARK_SPORT_OR,
+	XT_HMARK_DPORT_OR,
+	XT_HMARK_PROTO_AND,
+	XT_HMARK_RND,
+	XT_HMARK_MODULUS,
+	XT_HMARK_OFFSET,
+	XT_HMARK_CT,
+	XT_HMARK_METHOD_L3,
+	XT_HMARK_METHOD_L3_4,
+	XT_F_HMARK_SADR_AND    = 1 << XT_HMARK_SADR_AND,
+	XT_F_HMARK_DADR_AND    = 1 << XT_HMARK_DADR_AND,
+	XT_F_HMARK_SPI_AND     = 1 << XT_HMARK_SPI_AND,
+	XT_F_HMARK_SPI_OR      = 1 << XT_HMARK_SPI_OR,
+	XT_F_HMARK_SPORT_AND   = 1 << XT_HMARK_SPORT_AND,
+	XT_F_HMARK_DPORT_AND   = 1 << XT_HMARK_DPORT_AND,
+	XT_F_HMARK_SPORT_OR    = 1 << XT_HMARK_SPORT_OR,
+	XT_F_HMARK_DPORT_OR    = 1 << XT_HMARK_DPORT_OR,
+	XT_F_HMARK_PROTO_AND   = 1 << XT_HMARK_PROTO_AND,
+	XT_F_HMARK_RND         = 1 << XT_HMARK_RND,
+	XT_F_HMARK_MODULUS     = 1 << XT_HMARK_MODULUS,
+	XT_F_HMARK_OFFSET      = 1 << XT_HMARK_OFFSET,
+	XT_F_HMARK_CT          = 1 << XT_HMARK_CT,
+	XT_F_HMARK_METHOD_L3   = 1 << XT_HMARK_METHOD_L3,
+	XT_F_HMARK_METHOD_L3_4 = 1 << XT_HMARK_METHOD_L3_4,
+};
+
+union hmark_ports {
+	struct {
+		__u16	src;
+		__u16	dst;
+	} p16;
+	__u32	v32;
+};
+
+struct xt_hmark_info {
+	union nf_inet_addr	src_mask;	/* Source address mask */
+	union nf_inet_addr	dst_mask;	/* Dest address mask */
+	union hmark_ports	port_mask;
+	union hmark_ports	port_set;
+	__u32			spi_mask;
+	__u32			spi_set;
+	__u32			flags;		/* Print out only */
+	__u16			proto_mask;	/* L4 Proto mask */
+	__u32			hashrnd;
+	__u32			hmodulus;	/* Modulus */
+	__u32			hoffset;	/* Offset */
+};
+
+#endif /* XT_HMARK_H_ */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index d3f583e..cd5668e 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -517,6 +517,21 @@ config NETFILTER_XT_TARGET_HL
 	since you can easily create immortal packets that loop
 	forever on the network.
 
+config NETFILTER_XT_TARGET_HMARK
+	tristate '"HMARK" target support'
+	depends on (IP6_NF_IPTABLES || IP6_NF_IPTABLES=n)
+	depends on NETFILTER_ADVANCED
+	---help---
+	This option adds the "HMARK" target.
+
+	The target allows you to create rules in the "raw" and "mangle" tables
+	which set the skbuff mark by means of hash calculation within a given
+	range. The nfmark can influence the routing method (see "Use netfilter
+	MARK value as routing key") and can also be used by other subsystems to
+	change their behaviour.
+
+	To compile it as a module, choose M here. If unsure, say N.
+
 config NETFILTER_XT_TARGET_IDLETIMER
 	tristate  "IDLETIMER target support"
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 78b8591..2f3bc0f 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -60,6 +60,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_CONNSECMARK) += xt_CONNSECMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_CT) += xt_CT.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_DSCP) += xt_DSCP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_HL) += xt_HL.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_HMARK) += xt_HMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_LED) += xt_LED.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_LOG) += xt_LOG.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_NFLOG) += xt_NFLOG.o
diff --git a/net/netfilter/xt_HMARK.c b/net/netfilter/xt_HMARK.c
new file mode 100644
index 0000000..df743bd
--- /dev/null
+++ b/net/netfilter/xt_HMARK.c
@@ -0,0 +1,391 @@
+/*
+ * xt_HMARK - Netfilter module to set mark as hash value
+ *
+ * (C) 2012 by Hans Schillstrom <hans.schillstrom@ericsson.com>
+ * (C) 2012 by Pablo Neira Ayuso <pablo@netfilter.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * Description:
+ *
+ * This module calculates a hash value that can be modified by modulus and an
+ * offset, i.e. it is possible to produce a skb->mark within a range The hash
+ * value is based on a direction independent five tuple: src & dst addr src &
+ * dst ports and protocol.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/icmp.h>
+
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_HMARK.h>
+
+#include <net/ip.h>
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+#include <net/netfilter/nf_conntrack.h>
+#endif
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+#include <net/ipv6.h>
+#include <linux/netfilter_ipv6/ip6_tables.h>
+#endif
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Hans Schillstrom <hans.schillstrom@ericsson.com>");
+MODULE_DESCRIPTION("Xtables: packet marking using hash calculation");
+MODULE_ALIAS("ipt_HMARK");
+MODULE_ALIAS("ip6t_HMARK");
+
+struct hmark_tuple {
+	u32			src;
+	u32			dst;
+	union hmark_ports	uports;
+	uint8_t			proto;
+};
+
+static inline u32
+hmark_hash(const struct hmark_tuple *t, const struct xt_hmark_info *info)
+{
+	u32 hash;
+
+	hash = jhash_3words(t->src, t->dst, t->uports.v32, info->hashrnd);
+	hash = hash ^ (t->proto & info->proto_mask);
+
+	return (hash % info->hmodulus) + info->hoffset;
+}
+
+static void
+hmark_set_tuple_ports(const struct sk_buff *skb, unsigned int nhoff,
+		      struct hmark_tuple *t, const struct xt_hmark_info *info)
+{
+	int protoff;
+
+	protoff = proto_ports_offset(t->proto);
+	if (protoff < 0)
+		return;
+
+	nhoff += protoff;
+	if (skb_copy_bits(skb, nhoff, &t->uports, sizeof(t->uports)) < 0)
+		return;
+
+	if (t->proto == IPPROTO_ESP || t->proto == IPPROTO_AH)
+		t->uports.v32 = (t->uports.v32 & info->spi_mask) |
+				info->spi_set;
+	else {
+		t->uports.v32 = (t->uports.v32 & info->port_mask.v32) |
+				info->port_set.v32;
+
+		if (t->uports.p16.dst < t->uports.p16.src)
+			swap(t->uports.p16.dst, t->uports.p16.src);
+	}
+}
+
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+static int get_inner6_hdr(const struct sk_buff *skb, int *offset)
+{
+	struct icmp6hdr *icmp6h, _ih6;
+
+	icmp6h = skb_header_pointer(skb, *offset, sizeof(_ih6), &_ih6);
+	if (icmp6h == NULL)
+		return 0;
+
+	if (icmp6h->icmp6_type && icmp6h->icmp6_type < 128) {
+		*offset += sizeof(struct icmp6hdr);
+		return 1;
+	}
+	return 0;
+}
+
+static int
+hmark_ct_set_htuple_ipv6(const struct sk_buff *skb, struct hmark_tuple *t,
+			 const struct xt_hmark_info *info)
+{
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
+	struct nf_conntrack_tuple *otuple;
+	struct nf_conntrack_tuple *rtuple;
+
+	if (ct == NULL || nf_ct_is_untracked(ct))
+		return -1;
+
+	otuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
+	rtuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
+
+	t->src = (__force u32)
+		(otuple->src.u3.in6.s6_addr32[0] &
+			info->src_mask.in6.s6_addr32[0]) ^
+		(otuple->src.u3.in6.s6_addr32[1] &
+			info->src_mask.in6.s6_addr32[1]) ^
+		(otuple->src.u3.in6.s6_addr32[2] &
+			info->src_mask.in6.s6_addr32[2]) ^
+		(otuple->src.u3.in6.s6_addr32[3] &
+			info->src_mask.in6.s6_addr32[3]);
+	t->dst = (__force u32)
+		(otuple->src.u3.in6.s6_addr32[0] &
+			info->dst_mask.in6.s6_addr32[0]) ^
+		(otuple->src.u3.in6.s6_addr32[1] &
+			info->dst_mask.in6.s6_addr32[1]) ^
+		(otuple->src.u3.in6.s6_addr32[2] &
+			info->dst_mask.in6.s6_addr32[2]) ^
+		(otuple->src.u3.in6.s6_addr32[3] &
+			info->dst_mask.in6.s6_addr32[3]);
+
+	t->proto = nf_ct_protonum(ct);
+	if (t->proto != IPPROTO_ICMP) {
+		t->uports.p16.src = (otuple->src.u.all & info->port_mask.v32) |
+					info->port_set.v32;
+		t->uports.p16.dst = (rtuple->src.u.all & info->port_mask.v32) |
+					info->port_set.v32;
+	}
+
+	return 0;
+#else
+	return -1;
+#endif
+}
+
+static int
+hmark_pkt_set_htuple_ipv6(const struct sk_buff *skb, struct hmark_tuple *t,
+			  const struct xt_hmark_info *info)
+{
+	struct ipv6hdr *ip6, _ip6;
+	int flag = IP6T_FH_F_AUTH; /* Ports offset, find_hdr flags */
+	unsigned int nhoff = 0;
+	u16 fragoff = 0;
+	u8 nexthdr;
+
+	ip6 = (struct ipv6hdr *) (skb->data + skb_network_offset(skb));
+	nexthdr = ipv6_find_hdr(skb, &nhoff, -1, &fragoff, &flag);
+	if (nexthdr < 0)
+		return 0;
+	/* No need to check for icmp errors on fragments */
+	if ((flag & IP6T_FH_F_FRAG) || (nexthdr != IPPROTO_ICMPV6))
+		goto noicmp;
+	/* if an icmp error, use the inner header */
+	if (get_inner6_hdr(skb, &nhoff)) {
+		ip6 = skb_header_pointer(skb, nhoff, sizeof(_ip6), &_ip6);
+		if (ip6 == NULL)
+			return XT_CONTINUE;
+		/* Treat AH as ESP, use SPI nothing else. */
+		flag = IP6T_FH_F_AUTH;
+		nexthdr = ipv6_find_hdr(skb, &nhoff, -1, &fragoff, &flag);
+		if (nexthdr < 0)
+			return XT_CONTINUE;
+	}
+noicmp:
+	t->src = (__force u32)
+		(ip6->saddr.s6_addr32[0] & info->src_mask.in6.s6_addr32[0]) ^
+		(ip6->saddr.s6_addr32[1] & info->src_mask.in6.s6_addr32[1]) ^
+		(ip6->saddr.s6_addr32[2] & info->src_mask.in6.s6_addr32[2]) ^
+		(ip6->saddr.s6_addr32[3] & info->src_mask.in6.s6_addr32[3]);
+	t->dst = (__force u32)
+		(ip6->daddr.s6_addr32[0] & info->dst_mask.in6.s6_addr32[0]) ^
+		(ip6->daddr.s6_addr32[1] & info->dst_mask.in6.s6_addr32[1]) ^
+		(ip6->daddr.s6_addr32[2] & info->dst_mask.in6.s6_addr32[2]) ^
+		(ip6->daddr.s6_addr32[3] & info->dst_mask.in6.s6_addr32[3]);
+
+	t->proto = nexthdr;
+
+	if (t->proto == IPPROTO_ICMPV6)
+		return 0;
+
+	if (flag & IP6T_FH_F_FRAG)
+		return 0;
+
+	if (!(info->flags & XT_F_HMARK_METHOD_L3))
+		hmark_set_tuple_ports(skb, nhoff, t, info);
+
+	return 0;
+}
+
+static unsigned int
+hmark_tg_v6(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+	struct hmark_tuple t;
+
+	memset(&t, 0, sizeof(struct hmark_tuple));
+
+	if (info->flags & XT_F_HMARK_CT) {
+		if (hmark_ct_set_htuple_ipv6(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	} else {
+		if (hmark_pkt_set_htuple_ipv6(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	}
+
+	skb->mark = hmark_hash(&t, info);
+	return XT_CONTINUE;
+}
+#endif
+
+static int get_inner_hdr(const struct sk_buff *skb, int iphsz, int *nhoff)
+{
+	const struct icmphdr *icmph;
+	struct icmphdr _ih;
+
+	/* Not enough header? */
+	icmph = skb_header_pointer(skb, *nhoff + iphsz, sizeof(_ih), &_ih);
+	if (icmph == NULL && icmph->type > NR_ICMP_TYPES)
+		return 0;
+
+	/* Error message? */
+	if (icmph->type != ICMP_DEST_UNREACH &&
+	    icmph->type != ICMP_SOURCE_QUENCH &&
+	    icmph->type != ICMP_TIME_EXCEEDED &&
+	    icmph->type != ICMP_PARAMETERPROB &&
+	    icmph->type != ICMP_REDIRECT)
+		return 0;
+
+	*nhoff += iphsz + sizeof(_ih);
+	return 1;
+}
+
+static int
+hmark_ct_set_htuple_ipv4(const struct sk_buff *skb, struct hmark_tuple *t,
+			 const struct xt_hmark_info *info)
+{
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
+	struct nf_conntrack_tuple *otuple;
+	struct nf_conntrack_tuple *rtuple;
+
+	if (ct == NULL || nf_ct_is_untracked(ct))
+		return -1;
+
+	otuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
+	rtuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
+
+	t->src = (__force u32) otuple->src.u3.in.s_addr &
+			info->src_mask.in.s_addr;
+	t->dst = (__force u32) rtuple->src.u3.in.s_addr &
+			info->dst_mask.in.s_addr;
+
+	t->proto = nf_ct_protonum(ct);
+	if (t->proto != IPPROTO_ICMP) {
+		t->uports.p16.src = (otuple->src.u.all & info->port_mask.v32) |
+					info->port_set.v32;
+		t->uports.p16.dst = (rtuple->src.u.all & info->port_mask.v32) |
+					info->port_set.v32;
+	}
+	return 0;
+#else
+	return -1;
+#endif
+}
+
+static int
+hmark_pkt_set_htuple_ipv4(const struct sk_buff *skb, struct hmark_tuple *t,
+			  const struct xt_hmark_info *info)
+{
+	struct iphdr *ip, _ip;
+	int nhoff = skb_network_offset(skb);
+
+	ip = (struct iphdr *) (skb->data + nhoff);
+	if (ip->protocol == IPPROTO_ICMP) {
+		/* use inner header in case of ICMP errors */
+		if (get_inner_hdr(skb, ip->ihl * 4, &nhoff)) {
+			ip = skb_header_pointer(skb, nhoff, sizeof(_ip), &_ip);
+			if (ip == NULL)
+				return 0;
+		}
+	}
+
+	t->src = (__force u32) ip->saddr;
+	t->dst = (__force u32) ip->daddr;
+
+	/* this ensures consistent hashing for both directions */
+	if (t->dst < t->src)
+		swap(t->src, t->dst);
+
+	t->src &= info->src_mask.ip;
+	t->dst &= info->dst_mask.ip;
+
+	t->proto = ip->protocol;
+
+	/* ICMP has no ports, skip */
+	if (t->proto == IPPROTO_ICMP)
+		return 0;
+
+	/* follow-up fragments don't contain ports, skip */
+	if (ip->frag_off & htons(IP_MF | IP_OFFSET))
+		return 0;
+
+	if (!(info->flags & XT_F_HMARK_METHOD_L3))
+		hmark_set_tuple_ports(skb, ip->ihl * 4, t, info);
+
+	return 0;
+}
+
+static unsigned int
+hmark_tg_v4(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+	struct hmark_tuple t;
+
+	memset(&t, 0, sizeof(struct hmark_tuple));
+
+	if (info->flags & XT_F_HMARK_CT) {
+		if (hmark_ct_set_htuple_ipv4(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	} else {
+		if (hmark_pkt_set_htuple_ipv4(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	}
+
+	skb->mark = hmark_hash(&t, info);
+	return XT_CONTINUE;
+}
+
+static int hmark_tg_check(const struct xt_tgchk_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+
+	if (!info->hmodulus) {
+		pr_info("xt_HMARK: hash modulus can't be zero\n");
+		return -EINVAL;
+	}
+	if (info->proto_mask && (info->flags & XT_F_HMARK_METHOD_L3)) {
+		pr_info("xt_HMARK: proto mask must be zero with L3 mode\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static struct xt_target hmark_tg_reg[] __read_mostly = {
+	{
+		.name		= "HMARK",
+		.family		= NFPROTO_IPV4,
+		.target		= hmark_tg_v4,
+		.targetsize	= sizeof(struct xt_hmark_info),
+		.checkentry	= hmark_tg_check,
+		.me		= THIS_MODULE,
+	},
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+	{
+		.name		= "HMARK",
+		.family		= NFPROTO_IPV6,
+		.target		= hmark_tg_v6,
+		.targetsize	= sizeof(struct xt_hmark_info),
+		.checkentry	= hmark_tg_check,
+		.me		= THIS_MODULE,
+	},
+#endif
+};
+
+static int __init hmark_tg_init(void)
+{
+	return xt_register_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+}
+
+static void __exit hmark_tg_exit(void)
+{
+	xt_unregister_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+}
+
+module_init(hmark_tg_init);
+module_exit(hmark_tg_exit);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
  2012-05-02  0:34   ` Pablo Neira Ayuso
@ 2012-05-02  7:55     ` Hans Schillstrom
  2012-05-02  8:09       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 21+ messages in thread
From: Hans Schillstrom @ 2012-05-02  7:55 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: kaber, jengelh, netfilter-devel, netdev, hans

Hello Pablo
(Sorry for spamming some of you, kmail started to send HTML mail)

On Wednesday 02 May 2012 02:34:14 Pablo Neira Ayuso wrote:
> Hi Hans,
> 
> I have decided to take your patch and give it one spin today.
> 
> Please, find it attached. The main things I've done are:
> 
> * splitting the code into smaller functions, thus, it becomes more
>   maintainable.
> 
> * try to put common code into functions, eg. the layer 4 protocol
>   parsing to obtain the ports is the same for both IPv4 and IPv6.
> 
> * adding the hmark_tuple abstraction, cleaner than using several
>   variables to set the address, ports, and so on. Thus, we only pass
>   one single pointer to it.
> 
> * I have removed most of the comments, they bloat the file and most
>   information can be extracted by reading the code. I only left the
>   comments that clarify "strange" things.
> 
> Regarding ICMP traffic, I think we can use the ID field for the
> hashing as well. Thus, we handle ICMP like other protocols.

Yes why not, I can give it a try.

> 
> Please, I'd appreciate if you can test and spot issues after my
> rework. I have slightly tested here.

OK I found some minor things, I'll send an updated version back later today.
I will run all my tests it will take a couple of hours.

This is what I have founf so far (before testing)

+	t->dst = (__force u32)
+		(otuple->src.u3.in6.s6_addr32[0] &
+			info->dst_mask.in6.s6_addr32[0]) ^
+		(otuple->src.u3.in6.s6_addr32[1] &
+			info->dst_mask.in6.s6_addr32[1]) ^
+		(otuple->src.u3.in6.s6_addr32[2] &
+			info->dst_mask.in6.s6_addr32[2]) ^
+		(otuple->src.u3.in6.s6_addr32[3] &
+			info->dst_mask.in6.s6_addr32[3]);

Should be rtuple 

+	if (t->proto != IPPROTO_ICMP) {
+		t->uports.p16.src = (otuple->src.u.all & info->port_mask.v32) |
+					info->port_set.v32;
+		t->uports.p16.dst = (rtuple->src.u.all & info->port_mask.v32) |
+					info->port_set.v32;
+	}

in hmark_ct_set_htuple_ipv4() and hmark_ct_set_htuple_ipv6()
Wrong port_mask and port_set, this will work better..

		if (t->proto != IPPROTO_ICMP) {
                t->uports.p16.src = otuple->src.u.all;
                t->uports.p16.dst = rtuple->src.u.all;
                t->uports.v32 = (t->uports.v32 & info->port_mask.v32) |
                                info->port_set.v32;


> 
> I may make some minor cleanup on it before submission but, in that
> case, in that case, I'll post the patch. I would not expect more major
> changes in it.
> 
> Let me know.
Thanks Pablo
I realized that I sent wrong version as v12 (v11 with updated comments only), sorry for the confusion.
Basically the changes are the same but you have split it up a little bit more.

-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
  2012-05-02  7:55     ` Hans Schillstrom
@ 2012-05-02  8:09       ` Pablo Neira Ayuso
  2012-05-02 17:49         ` Hans Schillstrom
  0 siblings, 1 reply; 21+ messages in thread
From: Pablo Neira Ayuso @ 2012-05-02  8:09 UTC (permalink / raw)
  To: Hans Schillstrom; +Cc: kaber, jengelh, netfilter-devel, netdev, hans

On Wed, May 02, 2012 at 09:55:00AM +0200, Hans Schillstrom wrote:
> Hello Pablo
> (Sorry for spamming some of you, kmail started to send HTML mail)
> 
> On Wednesday 02 May 2012 02:34:14 Pablo Neira Ayuso wrote:
> > Hi Hans,
> > 
> > I have decided to take your patch and give it one spin today.
> > 
> > Please, find it attached. The main things I've done are:
> > 
> > * splitting the code into smaller functions, thus, it becomes more
> >   maintainable.
> > 
> > * try to put common code into functions, eg. the layer 4 protocol
> >   parsing to obtain the ports is the same for both IPv4 and IPv6.
> > 
> > * adding the hmark_tuple abstraction, cleaner than using several
> >   variables to set the address, ports, and so on. Thus, we only pass
> >   one single pointer to it.
> > 
> > * I have removed most of the comments, they bloat the file and most
> >   information can be extracted by reading the code. I only left the
> >   comments that clarify "strange" things.
> > 
> > Regarding ICMP traffic, I think we can use the ID field for the
> > hashing as well. Thus, we handle ICMP like other protocols.
> 
> Yes why not, I can give it a try.
> 
> > 
> > Please, I'd appreciate if you can test and spot issues after my
> > rework. I have slightly tested here.
> 
> OK I found some minor things, I'll send an updated version back later today.
> I will run all my tests it will take a couple of hours.

Please, go ahead.

> This is what I have founf so far (before testing)
> 
> +	t->dst = (__force u32)
> +		(otuple->src.u3.in6.s6_addr32[0] &
> +			info->dst_mask.in6.s6_addr32[0]) ^
> +		(otuple->src.u3.in6.s6_addr32[1] &
> +			info->dst_mask.in6.s6_addr32[1]) ^
> +		(otuple->src.u3.in6.s6_addr32[2] &
> +			info->dst_mask.in6.s6_addr32[2]) ^
> +		(otuple->src.u3.in6.s6_addr32[3] &
> +			info->dst_mask.in6.s6_addr32[3]);
> 
> Should be rtuple 
> 
> +	if (t->proto != IPPROTO_ICMP) {
> +		t->uports.p16.src = (otuple->src.u.all & info->port_mask.v32) |
> +					info->port_set.v32;
> +		t->uports.p16.dst = (rtuple->src.u.all & info->port_mask.v32) |
> +					info->port_set.v32;
> +	}
> 
> in hmark_ct_set_htuple_ipv4() and hmark_ct_set_htuple_ipv6()
> Wrong port_mask and port_set, this will work better..
> 
> 		if (t->proto != IPPROTO_ICMP) {
>                 t->uports.p16.src = otuple->src.u.all;
>                 t->uports.p16.dst = rtuple->src.u.all;
>                 t->uports.v32 = (t->uports.v32 & info->port_mask.v32) |
>                                 info->port_set.v32;

Fine, thanks.

> > 
> > I may make some minor cleanup on it before submission but, in that
> > case, in that case, I'll post the patch. I would not expect more major
> > changes in it.
> > 
> > Let me know.
> Thanks Pablo
> I realized that I sent wrong version as v12 (v11 with updated comments only), sorry for the confusion.

Yes, I noticed that.

> Basically the changes are the same but you have split it up a little bit more.

Exactly, my idea was to split it up to make it more maintainable and
to try to re-use code as much as possible.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
  2012-05-02  8:09       ` Pablo Neira Ayuso
@ 2012-05-02 17:49         ` Hans Schillstrom
  2012-05-06 22:57           ` Pablo Neira Ayuso
  0 siblings, 1 reply; 21+ messages in thread
From: Hans Schillstrom @ 2012-05-02 17:49 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: kaber, jengelh, netfilter-devel, netdev, hans

[-- Attachment #1: Type: text/plain, Size: 2271 bytes --]

On Wednesday 02 May 2012 10:09:44 Pablo Neira Ayuso wrote:
> On Wed, May 02, 2012 at 09:55:00AM +0200, Hans Schillstrom wrote:
> > Hello Pablo
> > (Sorry for spamming some of you, kmail started to send HTML mail)
> > 
> > On Wednesday 02 May 2012 02:34:14 Pablo Neira Ayuso wrote:
> > > Hi Hans,
> > > 
> > > I have decided to take your patch and give it one spin today.
> > > 
> > > Please, find it attached. The main things I've done are:
> > > 
> > > * splitting the code into smaller functions, thus, it becomes more
> > >   maintainable.
> > > 
> > > * try to put common code into functions, eg. the layer 4 protocol
> > >   parsing to obtain the ports is the same for both IPv4 and IPv6.
> > > 
> > > * adding the hmark_tuple abstraction, cleaner than using several
> > >   variables to set the address, ports, and so on. Thus, we only pass
> > >   one single pointer to it.
> > > 
> > > * I have removed most of the comments, they bloat the file and most
> > >   information can be extracted by reading the code. I only left the
> > >   comments that clarify "strange" things.
> > > 
> > > Regarding ICMP traffic, I think we can use the ID field for the
> > > hashing as well. Thus, we handle ICMP like other protocols.
> > 
> > Yes why not, I can give it a try.
> > 

I think we wait with this one..

> > > 
> > > Please, I'd appreciate if you can test and spot issues after my
> > > rework. I have slightly tested here.
> > 
> > OK I found some minor things, I'll send an updated version back later today.
> > I will run all my tests it will take a couple of hours.
> 
> Please, go ahead.

Done, all my tests passed

[snip]
This is what I have done.

- I reduced the code size a little bit by combining the hmark_ct_set_htuple_ipvX into one func.
  by adding a hmark_addr6_mask() and hmark_addr_any_mask()
  Note that using "otuple->src.l3num" as param 1 in both src and dst is not a typo.
  (it's not set in the rtuple)
- Made the if (dst < src) swap() in the hmark_hash() since it should be used by every caller. 
- Moved the L3 check a little bit earlier.
- changed return values for fragments.
- Added nhoffs to: hmark_set_tuple_ports(skb, (ip->ihl * 4) + nhoff, t, info);
  to get icmp working


-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

[-- Attachment #2: 0001-netfilter-add-xt_hmark-target-for-hash-based-skb-mar.patch --]
[-- Type: text/x-patch, Size: 15382 bytes --]

From 55b47c7a3f7ab6a9d0430c6b753ccf3cc3cac7b8 Mon Sep 17 00:00:00 2001
From: Hans Schillstrom <hans.schillstrom@ericsson.com>
Date: Wed, 2 May 2012 18:59:28 +0200
Subject: [PATCH 1/1] netfilter: add xt_hmark target for hash-based skb marking

The target allows you to create rules in the "raw" and "mangle" tables
which set the skbuff mark by means of hash calculation within a given
range. The nfmark can influence the routing method (see "Use netfilter
MARK value as routing key") and can also be used by other subsystems to
change their behaviour.

Some examples:

* Default rule handles all TCP, UDP, SCTP, ESP & AH

 iptables -t mangle -A PREROUTING -m state --state NEW,ESTABLISHED,RELATED \
	-j HMARK --hmark-offset 10000 --hmark-mod 10

* Handle SCTP and hash dest port only and produce a nfmark between 100-119.

 iptables -t mangle -A PREROUTING -p SCTP -j HMARK --src-mask 0 --dst-mask 0 \
	--sp-mask 0 --offset 100 --mod 20

* Fragment safe Layer 3 only, that keep a class C network flow together

 iptables -t mangle -A PREROUTING -j HMARK --method L3 \
	--src-mask 24 --mod 20 --offset 100

[ A big part of this patch has been refactorized by Pablo Neira Ayuso ]

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/linux/netfilter/xt_HMARK.h |   62 ++++++
 net/netfilter/Kconfig              |   15 ++
 net/netfilter/Makefile             |    1 +
 net/netfilter/xt_HMARK.c           |  371 ++++++++++++++++++++++++++++++++++++
 4 files changed, 449 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_HMARK.h
 create mode 100644 net/netfilter/xt_HMARK.c

diff --git a/include/linux/netfilter/xt_HMARK.h b/include/linux/netfilter/xt_HMARK.h
new file mode 100644
index 0000000..cdf4a8f
--- /dev/null
+++ b/include/linux/netfilter/xt_HMARK.h
@@ -0,0 +1,62 @@
+#ifndef XT_HMARK_H_
+#define XT_HMARK_H_
+
+#include <linux/types.h>
+
+enum {
+	XT_HMARK_NONE,
+	XT_HMARK_SADR_AND,
+	XT_HMARK_DADR_AND,
+	XT_HMARK_SPI_AND,
+	XT_HMARK_SPI_OR,
+	XT_HMARK_SPORT_AND,
+	XT_HMARK_DPORT_AND,
+	XT_HMARK_SPORT_OR,
+	XT_HMARK_DPORT_OR,
+	XT_HMARK_PROTO_AND,
+	XT_HMARK_RND,
+	XT_HMARK_MODULUS,
+	XT_HMARK_OFFSET,
+	XT_HMARK_CT,
+	XT_HMARK_METHOD_L3,
+	XT_HMARK_METHOD_L3_4,
+	XT_F_HMARK_SADR_AND    = 1 << XT_HMARK_SADR_AND,
+	XT_F_HMARK_DADR_AND    = 1 << XT_HMARK_DADR_AND,
+	XT_F_HMARK_SPI_AND     = 1 << XT_HMARK_SPI_AND,
+	XT_F_HMARK_SPI_OR      = 1 << XT_HMARK_SPI_OR,
+	XT_F_HMARK_SPORT_AND   = 1 << XT_HMARK_SPORT_AND,
+	XT_F_HMARK_DPORT_AND   = 1 << XT_HMARK_DPORT_AND,
+	XT_F_HMARK_SPORT_OR    = 1 << XT_HMARK_SPORT_OR,
+	XT_F_HMARK_DPORT_OR    = 1 << XT_HMARK_DPORT_OR,
+	XT_F_HMARK_PROTO_AND   = 1 << XT_HMARK_PROTO_AND,
+	XT_F_HMARK_RND         = 1 << XT_HMARK_RND,
+	XT_F_HMARK_MODULUS     = 1 << XT_HMARK_MODULUS,
+	XT_F_HMARK_OFFSET      = 1 << XT_HMARK_OFFSET,
+	XT_F_HMARK_CT          = 1 << XT_HMARK_CT,
+	XT_F_HMARK_METHOD_L3   = 1 << XT_HMARK_METHOD_L3,
+	XT_F_HMARK_METHOD_L3_4 = 1 << XT_HMARK_METHOD_L3_4,
+};
+
+union hmark_ports {
+	struct {
+		__u16	src;
+		__u16	dst;
+	} p16;
+	__u32	v32;
+};
+
+struct xt_hmark_info {
+	union nf_inet_addr	src_mask;	/* Source address mask */
+	union nf_inet_addr	dst_mask;	/* Dest address mask */
+	union hmark_ports	port_mask;
+	union hmark_ports	port_set;
+	__u32			spi_mask;
+	__u32			spi_set;
+	__u32			flags;		/* Print out only */
+	__u16			proto_mask;	/* L4 Proto mask */
+	__u32			hashrnd;
+	__u32			hmodulus;	/* Modulus */
+	__u32			hoffset;	/* Offset */
+};
+
+#endif /* XT_HMARK_H_ */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 0c6f67e..209c1ed 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -509,6 +509,21 @@ config NETFILTER_XT_TARGET_HL
 	since you can easily create immortal packets that loop
 	forever on the network.
 
+config NETFILTER_XT_TARGET_HMARK
+	tristate '"HMARK" target support'
+	depends on (IP6_NF_IPTABLES || IP6_NF_IPTABLES=n)
+	depends on NETFILTER_ADVANCED
+	---help---
+	This option adds the "HMARK" target.
+
+	The target allows you to create rules in the "raw" and "mangle" tables
+	which set the skbuff mark by means of hash calculation within a given
+	range. The nfmark can influence the routing method (see "Use netfilter
+	MARK value as routing key") and can also be used by other subsystems to
+	change their behaviour.
+
+	To compile it as a module, choose M here. If unsure, say N.
+
 config NETFILTER_XT_TARGET_IDLETIMER
 	tristate  "IDLETIMER target support"
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index ca36765..4e7960c 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_CONNSECMARK) += xt_CONNSECMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_CT) += xt_CT.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_DSCP) += xt_DSCP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_HL) += xt_HL.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_HMARK) += xt_HMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_LED) += xt_LED.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_LOG) += xt_LOG.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_NFLOG) += xt_NFLOG.o
diff --git a/net/netfilter/xt_HMARK.c b/net/netfilter/xt_HMARK.c
new file mode 100644
index 0000000..76a3fa7
--- /dev/null
+++ b/net/netfilter/xt_HMARK.c
@@ -0,0 +1,371 @@
+/*
+ * xt_HMARK - Netfilter module to set mark as hash value
+ *
+ * (C) 2012 by Hans Schillstrom <hans.schillstrom@ericsson.com>
+ * (C) 2012 by Pablo Neira Ayuso <pablo@netfilter.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * Description:
+ *
+ * This module calculates a hash value that can be modified by modulus and an
+ * offset, i.e. it is possible to produce a skb->mark within a range The hash
+ * value is based on a direction independent five tuple: src & dst addr src &
+ * dst ports and protocol.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/icmp.h>
+
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_HMARK.h>
+
+#include <net/ip.h>
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+#include <net/netfilter/nf_conntrack.h>
+#endif
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+#include <net/ipv6.h>
+#include <linux/netfilter_ipv6/ip6_tables.h>
+#endif
+
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Hans Schillstrom <hans.schillstrom@ericsson.com>");
+MODULE_DESCRIPTION("Xtables: packet marking using hash calculation");
+MODULE_ALIAS("ipt_HMARK");
+MODULE_ALIAS("ip6t_HMARK");
+
+struct hmark_tuple {
+	u32			src;
+	u32			dst;
+	union hmark_ports	uports;
+	uint8_t			proto;
+};
+
+static int
+hmark_ct_set_htuple(const struct sk_buff *skb, struct hmark_tuple *t,
+			 const struct xt_hmark_info *info);
+static inline u32
+hmark_hash(struct hmark_tuple *t, const struct xt_hmark_info *info)
+{
+	u32 hash;
+
+	if (t->dst < t->src)
+		swap(t->src, t->dst);
+
+	hash = jhash_3words(t->src, t->dst, t->uports.v32, info->hashrnd);
+	hash = hash ^ (t->proto & info->proto_mask);
+
+	return (hash % info->hmodulus) + info->hoffset;
+}
+
+static void
+hmark_set_tuple_ports(const struct sk_buff *skb, unsigned int nhoff,
+		      struct hmark_tuple *t, const struct xt_hmark_info *info)
+{
+	int protoff;
+
+	protoff = proto_ports_offset(t->proto);
+	if (protoff < 0)
+		return;
+
+	nhoff += protoff;
+	if (skb_copy_bits(skb, nhoff, &t->uports, sizeof(t->uports)) < 0)
+		return;
+
+	if (t->proto == IPPROTO_ESP || t->proto == IPPROTO_AH)
+		t->uports.v32 = (t->uports.v32 & info->spi_mask) |
+				info->spi_set;
+	else {
+		t->uports.v32 = (t->uports.v32 & info->port_mask.v32) |
+				info->port_set.v32;
+
+		if (t->uports.p16.dst < t->uports.p16.src)
+			swap(t->uports.p16.dst, t->uports.p16.src);
+	}
+}
+
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+static int get_inner6_hdr(const struct sk_buff *skb, int *offset)
+{
+	struct icmp6hdr *icmp6h, _ih6;
+
+	icmp6h = skb_header_pointer(skb, *offset, sizeof(_ih6), &_ih6);
+	if (icmp6h == NULL)
+		return 0;
+
+	if (icmp6h->icmp6_type && icmp6h->icmp6_type < 128) {
+		*offset += sizeof(struct icmp6hdr);
+		return 1;
+	}
+	return 0;
+}
+
+static inline u32 hmark_addr6_mask(const __u32 *addr32, const __u32 *mask)
+{
+	return  (addr32[0] & mask[0]) ^
+		(addr32[1] & mask[1]) ^
+		(addr32[2] & mask[2]) ^
+		(addr32[3] & mask[3]);
+}
+
+static int
+hmark_pkt_set_htuple_ipv6(const struct sk_buff *skb, struct hmark_tuple *t,
+			  const struct xt_hmark_info *info)
+{
+	struct ipv6hdr *ip6, _ip6;
+	int flag = IP6T_FH_F_AUTH; /* Ports offset, find_hdr flags */
+	unsigned int nhoff = 0;
+	u16 fragoff = 0;
+	int nexthdr;
+
+	ip6 = (struct ipv6hdr *) (skb->data + skb_network_offset(skb));
+	nexthdr = ipv6_find_hdr(skb, &nhoff, -1, &fragoff, &flag);
+	if (nexthdr < 0)
+		return 0;
+	/* No need to check for icmp errors on fragments */
+	if ((flag & IP6T_FH_F_FRAG) || (nexthdr != IPPROTO_ICMPV6))
+		goto noicmp;
+	/* if an icmp error, use the inner header */
+	if (get_inner6_hdr(skb, &nhoff)) {
+		ip6 = skb_header_pointer(skb, nhoff, sizeof(_ip6), &_ip6);
+		if (ip6 == NULL)
+			return -1;
+		/* Treat AH as ESP, use SPI nothing else. */
+		flag = IP6T_FH_F_AUTH;
+		nexthdr = ipv6_find_hdr(skb, &nhoff, -1, &fragoff, &flag);
+		if (nexthdr < 0)
+			return -1;
+	}
+noicmp:
+	t->src = hmark_addr6_mask(ip6->saddr.s6_addr32, info->src_mask.all);
+	t->dst = hmark_addr6_mask(ip6->daddr.s6_addr32, info->dst_mask.all);
+
+	if (info->flags & XT_F_HMARK_METHOD_L3)
+		return 0;
+
+	t->proto = nexthdr;
+
+	if (t->proto == IPPROTO_ICMPV6)
+		return 0;
+
+	if (flag & IP6T_FH_F_FRAG)
+		return -1;
+
+	hmark_set_tuple_ports(skb, nhoff, t, info);
+
+	return 0;
+}
+
+static unsigned int
+hmark_tg_v6(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+	struct hmark_tuple t;
+
+	memset(&t, 0, sizeof(struct hmark_tuple));
+
+	if (info->flags & XT_F_HMARK_CT) {
+		if (hmark_ct_set_htuple(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	} else {
+		if (hmark_pkt_set_htuple_ipv6(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	}
+
+	skb->mark = hmark_hash(&t, info);
+	return XT_CONTINUE;
+}
+
+static inline u32
+hmark_addr_any_mask(int l3num, const __u32 *addr32, const __u32 *mask)
+{
+	if (l3num == AF_INET)
+		return *addr32 & *mask;
+
+	return hmark_addr6_mask(addr32, mask);
+}
+#else
+static inline u32
+hmark_addr_any_mask(int l3num, const __u32 *addr32, const __u32 *mask)
+{
+	return *addr32 & *mask;
+}
+
+#endif
+
+static int
+hmark_ct_set_htuple(const struct sk_buff *skb, struct hmark_tuple *t,
+			 const struct xt_hmark_info *info)
+{
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
+	struct nf_conntrack_tuple *otuple;
+	struct nf_conntrack_tuple *rtuple;
+
+	if (ct == NULL || nf_ct_is_untracked(ct))
+		return -1;
+
+	otuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
+	rtuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
+
+	t->src = hmark_addr_any_mask(otuple->src.l3num, otuple->src.u3.all,
+				     info->src_mask.all);
+	t->dst = hmark_addr_any_mask(otuple->src.l3num, rtuple->src.u3.all,
+				     info->dst_mask.all);
+
+	if (info->flags & XT_F_HMARK_METHOD_L3)
+		return 0;
+
+	t->proto = nf_ct_protonum(ct);
+	if (t->proto != IPPROTO_ICMP) {
+		t->uports.p16.src = otuple->src.u.all;
+		t->uports.p16.dst = rtuple->src.u.all;
+		t->uports.v32 = (t->uports.v32 & info->port_mask.v32) |
+				info->port_set.v32;
+		if (t->uports.p16.dst < t->uports.p16.src)
+			swap(t->uports.p16.dst, t->uports.p16.src);
+	}
+
+	return 0;
+#else
+	return -1;
+#endif
+}
+
+static int get_inner_hdr(const struct sk_buff *skb, int iphsz, int *nhoff)
+{
+	const struct icmphdr *icmph;
+	struct icmphdr _ih;
+
+	/* Not enough header? */
+	icmph = skb_header_pointer(skb, *nhoff + iphsz, sizeof(_ih), &_ih);
+	if (icmph == NULL && icmph->type > NR_ICMP_TYPES)
+		return 0;
+
+	/* Error message? */
+	if (icmph->type != ICMP_DEST_UNREACH &&
+	    icmph->type != ICMP_SOURCE_QUENCH &&
+	    icmph->type != ICMP_TIME_EXCEEDED &&
+	    icmph->type != ICMP_PARAMETERPROB &&
+	    icmph->type != ICMP_REDIRECT)
+		return 0;
+
+	*nhoff += iphsz + sizeof(_ih);
+	return 1;
+}
+
+static int
+hmark_pkt_set_htuple_ipv4(const struct sk_buff *skb, struct hmark_tuple *t,
+			  const struct xt_hmark_info *info)
+{
+	struct iphdr *ip, _ip;
+	int nhoff = skb_network_offset(skb);
+
+	ip = (struct iphdr *) (skb->data + nhoff);
+	if (ip->protocol == IPPROTO_ICMP) {
+		/* use inner header in case of ICMP errors */
+		if (get_inner_hdr(skb, ip->ihl * 4, &nhoff)) {
+			ip = skb_header_pointer(skb, nhoff, sizeof(_ip), &_ip);
+			if (ip == NULL)
+				return -1;
+		}
+	}
+
+	t->src = (__force u32) ip->saddr;
+	t->dst = (__force u32) ip->daddr;
+
+	t->src &= info->src_mask.ip;
+	t->dst &= info->dst_mask.ip;
+
+	if (info->flags & XT_F_HMARK_METHOD_L3)
+		return 0;
+
+	t->proto = ip->protocol;
+
+	/* ICMP has no ports, skip */
+	if (t->proto == IPPROTO_ICMP)
+		return 0;
+
+	/* follow-up fragments don't contain ports, skip */
+	if (ip->frag_off & htons(IP_MF | IP_OFFSET))
+		return -1;
+
+	hmark_set_tuple_ports(skb, (ip->ihl * 4) + nhoff, t, info);
+
+	return 0;
+}
+
+static unsigned int
+hmark_tg_v4(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+	struct hmark_tuple t;
+
+	memset(&t, 0, sizeof(struct hmark_tuple));
+
+	if (info->flags & XT_F_HMARK_CT) {
+		if (hmark_ct_set_htuple(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	} else {
+		if (hmark_pkt_set_htuple_ipv4(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	}
+
+	skb->mark = hmark_hash(&t, info);
+	return XT_CONTINUE;
+}
+
+static int hmark_tg_check(const struct xt_tgchk_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+
+	if (!info->hmodulus) {
+		pr_info("xt_HMARK: hash modulus can't be zero\n");
+		return -EINVAL;
+	}
+	if (info->proto_mask && (info->flags & XT_F_HMARK_METHOD_L3)) {
+		pr_info("xt_HMARK: proto mask must be zero with L3 mode\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static struct xt_target hmark_tg_reg[] __read_mostly = {
+	{
+		.name		= "HMARK",
+		.family		= NFPROTO_IPV4,
+		.target		= hmark_tg_v4,
+		.targetsize	= sizeof(struct xt_hmark_info),
+		.checkentry	= hmark_tg_check,
+		.me		= THIS_MODULE,
+	},
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+	{
+		.name		= "HMARK",
+		.family		= NFPROTO_IPV6,
+		.target		= hmark_tg_v6,
+		.targetsize	= sizeof(struct xt_hmark_info),
+		.checkentry	= hmark_tg_check,
+		.me		= THIS_MODULE,
+	},
+#endif
+};
+
+static int __init hmark_tg_init(void)
+{
+	return xt_register_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+}
+
+static void __exit hmark_tg_exit(void)
+{
+	xt_unregister_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+}
+
+module_init(hmark_tg_init);
+module_exit(hmark_tg_exit);
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
  2012-05-02 17:49         ` Hans Schillstrom
@ 2012-05-06 22:57           ` Pablo Neira Ayuso
  2012-05-07  8:20             ` Hans Schillstrom
  0 siblings, 1 reply; 21+ messages in thread
From: Pablo Neira Ayuso @ 2012-05-06 22:57 UTC (permalink / raw)
  To: Hans Schillstrom; +Cc: kaber, jengelh, netfilter-devel, netdev, hans

[-- Attachment #1: Type: text/plain, Size: 9477 bytes --]

Hi Hans,

[...]
> > > > Regarding ICMP traffic, I think we can use the ID field for the
> > > > hashing as well. Thus, we handle ICMP like other protocols.
> > > 
> > > Yes why not, I can give it a try.
> > > 
> 
> I think we wait with this one..

I see. This is easy to add for the conntrack side, but it will require
some extra code for the packet-based solution.

Not directly related to this but, I know that your intention is to
make this as flexible as possible. However, I still don't find how I
would use the port mask feature in any of my setups.  Basically, I
don't come up with any useful example for this situation.

I'm also telling this because I think that ICMP support will be
easier to add if port masking is removed.

[...]
> This is what I have done.
> 
> - I reduced the code size a little bit by combining the hmark_ct_set_htuple_ipvX into one func.
>   by adding a hmark_addr6_mask() and hmark_addr_any_mask()
>   Note that using "otuple->src.l3num" as param 1 in both src and dst is not a typo.
>   (it's not set in the rtuple)

Good one, this made the code even smaller.

> - Made the if (dst < src) swap() in the hmark_hash() since it should be used by every caller.

Not really, you don't need for the conntrack part. The original tuple
is always the same, not matter where the packet is coming from. I have
removed this again so it only affects packet-based hashing.

> - Moved the L3 check a little bit earlier.

good.

> - changed return values for fragments.

With this, you're giving up on trying to classify fragments. Do you
really want this?

>From my point of view, if your firewalls (assuming they are the HMARK
classification) are stateless, it still makes sense to me to classify
fragments using the XT_HMARK_METHOD_L3_4.

> - Added nhoffs to: hmark_set_tuple_ports(skb, (ip->ihl * 4) + nhoff, t, info);
>   to get icmp working

good catch.

Below, some minor changes that I made to your patch (you can find a
new version enclosed to this email).

[...]
> +#ifndef XT_HMARK_H_
> +#define XT_HMARK_H_
> +
> +#include <linux/types.h>
> +
> +enum {
> +	XT_HMARK_NONE,
> +	XT_HMARK_SADR_AND,
> +	XT_HMARK_DADR_AND,
> +	XT_HMARK_SPI_AND,
> +	XT_HMARK_SPI_OR,
> +	XT_HMARK_SPORT_AND,
> +	XT_HMARK_DPORT_AND,
> +	XT_HMARK_SPORT_OR,
> +	XT_HMARK_DPORT_OR,
> +	XT_HMARK_PROTO_AND,
> +	XT_HMARK_RND,
> +	XT_HMARK_MODULUS,
> +	XT_HMARK_OFFSET,
> +	XT_HMARK_CT,
> +	XT_HMARK_METHOD_L3,
> +	XT_HMARK_METHOD_L3_4,
> +	XT_F_HMARK_SADR_AND    = 1 << XT_HMARK_SADR_AND,
> +	XT_F_HMARK_DADR_AND    = 1 << XT_HMARK_DADR_AND,
> +	XT_F_HMARK_SPI_AND     = 1 << XT_HMARK_SPI_AND,
> +	XT_F_HMARK_SPI_OR      = 1 << XT_HMARK_SPI_OR,
> +	XT_F_HMARK_SPORT_AND   = 1 << XT_HMARK_SPORT_AND,
> +	XT_F_HMARK_DPORT_AND   = 1 << XT_HMARK_DPORT_AND,
> +	XT_F_HMARK_SPORT_OR    = 1 << XT_HMARK_SPORT_OR,
> +	XT_F_HMARK_DPORT_OR    = 1 << XT_HMARK_DPORT_OR,
> +	XT_F_HMARK_PROTO_AND   = 1 << XT_HMARK_PROTO_AND,
> +	XT_F_HMARK_RND         = 1 << XT_HMARK_RND,
> +	XT_F_HMARK_MODULUS     = 1 << XT_HMARK_MODULUS,
> +	XT_F_HMARK_OFFSET      = 1 << XT_HMARK_OFFSET,
> +	XT_F_HMARK_CT          = 1 << XT_HMARK_CT,
> +	XT_F_HMARK_METHOD_L3   = 1 << XT_HMARK_METHOD_L3,
> +	XT_F_HMARK_METHOD_L3_4 = 1 << XT_HMARK_METHOD_L3_4,

I've defined:

#define XT_HMARK_FLAG(flag) (1 << flag)

So we save all those extra _F_ defintions, they look redundant.

[...]
> diff --git a/net/netfilter/xt_HMARK.c b/net/netfilter/xt_HMARK.c
> new file mode 100644
> index 0000000..76a3fa7
> --- /dev/null
> +++ b/net/netfilter/xt_HMARK.c
> +/*
> + * xt_HMARK - Netfilter module to set mark as hash value
> + *
> + * (C) 2012 by Hans Schillstrom <hans.schillstrom@ericsson.com>
> + * (C) 2012 by Pablo Neira Ayuso <pablo@netfilter.org>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published by
> + * the Free Software Foundation.
> + *
> + * Description:
> + *
> + * This module calculates a hash value that can be modified by modulus and an
> + * offset, i.e. it is possible to produce a skb->mark within a range The hash
> + * value is based on a direction independent five tuple: src & dst addr src &
> + * dst ports and protocol.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/skbuff.h>
> +#include <linux/icmp.h>
> +
> +#include <linux/netfilter/x_tables.h>
> +#include <linux/netfilter/xt_HMARK.h>
> +
> +#include <net/ip.h>
> +#if IS_ENABLED(CONFIG_NF_CONNTRACK)
> +#include <net/netfilter/nf_conntrack.h>
> +#endif
> +#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
> +#include <net/ipv6.h>
> +#include <linux/netfilter_ipv6/ip6_tables.h>
> +#endif
> +
> +

I removed this extra blank line above.

> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Hans Schillstrom <hans.schillstrom@ericsson.com>");
> +MODULE_DESCRIPTION("Xtables: packet marking using hash calculation");
> +MODULE_ALIAS("ipt_HMARK");
> +MODULE_ALIAS("ip6t_HMARK");
> +
> +struct hmark_tuple {
> +	u32			src;
> +	u32			dst;
> +	union hmark_ports	uports;
> +	uint8_t			proto;
> +};
> +
> +static int
> +hmark_ct_set_htuple(const struct sk_buff *skb, struct hmark_tuple *t,
> +			 const struct xt_hmark_info *info);
> +static inline u32
> +hmark_hash(struct hmark_tuple *t, const struct xt_hmark_info *info)
> +{
> +	u32 hash;
> +
> +	if (t->dst < t->src)
> +		swap(t->src, t->dst);
> +
> +	hash = jhash_3words(t->src, t->dst, t->uports.v32, info->hashrnd);
> +	hash = hash ^ (t->proto & info->proto_mask);
> +
> +	return (hash % info->hmodulus) + info->hoffset;
> +}
> +
> +static void
> +hmark_set_tuple_ports(const struct sk_buff *skb, unsigned int nhoff,
> +		      struct hmark_tuple *t, const struct xt_hmark_info *info)
> +{
> +	int protoff;
> +
> +	protoff = proto_ports_offset(t->proto);
> +	if (protoff < 0)
> +		return;
> +
> +	nhoff += protoff;
> +	if (skb_copy_bits(skb, nhoff, &t->uports, sizeof(t->uports)) < 0)
> +		return;
> +
> +	if (t->proto == IPPROTO_ESP || t->proto == IPPROTO_AH)
> +		t->uports.v32 = (t->uports.v32 & info->spi_mask) |
> +				info->spi_set;
> +	else {
> +		t->uports.v32 = (t->uports.v32 & info->port_mask.v32) |
> +				info->port_set.v32;
> +
> +		if (t->uports.p16.dst < t->uports.p16.src)
> +			swap(t->uports.p16.dst, t->uports.p16.src);
> +	}
> +}
> +
> +#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
> +static int get_inner6_hdr(const struct sk_buff *skb, int *offset)
> +{
> +	struct icmp6hdr *icmp6h, _ih6;
> +
> +	icmp6h = skb_header_pointer(skb, *offset, sizeof(_ih6), &_ih6);
> +	if (icmp6h == NULL)
> +		return 0;
> +
> +	if (icmp6h->icmp6_type && icmp6h->icmp6_type < 128) {
> +		*offset += sizeof(struct icmp6hdr);
> +		return 1;
> +	}
> +	return 0;
> +}
> +
> +static inline u32 hmark_addr6_mask(const __u32 *addr32, const __u32 *mask)
> +{
> +	return  (addr32[0] & mask[0]) ^
> +		(addr32[1] & mask[1]) ^
> +		(addr32[2] & mask[2]) ^
> +		(addr32[3] & mask[3]);
> +}
> +
> +static int
> +hmark_pkt_set_htuple_ipv6(const struct sk_buff *skb, struct hmark_tuple *t,
> +			  const struct xt_hmark_info *info)
> +{
> +	struct ipv6hdr *ip6, _ip6;
> +	int flag = IP6T_FH_F_AUTH; /* Ports offset, find_hdr flags */
> +	unsigned int nhoff = 0;
> +	u16 fragoff = 0;
> +	int nexthdr;
> +
> +	ip6 = (struct ipv6hdr *) (skb->data + skb_network_offset(skb));
> +	nexthdr = ipv6_find_hdr(skb, &nhoff, -1, &fragoff, &flag);
> +	if (nexthdr < 0)
> +		return 0;
> +	/* No need to check for icmp errors on fragments */
> +	if ((flag & IP6T_FH_F_FRAG) || (nexthdr != IPPROTO_ICMPV6))
> +		goto noicmp;
> +	/* if an icmp error, use the inner header */
> +	if (get_inner6_hdr(skb, &nhoff)) {
> +		ip6 = skb_header_pointer(skb, nhoff, sizeof(_ip6), &_ip6);
> +		if (ip6 == NULL)
> +			return -1;
> +		/* Treat AH as ESP, use SPI nothing else. */
> +		flag = IP6T_FH_F_AUTH;
> +		nexthdr = ipv6_find_hdr(skb, &nhoff, -1, &fragoff, &flag);
> +		if (nexthdr < 0)
> +			return -1;
> +	}
> +noicmp:
> +	t->src = hmark_addr6_mask(ip6->saddr.s6_addr32, info->src_mask.all);
> +	t->dst = hmark_addr6_mask(ip6->daddr.s6_addr32, info->dst_mask.all);
> +
> +	if (info->flags & XT_F_HMARK_METHOD_L3)
> +		return 0;
> +
> +	t->proto = nexthdr;
> +
> +	if (t->proto == IPPROTO_ICMPV6)
> +		return 0;
> +
> +	if (flag & IP6T_FH_F_FRAG)
> +		return -1;
> +
> +	hmark_set_tuple_ports(skb, nhoff, t, info);
> +
> +	return 0;
> +}
> +
> +static unsigned int
> +hmark_tg_v6(struct sk_buff *skb, const struct xt_action_param *par)
> +{
> +	const struct xt_hmark_info *info = par->targinfo;
> +	struct hmark_tuple t;
> +
> +	memset(&t, 0, sizeof(struct hmark_tuple));
> +
> +	if (info->flags & XT_F_HMARK_CT) {
> +		if (hmark_ct_set_htuple(skb, &t, info) < 0)
> +			return XT_CONTINUE;
> +	} else {
> +		if (hmark_pkt_set_htuple_ipv6(skb, &t, info) < 0)
> +			return XT_CONTINUE;
> +	}
> +
> +	skb->mark = hmark_hash(&t, info);
> +	return XT_CONTINUE;
> +}
> +
> +static inline u32
> +hmark_addr_any_mask(int l3num, const __u32 *addr32, const __u32 *mask)
> +{
> +	if (l3num == AF_INET)
> +		return *addr32 & *mask;
> +
> +	return hmark_addr6_mask(addr32, mask);
> +}
> +#else
> +static inline u32
> +hmark_addr_any_mask(int l3num, const __u32 *addr32, const __u32 *mask)
> +{
> +	return *addr32 & *mask;
> +}
> +
> +#endif

This is ugly. I think you will not find any section of the Netfilter
code with something similar. I have declared this function out of the
#ifdef section, those are static inline, the compiler will put them
out if unused with no further complain.

Please, find a new takeover patch enclosed.

[-- Attachment #2: 0001-netfilter-add-xt_hmark-target-for-hash-based-skb-mar.patch --]
[-- Type: text/x-diff, Size: 13655 bytes --]

>From d5065af3988cc7561a02f30bae8342e1a89126a4 Mon Sep 17 00:00:00 2001
From: Hans Schillstrom <hans.schillstrom@ericsson.com>
Date: Wed, 2 May 2012 07:49:47 +0000
Subject: netfilter: add xt_hmark target for hash-based skb
 marking

The target allows you to create rules in the "raw" and "mangle" tables
which set the skbuff mark by means of hash calculation within a given
range. The nfmark can influence the routing method (see "Use netfilter
MARK value as routing key") and can also be used by other subsystems to
change their behaviour.

Some examples:

* Default rule handles all TCP, UDP, SCTP, ESP & AH

 iptables -t mangle -A PREROUTING -m state --state NEW,ESTABLISHED,RELATED \
	-j HMARK --hmark-offset 10000 --hmark-mod 10

* Handle SCTP and hash dest port only and produce a nfmark between 100-119.

 iptables -t mangle -A PREROUTING -p SCTP -j HMARK --src-mask 0 --dst-mask 0 \
	--sp-mask 0 --offset 100 --mod 20

* Fragment safe Layer 3 only, that keep a class C network flow together

 iptables -t mangle -A PREROUTING -j HMARK --method L3 \
	--src-mask 24 --mod 20 --offset 100

[ A big part of this patch has been refactorized by Pablo Neira Ayuso ]

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/linux/netfilter/xt_HMARK.h |   48 +++++
 net/netfilter/Kconfig              |   15 ++
 net/netfilter/Makefile             |    1 +
 net/netfilter/xt_HMARK.c           |  358 ++++++++++++++++++++++++++++++++++++
 4 files changed, 422 insertions(+)
 create mode 100644 include/linux/netfilter/xt_HMARK.h
 create mode 100644 net/netfilter/xt_HMARK.c

diff --git a/include/linux/netfilter/xt_HMARK.h b/include/linux/netfilter/xt_HMARK.h
new file mode 100644
index 0000000..05e43ba
--- /dev/null
+++ b/include/linux/netfilter/xt_HMARK.h
@@ -0,0 +1,48 @@
+#ifndef XT_HMARK_H_
+#define XT_HMARK_H_
+
+#include <linux/types.h>
+
+enum {
+	XT_HMARK_NONE,
+	XT_HMARK_SADR_AND,
+	XT_HMARK_DADR_AND,
+	XT_HMARK_SPI_AND,
+	XT_HMARK_SPI_OR,
+	XT_HMARK_SPORT_AND,
+	XT_HMARK_DPORT_AND,
+	XT_HMARK_SPORT_OR,
+	XT_HMARK_DPORT_OR,
+	XT_HMARK_PROTO_AND,
+	XT_HMARK_RND,
+	XT_HMARK_MODULUS,
+	XT_HMARK_OFFSET,
+	XT_HMARK_CT,
+	XT_HMARK_METHOD_L3,
+	XT_HMARK_METHOD_L3_4,
+};
+#define XT_HMARK_FLAG(flag)	(1 << flag)
+
+union hmark_ports {
+	struct {
+		__u16	src;
+		__u16	dst;
+	} p16;
+	__u32	v32;
+};
+
+struct xt_hmark_info {
+	union nf_inet_addr	src_mask;	/* Source address mask */
+	union nf_inet_addr	dst_mask;	/* Dest address mask */
+	union hmark_ports	port_mask;
+	union hmark_ports	port_set;
+	__u32			spi_mask;
+	__u32			spi_set;
+	__u32			flags;		/* Print out only */
+	__u16			proto_mask;	/* L4 Proto mask */
+	__u32			hashrnd;
+	__u32			hmodulus;	/* Modulus */
+	__u32			hoffset;	/* Offset */
+};
+
+#endif /* XT_HMARK_H_ */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 0c6f67e..209c1ed 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -509,6 +509,21 @@ config NETFILTER_XT_TARGET_HL
 	since you can easily create immortal packets that loop
 	forever on the network.
 
+config NETFILTER_XT_TARGET_HMARK
+	tristate '"HMARK" target support'
+	depends on (IP6_NF_IPTABLES || IP6_NF_IPTABLES=n)
+	depends on NETFILTER_ADVANCED
+	---help---
+	This option adds the "HMARK" target.
+
+	The target allows you to create rules in the "raw" and "mangle" tables
+	which set the skbuff mark by means of hash calculation within a given
+	range. The nfmark can influence the routing method (see "Use netfilter
+	MARK value as routing key") and can also be used by other subsystems to
+	change their behaviour.
+
+	To compile it as a module, choose M here. If unsure, say N.
+
 config NETFILTER_XT_TARGET_IDLETIMER
 	tristate  "IDLETIMER target support"
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index ca36765..4e7960c 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_CONNSECMARK) += xt_CONNSECMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_CT) += xt_CT.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_DSCP) += xt_DSCP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_HL) += xt_HL.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_HMARK) += xt_HMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_LED) += xt_LED.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_LOG) += xt_LOG.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_NFLOG) += xt_NFLOG.o
diff --git a/net/netfilter/xt_HMARK.c b/net/netfilter/xt_HMARK.c
new file mode 100644
index 0000000..b4aa912
--- /dev/null
+++ b/net/netfilter/xt_HMARK.c
@@ -0,0 +1,358 @@
+/*
+ * xt_HMARK - Netfilter module to set mark by means of hashing
+ *
+ * (C) 2012 by Hans Schillstrom <hans.schillstrom@ericsson.com>
+ * (C) 2012 by Pablo Neira Ayuso <pablo@netfilter.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/icmp.h>
+
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_HMARK.h>
+
+#include <net/ip.h>
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+#include <net/netfilter/nf_conntrack.h>
+#endif
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+#include <net/ipv6.h>
+#include <linux/netfilter_ipv6/ip6_tables.h>
+#endif
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Hans Schillstrom <hans.schillstrom@ericsson.com>");
+MODULE_DESCRIPTION("Xtables: packet marking using hash calculation");
+MODULE_ALIAS("ipt_HMARK");
+MODULE_ALIAS("ip6t_HMARK");
+
+struct hmark_tuple {
+	u32			src;
+	u32			dst;
+	union hmark_ports	uports;
+	uint8_t			proto;
+};
+
+static inline u32 hmark_addr6_mask(const __u32 *addr32, const __u32 *mask)
+{
+	return (addr32[0] & mask[0]) ^
+	       (addr32[1] & mask[1]) ^
+	       (addr32[2] & mask[2]) ^
+	       (addr32[3] & mask[3]);
+}
+
+static inline u32
+hmark_addr_mask(int l3num, const __u32 *addr32, const __u32 *mask)
+{
+	switch(l3num) {
+	case AF_INET:
+		return *addr32 & *mask;
+	case AF_INET6:
+		return hmark_addr6_mask(addr32, mask);
+	}
+	return 0;
+}
+
+static int
+hmark_ct_set_htuple(const struct sk_buff *skb, struct hmark_tuple *t,
+		    const struct xt_hmark_info *info)
+{
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
+	struct nf_conntrack_tuple *otuple;
+	struct nf_conntrack_tuple *rtuple;
+
+	if (ct == NULL || nf_ct_is_untracked(ct))
+		return -1;
+
+	otuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
+	rtuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
+
+	t->src = hmark_addr_mask(otuple->src.l3num, otuple->src.u3.all,
+				 info->src_mask.all);
+	t->dst = hmark_addr_mask(otuple->src.l3num, rtuple->src.u3.all,
+				 info->dst_mask.all);
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))
+		return 0;
+
+	t->proto = nf_ct_protonum(ct);
+	if (t->proto != IPPROTO_ICMP) {
+		t->uports.p16.src = otuple->src.u.all;
+		t->uports.p16.dst = rtuple->src.u.all;
+		t->uports.v32 = (t->uports.v32 & info->port_mask.v32) |
+				info->port_set.v32;
+	}
+
+	return 0;
+#else
+	return -1;
+#endif
+}
+
+static inline u32
+hmark_hash(struct hmark_tuple *t, const struct xt_hmark_info *info)
+{
+	u32 hash;
+
+	hash = jhash_3words(t->src, t->dst, t->uports.v32, info->hashrnd);
+	hash = hash ^ (t->proto & info->proto_mask);
+
+	return (hash % info->hmodulus) + info->hoffset;
+}
+
+static void
+hmark_set_tuple_ports(const struct sk_buff *skb, unsigned int nhoff,
+		      struct hmark_tuple *t, const struct xt_hmark_info *info)
+{
+	int protoff;
+
+	protoff = proto_ports_offset(t->proto);
+	if (protoff < 0)
+		return;
+
+	nhoff += protoff;
+	if (skb_copy_bits(skb, nhoff, &t->uports, sizeof(t->uports)) < 0)
+		return;
+
+	if (t->proto == IPPROTO_ESP || t->proto == IPPROTO_AH)
+		t->uports.v32 = (t->uports.v32 & info->spi_mask) |
+				info->spi_set;
+	else {
+		t->uports.v32 = (t->uports.v32 & info->port_mask.v32) |
+				info->port_set.v32;
+
+		if (t->uports.p16.dst < t->uports.p16.src)
+			swap(t->uports.p16.dst, t->uports.p16.src);
+	}
+}
+
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+static int get_inner6_hdr(const struct sk_buff *skb, int *offset)
+{
+	struct icmp6hdr *icmp6h, _ih6;
+
+	icmp6h = skb_header_pointer(skb, *offset, sizeof(_ih6), &_ih6);
+	if (icmp6h == NULL)
+		return 0;
+
+	if (icmp6h->icmp6_type && icmp6h->icmp6_type < 128) {
+		*offset += sizeof(struct icmp6hdr);
+		return 1;
+	}
+	return 0;
+}
+
+static int
+hmark_pkt_set_htuple_ipv6(const struct sk_buff *skb, struct hmark_tuple *t,
+			  const struct xt_hmark_info *info)
+{
+	struct ipv6hdr *ip6, _ip6;
+	int flag = IP6T_FH_F_AUTH; /* Ports offset, find_hdr flags */
+	unsigned int nhoff = 0;
+	u16 fragoff = 0;
+	int nexthdr;
+
+	ip6 = (struct ipv6hdr *) (skb->data + skb_network_offset(skb));
+	nexthdr = ipv6_find_hdr(skb, &nhoff, -1, &fragoff, &flag);
+	if (nexthdr < 0)
+		return 0;
+	/* No need to check for icmp errors on fragments */
+	if ((flag & IP6T_FH_F_FRAG) || (nexthdr != IPPROTO_ICMPV6))
+		goto noicmp;
+	/* if an icmp error, use the inner header */
+	if (get_inner6_hdr(skb, &nhoff)) {
+		ip6 = skb_header_pointer(skb, nhoff, sizeof(_ip6), &_ip6);
+		if (ip6 == NULL)
+			return -1;
+		/* Treat AH as ESP, use SPI nothing else. */
+		flag = IP6T_FH_F_AUTH;
+		nexthdr = ipv6_find_hdr(skb, &nhoff, -1, &fragoff, &flag);
+		if (nexthdr < 0)
+			return -1;
+	}
+noicmp:
+	t->src = hmark_addr6_mask(ip6->saddr.s6_addr32, info->src_mask.all);
+	t->dst = hmark_addr6_mask(ip6->daddr.s6_addr32, info->dst_mask.all);
+
+	if (t->dst < t->src)
+		swap(t->src, t->dst);
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))
+		return 0;
+
+	t->proto = nexthdr;
+
+	if (t->proto == IPPROTO_ICMPV6)
+		return 0;
+
+	if (flag & IP6T_FH_F_FRAG)
+		return -1;
+
+	hmark_set_tuple_ports(skb, nhoff, t, info);
+
+	return 0;
+}
+
+static unsigned int
+hmark_tg_v6(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+	struct hmark_tuple t;
+
+	memset(&t, 0, sizeof(struct hmark_tuple));
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT)) {
+		if (hmark_ct_set_htuple(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	} else {
+		if (hmark_pkt_set_htuple_ipv6(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	}
+
+	skb->mark = hmark_hash(&t, info);
+	return XT_CONTINUE;
+}
+#endif
+
+static int get_inner_hdr(const struct sk_buff *skb, int iphsz, int *nhoff)
+{
+	const struct icmphdr *icmph;
+	struct icmphdr _ih;
+
+	/* Not enough header? */
+	icmph = skb_header_pointer(skb, *nhoff + iphsz, sizeof(_ih), &_ih);
+	if (icmph == NULL && icmph->type > NR_ICMP_TYPES)
+		return 0;
+
+	/* Error message? */
+	if (icmph->type != ICMP_DEST_UNREACH &&
+	    icmph->type != ICMP_SOURCE_QUENCH &&
+	    icmph->type != ICMP_TIME_EXCEEDED &&
+	    icmph->type != ICMP_PARAMETERPROB &&
+	    icmph->type != ICMP_REDIRECT)
+		return 0;
+
+	*nhoff += iphsz + sizeof(_ih);
+	return 1;
+}
+
+static int
+hmark_pkt_set_htuple_ipv4(const struct sk_buff *skb, struct hmark_tuple *t,
+			  const struct xt_hmark_info *info)
+{
+	struct iphdr *ip, _ip;
+	int nhoff = skb_network_offset(skb);
+
+	ip = (struct iphdr *) (skb->data + nhoff);
+	if (ip->protocol == IPPROTO_ICMP) {
+		/* use inner header in case of ICMP errors */
+		if (get_inner_hdr(skb, ip->ihl * 4, &nhoff)) {
+			ip = skb_header_pointer(skb, nhoff, sizeof(_ip), &_ip);
+			if (ip == NULL)
+				return -1;
+		}
+	}
+
+	t->src = (__force u32) ip->saddr;
+	t->dst = (__force u32) ip->daddr;
+
+	t->src &= info->src_mask.ip;
+	t->dst &= info->dst_mask.ip;
+
+	if (t->dst < t->src)
+		swap(t->src, t->dst);
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))
+		return 0;
+
+	t->proto = ip->protocol;
+
+	/* ICMP has no ports, skip */
+	if (t->proto == IPPROTO_ICMP)
+		return 0;
+
+	/* follow-up fragments don't contain ports, skip */
+	if (ip->frag_off & htons(IP_MF | IP_OFFSET))
+		return -1;
+
+	hmark_set_tuple_ports(skb, (ip->ihl * 4) + nhoff, t, info);
+
+	return 0;
+}
+
+static unsigned int
+hmark_tg_v4(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+	struct hmark_tuple t;
+
+	memset(&t, 0, sizeof(struct hmark_tuple));
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT)) {
+		if (hmark_ct_set_htuple(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	} else {
+		if (hmark_pkt_set_htuple_ipv4(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	}
+
+	skb->mark = hmark_hash(&t, info);
+	return XT_CONTINUE;
+}
+
+static int hmark_tg_check(const struct xt_tgchk_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+
+	if (!info->hmodulus) {
+		pr_info("xt_HMARK: hash modulus can't be zero\n");
+		return -EINVAL;
+	}
+	if (info->proto_mask &&
+	    (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))) {
+		pr_info("xt_HMARK: proto mask must be zero with L3 mode\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static struct xt_target hmark_tg_reg[] __read_mostly = {
+	{
+		.name		= "HMARK",
+		.family		= NFPROTO_IPV4,
+		.target		= hmark_tg_v4,
+		.targetsize	= sizeof(struct xt_hmark_info),
+		.checkentry	= hmark_tg_check,
+		.me		= THIS_MODULE,
+	},
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+	{
+		.name		= "HMARK",
+		.family		= NFPROTO_IPV6,
+		.target		= hmark_tg_v6,
+		.targetsize	= sizeof(struct xt_hmark_info),
+		.checkentry	= hmark_tg_check,
+		.me		= THIS_MODULE,
+	},
+#endif
+};
+
+static int __init hmark_tg_init(void)
+{
+	return xt_register_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+}
+
+static void __exit hmark_tg_exit(void)
+{
+	xt_unregister_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+}
+
+module_init(hmark_tg_init);
+module_exit(hmark_tg_exit);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
  2012-05-06 22:57           ` Pablo Neira Ayuso
@ 2012-05-07  8:20             ` Hans Schillstrom
  2012-05-07  9:03               ` Pablo Neira Ayuso
  0 siblings, 1 reply; 21+ messages in thread
From: Hans Schillstrom @ 2012-05-07  8:20 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: kaber, jengelh, netfilter-devel, netdev, hans

[-- Attachment #1: Type: text/plain, Size: 4731 bytes --]

On Monday 07 May 2012 00:57:38 Pablo Neira Ayuso wrote:
> Hi Hans,
> 
> [...]
> > > > > Regarding ICMP traffic, I think we can use the ID field for the
> > > > > hashing as well. Thus, we handle ICMP like other protocols.
> > > >
> > > > Yes why not, I can give it a try.
> > > >
> >
> > I think we wait with this one..
> 
> I see. This is easy to add for the conntrack side, but it will require
> some extra code for the packet-based solution.

Actually I think there is very little gain to spread with type 
and then we must add a user mode possibility to turn it off 
i.e. a --hmark-icmp-type-mask 

> Not directly related to this but, I know that your intention is to
> make this as flexible as possible. However, I still don't find how I
> would use the port mask feature in any of my setups.  Basically, I
> don't come up with any useful example for this situation.

We have plenty of rules where just source port mask is zero.
and the dest-port-mask is 0xfffc (or 0xffff)


> I'm also telling this because I think that ICMP support will be
> easier to add if port masking is removed.
> 
> [...]
> > This is what I have done.
> >
> > - I reduced the code size a little bit by combining the hmark_ct_set_htuple_ipvX into one func.
> >   by adding a hmark_addr6_mask() and hmark_addr_any_mask()
> >   Note that using "otuple->src.l3num" as param 1 in both src and dst is not a typo.
> >   (it's not set in the rtuple)
> 
> Good one, this made the code even smaller.
> 
> > - Made the if (dst < src) swap() in the hmark_hash() since it should be used by every caller.
> 
> Not really, you don't need for the conntrack part. The original tuple
> is always the same, not matter where the packet is coming from. I have
> removed this again so it only affects packet-based hashing.

Yes original tuple is always the same but not always less than the rtuple.
If you have two nodes that should produce the same hmark,
one with conntrack an one without you must make a compare to make it consistent.

> 
> > - Moved the L3 check a little bit earlier.
> 
> good.
> 
> > - changed return values for fragments.
> 
> With this, you're giving up on trying to classify fragments. Do you
> really want this?
> 
> From my point of view, if your firewalls (assuming they are the HMARK
> classification) are stateless, it still makes sense to me to classify
> fragments using the XT_HMARK_METHOD_L3_4.

I do agree, it is back to "return 0" again.

> 
> > - Added nhoffs to: hmark_set_tuple_ports(skb, (ip->ihl * 4) + nhoff, t, info);
> >   to get icmp working
> 
> good catch.
> 
> Below, some minor changes that I made to your patch (you can find a
> new version enclosed to this email).
> 
> [...]
> > +#ifndef XT_HMARK_H_
> > +#define XT_HMARK_H_
> > +
> > +#include <linux/types.h>
> > +
> > +enum {
> > +     XT_HMARK_NONE,
> > +     XT_HMARK_SADR_AND,
> > +     XT_HMARK_DADR_AND,
> > +     XT_HMARK_SPI_AND,
> > +     XT_HMARK_SPI_OR,
> > +     XT_HMARK_SPORT_AND,
> > +     XT_HMARK_DPORT_AND,
> > +     XT_HMARK_SPORT_OR,
> > +     XT_HMARK_DPORT_OR,
> > +     XT_HMARK_PROTO_AND,
> > +     XT_HMARK_RND,
> > +     XT_HMARK_MODULUS,
> > +     XT_HMARK_OFFSET,
> > +     XT_HMARK_CT,
> > +     XT_HMARK_METHOD_L3,
> > +     XT_HMARK_METHOD_L3_4,
> > +     XT_F_HMARK_SADR_AND    = 1 << XT_HMARK_SADR_AND,
> > +     XT_F_HMARK_DADR_AND    = 1 << XT_HMARK_DADR_AND,
> > +     XT_F_HMARK_SPI_AND     = 1 << XT_HMARK_SPI_AND,
> > +     XT_F_HMARK_SPI_OR      = 1 << XT_HMARK_SPI_OR,
> > +     XT_F_HMARK_SPORT_AND   = 1 << XT_HMARK_SPORT_AND,
> > +     XT_F_HMARK_DPORT_AND   = 1 << XT_HMARK_DPORT_AND,
> > +     XT_F_HMARK_SPORT_OR    = 1 << XT_HMARK_SPORT_OR,
> > +     XT_F_HMARK_DPORT_OR    = 1 << XT_HMARK_DPORT_OR,
> > +     XT_F_HMARK_PROTO_AND   = 1 << XT_HMARK_PROTO_AND,
> > +     XT_F_HMARK_RND         = 1 << XT_HMARK_RND,
> > +     XT_F_HMARK_MODULUS     = 1 << XT_HMARK_MODULUS,
> > +     XT_F_HMARK_OFFSET      = 1 << XT_HMARK_OFFSET,
> > +     XT_F_HMARK_CT          = 1 << XT_HMARK_CT,
> > +     XT_F_HMARK_METHOD_L3   = 1 << XT_HMARK_METHOD_L3,
> > +     XT_F_HMARK_METHOD_L3_4 = 1 << XT_HMARK_METHOD_L3_4,
> 
> I've defined:
> 
> #define XT_HMARK_FLAG(flag) (1 << flag)
> 
> So we save all those extra _F_ defintions, they look redundant.

OK, I had to change the user mode code to keep up with this change...
The user code part is also included now.

[snip]

>+static inline u32
>+hmark_addr_mask(int l3num, const __u32 *addr32, const __u32 *mask)
>+{
>+       switch (l3num) {
              ^
Added a space here

>+       case AF_INET:
>+               return *addr32 & *mask;
>+       case AF_INET6:
>+               return hmark_addr6_mask(addr32, mask);


-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

[-- Attachment #2: 0001-netfilter-add-xt_hmark-target-for-hash-based-skb-mar.patch --]
[-- Type: text/x-patch, Size: 14126 bytes --]

From 04cc88b2eec677fd8eab3fbf620ed9209b883b8c Mon Sep 17 00:00:00 2001
From: Hans Schillstrom <hans.schillstrom@ericsson.com>
Date: Mon, 7 May 2012 08:33:08 +0200
Subject: [PATCH 1/1] netfilter: add xt_hmark target for hash-based skb marking

The target allows you to create rules in the "raw" and "mangle" tables
which set the skbuff mark by means of hash calculation within a given
range. The nfmark can influence the routing method (see "Use netfilter
MARK value as routing key") and can also be used by other subsystems to
change their behaviour.

Some examples:

* Default rule handles all TCP, UDP, SCTP, ESP & AH

 iptables -t mangle -A PREROUTING -m state --state NEW,ESTABLISHED,RELATED \
	-j HMARK --hmark-offset 10000 --hmark-mod 10

* Handle SCTP and hash dest port only and produce a nfmark between 100-119.

 iptables -t mangle -A PREROUTING -p SCTP -j HMARK --src-mask 0 --dst-mask 0 \
	--sp-mask 0 --offset 100 --mod 20

* Fragment safe Layer 3 only, that keep a class C network flow together

 iptables -t mangle -A PREROUTING -j HMARK --method L3 \
	--src-mask 24 --mod 20 --offset 100

[ A big part of this patch has been refactorized by Pablo Neira Ayuso ]

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/linux/netfilter/xt_HMARK.h |   48 +++++
 net/netfilter/Kconfig              |   15 ++
 net/netfilter/Makefile             |    1 +
 net/netfilter/xt_HMARK.c           |  355 ++++++++++++++++++++++++++++++++++++
 4 files changed, 419 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_HMARK.h
 create mode 100644 net/netfilter/xt_HMARK.c

diff --git a/include/linux/netfilter/xt_HMARK.h b/include/linux/netfilter/xt_HMARK.h
new file mode 100644
index 0000000..05e43ba
--- /dev/null
+++ b/include/linux/netfilter/xt_HMARK.h
@@ -0,0 +1,48 @@
+#ifndef XT_HMARK_H_
+#define XT_HMARK_H_
+
+#include <linux/types.h>
+
+enum {
+	XT_HMARK_NONE,
+	XT_HMARK_SADR_AND,
+	XT_HMARK_DADR_AND,
+	XT_HMARK_SPI_AND,
+	XT_HMARK_SPI_OR,
+	XT_HMARK_SPORT_AND,
+	XT_HMARK_DPORT_AND,
+	XT_HMARK_SPORT_OR,
+	XT_HMARK_DPORT_OR,
+	XT_HMARK_PROTO_AND,
+	XT_HMARK_RND,
+	XT_HMARK_MODULUS,
+	XT_HMARK_OFFSET,
+	XT_HMARK_CT,
+	XT_HMARK_METHOD_L3,
+	XT_HMARK_METHOD_L3_4,
+};
+#define XT_HMARK_FLAG(flag)	(1 << flag)
+
+union hmark_ports {
+	struct {
+		__u16	src;
+		__u16	dst;
+	} p16;
+	__u32	v32;
+};
+
+struct xt_hmark_info {
+	union nf_inet_addr	src_mask;	/* Source address mask */
+	union nf_inet_addr	dst_mask;	/* Dest address mask */
+	union hmark_ports	port_mask;
+	union hmark_ports	port_set;
+	__u32			spi_mask;
+	__u32			spi_set;
+	__u32			flags;		/* Print out only */
+	__u16			proto_mask;	/* L4 Proto mask */
+	__u32			hashrnd;
+	__u32			hmodulus;	/* Modulus */
+	__u32			hoffset;	/* Offset */
+};
+
+#endif /* XT_HMARK_H_ */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 0c6f67e..209c1ed 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -509,6 +509,21 @@ config NETFILTER_XT_TARGET_HL
 	since you can easily create immortal packets that loop
 	forever on the network.
 
+config NETFILTER_XT_TARGET_HMARK
+	tristate '"HMARK" target support'
+	depends on (IP6_NF_IPTABLES || IP6_NF_IPTABLES=n)
+	depends on NETFILTER_ADVANCED
+	---help---
+	This option adds the "HMARK" target.
+
+	The target allows you to create rules in the "raw" and "mangle" tables
+	which set the skbuff mark by means of hash calculation within a given
+	range. The nfmark can influence the routing method (see "Use netfilter
+	MARK value as routing key") and can also be used by other subsystems to
+	change their behaviour.
+
+	To compile it as a module, choose M here. If unsure, say N.
+
 config NETFILTER_XT_TARGET_IDLETIMER
 	tristate  "IDLETIMER target support"
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index ca36765..4e7960c 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_CONNSECMARK) += xt_CONNSECMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_CT) += xt_CT.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_DSCP) += xt_DSCP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_HL) += xt_HL.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_HMARK) += xt_HMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_LED) += xt_LED.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_LOG) += xt_LOG.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_NFLOG) += xt_NFLOG.o
diff --git a/net/netfilter/xt_HMARK.c b/net/netfilter/xt_HMARK.c
new file mode 100644
index 0000000..6954d40
--- /dev/null
+++ b/net/netfilter/xt_HMARK.c
@@ -0,0 +1,355 @@
+/*
+ * xt_HMARK - Netfilter module to set mark by means of hashing
+ *
+ * (C) 2012 by Hans Schillstrom <hans.schillstrom@ericsson.com>
+ * (C) 2012 by Pablo Neira Ayuso <pablo@netfilter.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/icmp.h>
+
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_HMARK.h>
+
+#include <net/ip.h>
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+#include <net/netfilter/nf_conntrack.h>
+#endif
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+#include <net/ipv6.h>
+#include <linux/netfilter_ipv6/ip6_tables.h>
+#endif
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Hans Schillstrom <hans.schillstrom@ericsson.com>");
+MODULE_DESCRIPTION("Xtables: packet marking using hash calculation");
+MODULE_ALIAS("ipt_HMARK");
+MODULE_ALIAS("ip6t_HMARK");
+
+struct hmark_tuple {
+	u32			src;
+	u32			dst;
+	union hmark_ports	uports;
+	uint8_t			proto;
+};
+
+static inline u32 hmark_addr6_mask(const __u32 *addr32, const __u32 *mask)
+{
+	return (addr32[0] & mask[0]) ^
+	       (addr32[1] & mask[1]) ^
+	       (addr32[2] & mask[2]) ^
+	       (addr32[3] & mask[3]);
+}
+
+static inline u32
+hmark_addr_mask(int l3num, const __u32 *addr32, const __u32 *mask)
+{
+	switch (l3num) {
+	case AF_INET:
+		return *addr32 & *mask;
+	case AF_INET6:
+		return hmark_addr6_mask(addr32, mask);
+	}
+	return 0;
+}
+
+static int
+hmark_ct_set_htuple(const struct sk_buff *skb, struct hmark_tuple *t,
+		    const struct xt_hmark_info *info)
+{
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
+	struct nf_conntrack_tuple *otuple;
+	struct nf_conntrack_tuple *rtuple;
+
+	if (ct == NULL || nf_ct_is_untracked(ct))
+		return -1;
+
+	otuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
+	rtuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
+
+	t->src = hmark_addr_mask(otuple->src.l3num, otuple->src.u3.all,
+				 info->src_mask.all);
+	t->dst = hmark_addr_mask(otuple->src.l3num, rtuple->src.u3.all,
+				 info->dst_mask.all);
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))
+		return 0;
+
+	t->proto = nf_ct_protonum(ct);
+	if (t->proto != IPPROTO_ICMP) {
+		t->uports.p16.src = otuple->src.u.all;
+		t->uports.p16.dst = rtuple->src.u.all;
+		t->uports.v32 = (t->uports.v32 & info->port_mask.v32) |
+				info->port_set.v32;
+	}
+
+	return 0;
+#else
+	return -1;
+#endif
+}
+
+static inline u32
+hmark_hash(struct hmark_tuple *t, const struct xt_hmark_info *info)
+{
+	u32 hash;
+
+	if (t->dst < t->src)
+		swap(t->src, t->dst);
+
+	hash = jhash_3words(t->src, t->dst, t->uports.v32, info->hashrnd);
+	hash = hash ^ (t->proto & info->proto_mask);
+
+	return (hash % info->hmodulus) + info->hoffset;
+}
+
+static void
+hmark_set_tuple_ports(const struct sk_buff *skb, unsigned int nhoff,
+		      struct hmark_tuple *t, const struct xt_hmark_info *info)
+{
+	int protoff;
+
+	protoff = proto_ports_offset(t->proto);
+	if (protoff < 0)
+		return;
+
+	nhoff += protoff;
+	if (skb_copy_bits(skb, nhoff, &t->uports, sizeof(t->uports)) < 0)
+		return;
+
+	if (t->proto == IPPROTO_ESP || t->proto == IPPROTO_AH)
+		t->uports.v32 = (t->uports.v32 & info->spi_mask) |
+				info->spi_set;
+	else {
+		t->uports.v32 = (t->uports.v32 & info->port_mask.v32) |
+				info->port_set.v32;
+
+		if (t->uports.p16.dst < t->uports.p16.src)
+			swap(t->uports.p16.dst, t->uports.p16.src);
+	}
+}
+
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+static int get_inner6_hdr(const struct sk_buff *skb, int *offset)
+{
+	struct icmp6hdr *icmp6h, _ih6;
+
+	icmp6h = skb_header_pointer(skb, *offset, sizeof(_ih6), &_ih6);
+	if (icmp6h == NULL)
+		return 0;
+
+	if (icmp6h->icmp6_type && icmp6h->icmp6_type < 128) {
+		*offset += sizeof(struct icmp6hdr);
+		return 1;
+	}
+	return 0;
+}
+
+static int
+hmark_pkt_set_htuple_ipv6(const struct sk_buff *skb, struct hmark_tuple *t,
+			  const struct xt_hmark_info *info)
+{
+	struct ipv6hdr *ip6, _ip6;
+	int flag = IP6T_FH_F_AUTH; /* Ports offset, find_hdr flags */
+	unsigned int nhoff = 0;
+	u16 fragoff = 0;
+	int nexthdr;
+
+	ip6 = (struct ipv6hdr *) (skb->data + skb_network_offset(skb));
+	nexthdr = ipv6_find_hdr(skb, &nhoff, -1, &fragoff, &flag);
+	if (nexthdr < 0)
+		return 0;
+	/* No need to check for icmp errors on fragments */
+	if ((flag & IP6T_FH_F_FRAG) || (nexthdr != IPPROTO_ICMPV6))
+		goto noicmp;
+	/* if an icmp error, use the inner header */
+	if (get_inner6_hdr(skb, &nhoff)) {
+		ip6 = skb_header_pointer(skb, nhoff, sizeof(_ip6), &_ip6);
+		if (ip6 == NULL)
+			return -1;
+		/* Treat AH as ESP, use SPI nothing else. */
+		flag = IP6T_FH_F_AUTH;
+		nexthdr = ipv6_find_hdr(skb, &nhoff, -1, &fragoff, &flag);
+		if (nexthdr < 0)
+			return -1;
+	}
+noicmp:
+	t->src = hmark_addr6_mask(ip6->saddr.s6_addr32, info->src_mask.all);
+	t->dst = hmark_addr6_mask(ip6->daddr.s6_addr32, info->dst_mask.all);
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))
+		return 0;
+
+	t->proto = nexthdr;
+
+	if (t->proto == IPPROTO_ICMPV6)
+		return 0;
+
+	if (flag & IP6T_FH_F_FRAG)
+		return 0;
+
+	hmark_set_tuple_ports(skb, nhoff, t, info);
+
+	return 0;
+}
+
+static unsigned int
+hmark_tg_v6(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+	struct hmark_tuple t;
+
+	memset(&t, 0, sizeof(struct hmark_tuple));
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT)) {
+		if (hmark_ct_set_htuple(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	} else {
+		if (hmark_pkt_set_htuple_ipv6(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	}
+
+	skb->mark = hmark_hash(&t, info);
+	return XT_CONTINUE;
+}
+#endif
+
+static int get_inner_hdr(const struct sk_buff *skb, int iphsz, int *nhoff)
+{
+	const struct icmphdr *icmph;
+	struct icmphdr _ih;
+
+	/* Not enough header? */
+	icmph = skb_header_pointer(skb, *nhoff + iphsz, sizeof(_ih), &_ih);
+	if (icmph == NULL && icmph->type > NR_ICMP_TYPES)
+		return 0;
+
+	/* Error message? */
+	if (icmph->type != ICMP_DEST_UNREACH &&
+	    icmph->type != ICMP_SOURCE_QUENCH &&
+	    icmph->type != ICMP_TIME_EXCEEDED &&
+	    icmph->type != ICMP_PARAMETERPROB &&
+	    icmph->type != ICMP_REDIRECT)
+		return 0;
+
+	*nhoff += iphsz + sizeof(_ih);
+	return 1;
+}
+
+static int
+hmark_pkt_set_htuple_ipv4(const struct sk_buff *skb, struct hmark_tuple *t,
+			  const struct xt_hmark_info *info)
+{
+	struct iphdr *ip, _ip;
+	int nhoff = skb_network_offset(skb);
+
+	ip = (struct iphdr *) (skb->data + nhoff);
+	if (ip->protocol == IPPROTO_ICMP) {
+		/* use inner header in case of ICMP errors */
+		if (get_inner_hdr(skb, ip->ihl * 4, &nhoff)) {
+			ip = skb_header_pointer(skb, nhoff, sizeof(_ip), &_ip);
+			if (ip == NULL)
+				return -1;
+		}
+	}
+
+	t->src = (__force u32) ip->saddr;
+	t->dst = (__force u32) ip->daddr;
+
+	t->src &= info->src_mask.ip;
+	t->dst &= info->dst_mask.ip;
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))
+		return 0;
+
+	t->proto = ip->protocol;
+
+	/* ICMP has no ports, skip */
+	if (t->proto == IPPROTO_ICMP)
+		return 0;
+
+	/* follow-up fragments don't contain ports, skip */
+	if (ip->frag_off & htons(IP_MF | IP_OFFSET))
+		return 0;
+
+	hmark_set_tuple_ports(skb, (ip->ihl * 4) + nhoff, t, info);
+
+	return 0;
+}
+
+static unsigned int
+hmark_tg_v4(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+	struct hmark_tuple t;
+
+	memset(&t, 0, sizeof(struct hmark_tuple));
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT)) {
+		if (hmark_ct_set_htuple(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	} else {
+		if (hmark_pkt_set_htuple_ipv4(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	}
+
+	skb->mark = hmark_hash(&t, info);
+	return XT_CONTINUE;
+}
+
+static int hmark_tg_check(const struct xt_tgchk_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+
+	if (!info->hmodulus) {
+		pr_info("xt_HMARK: hash modulus can't be zero\n");
+		return -EINVAL;
+	}
+	if (info->proto_mask &&
+	    (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))) {
+		pr_info("xt_HMARK: proto mask must be zero with L3 mode\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static struct xt_target hmark_tg_reg[] __read_mostly = {
+	{
+		.name		= "HMARK",
+		.family		= NFPROTO_IPV4,
+		.target		= hmark_tg_v4,
+		.targetsize	= sizeof(struct xt_hmark_info),
+		.checkentry	= hmark_tg_check,
+		.me		= THIS_MODULE,
+	},
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+	{
+		.name		= "HMARK",
+		.family		= NFPROTO_IPV6,
+		.target		= hmark_tg_v6,
+		.targetsize	= sizeof(struct xt_hmark_info),
+		.checkentry	= hmark_tg_check,
+		.me		= THIS_MODULE,
+	},
+#endif
+};
+
+static int __init hmark_tg_init(void)
+{
+	return xt_register_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+}
+
+static void __exit hmark_tg_exit(void)
+{
+	xt_unregister_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+}
+
+module_init(hmark_tg_init);
+module_exit(hmark_tg_exit);
-- 
1.7.2.3


[-- Attachment #3: 0001-netfilter-userspace-part-for-target-HMARK.patch --]
[-- Type: text/x-patch, Size: 24699 bytes --]

From edcb596187a50172481d1e9fa11ae062337c69eb Mon Sep 17 00:00:00 2001
From: Hans Schillstrom <hans.schillstrom@ericsson.com>
Date: Mon, 7 May 2012 09:46:38 +0200
Subject: [PATCH 1/1] netfilter: userspace part for target HMARK

    The target allows you to create rules in the "raw" and "mangle" tables
    which alter the netfilter mark (nfmark) field within a given range.
    First a 32 bit hash value is generated then modulus by <limit> and
    finally an offset is added before it's written to nfmark.
    Prior to routing, the nfmark can influence the routing method (see
    "Use netfilter MARK value as routing key") and can also be used by
    other subsystems to change their behaviour.

    The mark match can also be used to match nfmark produced by this module.
Ver 13
    Name change of defines.

Ver 12
    Reset option flag in some cases, where option is disabled by value.

Ver 10
    conntrack reduced to --hmark-ct switch
    renaming of vars in xt_hmark_info
    Adding helptext and updated man due to --hmark-ct switc

Ver 9
    Formating changes.

Ver 8
    Syntax changes more descriptive options
    --hmark-method added.

Ver 6-7 -

Ver 5
      smask and dmask changed to length

Ver 4
      xtoptions used for parsing.

Ver 3
       -

Ver 2
      IPv4 NAT added
      iptables ver 1.4.12.1 adaptions.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 extensions/libxt_HMARK.c           |  510 ++++++++++++++++++++++++++++++++++++
 extensions/libxt_HMARK.man         |   84 ++++++
 include/linux/netfilter/xt_HMARK.h |   48 ++++
 3 files changed, 642 insertions(+), 0 deletions(-)
 create mode 100644 extensions/libxt_HMARK.c
 create mode 100644 extensions/libxt_HMARK.man
 create mode 100644 include/linux/netfilter/xt_HMARK.h

diff --git a/extensions/libxt_HMARK.c b/extensions/libxt_HMARK.c
new file mode 100644
index 0000000..4b13cd3
--- /dev/null
+++ b/extensions/libxt_HMARK.c
@@ -0,0 +1,510 @@
+/*
+ * Shared library add-on to iptables to add HMARK target support.
+ *
+ * The kernel module calculates a hash value that can be modified by modulus
+ * and an offset. The hash value is based on a direction independent
+ * five tuple: src & dst addr src & dst ports and protocol.
+ * However src & dst port can be masked and are not used for fragmented
+ * packets, ESP and AH don't have ports so SPI will be used instead.
+ * For ICMP error messages the hash mark values will be calculated on
+ * the source packet i.e. the packet caused the error (If sufficient
+ * amount of data exists).
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <stdbool.h>
+#include <stdio.h>
+#include <string.h>
+
+#include "xtables.h"
+#include <linux/netfilter/xt_HMARK.h>
+
+
+#define DEF_HRAND 0xc175a3b8	/* Default "random" value to jhash */
+
+#define XT_F_HMARK_L4_OPTS \
+		(XT_HMARK_FLAG(XT_HMARK_SPI_AND) |\
+		 XT_HMARK_FLAG(XT_HMARK_SPI_OR) |\
+		 XT_HMARK_FLAG(XT_HMARK_SPORT_AND) |\
+		 XT_HMARK_FLAG(XT_HMARK_SPORT_OR) |\
+		 XT_HMARK_FLAG(XT_HMARK_DPORT_AND) |\
+		 XT_HMARK_FLAG(XT_HMARK_DPORT_OR) |\
+		 XT_HMARK_FLAG(XT_HMARK_PROTO_AND))
+
+static void HMARK_help(void)
+{
+	printf(
+"HMARK target options, i.e. modify hash calculation by:\n"
+"  --hmark-method <method>          Overall L3/L4 and fragment behavior\n"
+"                 L3                Fragment safe, do not use ports or proto\n"
+"                                   i.e. Fragments don't need special care.\n"
+"                 L3-4 (Default)    Fragment unsafe, use ports and proto\n"
+"                                   if defrag off in conntrack\n"
+"                                      no hmark on any part of a fragment\n"
+"  Limit/modify the calculated hash mark by:\n"
+"  --hmark-mod value                nfmark modulus value\n"
+"  --hmark-offset value             Last action add value to nfmark\n\n"
+" Fine tuning of what will be included in hash calculation\n"
+"  --hmark-src-mask length          Source address mask length\n"
+"  --hmark-dst-mask length          Dest address mask length\n"
+"  --hmark-sport-mask value         Mask src port with value\n"
+"  --hmark-dport-mask value         Mask dst port with value\n"
+"  --hmark-spi-mask value           For esp and ah AND spi with value\n"
+"  --hmark-sport-set value          OR src port with value\n"
+"  --hmark-dport-set value          OR dst port with value\n"
+"  --hmark-spi-set value            For esp and ah OR spi with value\n"
+"  --hmark-proto-mask value         Mask Protocol with value\n"
+"  --hmark-rnd                      Initial Random value to hash cacl.\n"
+" For NAT in IPv4: src part from original/reply tuple will always be used\n"
+" i.e. orig src part will be used as src address/port.\n"
+"     reply src part will be used as dst address/port\n"
+" Make sure to qualify the rule in a proper way when using NAT flag\n"
+" When --ct is used only tracked connections will match\n"
+"  --hmark-ct                       Force conntrack orig and rely tuples as\n"
+"                                   source and destination.\n\n"
+" In many cases hmark can be omitted i.e. --src-mask can be used\n");
+}
+
+#define hi struct xt_hmark_info
+
+static const struct xt_option_entry HMARK_opts[] = {
+	{ .name  = "hmark-method",
+	  .type  = XTTYPE_STRING,
+	  .id    = XT_HMARK_METHOD_L3
+	},
+	{ .name  = "hmark-src-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_SADR_AND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, src_mask)
+	},
+	{ .name  = "hmark-dst-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_DADR_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, dst_mask)
+	},
+	{ .name  = "hmark-sport-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.src)
+	},
+	{ .name  = "hmark-dport-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_DPORT_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.dst)
+	},
+	{ .name  = "hmark-spi-mask",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, spi_mask)
+	},
+	{ .name  = "hmark-sport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.src)
+	},
+	{ .name  = "hmark-dport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_DPORT_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.dst)
+	},
+	{ .name  = "hmark-spi-set",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, spi_set)
+	},
+	{ .name  = "hmark-proto-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_PROTO_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, proto_mask)
+	},
+	{ .name  = "hmark-rnd",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_RND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hashrnd)
+	},
+	{ .name = "hmark-mod",
+	  .type = XTTYPE_UINT32,
+	  .id = XT_HMARK_MODULUS,
+	  .min = 1,
+	  .flags = XTOPT_PUT | XTOPT_MAND,
+	  XTOPT_POINTER(hi, hmodulus)
+	},
+	{ .name  = "hmark-offset",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_OFFSET,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hoffset)
+	},
+	{ .name  = "hmark-ct",
+	  .type  = XTTYPE_NONE,
+	  .id    = XT_HMARK_CT
+	},
+
+	{ .name  = "method",
+	  .type  = XTTYPE_STRING,
+	  .id    = XT_HMARK_METHOD_L3
+	},
+	{ .name  = "src-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_SADR_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, src_mask)
+	},
+	{ .name  = "dst-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_DADR_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, dst_mask)
+	},
+	{ .name  = "sport-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.src)
+	},
+	{ .name  = "dport-mask", .type = XTTYPE_UINT16,
+	  .id = XT_HMARK_DPORT_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.dst)
+	},
+	{ .name  = "spi-mask",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, spi_mask)
+	},
+	{ .name  = "sport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.src)
+	},
+	{ .name  = "dport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_DPORT_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.dst)
+	},
+	{ .name  = "spi-set",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, spi_set)
+	},
+	{ .name  = "proto-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_PROTO_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, proto_mask)
+	},
+	{ .name  = "rnd",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_RND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hashrnd)
+	},
+	{ .name  = "mod",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_MODULUS,
+	  .min   = 1,
+	  .flags = XTOPT_PUT,
+	  XTOPT_MAND, XTOPT_POINTER(hi, hmodulus)
+	},
+	{ .name  = "offset",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_OFFSET,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hoffset)
+	},
+	{ .name  = "ct",
+	  .type  = XTTYPE_NONE,
+	  .id    = XT_HMARK_CT
+	},
+	XTOPT_TABLEEND,
+};
+
+static void HMARK_parse(struct xt_option_call *cb, int plen)
+{
+	struct xt_hmark_info *info = cb->data;
+
+	if (!cb->xflags) {
+		memset(info, 0xff, sizeof(struct xt_hmark_info));
+		info->port_set.v32 = 0;
+		info->flags = 0;
+		info->spi_set = 0;
+		info->hoffset = 0;
+		info->hashrnd = DEF_HRAND;
+	}
+	xtables_option_parse(cb);
+
+	switch (cb->entry->id) {
+	case XT_HMARK_SADR_AND:
+		if (cb->val.hlen == plen)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_SADR_AND);
+		break;
+	case XT_HMARK_DADR_AND:
+		if (cb->val.hlen == plen)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_DADR_AND);
+		break;
+	case XT_HMARK_SPI_AND:
+		info->spi_mask = htonl(cb->val.u32);
+		if (cb->val.u32 == 0xffffffff)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_SPI_AND);
+		break;
+	case XT_HMARK_SPI_OR:
+		info->spi_set = htonl(cb->val.u32);
+		if (cb->val.u32 == 0)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_SPI_OR);
+		break;
+	case XT_HMARK_SPORT_AND:
+		info->port_mask.p16.src = htons(cb->val.u16);
+		if (cb->val.u16 == 0xffff)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_SPORT_AND);
+		break;
+	case XT_HMARK_DPORT_AND:
+		info->port_mask.p16.dst = htons(cb->val.u16);
+		if (cb->val.u16 == 0xffff)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_DPORT_AND);
+		break;
+	case XT_HMARK_SPORT_OR:
+		info->port_set.p16.src = htons(cb->val.u16);
+		if (cb->val.u16 == 0)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_SPORT_OR);
+		break;
+	case XT_HMARK_DPORT_OR:
+		info->port_set.p16.dst = htons(cb->val.u16);
+		if (cb->val.u16 == 0)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_DPORT_OR);
+		break;
+	case XT_HMARK_PROTO_AND:
+		if (cb->val.u16 == 0xffff)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_PROTO_AND);
+		break;
+	case XT_HMARK_MODULUS:
+		if (info->hmodulus == 0) {
+			xtables_error(PARAMETER_PROBLEM,
+				      "xxx modulus 0 ? "
+				      "thats a div by 0");
+			info->hmodulus = 0xffffffff;
+		}
+		break;
+	case XT_HMARK_METHOD_L3:
+		if (strcmp(cb->arg, "L3") == 0) {
+			info->proto_mask = 0;
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_METHOD_L3_4);
+		} else if (strcmp(cb->arg, "L3-4") == 0) {
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_METHOD_L3);
+			cb->xflags |= XT_HMARK_FLAG(XT_HMARK_METHOD_L3_4);
+		}
+		break;
+	}
+	info->flags = cb->xflags;
+}
+
+static void HMARK_ip4_parse(struct xt_option_call *cb)
+{
+	HMARK_parse(cb, 32);
+}
+static void HMARK_ip6_parse(struct xt_option_call *cb)
+{
+	HMARK_parse(cb, 128);
+}
+
+static void HMARK_check(struct xt_fcheck_call *cb)
+{
+	if (!(cb->xflags & XT_HMARK_FLAG(XT_HMARK_MODULUS)))
+		xtables_error(PARAMETER_PROBLEM, "HMARK: the --hmark-mod, "
+			      "is not set, or zero wich is a div by zero");
+	/* Check for invalid options */
+	if (cb->xflags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3) &&
+	   (cb->xflags & XT_F_HMARK_L4_OPTS))
+		xtables_error(PARAMETER_PROBLEM, "HMARK: --hmark-method L3, "
+			      "can not be combined by an Layer 4 options: "
+			      "port, spi or proto ");
+}
+/*
+ * Common print for IPv4 & IPv6
+ */
+static void HMARK_print(const struct xt_hmark_info *info)
+{
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3)) {
+		printf("method L3 ");
+	} else {
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3_4))
+			printf("method L3-4 ");
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPORT_AND))
+			printf("sport-mask 0x%x ",
+			       htons(info->port_mask.p16.src));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_DPORT_AND))
+			printf("dport-mask 0x%x ",
+			       htons(info->port_mask.p16.dst));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPI_AND))
+			printf("spi-mask 0x%x ", htonl(info->spi_mask));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPORT_OR))
+			printf("sport-set 0x%x ",
+			       htons(info->port_set.p16.src));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_DPORT_OR))
+			printf("dport-set 0x%x ",
+			       htons(info->port_set.p16.dst));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPI_OR))
+			printf("spi-set 0x%x ", htonl(info->spi_set));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_PROTO_AND))
+			printf("proto-mask 0x%x ", info->proto_mask);
+	}
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_RND))
+		printf("rnd 0x%x ", info->hashrnd);
+
+}
+
+static void HMARK_ip6_print(const void *ip,
+			    const struct xt_entry_target *target, int numeric)
+{
+	const struct xt_hmark_info *info =
+			(const struct xt_hmark_info *)target->data;
+
+	printf(" HMARK ");
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_MODULUS))
+		printf("%% 0x%x ", info->hmodulus);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_OFFSET))
+		printf("+ 0x%x ", info->hoffset);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT))
+		printf("ct, ");
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_SADR_AND))
+		printf("src-mask %s ",
+		       xtables_ip6mask_to_numeric(&info->src_mask.in6) + 1);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_DADR_AND))
+		printf("dst-mask %s ",
+		       xtables_ip6mask_to_numeric(&info->dst_mask.in6) + 1);
+	HMARK_print(info);
+}
+static void HMARK_ip4_print(const void *ip,
+			    const struct xt_entry_target *target, int numeric)
+{
+	const struct xt_hmark_info *info =
+		(const struct xt_hmark_info *)target->data;
+
+	printf(" HMARK ");
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_MODULUS))
+		printf("%% 0x%x ", info->hmodulus);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_OFFSET))
+		printf("+ 0x%x ", info->hoffset);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT))
+		printf("ct, ");
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_SADR_AND))
+		printf("src-mask %s ",
+		       xtables_ipmask_to_numeric(&info->src_mask.in) + 1);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_DADR_AND))
+		printf("dst-mask %s ",
+		       xtables_ipmask_to_numeric(&info->dst_mask.in) + 1);
+	HMARK_print(info);
+}
+static void HMARK_save(const struct xt_hmark_info *info)
+{
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3)) {
+		printf(" --hmark-method L3");
+	} else {
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3_4))
+			printf(" --hmark-method L3-4");
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPORT_AND))
+			printf(" --hmark-sport-mask 0x%x",
+			       htons(info->port_mask.p16.src));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_DPORT_AND))
+			printf(" --hmark-dport-mask 0x%x",
+			       htons(info->port_mask.p16.dst));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPI_AND))
+			printf(" --hmark-spi-mask 0x%x",
+			       htonl(info->spi_mask));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPORT_OR))
+			printf(" --hmark-sport-set 0x%x",
+			       htons(info->port_set.p16.src));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_DPORT_OR))
+			printf(" --hmark-dport-set 0x%x",
+			       htons(info->port_set.p16.dst));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPI_OR))
+			printf(" --hmark-spi-set 0x%x", htonl(info->spi_set));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_PROTO_AND))
+			printf(" --hmark-proto-mask 0x%x", info->proto_mask);
+	}
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_RND))
+		printf(" --hmark-rnd 0x%x", info->hashrnd);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_MODULUS))
+		printf(" --hmark-mod 0x%x", info->hmodulus);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_OFFSET))
+		printf(" --hmark-offset 0x%x", info->hoffset);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT))
+		printf(" --hmark-ct");
+}
+
+static void HMARK_ip6_save(const void *ip, const struct xt_entry_target *target)
+{
+	const struct xt_hmark_info *info =
+		(const struct xt_hmark_info *)target->data;
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_SADR_AND))
+		printf(" --hmark-src-mask %s",
+		       xtables_ip6mask_to_numeric(&info->src_mask.in6) + 1);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_DADR_AND))
+		printf(" --hmark-dst-mask %s",
+		       xtables_ip6mask_to_numeric(&info->dst_mask.in6) + 1);
+	HMARK_save(info);
+}
+
+static void HMARK_ip4_save(const void *ip, const struct xt_entry_target *target)
+{
+	const struct xt_hmark_info *info =
+		(const struct xt_hmark_info *)target->data;
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_SADR_AND))
+		printf(" --hmark-src-mask %s",
+		       xtables_ipmask_to_numeric(&info->src_mask.in) + 1);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_DADR_AND))
+		printf(" --hmark-dst-mask %s",
+		       xtables_ipmask_to_numeric(&info->dst_mask.in) + 1);
+	HMARK_save(info);
+}
+
+static struct xtables_target mark_tg_reg[] = {
+	{
+		.family        = NFPROTO_IPV4,
+		.name          = "HMARK",
+		.version       = XTABLES_VERSION,
+		.revision      = 0,
+		.size          = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.help          = HMARK_help,
+		.print         = HMARK_ip4_print,
+		.save          = HMARK_ip4_save,
+		.x6_parse      = HMARK_ip4_parse,
+		.x6_fcheck     = HMARK_check,
+		.x6_options    = HMARK_opts,
+	},
+	{
+		.family        = NFPROTO_IPV6,
+		.name          = "HMARK",
+		.version       = XTABLES_VERSION,
+		.revision      = 0,
+		.size          = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.help          = HMARK_help,
+		.print         = HMARK_ip6_print,
+		.save          = HMARK_ip6_save,
+		.x6_parse      = HMARK_ip6_parse,
+		.x6_fcheck     = HMARK_check,
+		.x6_options    = HMARK_opts,
+	},
+};
+
+void _init(void)
+{
+	xtables_register_targets(mark_tg_reg, ARRAY_SIZE(mark_tg_reg));
+}
diff --git a/extensions/libxt_HMARK.man b/extensions/libxt_HMARK.man
new file mode 100644
index 0000000..c258e59
--- /dev/null
+++ b/extensions/libxt_HMARK.man
@@ -0,0 +1,84 @@
+This module does the same as MARK, i.e. set an fwmark, but the mark is based on a hash value.
+The hash is based on src-addr, dst-addr, sport, dport and proto. The same mark will be produced independent of direction if no masks is set or the same masks is used for src and dest.
+The hash mark could be adjusted by modulus and finally an offset could be added, i.e the final mark will be within a range.
+ICMP error will use the the original message for hash calculation not the icmp it self.
+
+Note: IPv4 packets with nf_defrag_ipv4 loaded will be defragmented before they reach hmark,
+      IPv6 nf_defrag is not implemented this way, hence fragmented ipv6 packets will reach hmark.
+      Default behavior is to completely ignore any fragment if it reach hmark.
+      --hmark-method L3 is fragment safe since neither ports or L4 protocol field is used.
+      None of the parameters effect the packet it self only the calculated hash value.
+
+.PP
+Parameters:
+Short hand methods
+.TP
+\fB\-\-hmark\-method\fP \fIL3\fP
+Do not use L4 protocol field, ports or spi, only Layer 3 addresses, mask length
+of L3 addresses can still be used. Fragment or not does not matter in
+this case since only L3 address can be used in calc. of hash value.
+.TP
+\fB\-\-hmark\-method\fP \fIL3-4\fP (Default)
+Include L4 in calculation. of hash value i.e. all masks below are valid.
+Fragments will be ignored. (i.e no hash value produced)
+.PP
+For all masks default is all "1:s", to disable a field use mask 0
+.TP
+\fB\-\-hmark\-src\-mask\fP \fIlength\fP
+The length of the mask to AND the source address with (saddr & value).
+.TP
+\fB\-\-hmark\-dst\-mask\fP \fIlength\fP
+The length of the mask to AND the dest. address with (daddr & value).
+.TP
+\fB\-\-hmark\-sport\-mask\fP \fIvalue\fP
+A 16 bit value to AND the src port with (sport & value).
+.TP
+\fB\-\-hmark\-dport\-mask\fP \fIvalue\fP
+A 16 bit value to AND the dest port with (dport & value).
+.TP
+\fB\-\-hmark\-sport\-set\fP \fIvalue\fP
+A 16 bit value to OR the src port with (sport | value).
+.TP
+\fB\-\-hmark\-dport\-set\fP \fIvalue\fP
+A 16 bit value to OR the dest port with (dport | value).
+.TP
+\fB\-\-hmark\-spi\-mask\fP \fIvalue\fP
+Value to AND the spi field with (spi & value) valid for proto esp or ah.
+.TP
+\fB\-\-hmark\-spi\-set\fP \fIvalue\fP
+Value to OR the spi field with (spi | value) valid for proto esp or ah.
+.TP
+\fB\-\-hmark\-proto\-mask\fP \fIvalue\fP
+An 8 bit value to AND the L4 proto field with (proto & value).
+.TP
+\fB\-\-hmark\-ct\fP
+When flag is set, conntrack data should be used. Useful when NAT internal addressed should be used in calculation.
+Be careful when using DNAT since mangle table is handled before nat table. I.e it will not work as expected to put HMARK in table mangle and PREROUTING chain. The initial packet will have it's hash based on the original address, while the rest of the flow will use the NAT:ed address.
+.TP
+\fB\-\-hmark\-rnd\fP \fIvalue\fP
+A 32 bit initial value for hash calc, default is 0xc175a3b8.
+.PP
+Final processing of the mark in order of execution.
+.TP
+\fB\-\-hmark\-mod\fP \fIvalue (must be > 0)\fP
+The easiest way to describe this is:  hash = hash mod <value>
+.TP
+\fB\-\-hmark\-offset\fP \fIvalue\fP
+The easiest way to describe this is:  hash = hash + <value>
+.PP
+\fIExamples:\fP
+.PP
+Default rule handles all TCP, UDP, SCTP, ESP & AH
+.IP
+iptables \-t mangle \-A PREROUTING \-m state \-\-state NEW,ESTABLISHED,RELATED
+ \-j HMARK \-\-hmark-offs 10000 \-\-hmark-mod 10
+.PP
+Handle SCTP and hash dest port only and produce a nfmark between 100-119.
+.IP
+iptables \-t mangle \-A PREROUTING -p SCTP \-j HMARK \-\-src\-mask 0 \-\-dst\-mask 0
+ \-\-sp\-mask 0 \-\-offset 100 \-\-mod 20
+.PP
+Fragment safe Layer 3 only that keep a class C network flow together
+.IP
+iptables \-t mangle \-A PREROUTING \-j HMARK \-\-method L3 \-\-src\-mask 24 \-\-mod 20 \-\-offset 100
+
diff --git a/include/linux/netfilter/xt_HMARK.h b/include/linux/netfilter/xt_HMARK.h
new file mode 100644
index 0000000..05e43ba
--- /dev/null
+++ b/include/linux/netfilter/xt_HMARK.h
@@ -0,0 +1,48 @@
+#ifndef XT_HMARK_H_
+#define XT_HMARK_H_
+
+#include <linux/types.h>
+
+enum {
+	XT_HMARK_NONE,
+	XT_HMARK_SADR_AND,
+	XT_HMARK_DADR_AND,
+	XT_HMARK_SPI_AND,
+	XT_HMARK_SPI_OR,
+	XT_HMARK_SPORT_AND,
+	XT_HMARK_DPORT_AND,
+	XT_HMARK_SPORT_OR,
+	XT_HMARK_DPORT_OR,
+	XT_HMARK_PROTO_AND,
+	XT_HMARK_RND,
+	XT_HMARK_MODULUS,
+	XT_HMARK_OFFSET,
+	XT_HMARK_CT,
+	XT_HMARK_METHOD_L3,
+	XT_HMARK_METHOD_L3_4,
+};
+#define XT_HMARK_FLAG(flag)	(1 << flag)
+
+union hmark_ports {
+	struct {
+		__u16	src;
+		__u16	dst;
+	} p16;
+	__u32	v32;
+};
+
+struct xt_hmark_info {
+	union nf_inet_addr	src_mask;	/* Source address mask */
+	union nf_inet_addr	dst_mask;	/* Dest address mask */
+	union hmark_ports	port_mask;
+	union hmark_ports	port_set;
+	__u32			spi_mask;
+	__u32			spi_set;
+	__u32			flags;		/* Print out only */
+	__u16			proto_mask;	/* L4 Proto mask */
+	__u32			hashrnd;
+	__u32			hmodulus;	/* Modulus */
+	__u32			hoffset;	/* Offset */
+};
+
+#endif /* XT_HMARK_H_ */
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
  2012-05-07  8:20             ` Hans Schillstrom
@ 2012-05-07  9:03               ` Pablo Neira Ayuso
  2012-05-07  9:14                 ` Hans Schillstrom
  0 siblings, 1 reply; 21+ messages in thread
From: Pablo Neira Ayuso @ 2012-05-07  9:03 UTC (permalink / raw)
  To: Hans Schillstrom; +Cc: kaber, jengelh, netfilter-devel, netdev, hans

On Mon, May 07, 2012 at 10:20:42AM +0200, Hans Schillstrom wrote:
> On Monday 07 May 2012 00:57:38 Pablo Neira Ayuso wrote:
> > Hi Hans,
> > 
> > [...]
> > > > > > Regarding ICMP traffic, I think we can use the ID field for the
> > > > > > hashing as well. Thus, we handle ICMP like other protocols.
> > > > >
> > > > > Yes why not, I can give it a try.
> > > > >
> > >
> > > I think we wait with this one..
> > 
> > I see. This is easy to add for the conntrack side, but it will require
> > some extra code for the packet-based solution.
> 
> Actually I think there is very little gain to spread with type 
> and then we must add a user mode possibility to turn it off 
> i.e. a --hmark-icmp-type-mask 
> 
> > Not directly related to this but, I know that your intention is to
> > make this as flexible as possible. However, I still don't find how I
> > would use the port mask feature in any of my setups.  Basically, I
> > don't come up with any useful example for this situation.
> 
> We have plenty of rules where just source port mask is zero.
> and the dest-port-mask is 0xfffc (or 0xffff)

0xffff and 0x0000 means on/off respectively.

Still curious, how can 0xfffc be useful?

> > I'm also telling this because I think that ICMP support will be
> > easier to add if port masking is removed.
> > 
> > [...]
> > > This is what I have done.
> > >
> > > - I reduced the code size a little bit by combining the hmark_ct_set_htuple_ipvX into one func.
> > >   by adding a hmark_addr6_mask() and hmark_addr_any_mask()
> > >   Note that using "otuple->src.l3num" as param 1 in both src and dst is not a typo.
> > >   (it's not set in the rtuple)
> > 
> > Good one, this made the code even smaller.
> > 
> > > - Made the if (dst < src) swap() in the hmark_hash() since it should be used by every caller.
> > 
> > Not really, you don't need for the conntrack part. The original tuple
> > is always the same, not matter where the packet is coming from. I have
> > removed this again so it only affects packet-based hashing.
> 
> Yes original tuple is always the same but not always less than the rtuple.
> If you have two nodes that should produce the same hmark,
> one with conntrack an one without you must make a compare to make it consistent.

I see, for consistency still makes sense although this seems to me
like still strange configuration. In what scenario would you use two
different approaches?

> > > - Moved the L3 check a little bit earlier.
> > 
> > good.
> > 
> > > - changed return values for fragments.
> > 
> > With this, you're giving up on trying to classify fragments. Do you
> > really want this?
> > 
> > From my point of view, if your firewalls (assuming they are the HMARK
> > classification) are stateless, it still makes sense to me to classify
> > fragments using the XT_HMARK_METHOD_L3_4.
> 
> I do agree, it is back to "return 0" again.

OK.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
  2012-05-07  9:03               ` Pablo Neira Ayuso
@ 2012-05-07  9:14                 ` Hans Schillstrom
  2012-05-07 11:56                   ` Pablo Neira Ayuso
  0 siblings, 1 reply; 21+ messages in thread
From: Hans Schillstrom @ 2012-05-07  9:14 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: kaber, jengelh, netfilter-devel, netdev, hans

On Monday 07 May 2012 11:03:28 Pablo Neira Ayuso wrote:
> On Mon, May 07, 2012 at 10:20:42AM +0200, Hans Schillstrom wrote:
> > On Monday 07 May 2012 00:57:38 Pablo Neira Ayuso wrote:
> > > Hi Hans,
> > > 
> > > [...]
> > > > > > > Regarding ICMP traffic, I think we can use the ID field for the
> > > > > > > hashing as well. Thus, we handle ICMP like other protocols.
> > > > > >
> > > > > > Yes why not, I can give it a try.
> > > > > >
> > > >
> > > > I think we wait with this one..
> > > 
> > > I see. This is easy to add for the conntrack side, but it will require
> > > some extra code for the packet-based solution.
> > 
> > Actually I think there is very little gain to spread with type 
> > and then we must add a user mode possibility to turn it off 
> > i.e. a --hmark-icmp-type-mask 
> > 
> > > Not directly related to this but, I know that your intention is to
> > > make this as flexible as possible. However, I still don't find how I
> > > would use the port mask feature in any of my setups.  Basically, I
> > > don't come up with any useful example for this situation.
> > 
> > We have plenty of rules where just source port mask is zero.
> > and the dest-port-mask is 0xfffc (or 0xffff)
> 
> 0xffff and 0x0000 means on/off respectively.
> 
> Still curious, how can 0xfffc be useful?

That's a special case where an appl is using 4 ports.
But in general, have not seen other than "on/off" except for above.

> 
> > > I'm also telling this because I think that ICMP support will be
> > > easier to add if port masking is removed.
> > > 
> > > [...]
> > > > This is what I have done.
> > > >
> > > > - I reduced the code size a little bit by combining the hmark_ct_set_htuple_ipvX into one func.
> > > >   by adding a hmark_addr6_mask() and hmark_addr_any_mask()
> > > >   Note that using "otuple->src.l3num" as param 1 in both src and dst is not a typo.
> > > >   (it's not set in the rtuple)
> > > 
> > > Good one, this made the code even smaller.
> > > 
> > > > - Made the if (dst < src) swap() in the hmark_hash() since it should be used by every caller.
> > > 
> > > Not really, you don't need for the conntrack part. The original tuple
> > > is always the same, not matter where the packet is coming from. I have
> > > removed this again so it only affects packet-based hashing.
> > 
> > Yes original tuple is always the same but not always less than the rtuple.
> > If you have two nodes that should produce the same hmark,
> > one with conntrack an one without you must make a compare to make it consistent.
> 
> I see, for consistency still makes sense although this seems to me
> like still strange configuration. In what scenario would you use two
> different approaches?

In the way that we use HMARK,
in the incomming path there is conntrack disabled in the contrainer, 
for the outgoing patch i.e. at the payloads there is conntrack used.
In that case the --hmark-ct makes life easier.

> 
> > > > - Moved the L3 check a little bit earlier.
> > > 
> > > good.
> > > 
> > > > - changed return values for fragments.
> > > 
> > > With this, you're giving up on trying to classify fragments. Do you
> > > really want this?
> > > 
> > > From my point of view, if your firewalls (assuming they are the HMARK
> > > classification) are stateless, it still makes sense to me to classify
> > > fragments using the XT_HMARK_METHOD_L3_4.
> > 
> > I do agree, it is back to "return 0" again.
> 
> OK.
> 

-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
  2012-05-07  9:14                 ` Hans Schillstrom
@ 2012-05-07 11:56                   ` Pablo Neira Ayuso
  2012-05-07 12:09                     ` Hans Schillstrom
  0 siblings, 1 reply; 21+ messages in thread
From: Pablo Neira Ayuso @ 2012-05-07 11:56 UTC (permalink / raw)
  To: Hans Schillstrom; +Cc: kaber, jengelh, netfilter-devel, netdev, hans

On Mon, May 07, 2012 at 11:14:34AM +0200, Hans Schillstrom wrote:
> > > We have plenty of rules where just source port mask is zero.
> > > and the dest-port-mask is 0xfffc (or 0xffff)
> > 
> > 0xffff and 0x0000 means on/off respectively.
> > 
> > Still curious, how can 0xfffc be useful?
> 
> That's a special case where an appl is using 4 ports.
> But in general, have not seen other than "on/off" except for above.

I see. Well I'm fine with this way to switch on/off things, just
wanted some clafication.

Still one final thing I'd like to remove before inclusion:

+       union hmark_ports       port_mask;
+       union hmark_ports       port_set;
+       __u32                   spi_mask;
+       __u32                   spi_set;

the spi_mask seems redundant. The port_mask already provides u32 for
it.

In case you want to support different masks for AH/ESP and TCP, you
could do the following:

iptables -I PREROUTING -t mangle -p esp -j HARK --spi-mask 0xffff0000
iptables -I PREROUTING -t mangle -p tcp -j HARK --port-mask 0xfffc

Any objection?

Yes, you'll have to change user-space again, but we have time for
that.

> > > > I'm also telling this because I think that ICMP support will be
> > > > easier to add if port masking is removed.
> > > > 
> > > > [...]
> > > > > This is what I have done.
> > > > >
> > > > > - I reduced the code size a little bit by combining the hmark_ct_set_htuple_ipvX into one func.
> > > > >   by adding a hmark_addr6_mask() and hmark_addr_any_mask()
> > > > >   Note that using "otuple->src.l3num" as param 1 in both src and dst is not a typo.
> > > > >   (it's not set in the rtuple)
> > > > 
> > > > Good one, this made the code even smaller.
> > > > 
> > > > > - Made the if (dst < src) swap() in the hmark_hash() since it should be used by every caller.
> > > > 
> > > > Not really, you don't need for the conntrack part. The original tuple
> > > > is always the same, not matter where the packet is coming from. I have
> > > > removed this again so it only affects packet-based hashing.
> > > 
> > > Yes original tuple is always the same but not always less than the rtuple.
> > > If you have two nodes that should produce the same hmark,
> > > one with conntrack an one without you must make a compare to make it consistent.
> > 
> > I see, for consistency still makes sense although this seems to me
> > like still strange configuration. In what scenario would you use two
> > different approaches?
> 
> In the way that we use HMARK,
> in the incomming path there is conntrack disabled in the contrainer, 
> for the outgoing patch i.e. at the payloads there is conntrack used.
> In that case the --hmark-ct makes life easier.

That's still not enough to guarantee that the mark will be consistent
if NAT is in user, but I don't mind recovering the swap and add some
comment on the code to explain this if this makes your life easier.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
  2012-05-07 11:56                   ` Pablo Neira Ayuso
@ 2012-05-07 12:09                     ` Hans Schillstrom
  2012-05-07 12:22                       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 21+ messages in thread
From: Hans Schillstrom @ 2012-05-07 12:09 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: kaber, jengelh, netfilter-devel, netdev, hans

On Monday 07 May 2012 13:56:12 Pablo Neira Ayuso wrote:
> On Mon, May 07, 2012 at 11:14:34AM +0200, Hans Schillstrom wrote:
> > > > We have plenty of rules where just source port mask is zero.
> > > > and the dest-port-mask is 0xfffc (or 0xffff)
> > > 
> > > 0xffff and 0x0000 means on/off respectively.
> > > 
> > > Still curious, how can 0xfffc be useful?
> > 
> > That's a special case where an appl is using 4 ports.
> > But in general, have not seen other than "on/off" except for above.
> 
> I see. Well I'm fine with this way to switch on/off things, just
> wanted some clafication.
> 
> Still one final thing I'd like to remove before inclusion:
> 
> +       union hmark_ports       port_mask;
> +       union hmark_ports       port_set;
> +       __u32                   spi_mask;
> +       __u32                   spi_set;
> 
> the spi_mask seems redundant. The port_mask already provides u32 for
> it.

No problems, I'll remove it.

> In case you want to support different masks for AH/ESP and TCP, you
> could do the following:
> 
> iptables -I PREROUTING -t mangle -p esp -j HARK --spi-mask 0xffff0000
> iptables -I PREROUTING -t mangle -p tcp -j HARK --port-mask 0xfffc
> 
> Any objection?

I don't think this is a problem, but it should be written in the man page
that ports and spi share mask so they can't be used at the same time.


> Yes, you'll have to change user-space again, but we have time for
> that.

:-)

> 
> > > > > I'm also telling this because I think that ICMP support will be
> > > > > easier to add if port masking is removed.
> > > > > 
> > > > > [...]
> > > > > > This is what I have done.
> > > > > >
> > > > > > - I reduced the code size a little bit by combining the hmark_ct_set_htuple_ipvX into one func.
> > > > > >   by adding a hmark_addr6_mask() and hmark_addr_any_mask()
> > > > > >   Note that using "otuple->src.l3num" as param 1 in both src and dst is not a typo.
> > > > > >   (it's not set in the rtuple)
> > > > > 
> > > > > Good one, this made the code even smaller.
> > > > > 
> > > > > > - Made the if (dst < src) swap() in the hmark_hash() since it should be used by every caller.
> > > > > 
> > > > > Not really, you don't need for the conntrack part. The original tuple
> > > > > is always the same, not matter where the packet is coming from. I have
> > > > > removed this again so it only affects packet-based hashing.
> > > > 
> > > > Yes original tuple is always the same but not always less than the rtuple.
> > > > If you have two nodes that should produce the same hmark,
> > > > one with conntrack an one without you must make a compare to make it consistent.
> > > 
> > > I see, for consistency still makes sense although this seems to me
> > > like still strange configuration. In what scenario would you use two
> > > different approaches?
> > 
> > In the way that we use HMARK,
> > in the incomming path there is conntrack disabled in the contrainer, 
> > for the outgoing patch i.e. at the payloads there is conntrack used.
> > In that case the --hmark-ct makes life easier.
> 
> That's still not enough to guarantee that the mark will be consistent
> if NAT is in user, but I don't mind recovering the swap and add some
> comment on the code to explain this if this makes your life easier.

Thanks,  I will send a new patch soon.

-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
  2012-05-07 12:09                     ` Hans Schillstrom
@ 2012-05-07 12:22                       ` Pablo Neira Ayuso
  2012-05-07 12:57                         ` Hans Schillstrom
  2012-05-08  7:37                         ` Hans Schillstrom
  0 siblings, 2 replies; 21+ messages in thread
From: Pablo Neira Ayuso @ 2012-05-07 12:22 UTC (permalink / raw)
  To: Hans Schillstrom; +Cc: kaber, jengelh, netfilter-devel, netdev, hans

On Mon, May 07, 2012 at 02:09:46PM +0200, Hans Schillstrom wrote:
> On Monday 07 May 2012 13:56:12 Pablo Neira Ayuso wrote:
> > On Mon, May 07, 2012 at 11:14:34AM +0200, Hans Schillstrom wrote:
> > > > > We have plenty of rules where just source port mask is zero.
> > > > > and the dest-port-mask is 0xfffc (or 0xffff)
> > > > 
> > > > 0xffff and 0x0000 means on/off respectively.
> > > > 
> > > > Still curious, how can 0xfffc be useful?
> > > 
> > > That's a special case where an appl is using 4 ports.
> > > But in general, have not seen other than "on/off" except for above.
> > 
> > I see. Well I'm fine with this way to switch on/off things, just
> > wanted some clafication.
> > 
> > Still one final thing I'd like to remove before inclusion:
> > 
> > +       union hmark_ports       port_mask;
> > +       union hmark_ports       port_set;
> > +       __u32                   spi_mask;
> > +       __u32                   spi_set;
> > 
> > the spi_mask seems redundant. The port_mask already provides u32 for
> > it.
> 
> No problems, I'll remove it.

OK. As a nice side-effect, this will lead to removing the branch that
tests ESP/AH in hmark_set_tuple_ports.

Please, use the patch that I sent you yesterday. Recover the swap
behaviour that you need, I'll mangle the patch myself to add the
little comment to explain why we do this with CT as well.

BTW, note that you do *not* have to remove the XT_HMARK_SPI flags, we
still need those for iptables-save.

While at it:

+enum {                      
+       XT_HMARK_NONE,       
+       XT_HMARK_SADR_AND,   
+       XT_HMARK_DADR_AND,   
+       XT_HMARK_SPI_AND,    
+       XT_HMARK_SPI_OR,    

remove all trailing _OR

+       XT_HMARK_SPORT_AND,  
+       XT_HMARK_DPORT_AND,  
+       XT_HMARK_SPORT_OR,   
+       XT_HMARK_DPORT_OR,   
+       XT_HMARK_PROTO_AND,

rename all _AND by _MASK.

+       XT_HMARK_RND,        
+       XT_HMARK_MODULUS,    
+       XT_HMARK_OFFSET,     
+       XT_HMARK_CT,         
+       XT_HMARK_METHOD_L3,  
+       XT_HMARK_METHOD_L3_4,
};

What I'm asking should require very little changes in the kernel-code.

> > In case you want to support different masks for AH/ESP and TCP, you
> > could do the following:
> > 
> > iptables -I PREROUTING -t mangle -p esp -j HARK --spi-mask 0xffff0000
> > iptables -I PREROUTING -t mangle -p tcp -j HARK --port-mask 0xfffc
> > 
> > Any objection?
> 
> I don't think this is a problem, but it should be written in the man page
> that ports and spi share mask so they can't be used at the same time.

documentation is fine.

iptables can stop this by spotting a warning message from user-space.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
  2012-05-07 12:22                       ` Pablo Neira Ayuso
@ 2012-05-07 12:57                         ` Hans Schillstrom
  2012-05-07 14:54                           ` Pablo Neira Ayuso
  2012-05-08  7:37                         ` Hans Schillstrom
  1 sibling, 1 reply; 21+ messages in thread
From: Hans Schillstrom @ 2012-05-07 12:57 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: kaber, jengelh, netfilter-devel, netdev, hans

On Monday 07 May 2012 14:22:32 Pablo Neira Ayuso wrote:
> On Mon, May 07, 2012 at 02:09:46PM +0200, Hans Schillstrom wrote:
> > On Monday 07 May 2012 13:56:12 Pablo Neira Ayuso wrote:
> > > On Mon, May 07, 2012 at 11:14:34AM +0200, Hans Schillstrom wrote:
> > > > > > We have plenty of rules where just source port mask is zero.
> > > > > > and the dest-port-mask is 0xfffc (or 0xffff)
> > > > > 
> > > > > 0xffff and 0x0000 means on/off respectively.
> > > > > 
> > > > > Still curious, how can 0xfffc be useful?
> > > > 
> > > > That's a special case where an appl is using 4 ports.
> > > > But in general, have not seen other than "on/off" except for above.
> > > 
> > > I see. Well I'm fine with this way to switch on/off things, just
> > > wanted some clafication.
> > > 
> > > Still one final thing I'd like to remove before inclusion:
> > > 
> > > +       union hmark_ports       port_mask;
> > > +       union hmark_ports       port_set;
> > > +       __u32                   spi_mask;
> > > +       __u32                   spi_set;
> > > 
> > > the spi_mask seems redundant. The port_mask already provides u32 for
> > > it.
> > 
> > No problems, I'll remove it.
> 
> OK. As a nice side-effect, this will lead to removing the branch that
> tests ESP/AH in hmark_set_tuple_ports.
>
Yes, only check if not ESP or AH to swap src/dst

+static void
+hmark_set_tuple_ports(const struct sk_buff *skb, unsigned int nhoff,
+		      struct hmark_tuple *t, const struct xt_hmark_info *info)
+{
+	int protoff;
+
+	protoff = proto_ports_offset(t->proto);
+	if (protoff < 0)
+		return;
+
+	nhoff += protoff;
+	if (skb_copy_bits(skb, nhoff, &t->uports, sizeof(t->uports)) < 0)
+		return;
+
+	t->uports.v32 = (t->uports.v32 & info->port_mask.v32) |
+			info->port_set.v32;
+
+	if (t->proto != IPPROTO_ESP && t->proto != IPPROTO_AH)
+		if (t->uports.p16.dst < t->uports.p16.src)
+			swap(t->uports.p16.dst, t->uports.p16.src);
+}

> Please, use the patch that I sent you yesterday. Recover the swap
> behaviour that you need, I'll mangle the patch myself to add the
> little comment to explain why we do this with CT as well.
> 
> BTW, note that you do *not* have to remove the XT_HMARK_SPI flags, we
> still need those for iptables-save.
> 
> While at it:
> 
> +enum {                      
> +       XT_HMARK_NONE,       
> +       XT_HMARK_SADR_AND,   
> +       XT_HMARK_DADR_AND,   
> +       XT_HMARK_SPI_AND,    
> +       XT_HMARK_SPI_OR,    
> 
> remove all trailing _OR
> 
> +       XT_HMARK_SPORT_AND,  
> +       XT_HMARK_DPORT_AND,  
> +       XT_HMARK_SPORT_OR,   
> +       XT_HMARK_DPORT_OR,   
> +       XT_HMARK_PROTO_AND,
> 
> rename all _AND by _MASK.
> 
> +       XT_HMARK_RND,        
> +       XT_HMARK_MODULUS,    
> +       XT_HMARK_OFFSET,     
> +       XT_HMARK_CT,         
> +       XT_HMARK_METHOD_L3,  
> +       XT_HMARK_METHOD_L3_4,
> };
> 
> What I'm asking should require very little changes in the kernel-code.
> 

I'll send you the updates later to day

> > > In case you want to support different masks for AH/ESP and TCP, you
> > > could do the following:
> > > 
> > > iptables -I PREROUTING -t mangle -p esp -j HARK --spi-mask 0xffff0000
> > > iptables -I PREROUTING -t mangle -p tcp -j HARK --port-mask 0xfffc
> > > 
> > > Any objection?
> > 
> > I don't think this is a problem, but it should be written in the man page
> > that ports and spi share mask so they can't be used at the same time.
> 
> documentation is fine.
> 
> iptables can stop this by spotting a warning message from user-space.

If you think thats enough, I fine with that.

-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
  2012-05-07 12:57                         ` Hans Schillstrom
@ 2012-05-07 14:54                           ` Pablo Neira Ayuso
  0 siblings, 0 replies; 21+ messages in thread
From: Pablo Neira Ayuso @ 2012-05-07 14:54 UTC (permalink / raw)
  To: Hans Schillstrom; +Cc: kaber, jengelh, netfilter-devel, netdev, hans

On Mon, May 07, 2012 at 02:57:30PM +0200, Hans Schillstrom wrote:
> On Monday 07 May 2012 14:22:32 Pablo Neira Ayuso wrote:
> > On Mon, May 07, 2012 at 02:09:46PM +0200, Hans Schillstrom wrote:
> > > On Monday 07 May 2012 13:56:12 Pablo Neira Ayuso wrote:
> > > > On Mon, May 07, 2012 at 11:14:34AM +0200, Hans Schillstrom wrote:
> > > > > > > We have plenty of rules where just source port mask is zero.
> > > > > > > and the dest-port-mask is 0xfffc (or 0xffff)
> > > > > > 
> > > > > > 0xffff and 0x0000 means on/off respectively.
> > > > > > 
> > > > > > Still curious, how can 0xfffc be useful?
> > > > > 
> > > > > That's a special case where an appl is using 4 ports.
> > > > > But in general, have not seen other than "on/off" except for above.
> > > > 
> > > > I see. Well I'm fine with this way to switch on/off things, just
> > > > wanted some clafication.
> > > > 
> > > > Still one final thing I'd like to remove before inclusion:
> > > > 
> > > > +       union hmark_ports       port_mask;
> > > > +       union hmark_ports       port_set;
> > > > +       __u32                   spi_mask;
> > > > +       __u32                   spi_set;
> > > > 
> > > > the spi_mask seems redundant. The port_mask already provides u32 for
> > > > it.
> > > 
> > > No problems, I'll remove it.
> > 
> > OK. As a nice side-effect, this will lead to removing the branch that
> > tests ESP/AH in hmark_set_tuple_ports.
> >
> Yes, only check if not ESP or AH to swap src/dst

Do you really that branch? I mean, unless I'm missing anything, swapping
them shouldn't be a problem.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
  2012-05-07 12:22                       ` Pablo Neira Ayuso
  2012-05-07 12:57                         ` Hans Schillstrom
@ 2012-05-08  7:37                         ` Hans Schillstrom
  2012-05-09 10:38                           ` Pablo Neira Ayuso
  1 sibling, 1 reply; 21+ messages in thread
From: Hans Schillstrom @ 2012-05-08  7:37 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: kaber, jengelh, netfilter-devel, netdev, hans

[-- Attachment #1: Type: text/plain, Size: 1517 bytes --]

On Monday 07 May 2012 14:22:32 Pablo Neira Ayuso wrote:
> On Mon, May 07, 2012 at 02:09:46PM +0200, Hans Schillstrom wrote:
> > On Monday 07 May 2012 13:56:12 Pablo Neira Ayuso wrote:
> > > On Mon, May 07, 2012 at 11:14:34AM +0200, Hans Schillstrom wrote:
> > > > > > We have plenty of rules where just source port mask is zero.
> > > > > > and the dest-port-mask is 0xfffc (or 0xffff)
> > > > > 
> > > > > 0xffff and 0x0000 means on/off respectively.
> > > > > 
> > > > > Still curious, how can 0xfffc be useful?
> > > > 
> > > > That's a special case where an appl is using 4 ports.
> > > > But in general, have not seen other than "on/off" except for above.
> > > 
> > > I see. Well I'm fine with this way to switch on/off things, just
> > > wanted some clafication.
> > > 
> > > Still one final thing I'd like to remove before inclusion:
> > > 
> > > +       union hmark_ports       port_mask;
> > > +       union hmark_ports       port_set;
> > > +       __u32                   spi_mask;
> > > +       __u32                   spi_set;
> > > 
> > > the spi_mask seems redundant. The port_mask already provides u32 for
> > > it.
> > 
> > No problems, I'll remove it.
> 

Done,

> OK. As a nice side-effect, this will lead to removing the branch that
> tests ESP/AH in hmark_set_tuple_ports.

Yes,

[snip]
> remove all trailing _OR
> rename all _AND by _MASK.
Done

[snip]
> iptables can stop this by spotting a warning message from user-space.
Done.


-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

[-- Attachment #2: 0001-netfilter-add-xt_hmark-target-for-hash-based-skb-mar.patch --]
[-- Type: text/x-patch, Size: 14512 bytes --]

From d5065af3988cc7561a02f30bae8342e1a89126a4 Mon Sep 17 00:00:00 2001
From: Hans Schillstrom <hans.schillstrom@ericsson.com>
Date: Wed, 2 May 2012 07:49:47 +0000
Subject: netfilter: add xt_hmark target for hash-based skb
 marking

The target allows you to create rules in the "raw" and "mangle" tables
which set the skbuff mark by means of hash calculation within a given
range. The nfmark can influence the routing method (see "Use netfilter
MARK value as routing key") and can also be used by other subsystems to
change their behaviour.

Some examples:

* Default rule handles all TCP, UDP, SCTP, ESP & AH

 iptables -t mangle -A PREROUTING -m state --state NEW,ESTABLISHED,RELATED \
	-j HMARK --hmark-offset 10000 --hmark-mod 10

* Handle SCTP and hash dest port only and produce a nfmark between 100-119.

 iptables -t mangle -A PREROUTING -p SCTP -j HMARK --src-mask 0 --dst-mask 0 \
	--sp-mask 0 --offset 100 --mod 20

* Fragment safe Layer 3 only, that keep a class C network flow together

 iptables -t mangle -A PREROUTING -j HMARK --method L3 \
	--src-mask 24 --mod 20 --offset 100

[ A big part of this patch has been refactorized by Pablo Neira Ayuso ]

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/linux/netfilter/xt_HMARK.h |   48 +++++
 net/netfilter/Kconfig              |   15 ++
 net/netfilter/Makefile             |    1 +
 net/netfilter/xt_HMARK.c           |  358 ++++++++++++++++++++++++++++++++++++
 4 files changed, 422 insertions(+)
 create mode 100644 include/linux/netfilter/xt_HMARK.h
 create mode 100644 net/netfilter/xt_HMARK.c

diff --git a/include/linux/netfilter/xt_HMARK.h b/include/linux/netfilter/xt_HMARK.h
new file mode 100644
index 0000000..05e43ba
--- /dev/null
+++ b/include/linux/netfilter/xt_HMARK.h
@@ -0,0 +1,46 @@
+#ifndef XT_HMARK_H_
+#define XT_HMARK_H_
+
+#include <linux/types.h>
+
+enum {
+	XT_HMARK_NONE,
+	XT_HMARK_SADR_MASK,
+	XT_HMARK_DADR_MASK,
+	XT_HMARK_SPI_MASK,
+	XT_HMARK_SPI,
+	XT_HMARK_SPORT_MASK,
+	XT_HMARK_DPORT_MASK,
+	XT_HMARK_SPORT,
+	XT_HMARK_DPORT,
+	XT_HMARK_PROTO_MASK,
+	XT_HMARK_RND,
+	XT_HMARK_MODULUS,
+	XT_HMARK_OFFSET,
+	XT_HMARK_CT,
+	XT_HMARK_METHOD_L3,
+	XT_HMARK_METHOD_L3_4,
+};
+#define XT_HMARK_FLAG(flag)	(1 << flag)
+
+union hmark_ports {
+	struct {
+		__u16	src;
+		__u16	dst;
+	} p16;
+	__u32	v32;
+};
+
+struct xt_hmark_info {
+	union nf_inet_addr	src_mask;	/* Source address mask */
+	union nf_inet_addr	dst_mask;	/* Dest address mask */
+	union hmark_ports	port_mask;
+	union hmark_ports	port_set;
+	__u32			flags;		/* Print out only */
+	__u16			proto_mask;	/* L4 Proto mask */
+	__u32			hashrnd;
+	__u32			hmodulus;	/* Modulus */
+	__u32			hoffset;	/* Offset */
+};
+
+#endif /* XT_HMARK_H_ */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 0c6f67e..209c1ed 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -509,6 +509,21 @@ config NETFILTER_XT_TARGET_HL
 	since you can easily create immortal packets that loop
 	forever on the network.
 
+config NETFILTER_XT_TARGET_HMARK
+	tristate '"HMARK" target support'
+	depends on (IP6_NF_IPTABLES || IP6_NF_IPTABLES=n)
+	depends on NETFILTER_ADVANCED
+	---help---
+	This option adds the "HMARK" target.
+
+	The target allows you to create rules in the "raw" and "mangle" tables
+	which set the skbuff mark by means of hash calculation within a given
+	range. The nfmark can influence the routing method (see "Use netfilter
+	MARK value as routing key") and can also be used by other subsystems to
+	change their behaviour.
+
+	To compile it as a module, choose M here. If unsure, say N.
+
 config NETFILTER_XT_TARGET_IDLETIMER
 	tristate  "IDLETIMER target support"
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index ca36765..4e7960c 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_CONNSECMARK) += xt_CONNSECMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_CT) += xt_CT.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_DSCP) += xt_DSCP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_HL) += xt_HL.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_HMARK) += xt_HMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_LED) += xt_LED.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_LOG) += xt_LOG.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_NFLOG) += xt_NFLOG.o
diff --git a/net/netfilter/xt_HMARK.c b/net/netfilter/xt_HMARK.c
new file mode 100644
index 0000000..b4aa912
--- /dev/null
+++ b/net/netfilter/xt_HMARK.c
@@ -0,0 +1,362 @@
+/*
+ * xt_HMARK - Netfilter module to set mark by means of hashing
+ *
+ * (C) 2012 by Hans Schillstrom <hans.schillstrom@ericsson.com>
+ * (C) 2012 by Pablo Neira Ayuso <pablo@netfilter.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/icmp.h>
+
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_HMARK.h>
+
+#include <net/ip.h>
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+#include <net/netfilter/nf_conntrack.h>
+#endif
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+#include <net/ipv6.h>
+#include <linux/netfilter_ipv6/ip6_tables.h>
+#endif
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Hans Schillstrom <hans.schillstrom@ericsson.com>");
+MODULE_DESCRIPTION("Xtables: packet marking using hash calculation");
+MODULE_ALIAS("ipt_HMARK");
+MODULE_ALIAS("ip6t_HMARK");
+
+struct hmark_tuple {
+	u32			src;
+	u32			dst;
+	union hmark_ports	uports;
+	uint8_t			proto;
+};
+
+static inline u32 hmark_addr6_mask(const __u32 *addr32, const __u32 *mask)
+{
+	return (addr32[0] & mask[0]) ^
+	       (addr32[1] & mask[1]) ^
+	       (addr32[2] & mask[2]) ^
+	       (addr32[3] & mask[3]);
+}
+
+static inline u32
+hmark_addr_mask(int l3num, const __u32 *addr32, const __u32 *mask)
+{
+	switch (l3num) {
+	case AF_INET:
+		return *addr32 & *mask;
+	case AF_INET6:
+		return hmark_addr6_mask(addr32, mask);
+	}
+	return 0;
+}
+
+static int
+hmark_ct_set_htuple(const struct sk_buff *skb, struct hmark_tuple *t,
+		    const struct xt_hmark_info *info)
+{
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
+	struct nf_conntrack_tuple *otuple;
+	struct nf_conntrack_tuple *rtuple;
+
+	if (ct == NULL || nf_ct_is_untracked(ct))
+		return -1;
+
+	otuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
+	rtuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
+
+	t->src = hmark_addr_mask(otuple->src.l3num, otuple->src.u3.all,
+				 info->src_mask.all);
+	t->dst = hmark_addr_mask(otuple->src.l3num, rtuple->src.u3.all,
+				 info->dst_mask.all);
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))
+		return 0;
+
+	t->proto = nf_ct_protonum(ct);
+	if (t->proto != IPPROTO_ICMP) {
+		t->uports.p16.src = otuple->src.u.all;
+		t->uports.p16.dst = rtuple->src.u.all;
+		t->uports.v32 = (t->uports.v32 & info->port_mask.v32) |
+				info->port_set.v32;
+		if (t->uports.p16.dst < t->uports.p16.src)
+			swap(t->uports.p16.dst, t->uports.p16.src);
+	}
+
+	return 0;
+#else
+	return -1;
+#endif
+}
+
+static inline u32
+hmark_hash(struct hmark_tuple *t, const struct xt_hmark_info *info)
+{
+	u32 hash;
+
+	if (t->dst < t->src)
+		swap(t->src, t->dst);
+
+	hash = jhash_3words(t->src, t->dst, t->uports.v32, info->hashrnd);
+	hash = hash ^ (t->proto & info->proto_mask);
+
+	return (hash % info->hmodulus) + info->hoffset;
+}
+
+static void
+hmark_set_tuple_ports(const struct sk_buff *skb, unsigned int nhoff,
+		      struct hmark_tuple *t, const struct xt_hmark_info *info)
+{
+	int protoff;
+
+	protoff = proto_ports_offset(t->proto);
+	if (protoff < 0)
+		return;
+
+	nhoff += protoff;
+	if (skb_copy_bits(skb, nhoff, &t->uports, sizeof(t->uports)) < 0)
+		return;
+
+	t->uports.v32 = (t->uports.v32 & info->port_mask.v32) |
+			info->port_set.v32;
+
+	if (t->uports.p16.dst < t->uports.p16.src)
+		swap(t->uports.p16.dst, t->uports.p16.src);
+}
+
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+static int get_inner6_hdr(const struct sk_buff *skb, int *offset)
+{
+	struct icmp6hdr *icmp6h, _ih6;
+
+	icmp6h = skb_header_pointer(skb, *offset, sizeof(_ih6), &_ih6);
+	if (icmp6h == NULL)
+		return 0;
+
+	if (icmp6h->icmp6_type && icmp6h->icmp6_type < 128) {
+		*offset += sizeof(struct icmp6hdr);
+		return 1;
+	}
+	return 0;
+}
+
+static int
+hmark_pkt_set_htuple_ipv6(const struct sk_buff *skb, struct hmark_tuple *t,
+			  const struct xt_hmark_info *info)
+{
+	struct ipv6hdr *ip6, _ip6;
+	int flag = IP6T_FH_F_AUTH; /* Ports offset, find_hdr flags */
+	unsigned int nhoff = 0;
+	u16 fragoff = 0;
+	int nexthdr;
+
+	ip6 = (struct ipv6hdr *) (skb->data + skb_network_offset(skb));
+	nexthdr = ipv6_find_hdr(skb, &nhoff, -1, &fragoff, &flag);
+	if (nexthdr < 0)
+		return 0;
+	/* No need to check for icmp errors on fragments */
+	if ((flag & IP6T_FH_F_FRAG) || (nexthdr != IPPROTO_ICMPV6))
+		goto noicmp;
+	/* if an icmp error, use the inner header */
+	if (get_inner6_hdr(skb, &nhoff)) {
+		ip6 = skb_header_pointer(skb, nhoff, sizeof(_ip6), &_ip6);
+		if (ip6 == NULL)
+			return -1;
+		/* Treat AH as ESP, use SPI nothing else. */
+		flag = IP6T_FH_F_AUTH;
+		nexthdr = ipv6_find_hdr(skb, &nhoff, -1, &fragoff, &flag);
+		if (nexthdr < 0)
+			return -1;
+	}
+noicmp:
+	t->src = hmark_addr6_mask(ip6->saddr.s6_addr32, info->src_mask.all);
+	t->dst = hmark_addr6_mask(ip6->daddr.s6_addr32, info->dst_mask.all);
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))
+		return 0;
+
+	t->proto = nexthdr;
+	if (t->proto == IPPROTO_ICMPV6)
+		return 0;
+
+	if (flag & IP6T_FH_F_FRAG)
+		return 0;
+
+	hmark_set_tuple_ports(skb, nhoff, t, info);
+	return 0;
+}
+
+static unsigned int
+hmark_tg_v6(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+	struct hmark_tuple t;
+
+	memset(&t, 0, sizeof(struct hmark_tuple));
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT)) {
+		if (hmark_ct_set_htuple(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	} else {
+		if (hmark_pkt_set_htuple_ipv6(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	}
+
+	skb->mark = hmark_hash(&t, info);
+	return XT_CONTINUE;
+}
+#endif
+
+static int get_inner_hdr(const struct sk_buff *skb, int iphsz, int *nhoff)
+{
+	const struct icmphdr *icmph;
+	struct icmphdr _ih;
+
+	/* Not enough header? */
+	icmph = skb_header_pointer(skb, *nhoff + iphsz, sizeof(_ih), &_ih);
+	if (icmph == NULL && icmph->type > NR_ICMP_TYPES)
+		return 0;
+
+	/* Error message? */
+	if (icmph->type != ICMP_DEST_UNREACH &&
+	    icmph->type != ICMP_SOURCE_QUENCH &&
+	    icmph->type != ICMP_TIME_EXCEEDED &&
+	    icmph->type != ICMP_PARAMETERPROB &&
+	    icmph->type != ICMP_REDIRECT)
+		return 0;
+
+	*nhoff += iphsz + sizeof(_ih);
+	return 1;
+}
+
+static int
+hmark_pkt_set_htuple_ipv4(const struct sk_buff *skb, struct hmark_tuple *t,
+			  const struct xt_hmark_info *info)
+{
+	struct iphdr *ip, _ip;
+	int nhoff = skb_network_offset(skb);
+
+	ip = (struct iphdr *) (skb->data + nhoff);
+	if (ip->protocol == IPPROTO_ICMP) {
+		/* use inner header in case of ICMP errors */
+		if (get_inner_hdr(skb, ip->ihl * 4, &nhoff)) {
+			ip = skb_header_pointer(skb, nhoff, sizeof(_ip), &_ip);
+			if (ip == NULL)
+				return -1;
+		}
+	}
+
+	t->src = (__force u32) ip->saddr;
+	t->dst = (__force u32) ip->daddr;
+
+	t->src &= info->src_mask.ip;
+	t->dst &= info->dst_mask.ip;
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))
+		return 0;
+
+	t->proto = ip->protocol;
+
+	/* ICMP has no ports, skip */
+	if (t->proto == IPPROTO_ICMP)
+		return 0;
+
+	/* follow-up fragments don't contain ports, skip all fragments */
+	if (ip->frag_off & htons(IP_MF | IP_OFFSET))
+		return 0;
+
+	hmark_set_tuple_ports(skb, (ip->ihl * 4) + nhoff, t, info);
+
+	return 0;
+}
+
+static unsigned int
+hmark_tg_v4(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+	struct hmark_tuple t;
+
+	memset(&t, 0, sizeof(struct hmark_tuple));
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT)) {
+		if (hmark_ct_set_htuple(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	} else {
+		if (hmark_pkt_set_htuple_ipv4(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	}
+
+	skb->mark = hmark_hash(&t, info);
+	return XT_CONTINUE;
+}
+
+static int hmark_tg_check(const struct xt_tgchk_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+
+	if (!info->hmodulus) {
+		pr_info("xt_HMARK: hash modulus can't be zero\n");
+		return -EINVAL;
+	}
+	if (info->proto_mask &&
+	    (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))) {
+		pr_info("xt_HMARK: proto mask must be zero with L3 mode\n");
+		return -EINVAL;
+	}
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPI_MASK) &&
+	    (info->flags & (XT_HMARK_FLAG(XT_HMARK_SPORT_MASK) |
+			     XT_HMARK_FLAG(XT_HMARK_DPORT_MASK)))) {
+		pr_info("xt_HMARK: spi-mask and port-mask can't be combined\n");
+		return -EINVAL;
+	}
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPI) &&
+	    (info->flags & (XT_HMARK_FLAG(XT_HMARK_SPORT) |
+			     XT_HMARK_FLAG(XT_HMARK_DPORT)))) {
+		pr_info("xt_HMARK: spi-set and port-set can't be combined\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static struct xt_target hmark_tg_reg[] __read_mostly = {
+	{
+		.name		= "HMARK",
+		.family		= NFPROTO_IPV4,
+		.target		= hmark_tg_v4,
+		.targetsize	= sizeof(struct xt_hmark_info),
+		.checkentry	= hmark_tg_check,
+		.me		= THIS_MODULE,
+	},
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+	{
+		.name		= "HMARK",
+		.family		= NFPROTO_IPV6,
+		.target		= hmark_tg_v6,
+		.targetsize	= sizeof(struct xt_hmark_info),
+		.checkentry	= hmark_tg_check,
+		.me		= THIS_MODULE,
+	},
+#endif
+};
+
+static int __init hmark_tg_init(void)
+{
+	return xt_register_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+}
+
+static void __exit hmark_tg_exit(void)
+{
+	xt_unregister_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+}
+
+module_init(hmark_tg_init);
+module_exit(hmark_tg_exit);
-- 
1.7.9.5


[-- Attachment #3: 0001-netfilter-userspace-part-for-target-HMARK.patch --]
[-- Type: text/x-patch, Size: 23501 bytes --]

From 6e59e43e0e275918ae2c307e46a5581d5587459b Mon Sep 17 00:00:00 2001
From: Hans Schillstrom <hans.schillstrom@ericsson.com>
Date: Tue, 8 May 2012 09:15:12 +0200
Subject: [PATCH] netfilter: userspace part for target HMARK

    The target allows you to create rules in the "raw" and "mangle" tables
    which alter the netfilter mark (nfmark) field within a given range.
    First a 32 bit hash value is generated then modulus by <limit> and
    finally an offset is added before it's written to nfmark.
    Prior to routing, the nfmark can influence the routing method (see
    "Use netfilter MARK value as routing key") and can also be used by
    other subsystems to change their behaviour.

    The mark match can also be used to match nfmark produced by this module.

Ver 13
    Name change of defines and spi / port check due to removal ov spi data.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 extensions/libxt_HMARK.c   |  522 ++++++++++++++++++++++++++++++++++++++++++++
 extensions/libxt_HMARK.man |   84 +++++++
 2 files changed, 606 insertions(+), 0 deletions(-)
 create mode 100644 extensions/libxt_HMARK.c
 create mode 100644 extensions/libxt_HMARK.man

diff --git a/extensions/libxt_HMARK.c b/extensions/libxt_HMARK.c
new file mode 100644
index 0000000..2442f05
--- /dev/null
+++ b/extensions/libxt_HMARK.c
@@ -0,0 +1,522 @@
+/*
+ * Shared library add-on to iptables to add HMARK target support.
+ *
+ * The kernel module calculates a hash value that can be modified by modulus
+ * and an offset. The hash value is based on a direction independent
+ * five tuple: src & dst addr src & dst ports and protocol.
+ * However src & dst port can be masked and are not used for fragmented
+ * packets, ESP and AH don't have ports so SPI will be used instead.
+ * For ICMP error messages the hash mark values will be calculated on
+ * the source packet i.e. the packet caused the error (If sufficient
+ * amount of data exists).
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <stdbool.h>
+#include <stdio.h>
+#include <string.h>
+
+#include "xtables.h"
+#include <linux/netfilter/xt_HMARK.h>
+
+
+#define DEF_HRAND 0xc175a3b8	/* Default "random" value to jhash */
+
+#define XT_F_HMARK_L4_OPTS \
+		(XT_HMARK_FLAG(XT_HMARK_SPI_MASK) |\
+		 XT_HMARK_FLAG(XT_HMARK_SPI) |\
+		 XT_HMARK_FLAG(XT_HMARK_SPORT_MASK) |\
+		 XT_HMARK_FLAG(XT_HMARK_SPORT) |\
+		 XT_HMARK_FLAG(XT_HMARK_DPORT_MASK) |\
+		 XT_HMARK_FLAG(XT_HMARK_DPORT) |\
+		 XT_HMARK_FLAG(XT_HMARK_PROTO_MASK))
+
+static void HMARK_help(void)
+{
+	printf(
+"HMARK target options, i.e. modify hash calculation by:\n"
+"  --hmark-method <method>          Overall L3/L4 and fragment behavior\n"
+"                 L3                Fragment safe, do not use ports or proto\n"
+"                                   i.e. Fragments don't need special care.\n"
+"                 L3-4 (Default)    Fragment unsafe, use ports and proto\n"
+"                                   if defrag off in conntrack\n"
+"                                      no hmark on any part of a fragment\n"
+"  Limit/modify the calculated hash mark by:\n"
+"  --hmark-mod value                nfmark modulus value\n"
+"  --hmark-offset value             Last action add value to nfmark\n\n"
+" Fine tuning of what will be included in hash calculation\n"
+"  --hmark-src-mask length          Source address mask length\n"
+"  --hmark-dst-mask length          Dest address mask length\n"
+"  --hmark-sport-mask value         Mask src port with value\n"
+"  --hmark-dport-mask value         Mask dst port with value\n"
+"  --hmark-spi-mask value           For esp and ah AND spi with value\n"
+"  --hmark-sport-set value          OR src port with value\n"
+"  --hmark-dport-set value          OR dst port with value\n"
+"  --hmark-spi-set value            For esp and ah OR spi with value\n"
+"  --hmark-proto-mask value         Mask Protocol with value\n"
+"  --hmark-rnd                      Initial Random value to hash cacl.\n"
+" For NAT in IPv4: src part from original/reply tuple will always be used\n"
+" i.e. orig src part will be used as src address/port.\n"
+"     reply src part will be used as dst address/port\n"
+" Make sure to qualify the rule in a proper way when using NAT flag\n"
+" When --ct is used only tracked connections will match\n"
+"  --hmark-ct                       Force conntrack orig and rely tuples as\n"
+"                                   source and destination.\n\n"
+" In many cases hmark can be omitted i.e. --src-mask can be used\n");
+}
+
+#define hi struct xt_hmark_info
+
+static const struct xt_option_entry HMARK_opts[] = {
+	{ .name  = "hmark-method",
+	  .type  = XTTYPE_STRING,
+	  .id    = XT_HMARK_METHOD_L3
+	},
+	{ .name  = "hmark-src-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_SADR_MASK,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, src_mask)
+	},
+	{ .name  = "hmark-dst-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_DADR_MASK,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, dst_mask)
+	},
+	{ .name  = "hmark-sport-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT_MASK,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.src)
+	},
+	{ .name  = "hmark-dport-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_DPORT_MASK,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.dst)
+	},
+	{ .name  = "hmark-spi-mask",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI_MASK,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.v32)
+	},
+	{ .name  = "hmark-sport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.src)
+	},
+	{ .name  = "hmark-dport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_DPORT,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.dst)
+	},
+	{ .name  = "hmark-spi-set",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.v32)
+	},
+	{ .name  = "hmark-proto-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_PROTO_MASK,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, proto_mask)
+	},
+	{ .name  = "hmark-rnd",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_RND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hashrnd)
+	},
+	{ .name = "hmark-mod",
+	  .type = XTTYPE_UINT32,
+	  .id = XT_HMARK_MODULUS,
+	  .min = 1,
+	  .flags = XTOPT_PUT | XTOPT_MAND,
+	  XTOPT_POINTER(hi, hmodulus)
+	},
+	{ .name  = "hmark-offset",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_OFFSET,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hoffset)
+	},
+	{ .name  = "hmark-ct",
+	  .type  = XTTYPE_NONE,
+	  .id    = XT_HMARK_CT
+	},
+
+	{ .name  = "method",
+	  .type  = XTTYPE_STRING,
+	  .id    = XT_HMARK_METHOD_L3
+	},
+	{ .name  = "src-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_SADR_MASK,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, src_mask)
+	},
+	{ .name  = "dst-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_DADR_MASK,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, dst_mask)
+	},
+	{ .name  = "sport-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT_MASK,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.src)
+	},
+	{ .name  = "dport-mask", .type = XTTYPE_UINT16,
+	  .id = XT_HMARK_DPORT_MASK,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.dst)
+	},
+	{ .name  = "spi-mask",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI_MASK,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.v32)
+	},
+	{ .name  = "sport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.src)
+	},
+	{ .name  = "dport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_DPORT,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.dst)
+	},
+	{ .name  = "spi-set",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.v32)
+	},
+	{ .name  = "proto-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_PROTO_MASK,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, proto_mask)
+	},
+	{ .name  = "rnd",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_RND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hashrnd)
+	},
+	{ .name  = "mod",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_MODULUS,
+	  .min   = 1,
+	  .flags = XTOPT_PUT,
+	  XTOPT_MAND, XTOPT_POINTER(hi, hmodulus)
+	},
+	{ .name  = "offset",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_OFFSET,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hoffset)
+	},
+	{ .name  = "ct",
+	  .type  = XTTYPE_NONE,
+	  .id    = XT_HMARK_CT
+	},
+	XTOPT_TABLEEND,
+};
+
+static void HMARK_parse(struct xt_option_call *cb, int plen)
+{
+	struct xt_hmark_info *info = cb->data;
+
+	if (!cb->xflags) {
+		memset(info, 0xff, sizeof(struct xt_hmark_info));
+		info->port_set.v32 = 0;
+		info->flags = 0;
+		info->hoffset = 0;
+		info->hashrnd = DEF_HRAND;
+	}
+	xtables_option_parse(cb);
+
+	switch (cb->entry->id) {
+	case XT_HMARK_SADR_MASK:
+		if (cb->val.hlen == plen)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_SADR_MASK);
+		break;
+	case XT_HMARK_DADR_MASK:
+		if (cb->val.hlen == plen)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_DADR_MASK);
+		break;
+	case XT_HMARK_SPI_MASK:
+		info->port_mask.v32 = htonl(cb->val.u32);
+		if (cb->val.u32 == 0xffffffff)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_SPI_MASK);
+		break;
+	case XT_HMARK_SPI:
+		info->port_set.v32 = htonl(cb->val.u32);
+		if (cb->val.u32 == 0)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_SPI);
+		break;
+	case XT_HMARK_SPORT_MASK:
+		info->port_mask.p16.src = htons(cb->val.u16);
+		if (cb->val.u16 == 0xffff)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_SPORT_MASK);
+		break;
+	case XT_HMARK_DPORT_MASK:
+		info->port_mask.p16.dst = htons(cb->val.u16);
+		if (cb->val.u16 == 0xffff)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_DPORT_MASK);
+		break;
+	case XT_HMARK_SPORT:
+		info->port_set.p16.src = htons(cb->val.u16);
+		if (cb->val.u16 == 0)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_SPORT);
+		break;
+	case XT_HMARK_DPORT:
+		info->port_set.p16.dst = htons(cb->val.u16);
+		if (cb->val.u16 == 0)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_DPORT);
+		break;
+	case XT_HMARK_PROTO_MASK:
+		if (cb->val.u16 == 0xffff)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_PROTO_MASK);
+		break;
+	case XT_HMARK_MODULUS:
+		if (info->hmodulus == 0) {
+			xtables_error(PARAMETER_PROBLEM,
+				      "xxx modulus 0 ? "
+				      "thats a div by 0");
+			info->hmodulus = 0xffffffff;
+		}
+		break;
+	case XT_HMARK_METHOD_L3:
+		if (strcmp(cb->arg, "L3") == 0) {
+			info->proto_mask = 0;
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_METHOD_L3_4);
+		} else if (strcmp(cb->arg, "L3-4") == 0) {
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_METHOD_L3);
+			cb->xflags |= XT_HMARK_FLAG(XT_HMARK_METHOD_L3_4);
+		}
+		break;
+	}
+	info->flags = cb->xflags;
+}
+
+static void HMARK_ip4_parse(struct xt_option_call *cb)
+{
+	HMARK_parse(cb, 32);
+}
+static void HMARK_ip6_parse(struct xt_option_call *cb)
+{
+	HMARK_parse(cb, 128);
+}
+
+static void HMARK_check(struct xt_fcheck_call *cb)
+{
+	if (!(cb->xflags & XT_HMARK_FLAG(XT_HMARK_MODULUS)))
+		xtables_error(PARAMETER_PROBLEM, "HMARK: the --hmark-mod, "
+			      "is not set, or zero wich is a div by zero");
+	/* Check for invalid options */
+	if (cb->xflags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3) &&
+	   (cb->xflags & XT_F_HMARK_L4_OPTS))
+		xtables_error(PARAMETER_PROBLEM, "HMARK: --hmark-method L3, "
+			      "can not be combined by an Layer 4 options: "
+			      "port, spi or proto ");
+	/* Check invalid mix of spi and ports since thye share data */
+	if (cb->xflags & XT_HMARK_FLAG(XT_HMARK_SPI_MASK) &&
+	    (cb->xflags & (XT_HMARK_FLAG(XT_HMARK_SPORT_MASK) |
+			   XT_HMARK_FLAG(XT_HMARK_DPORT_MASK))))
+		xtables_error(PARAMETER_PROBLEM, "HMARK: --hmark-spi-mask, "
+			      "can not be combined with port mask options ");
+
+	if (cb->xflags & XT_HMARK_FLAG(XT_HMARK_SPI) &&
+	    (cb->xflags & (XT_HMARK_FLAG(XT_HMARK_SPORT) |
+			   XT_HMARK_FLAG(XT_HMARK_DPORT))))
+		xtables_error(PARAMETER_PROBLEM, "HMARK: --hmark-spi-set, "
+			      "can not be combined with port set options ");
+}
+/*
+ * Common print for IPv4 & IPv6
+ */
+static void HMARK_print(const struct xt_hmark_info *info)
+{
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3)) {
+		printf("method L3 ");
+	} else {
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3_4))
+			printf("method L3-4 ");
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPORT_MASK))
+			printf("sport-mask 0x%x ",
+			       htons(info->port_mask.p16.src));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_DPORT_MASK))
+			printf("dport-mask 0x%x ",
+			       htons(info->port_mask.p16.dst));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPI_MASK))
+			printf("spi-mask 0x%x ", htonl(info->port_mask.v32));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPORT))
+			printf("sport-set 0x%x ",
+			       htons(info->port_set.p16.src));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_DPORT))
+			printf("dport-set 0x%x ",
+			       htons(info->port_set.p16.dst));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPI))
+			printf("spi-set 0x%x ", htonl(info->port_set.v32));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_PROTO_MASK))
+			printf("proto-mask 0x%x ", info->proto_mask);
+	}
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_RND))
+		printf("rnd 0x%x ", info->hashrnd);
+
+}
+
+static void HMARK_ip6_print(const void *ip,
+			    const struct xt_entry_target *target, int numeric)
+{
+	const struct xt_hmark_info *info =
+			(const struct xt_hmark_info *)target->data;
+
+	printf(" HMARK ");
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_MODULUS))
+		printf("%% 0x%x ", info->hmodulus);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_OFFSET))
+		printf("+ 0x%x ", info->hoffset);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT))
+		printf("ct, ");
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_SADR_MASK))
+		printf("src-mask %s ",
+		       xtables_ip6mask_to_numeric(&info->src_mask.in6) + 1);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_DADR_MASK))
+		printf("dst-mask %s ",
+		       xtables_ip6mask_to_numeric(&info->dst_mask.in6) + 1);
+	HMARK_print(info);
+}
+static void HMARK_ip4_print(const void *ip,
+			    const struct xt_entry_target *target, int numeric)
+{
+	const struct xt_hmark_info *info =
+		(const struct xt_hmark_info *)target->data;
+
+	printf(" HMARK ");
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_MODULUS))
+		printf("%% 0x%x ", info->hmodulus);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_OFFSET))
+		printf("+ 0x%x ", info->hoffset);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT))
+		printf("ct, ");
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_SADR_MASK))
+		printf("src-mask %s ",
+		       xtables_ipmask_to_numeric(&info->src_mask.in) + 1);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_DADR_MASK))
+		printf("dst-mask %s ",
+		       xtables_ipmask_to_numeric(&info->dst_mask.in) + 1);
+	HMARK_print(info);
+}
+static void HMARK_save(const struct xt_hmark_info *info)
+{
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3)) {
+		printf(" --hmark-method L3");
+	} else {
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3_4))
+			printf(" --hmark-method L3-4");
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPORT_MASK))
+			printf(" --hmark-sport-mask 0x%x",
+			       htons(info->port_mask.p16.src));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_DPORT_MASK))
+			printf(" --hmark-dport-mask 0x%x",
+			       htons(info->port_mask.p16.dst));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPI_MASK))
+			printf(" --hmark-spi-mask 0x%x",
+			       htonl(info->port_mask.v32));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPORT))
+			printf(" --hmark-sport-set 0x%x",
+			       htons(info->port_set.p16.src));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_DPORT))
+			printf(" --hmark-dport-set 0x%x",
+			       htons(info->port_set.p16.dst));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPI))
+			printf(" --hmark-spi-set 0x%x",
+			       htonl(info->port_set.v32));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_PROTO_MASK))
+			printf(" --hmark-proto-mask 0x%x", info->proto_mask);
+	}
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_RND))
+		printf(" --hmark-rnd 0x%x", info->hashrnd);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_MODULUS))
+		printf(" --hmark-mod 0x%x", info->hmodulus);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_OFFSET))
+		printf(" --hmark-offset 0x%x", info->hoffset);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT))
+		printf(" --hmark-ct");
+}
+
+static void HMARK_ip6_save(const void *ip, const struct xt_entry_target *target)
+{
+	const struct xt_hmark_info *info =
+		(const struct xt_hmark_info *)target->data;
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_SADR_MASK))
+		printf(" --hmark-src-mask %s",
+		       xtables_ip6mask_to_numeric(&info->src_mask.in6) + 1);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_DADR_MASK))
+		printf(" --hmark-dst-mask %s",
+		       xtables_ip6mask_to_numeric(&info->dst_mask.in6) + 1);
+	HMARK_save(info);
+}
+
+static void HMARK_ip4_save(const void *ip, const struct xt_entry_target *target)
+{
+	const struct xt_hmark_info *info =
+		(const struct xt_hmark_info *)target->data;
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_SADR_MASK))
+		printf(" --hmark-src-mask %s",
+		       xtables_ipmask_to_numeric(&info->src_mask.in) + 1);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_DADR_MASK))
+		printf(" --hmark-dst-mask %s",
+		       xtables_ipmask_to_numeric(&info->dst_mask.in) + 1);
+	HMARK_save(info);
+}
+
+static struct xtables_target mark_tg_reg[] = {
+	{
+		.family        = NFPROTO_IPV4,
+		.name          = "HMARK",
+		.version       = XTABLES_VERSION,
+		.revision      = 0,
+		.size          = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.help          = HMARK_help,
+		.print         = HMARK_ip4_print,
+		.save          = HMARK_ip4_save,
+		.x6_parse      = HMARK_ip4_parse,
+		.x6_fcheck     = HMARK_check,
+		.x6_options    = HMARK_opts,
+	},
+	{
+		.family        = NFPROTO_IPV6,
+		.name          = "HMARK",
+		.version       = XTABLES_VERSION,
+		.revision      = 0,
+		.size          = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.help          = HMARK_help,
+		.print         = HMARK_ip6_print,
+		.save          = HMARK_ip6_save,
+		.x6_parse      = HMARK_ip6_parse,
+		.x6_fcheck     = HMARK_check,
+		.x6_options    = HMARK_opts,
+	},
+};
+
+void _init(void)
+{
+	xtables_register_targets(mark_tg_reg, ARRAY_SIZE(mark_tg_reg));
+}
diff --git a/extensions/libxt_HMARK.man b/extensions/libxt_HMARK.man
new file mode 100644
index 0000000..92bd1ed
--- /dev/null
+++ b/extensions/libxt_HMARK.man
@@ -0,0 +1,84 @@
+This module does the same as MARK, i.e. set an fwmark, but the mark is based on a hash value.
+The hash is based on src-addr, dst-addr, sport, dport and proto. The same mark will be produced independent of direction if no masks is set or the same masks is used for src and dest.
+The hash mark could be adjusted by modulus and finally an offset could be added, i.e the final mark will be within a range.
+ICMP error will use the the original message for hash calculation not the icmp it self.
+
+Note: IPv4 packets with nf_defrag_ipv4 loaded will be defragmented before they reach hmark,
+      IPv6 nf_defrag is not implemented this way, hence fragmented ipv6 packets will reach hmark.
+      Default behavior is to completely ignore any fragment if it reach hmark.
+      --hmark-method L3 is fragment safe since neither ports or L4 protocol field is used.
+      None of the parameters effect the packet it self only the calculated hash value.
+
+.PP
+Parameters:
+Short hand methods
+.TP
+\fB\-\-hmark\-method\fP \fIL3\fP
+Do not use L4 protocol field, ports or spi, only Layer 3 addresses, mask length
+of L3 addresses can still be used. Fragment or not does not matter in
+this case since only L3 address can be used in calc. of hash value.
+.TP
+\fB\-\-hmark\-method\fP \fIL3-4\fP (Default)
+Include L4 in calculation. of hash value i.e. all masks below are valid.
+Fragments will be ignored. (i.e no hash value produced)
+.PP
+For all masks default is all "1:s", to disable a field use mask 0
+.TP
+\fB\-\-hmark\-src\-mask\fP \fIlength\fP
+The length of the mask to AND the source address with (saddr & value).
+.TP
+\fB\-\-hmark\-dst\-mask\fP \fIlength\fP
+The length of the mask to AND the dest. address with (daddr & value).
+.TP
+\fB\-\-hmark\-sport\-mask\fP \fIvalue\fP
+A 16 bit value to AND the src port with (sport & value).
+.TP
+\fB\-\-hmark\-dport\-mask\fP \fIvalue\fP
+A 16 bit value to AND the dest port with (dport & value).
+.TP
+\fB\-\-hmark\-sport\-set\fP \fIvalue\fP
+A 16 bit value to OR the src port with (sport | value).
+.TP
+\fB\-\-hmark\-dport\-set\fP \fIvalue\fP
+A 16 bit value to OR the dest port with (dport | value).
+.TP
+\fB\-\-hmark\-spi\-mask\fP \fIvalue\fP
+Value to AND the spi field with (spi & value) valid for proto esp or ah.
+.TP
+\fB\-\-hmark\-spi\-set\fP \fIvalue\fP
+Value to OR the spi field with (spi | value) valid for proto esp or ah.
+.TP
+\fB\-\-hmark\-proto\-mask\fP \fIvalue\fP
+An 8 bit value to AND the L4 proto field with (proto & value).
+.TP
+\fB\-\-hmark\-ct\fP
+When flag is set, conntrack data should be used. Useful when NAT internal addressed should be used in calculation.
+Be careful when using DNAT since mangle table is handled before nat table. I.e it will not work as expected to put HMARK in table mangle and PREROUTING chain. The initial packet will have it's hash based on the original address, while the rest of the flow will use the NAT:ed address.
+.TP
+\fB\-\-hmark\-rnd\fP \fIvalue\fP
+A 32 bit initial value for hash calc, default is 0xc175a3b8.
+.PP
+Final processing of the mark in order of execution.
+.TP
+\fB\-\-hmark\-mod\fP \fIvalue (must be > 0)\fP
+The easiest way to describe this is:  hash = hash mod <value>
+.TP
+\fB\-\-hmark\-offset\fP \fIvalue\fP
+The easiest way to describe this is:  hash = hash + <value>
+.PP
+\fIExamples:\fP
+.PP
+Default rule handles all TCP, UDP, SCTP, ESP & AH
+.IP
+iptables \-t mangle \-A PREROUTING \-m state \-\-state NEW,ESTABLISHED,RELATED
+ \-j HMARK \-\-hmark-offs 10000 \-\-hmark-mod 10
+.PP
+Handle SCTP and hash dest port only and produce a nfmark between 100-119.
+.IP
+iptables \-t mangle \-A PREROUTING -p SCTP \-j HMARK \-\-src\-mask 0 \-\-dst\-mask 0
+ \-\-sp\-mask 0 \-\-offset 100 \-\-mod 20
+.PP
+Fragment safe Layer 3 only that keep a class C network flow together
+.IP
+iptables \-t mangle \-A PREROUTING \-j HMARK \-\-method L3 \-\-src\-mask 24 \-\-mod 20 \-\-offset 100
+
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
  2012-05-08  7:37                         ` Hans Schillstrom
@ 2012-05-09 10:38                           ` Pablo Neira Ayuso
  2012-05-09 13:36                             ` Hans Schillstrom
  0 siblings, 1 reply; 21+ messages in thread
From: Pablo Neira Ayuso @ 2012-05-09 10:38 UTC (permalink / raw)
  To: Hans Schillstrom; +Cc: kaber, jengelh, netfilter-devel, netdev, hans

On Tue, May 08, 2012 at 09:37:35AM +0200, Hans Schillstrom wrote:
> From d5065af3988cc7561a02f30bae8342e1a89126a4 Mon Sep 17 00:00:00 2001
> From: Hans Schillstrom <hans.schillstrom@ericsson.com>
> Date: Wed, 2 May 2012 07:49:47 +0000
> Subject: netfilter: add xt_hmark target for hash-based skb
>  marking
> 
> The target allows you to create rules in the "raw" and "mangle" tables
> which set the skbuff mark by means of hash calculation within a given
> range. The nfmark can influence the routing method (see "Use netfilter
> MARK value as routing key") and can also be used by other subsystems to
> change their behaviour.
> 
> Some examples:
> 
> * Default rule handles all TCP, UDP, SCTP, ESP & AH
> 
>  iptables -t mangle -A PREROUTING -m state --state NEW,ESTABLISHED,RELATED \
> 	-j HMARK --hmark-offset 10000 --hmark-mod 10
> 
> * Handle SCTP and hash dest port only and produce a nfmark between 100-119.
> 
>  iptables -t mangle -A PREROUTING -p SCTP -j HMARK --src-mask 0 --dst-mask 0 \
> 	--sp-mask 0 --offset 100 --mod 20
> 
> * Fragment safe Layer 3 only, that keep a class C network flow together
> 
>  iptables -t mangle -A PREROUTING -j HMARK --method L3 \
> 	--src-mask 24 --mod 20 --offset 100

I have removed these examples. Just in case we make changes to the
user-space part. We'll have the time for this (the entire 3.5 cycle).

Some minor glitches I made on this patch:

>  include/linux/netfilter/xt_HMARK.h |   48 +++++
>  net/netfilter/Kconfig              |   15 ++
>  net/netfilter/Makefile             |    1 +
>  net/netfilter/xt_HMARK.c           |  358 ++++++++++++++++++++++++++++++++++++
>  4 files changed, 422 insertions(+)
>  create mode 100644 include/linux/netfilter/xt_HMARK.h
>  create mode 100644 net/netfilter/xt_HMARK.c
> 
> diff --git a/include/linux/netfilter/xt_HMARK.h b/include/linux/netfilter/xt_HMARK.h
> new file mode 100644
> index 0000000..05e43ba
> --- /dev/null
> +++ b/include/linux/netfilter/xt_HMARK.h
> @@ -0,0 +1,46 @@
> +#ifndef XT_HMARK_H_
> +#define XT_HMARK_H_
> +
> +#include <linux/types.h>
> +
> +enum {
> +	XT_HMARK_NONE,

this means (1 << 0) is unused. I have removed this _NONE.

> +	XT_HMARK_SADR_MASK,
> +	XT_HMARK_DADR_MASK,
> +	XT_HMARK_SPI_MASK,
> +	XT_HMARK_SPI,
> +	XT_HMARK_SPORT_MASK,
> +	XT_HMARK_DPORT_MASK,
> +	XT_HMARK_SPORT,
> +	XT_HMARK_DPORT,
> +	XT_HMARK_PROTO_MASK,
> +	XT_HMARK_RND,
> +	XT_HMARK_MODULUS,
> +	XT_HMARK_OFFSET,
> +	XT_HMARK_CT,
> +	XT_HMARK_METHOD_L3,
> +	XT_HMARK_METHOD_L3_4,

I have also rearrange the order of the flags:

enum {
        XT_HMARK_SADDR_MASK,
        XT_HMARK_DADDR_MASK,
        XT_HMARK_SPI,       
        XT_HMARK_SPI_MASK,  
        XT_HMARK_SPORT,     
        XT_HMARK_DPORT,     
        XT_HMARK_SPORT_MASK,
        XT_HMARK_DPORT_MASK,
        XT_HMARK_PROTO_MASK,
        XT_HMARK_RND,       
        XT_HMARK_MODULUS,   
        XT_HMARK_OFFSET,    
        XT_HMARK_CT,        
        XT_HMARK_METHOD_L3, 
        XT_HMARK_METHOD_L3_4,
};

I don't want people to ask me why we where using some strange order in
the flag definition in the future (yes, you'll have to recompile your
iptables HMARK support in your setups, sorry)

> +};
> +#define XT_HMARK_FLAG(flag)	(1 << flag)
> +
> +union hmark_ports {
> +	struct {
> +		__u16	src;
> +		__u16	dst;
> +	} p16;
> +	__u32	v32;
> +};
> +
> +struct xt_hmark_info {
> +	union nf_inet_addr	src_mask;	/* Source address mask */
> +	union nf_inet_addr	dst_mask;	/* Dest address mask */
> +	union hmark_ports	port_mask;
> +	union hmark_ports	port_set;
> +	__u32			flags;		/* Print out only */
> +	__u16			proto_mask;	/* L4 Proto mask */
> +	__u32			hashrnd;
> +	__u32			hmodulus;	/* Modulus */
> +	__u32			hoffset;	/* Offset */

I've removed these comments, they provide no extra information. Still
I left the one that described hoffset, that may seem not obvious.

> +#endif /* XT_HMARK_H_ */

I have applied this, I'm going to pass it to davem.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v12 PATCH 1/3] NETFILTER added flags to ipv6_find_hdr()
  2012-04-23 13:35 ` [v12 PATCH 1/3] NETFILTER added flags to ipv6_find_hdr() Hans Schillstrom
@ 2012-05-09 11:01   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 21+ messages in thread
From: Pablo Neira Ayuso @ 2012-05-09 11:01 UTC (permalink / raw)
  To: Hans Schillstrom; +Cc: kaber, jengelh, netfilter-devel, netdev, hans

I have applied this with minor changes.

BTW, please use the following patch tagging next time, I'll save time:

netfilter: ip6_tables: add flags parameter to ipv6_find_hdr()

note the initial netfilter, then ip6_tables, then the description.

This is useful for grepping.

More minor glitches:

On Mon, Apr 23, 2012 at 03:35:26PM +0200, Hans Schillstrom wrote:
> Two new flags to ipv6_find_hdr,
> One that tells us that this is a fragment.
> One that stops at AH if any i.e. treat it like a transport header.
> i.e. make handling of ESP and AH the same.
> Param offset can now point to an inner icmp ipv5 header.
> 
> Version 3:
>     offset param into ipv6_find_hdr set to zero.
> 
> Version 2:
>     wrapper removed and changes made at every call.
> 
> Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
> ---
>  include/linux/netfilter_ipv6/ip6_tables.h |   12 +++++++++-
>  net/ipv6/netfilter/ip6_tables.c           |   35 ++++++++++++++++++++++++----
>  net/ipv6/netfilter/ip6t_ah.c              |    4 +-
>  net/ipv6/netfilter/ip6t_frag.c            |    4 +-
>  net/ipv6/netfilter/ip6t_hbh.c             |    4 +-
>  net/ipv6/netfilter/ip6t_rt.c              |    4 +-
>  net/netfilter/xt_TPROXY.c                 |    4 +-
>  net/netfilter/xt_socket.c                 |    4 +-
>  8 files changed, 53 insertions(+), 18 deletions(-)
> 
> diff --git a/include/linux/netfilter_ipv6/ip6_tables.h b/include/linux/netfilter_ipv6/ip6_tables.h
> index 1bc898b..d96a39d 100644
> --- a/include/linux/netfilter_ipv6/ip6_tables.h
> +++ b/include/linux/netfilter_ipv6/ip6_tables.h
> @@ -287,6 +287,7 @@ extern unsigned int ip6t_do_table(struct sk_buff *skb,
>  				  struct xt_table *table);
>  
>  /* Check for an extension */
> +

removed this extra line.

>  static inline int
>  ip6t_ext_hdr(u8 nexthdr)
>  {	return (nexthdr == IPPROTO_HOPOPTS) ||
> @@ -298,9 +299,18 @@ ip6t_ext_hdr(u8 nexthdr)
>  	       (nexthdr == IPPROTO_DSTOPTS);
>  }
>  
> +

removed double extra line.

> +extern int ip6t_ext_hdr(u8 nexthdr);
> +enum {
> +	IP6T_FH_FRAG,
> +	IP6T_FH_AUTH,

removed these two above, the are not used anywhere in the code.

> +	IP6T_FH_F_FRAG = 1 << IP6T_FH_FRAG,
> +	IP6T_FH_F_AUTH = 1 << IP6T_FH_AUTH,
> +};
> +
>  /* find specified header and get offset to it */
>  extern int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
> -			 int target, unsigned short *fragoff);
> +			 int target, unsigned short *fragoff, int *fragflg);
>  
>  #ifdef CONFIG_COMPAT
>  #include <net/compat.h>
> diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
> index d4e350f..1f18662 100644
> --- a/net/ipv6/netfilter/ip6_tables.c
> +++ b/net/ipv6/netfilter/ip6_tables.c
> @@ -133,7 +133,7 @@ ip6_packet_match(const struct sk_buff *skb,
>  		int protohdr;
>  		unsigned short _frag_off;
>  
> -		protohdr = ipv6_find_hdr(skb, protoff, -1, &_frag_off);
> +		protohdr = ipv6_find_hdr(skb, protoff, -1, &_frag_off, NULL);
>  		if (protohdr < 0) {
>  			if (_frag_off == 0)
>  				*hotdrop = true;
> @@ -362,6 +362,7 @@ ip6t_do_table(struct sk_buff *skb,
>  		const struct xt_entry_match *ematch;
>  
>  		IP_NF_ASSERT(e);
> +		acpar.thoff = 0;
>  		if (!ip6_packet_match(skb, indev, outdev, &e->ipv6,
>  		    &acpar.thoff, &acpar.fragoff, &acpar.hotdrop)) {
>   no_match:
> @@ -2277,6 +2278,8 @@ static void __exit ip6_tables_fini(void)
>   * find the offset to specified header or the protocol number of last header
>   * if target < 0. "last header" is transport protocol header, ESP, or
>   * "No next header".
> + * Note, *offset is used as input param. an if != 0
> + * it must be an offset to an inner ipv6 header ex. icmp error
>   *
>   * If target header is found, its offset is set in *offset and return protocol
>   * number. Otherwise, return -1.
> @@ -2289,17 +2292,34 @@ static void __exit ip6_tables_fini(void)
>   * *offset is meaningless and fragment offset is stored in *fragoff if fragoff
>   * isn't NULL.
>   *
> + * if flags != NULL AND
> + *    it's a fragment the frag flag "IP6T_FH_F_FRAG" will be set
> + *    it's an AH header and IP6T_FH_F_AUTH is set and target < 0
> + *      stop at AH (i.e. treat is as a transport header)

I've cleaned up these comments. The format does not look very orthodox
(I'm not blaming your English, but the way the text is organized).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
  2012-05-09 10:38                           ` Pablo Neira Ayuso
@ 2012-05-09 13:36                             ` Hans Schillstrom
  0 siblings, 0 replies; 21+ messages in thread
From: Hans Schillstrom @ 2012-05-09 13:36 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: kaber, jengelh, netfilter-devel, netdev, hans

On Wednesday 09 May 2012 12:38:20 Pablo Neira Ayuso wrote:
> On Tue, May 08, 2012 at 09:37:35AM +0200, Hans Schillstrom wrote:
> > From d5065af3988cc7561a02f30bae8342e1a89126a4 Mon Sep 17 00:00:00 2001
> > From: Hans Schillstrom <hans.schillstrom@ericsson.com>
> > Date: Wed, 2 May 2012 07:49:47 +0000
> > Subject: netfilter: add xt_hmark target for hash-based skb
> >  marking
[snip]
> 
> I have applied this, I'm going to pass it to davem.

Thanks Pablo

-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-05-09 13:36 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-23 13:35 [v12 PATCH 0/3] NETFILTER new target module, HMARK Hans Schillstrom
2012-04-23 13:35 ` [v12 PATCH 1/3] NETFILTER added flags to ipv6_find_hdr() Hans Schillstrom
2012-05-09 11:01   ` Pablo Neira Ayuso
2012-04-23 13:35 ` [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark Hans Schillstrom
2012-05-02  0:34   ` Pablo Neira Ayuso
2012-05-02  7:55     ` Hans Schillstrom
2012-05-02  8:09       ` Pablo Neira Ayuso
2012-05-02 17:49         ` Hans Schillstrom
2012-05-06 22:57           ` Pablo Neira Ayuso
2012-05-07  8:20             ` Hans Schillstrom
2012-05-07  9:03               ` Pablo Neira Ayuso
2012-05-07  9:14                 ` Hans Schillstrom
2012-05-07 11:56                   ` Pablo Neira Ayuso
2012-05-07 12:09                     ` Hans Schillstrom
2012-05-07 12:22                       ` Pablo Neira Ayuso
2012-05-07 12:57                         ` Hans Schillstrom
2012-05-07 14:54                           ` Pablo Neira Ayuso
2012-05-08  7:37                         ` Hans Schillstrom
2012-05-09 10:38                           ` Pablo Neira Ayuso
2012-05-09 13:36                             ` Hans Schillstrom
2012-04-23 13:35 ` [v12 PATCH 3/3] NETFILTER userspace part for target HMARK Hans Schillstrom

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.