netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [v11 PATCH 0/3] NETFILTER new target module, HMARK
@ 2012-03-22 11:59 Hans Schillstrom
  2012-03-22 11:59 ` [v11 PATCH 1/3] NETFILTER added flags to ipv6_find_hdr() Hans Schillstrom
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Hans Schillstrom @ 2012-03-22 11:59 UTC (permalink / raw)
  To: kaber, pablo, jengelh, netfilter-devel, netdev; +Cc: hans, Hans Schillstrom

The target allows you to create rules in the "raw" and "mangle" tables
which alter the netfilter mark (nfmark) field within a given range.
First a 32 bit hash value is generated then modulus by <limit> and
finally an offset is added before it's written to nfmark.
Prior to routing, the nfmark can influence the routing method (see
"Use netfilter MARK value as routing key") and can also be used by
other subsystems to change their behavior.

The mark match can also be used to match nfmark produced by this module.
See the kernel module for more info.

REVISION
Version 11
	Changed two comments

Version 10
        Even more simplified NAT handling just one switch --hmark-ct
        some renaming and some minor changes.
        Renaming of vars in xt_hmark_info
        Adding helptext and updated man due to --hmark-ct switch
        Changes are based on Pablos review.

Version 9
        Simpliefied nat handling in IPv4, some formating
        checkentry() used in kernel. 
        Most changes are based on Pablos review.

Version 8
        method L3 / L3-4 added i.e. Fragment handling changed to:
        - don't handle in "method L3-4"
        Syntax change in user mode to be more NF compatible.
        Most changes are based on Pablos review.

Version 7
	ahuum, IPv6 descending into icmp error hdr didn't work as expected
        with ipv6_find_hdr() Now it works as expected.

Version 6
        Removed ipv6_find_hdr() wrapper (Pablo)
	NAT / Conntrack compilation switches.

Version 5
	Use length of mask an smask and dmask and whole IPv6 addr (Jan E)
	Modify ipv6_find_hdr() and use it while traversing the IPv6 header.
        Manual changes.
	More or less all comments implemented.

Version 4
	Split of IPv6 and IPv4, use IP_CT_IS_REPLY, as Pablo suggested.
	removed one pskb_may_pull()
	xtoption parse used in the user space part.

Version 3
        Handling of SCTP for IPv6 added.

Version 2
	NAT Added for IPv4
	IPv6 ICMP handling enhanced.
	Usage example added

Version 1
	Initial RFC


We (Ericsson) use hmark in-front of ipvs as a pre-loadbalancer and
handles up to 70 ipvs running in parallel in clusters.
However hmark is not restricted to run in front of IPVS it can also be used as
"poor mans" load balancer.
With this version is also NAT supported as an option, with very high flows
you might not want to use conntrack.

The idea is to generate a direction independent fw mark range to use as input to
the routing (i.e. ip rule add fwmark ...).
Pretty straight forward and simple.


Example:
                                      App Server (Real Server)

                                           +---------+
                                        -->| Service |
     Gateway A                             +---------+
                          /
            +----------+ /     +----+      +---------+
--- if -A---| selector |---->  |ipvs|  --->| Service |
            +----------+ \     +----+      +---------+
                          \
                               +----+      +---------+
                               |ipvs|   -->| Service |
                               +----+      +---------+
      Gateway C
            +----------+ /     +----+
--- if-B ---| selector | --->  |ipvs|
            +----------+ \     +----+      +---------+
                                           | Service |
                                           +---------+
                          /
            +----------+ /     +----+     ..
--- if-B ---| selector | --->  |ipvs|      +---------+
            +----------+ \     +----+      | Service |
                          \                +---------+
#
# Example with four ipvs loadbalancers
#
iptables -t mangle -I PREROUTING -d $IPADDR -j HMARK --hmark-mod 4 --hmark-offs 100

ip rule add fwmark 100 table 100
ip rule add fwmark 101 table 101
ip rule add fwmark 102 table 102
ip rule add fwmark 103 table 103

ip ro ad table 100 default via x.y.z.1 dev bond1
ip ro ad table 101 default via x.y.z.2 dev bond1
ip ro ad table 102 default via x.y.z.3 dev bond1
ip ro ad table 103 default via x.y.z.4 dev bond1


If conntrack doesn't handle the return path,
do the oposite with HMARK and send it back right to ipvs.

Another exmaple of usage could be if you have cluster originated connections
and want to spread the connections over a number of interfaces
(NAT will complpicate things for you in this case)



                     \  Blade 1
                      \ +----------+      +---------+
                    <-- | selector | <--- | Service |
                      / +----------+      +---------+
                     /
   +------+
-- | Gw-A |          \  Blade 2
   +------+           \ +----------+      +---------+
   +------+         <-- | selector | <--- | Service |
-- | Gw-B |           / +----------+      +---------+
   +------+          /
   +------+
-- | Gw-C |          \
   +------+           \ +----------+      +---------+
                    <-- | selector | <--- | Service |
                      / +----------+      +---------+
                     /

                     \  Blande -n
                      \ +----------+      +---------+
                    <-- | selector | <--- | Service |
                      / +----------+      +---------+
                     /


Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [v11 PATCH 1/3] NETFILTER added flags to ipv6_find_hdr()
  2012-03-22 11:59 [v11 PATCH 0/3] NETFILTER new target module, HMARK Hans Schillstrom
@ 2012-03-22 11:59 ` Hans Schillstrom
  2012-03-22 11:59 ` [v11 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark Hans Schillstrom
  2012-03-22 11:59 ` [v11 PATCH 3/3] NETFILTER userspace part for target HMARK Hans Schillstrom
  2 siblings, 0 replies; 6+ messages in thread
From: Hans Schillstrom @ 2012-03-22 11:59 UTC (permalink / raw)
  To: kaber, pablo, jengelh, netfilter-devel, netdev; +Cc: hans, Hans Schillstrom

Two new flags to ipv6_find_hdr,
One that tells us that this is a fragment.
One that stops at AH if any i.e. treat it like a transport header.
i.e. make handling of ESP and AH the same.
Param offset can now point to an inner icmp ipv5 header.

Version 3:
    offset param into ipv6_find_hdr set to zero.

Version 2:
    wrapper removed and changes made at every call.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/linux/netfilter_ipv6/ip6_tables.h |    8 +++++-
 net/ipv6/netfilter/ip6_tables.c           |   35 ++++++++++++++++++++++++----
 net/ipv6/netfilter/ip6t_ah.c              |    4 +-
 net/ipv6/netfilter/ip6t_frag.c            |    4 +-
 net/ipv6/netfilter/ip6t_hbh.c             |    4 +-
 net/ipv6/netfilter/ip6t_rt.c              |    4 +-
 net/netfilter/xt_TPROXY.c                 |    4 +-
 net/netfilter/xt_socket.c                 |    4 +-
 8 files changed, 49 insertions(+), 18 deletions(-)

diff --git a/include/linux/netfilter_ipv6/ip6_tables.h b/include/linux/netfilter_ipv6/ip6_tables.h
index f549adc..e1ad013 100644
--- a/include/linux/netfilter_ipv6/ip6_tables.h
+++ b/include/linux/netfilter_ipv6/ip6_tables.h
@@ -288,9 +288,15 @@ extern unsigned int ip6t_do_table(struct sk_buff *skb,
 
 /* Check for an extension */
 extern int ip6t_ext_hdr(u8 nexthdr);
+enum {
+	IP6T_FH_FRAG,
+	IP6T_FH_AUTH,
+	IP6T_FH_F_FRAG = 1 << IP6T_FH_FRAG,
+	IP6T_FH_F_AUTH = 1 << IP6T_FH_AUTH,
+};
 /* find specified header and get offset to it */
 extern int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
-			 int target, unsigned short *fragoff);
+			 int target, unsigned short *fragoff, int *fragflg);
 
 #ifdef CONFIG_COMPAT
 #include <net/compat.h>
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 94874b0..9dab6a8 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -146,7 +146,7 @@ ip6_packet_match(const struct sk_buff *skb,
 		int protohdr;
 		unsigned short _frag_off;
 
-		protohdr = ipv6_find_hdr(skb, protoff, -1, &_frag_off);
+		protohdr = ipv6_find_hdr(skb, protoff, -1, &_frag_off, NULL);
 		if (protohdr < 0) {
 			if (_frag_off == 0)
 				*hotdrop = true;
@@ -375,6 +375,7 @@ ip6t_do_table(struct sk_buff *skb,
 		const struct xt_entry_match *ematch;
 
 		IP_NF_ASSERT(e);
+		acpar.thoff = 0;
 		if (!ip6_packet_match(skb, indev, outdev, &e->ipv6,
 		    &acpar.thoff, &acpar.fragoff, &acpar.hotdrop)) {
  no_match:
@@ -2290,6 +2291,8 @@ static void __exit ip6_tables_fini(void)
  * find the offset to specified header or the protocol number of last header
  * if target < 0. "last header" is transport protocol header, ESP, or
  * "No next header".
+ * Note, *offset is used as input param. an if != 0
+ * it must be an offset to an inner ipv6 header ex. icmp error
  *
  * If target header is found, its offset is set in *offset and return protocol
  * number. Otherwise, return -1.
@@ -2302,17 +2305,34 @@ static void __exit ip6_tables_fini(void)
  * *offset is meaningless and fragment offset is stored in *fragoff if fragoff
  * isn't NULL.
  *
+ * if flags != NULL AND
+ *    it's a fragment the frag flag "IP6T_FH_F_FRAG" will be set
+ *    it's an AH header and IP6T_FH_F_AUTH is set and target < 0
+ *      stop at AH (i.e. treat is as a transport header)
  */
 int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
-		  int target, unsigned short *fragoff)
+		  int target, unsigned short *fragoff, int *flags)
 {
 	unsigned int start = skb_network_offset(skb) + sizeof(struct ipv6hdr);
 	u8 nexthdr = ipv6_hdr(skb)->nexthdr;
-	unsigned int len = skb->len - start;
+	unsigned int len;
 
 	if (fragoff)
 		*fragoff = 0;
 
+	if (*offset) {
+		struct ipv6hdr _ip6, *ip6;
+
+		ip6 = skb_header_pointer(skb, *offset, sizeof(_ip6), &_ip6);
+		if (!ip6 || (ip6->version != 6)) {
+			printk(KERN_ERR "IPv6 header not found\n");
+			return -EBADMSG;
+		}
+		start = *offset + sizeof(struct ipv6hdr);
+		nexthdr = ip6->nexthdr;
+	}
+	len = skb->len - start;
+
 	while (nexthdr != target) {
 		struct ipv6_opt_hdr _hdr, *hp;
 		unsigned int hdrlen;
@@ -2329,6 +2349,9 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
 		if (nexthdr == NEXTHDR_FRAGMENT) {
 			unsigned short _frag_off;
 			__be16 *fp;
+
+			if (flags)	/* Indicate that this is a fragment */
+				*flags |= IP6T_FH_F_FRAG;
 			fp = skb_header_pointer(skb,
 						start+offsetof(struct frag_hdr,
 							       frag_off),
@@ -2349,9 +2372,11 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
 				return -ENOENT;
 			}
 			hdrlen = 8;
-		} else if (nexthdr == NEXTHDR_AUTH)
+		} else if (nexthdr == NEXTHDR_AUTH) {
+			if (flags && (*flags & IP6T_FH_F_AUTH) && (target < 0))
+				break;
 			hdrlen = (hp->hdrlen + 2) << 2;
-		else
+		} else
 			hdrlen = ipv6_optlen(hp);
 
 		nexthdr = hp->nexthdr;
diff --git a/net/ipv6/netfilter/ip6t_ah.c b/net/ipv6/netfilter/ip6t_ah.c
index 89cccc5..04099ab 100644
--- a/net/ipv6/netfilter/ip6t_ah.c
+++ b/net/ipv6/netfilter/ip6t_ah.c
@@ -41,11 +41,11 @@ static bool ah_mt6(const struct sk_buff *skb, struct xt_action_param *par)
 	struct ip_auth_hdr _ah;
 	const struct ip_auth_hdr *ah;
 	const struct ip6t_ah *ahinfo = par->matchinfo;
-	unsigned int ptr;
+	unsigned int ptr = 0;
 	unsigned int hdrlen = 0;
 	int err;
 
-	err = ipv6_find_hdr(skb, &ptr, NEXTHDR_AUTH, NULL);
+	err = ipv6_find_hdr(skb, &ptr, NEXTHDR_AUTH, NULL, NULL);
 	if (err < 0) {
 		if (err != -ENOENT)
 			par->hotdrop = true;
diff --git a/net/ipv6/netfilter/ip6t_frag.c b/net/ipv6/netfilter/ip6t_frag.c
index eda898f..3b5735e 100644
--- a/net/ipv6/netfilter/ip6t_frag.c
+++ b/net/ipv6/netfilter/ip6t_frag.c
@@ -40,10 +40,10 @@ frag_mt6(const struct sk_buff *skb, struct xt_action_param *par)
 	struct frag_hdr _frag;
 	const struct frag_hdr *fh;
 	const struct ip6t_frag *fraginfo = par->matchinfo;
-	unsigned int ptr;
+	unsigned int ptr = 0;
 	int err;
 
-	err = ipv6_find_hdr(skb, &ptr, NEXTHDR_FRAGMENT, NULL);
+	err = ipv6_find_hdr(skb, &ptr, NEXTHDR_FRAGMENT, NULL, NULL);
 	if (err < 0) {
 		if (err != -ENOENT)
 			par->hotdrop = true;
diff --git a/net/ipv6/netfilter/ip6t_hbh.c b/net/ipv6/netfilter/ip6t_hbh.c
index 59df051..01df142 100644
--- a/net/ipv6/netfilter/ip6t_hbh.c
+++ b/net/ipv6/netfilter/ip6t_hbh.c
@@ -50,7 +50,7 @@ hbh_mt6(const struct sk_buff *skb, struct xt_action_param *par)
 	const struct ipv6_opt_hdr *oh;
 	const struct ip6t_opts *optinfo = par->matchinfo;
 	unsigned int temp;
-	unsigned int ptr;
+	unsigned int ptr = 0;
 	unsigned int hdrlen = 0;
 	bool ret = false;
 	u8 _opttype;
@@ -62,7 +62,7 @@ hbh_mt6(const struct sk_buff *skb, struct xt_action_param *par)
 
 	err = ipv6_find_hdr(skb, &ptr,
 			    (par->match == &hbh_mt6_reg[0]) ?
-			    NEXTHDR_HOP : NEXTHDR_DEST, NULL);
+			    NEXTHDR_HOP : NEXTHDR_DEST, NULL, NULL);
 	if (err < 0) {
 		if (err != -ENOENT)
 			par->hotdrop = true;
diff --git a/net/ipv6/netfilter/ip6t_rt.c b/net/ipv6/netfilter/ip6t_rt.c
index d8488c5..2c99b94 100644
--- a/net/ipv6/netfilter/ip6t_rt.c
+++ b/net/ipv6/netfilter/ip6t_rt.c
@@ -42,14 +42,14 @@ static bool rt_mt6(const struct sk_buff *skb, struct xt_action_param *par)
 	const struct ipv6_rt_hdr *rh;
 	const struct ip6t_rt *rtinfo = par->matchinfo;
 	unsigned int temp;
-	unsigned int ptr;
+	unsigned int ptr = 0;
 	unsigned int hdrlen = 0;
 	bool ret = false;
 	struct in6_addr _addr;
 	const struct in6_addr *ap;
 	int err;
 
-	err = ipv6_find_hdr(skb, &ptr, NEXTHDR_ROUTING, NULL);
+	err = ipv6_find_hdr(skb, &ptr, NEXTHDR_ROUTING, NULL, NULL);
 	if (err < 0) {
 		if (err != -ENOENT)
 			par->hotdrop = true;
diff --git a/net/netfilter/xt_TPROXY.c b/net/netfilter/xt_TPROXY.c
index 35a959a..146033a 100644
--- a/net/netfilter/xt_TPROXY.c
+++ b/net/netfilter/xt_TPROXY.c
@@ -282,10 +282,10 @@ tproxy_tg6_v1(struct sk_buff *skb, const struct xt_action_param *par)
 	struct sock *sk;
 	const struct in6_addr *laddr;
 	__be16 lport;
-	int thoff;
+	int thoff = 0;
 	int tproto;
 
-	tproto = ipv6_find_hdr(skb, &thoff, -1, NULL);
+	tproto = ipv6_find_hdr(skb, &thoff, -1, NULL, NULL);
 	if (tproto < 0) {
 		pr_debug("unable to find transport header in IPv6 packet, dropping\n");
 		return NF_DROP;
diff --git a/net/netfilter/xt_socket.c b/net/netfilter/xt_socket.c
index 72bb07f..9ea482d 100644
--- a/net/netfilter/xt_socket.c
+++ b/net/netfilter/xt_socket.c
@@ -263,10 +263,10 @@ socket_mt6_v1(const struct sk_buff *skb, struct xt_action_param *par)
 	struct sock *sk;
 	struct in6_addr *daddr, *saddr;
 	__be16 dport, sport;
-	int thoff, tproto;
+	int thoff = 0, tproto;
 	const struct xt_socket_mtinfo1 *info = (struct xt_socket_mtinfo1 *) par->matchinfo;
 
-	tproto = ipv6_find_hdr(skb, &thoff, -1, NULL);
+	tproto = ipv6_find_hdr(skb, &thoff, -1, NULL, NULL);
 	if (tproto < 0) {
 		pr_debug("unable to find transport header in IPv6 packet, dropping\n");
 		return NF_DROP;
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [v11 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
  2012-03-22 11:59 [v11 PATCH 0/3] NETFILTER new target module, HMARK Hans Schillstrom
  2012-03-22 11:59 ` [v11 PATCH 1/3] NETFILTER added flags to ipv6_find_hdr() Hans Schillstrom
@ 2012-03-22 11:59 ` Hans Schillstrom
  2012-04-12 15:54   ` Pablo Neira Ayuso
  2012-03-22 11:59 ` [v11 PATCH 3/3] NETFILTER userspace part for target HMARK Hans Schillstrom
  2 siblings, 1 reply; 6+ messages in thread
From: Hans Schillstrom @ 2012-03-22 11:59 UTC (permalink / raw)
  To: kaber, pablo, jengelh, netfilter-devel, netdev; +Cc: hans, Hans Schillstrom

The target allows you to create rules in the "raw" and "mangle" tables
which alter the netfilter mark (nfmark) field within a given range.
First a 32 bit hash value is generated then modulus by <limit> and
finally an offset is added before it's written to nfmark.
Prior to routing, the nfmark can influence the routing method (see
"Use netfilter MARK value as routing key") and can also be used by
other subsystems to change their behavior.

man page
   HMARK
       This  module  does  the  same  as MARK, i.e. set an fwmark, but the mark
       is based on a hash value.  The hash is based on saddr, daddr, sport,
       dport and proto. The same mark will be produced independent of direction
       if no masks is set or the same masks is used for src and dest.
       The hash mark could be adjusted by modulus and finally an offset could
       be added, i.e the final mark will be within a range. ICMP error will use
       the the original message for hash calculation not the icmp it self.

       Note: IPv4 packets with nf_defrag_ipv4 loaded will be defragmented before they reach hmark,
             IPv6 nf_defrag is not implemented this way, hence fragmented ipv6 packets will reach hmark.
             Default behavior is to completely ignore any fragment if it reach hmark.
             --hmark-method L3 is fragment safe since neither ports or L4 protocol field is used.
             None of the parameters effect the packet it self only the calculated hash value.

       Parameters: Short hand methods

       --hmark-method L3
              Do not use L4 protocol field, ports or spi, only Layer 3 addresses,
              mask length of L3 addresses can still be used. Fragment or not
              does not matter in this case since only L3 address can be used in
              calc. of hash value.

       --hmark-method L3-4 (Default)
              Include  L4  in  calculation. of hash value i.e. all masks below are valid.
              Fragments will be ignored. (i.e no hash value produced)

       For all masks default is all "1:s", to disable a field use mask 0

       --hmark-src-mask length
              The length of the mask to AND the source address with (saddr & value).

       --hmark-dst-mask length
              The length of the mask to AND the dest. address with (daddr & value).

       --hmark-sport-mask value
              A 16 bit value to AND the src port with (sport & value).

       --hmark-dport-mask value
              A 16 bit value to AND the dest port with (dport & value).

       --hmark-sport-set value
              A 16 bit value to OR the src port with (sport | value).

       --hmark-dport-set value
              A 16 bit value to OR the dest port with (dport | value).

       --hmark-spi-mask value
              Value to AND the spi field with (spi & value) valid for proto esp or ah.

       --hmark-spi-set value
              Value to OR the spi field with (spi | value) valid for proto esp or ah.

       --hmark-proto-mask value
              An 8 bit value to AND the L4 proto field with (proto & value).

       --hmark-ct
              When flag is set, conntrack data should be used. Useful when NAT internal
              addressed should be used in calculation.  Be careful when using DNAT
              since mangle table is handled before nat table. I.e it will not work as
              expected to put HMARK in table mangle and PREROUTING chain. The  initial
              packet will have it's hash based on the original address,
              while the rest of the flow will use the NAT:ed address.

       --hmark-rnd value
              A 32 bit initial value for hash calc, default is 0xc175a3b8.

       Final processing of the mark in order of execution.

       --hmark-mod value (must be > 0)
              The easiest way to describe this is:  hash = hash mod <value>

       --hmark-offset value
              The easiest way to describe this is:  hash = hash + <value>

       Examples:

       Default rule handles all TCP, UDP, SCTP, ESP & AH

              iptables -t mangle -A PREROUTING -m state --state NEW,ESTABLISHED,RELATED
               -j HMARK --hmark-offset 10000 --hmark-mod 10

       Handle SCTP and hash dest port only and produce a nfmark between 100-119.

              iptables -t mangle -A PREROUTING -p SCTP -j HMARK --src-mask 0 --dst-mask 0
               --sp-mask 0 --offset 100 --mod 20

       Fragment safe Layer 3 only, that keep a class C network flow together

              iptables -t mangle -A PREROUTING -j HMARK --method L3 --src-mask 24 --mod 20 --offset 100

Rev 11
    Two comments changed
Rev 10
     Even more simplified NAT handling just one switch --hmark-ct
     some renaming and some minor changes.
     Changes are based on Pablos review.

Rev 9
      Simplified NAT selections, cleanup of comments, added checkentry()
      change of #ifdef to #if IS_ENABLED and dependency.
      Some minor formating.
      Most changes are based on Pablos review.

Rev 8
      method L3 / L3-4 added i.e. Fragment handling changed to
      don't handle in "method L3-4"
      Syntax change in user mode more NF compatible.
      Most changes are based on Pablos review.

Rev 7
      IPv6 descending into icmp error hdr didn't work as expected
      with ipv6_find_hdr() Now it works as expected.

Rev 6
      Compile options with or without conntrack fixed.
      __ipv6_find_hdr() replaced by ipv6_find_hdr()

Rev 5
      IPv6 rewritten uses __ipv6_find_hdr() (P. Mc Hardy)
      Full mask and address used for IPv6 smask and dmask (J.Engelhart)
      Changes due to comments by Pablo Neira Ayuso  and Eric Dumazet
      i.e uses of skb_header_pointer() and Null check of info->hmod
      Man page changes

Rev 4
      different targets for IPv4 and IPv6
      Changes based on review by Pablo.

Rev 3
      Support added to SCTP for IPv6
Rev 2
      IPv6 header scan changed to follow RFC 2640
      IPv4 icmp echo fragmented does now use proto as ipv6
      IPv6 pskb_may_pull() check is done in every time in header loop.
      IPv4 nat support added.
      default added in IPv6 loop and null check of hp

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/linux/netfilter/xt_HMARK.h |   62 +++++++
 net/netfilter/Kconfig              |   18 ++
 net/netfilter/Makefile             |    1 +
 net/netfilter/xt_HMARK.c           |  319 ++++++++++++++++++++++++++++++++++++
 4 files changed, 400 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_HMARK.h
 create mode 100644 net/netfilter/xt_HMARK.c

diff --git a/include/linux/netfilter/xt_HMARK.h b/include/linux/netfilter/xt_HMARK.h
new file mode 100644
index 0000000..cdf4a8f
--- /dev/null
+++ b/include/linux/netfilter/xt_HMARK.h
@@ -0,0 +1,62 @@
+#ifndef XT_HMARK_H_
+#define XT_HMARK_H_
+
+#include <linux/types.h>
+
+enum {
+	XT_HMARK_NONE,
+	XT_HMARK_SADR_AND,
+	XT_HMARK_DADR_AND,
+	XT_HMARK_SPI_AND,
+	XT_HMARK_SPI_OR,
+	XT_HMARK_SPORT_AND,
+	XT_HMARK_DPORT_AND,
+	XT_HMARK_SPORT_OR,
+	XT_HMARK_DPORT_OR,
+	XT_HMARK_PROTO_AND,
+	XT_HMARK_RND,
+	XT_HMARK_MODULUS,
+	XT_HMARK_OFFSET,
+	XT_HMARK_CT,
+	XT_HMARK_METHOD_L3,
+	XT_HMARK_METHOD_L3_4,
+	XT_F_HMARK_SADR_AND    = 1 << XT_HMARK_SADR_AND,
+	XT_F_HMARK_DADR_AND    = 1 << XT_HMARK_DADR_AND,
+	XT_F_HMARK_SPI_AND     = 1 << XT_HMARK_SPI_AND,
+	XT_F_HMARK_SPI_OR      = 1 << XT_HMARK_SPI_OR,
+	XT_F_HMARK_SPORT_AND   = 1 << XT_HMARK_SPORT_AND,
+	XT_F_HMARK_DPORT_AND   = 1 << XT_HMARK_DPORT_AND,
+	XT_F_HMARK_SPORT_OR    = 1 << XT_HMARK_SPORT_OR,
+	XT_F_HMARK_DPORT_OR    = 1 << XT_HMARK_DPORT_OR,
+	XT_F_HMARK_PROTO_AND   = 1 << XT_HMARK_PROTO_AND,
+	XT_F_HMARK_RND         = 1 << XT_HMARK_RND,
+	XT_F_HMARK_MODULUS     = 1 << XT_HMARK_MODULUS,
+	XT_F_HMARK_OFFSET      = 1 << XT_HMARK_OFFSET,
+	XT_F_HMARK_CT          = 1 << XT_HMARK_CT,
+	XT_F_HMARK_METHOD_L3   = 1 << XT_HMARK_METHOD_L3,
+	XT_F_HMARK_METHOD_L3_4 = 1 << XT_HMARK_METHOD_L3_4,
+};
+
+union hmark_ports {
+	struct {
+		__u16	src;
+		__u16	dst;
+	} p16;
+	__u32	v32;
+};
+
+struct xt_hmark_info {
+	union nf_inet_addr	src_mask;	/* Source address mask */
+	union nf_inet_addr	dst_mask;	/* Dest address mask */
+	union hmark_ports	port_mask;
+	union hmark_ports	port_set;
+	__u32			spi_mask;
+	__u32			spi_set;
+	__u32			flags;		/* Print out only */
+	__u16			proto_mask;	/* L4 Proto mask */
+	__u32			hashrnd;
+	__u32			hmodulus;	/* Modulus */
+	__u32			hoffset;	/* Offset */
+};
+
+#endif /* XT_HMARK_H_ */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index f8ac4ef..a775804 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -488,6 +488,24 @@ config NETFILTER_XT_TARGET_HL
 	since you can easily create immortal packets that loop
 	forever on the network.
 
+config NETFILTER_XT_TARGET_HMARK
+	tristate '"HMARK" target support'
+	depends on (IP6_NF_IPTABLES || IP6_NF_IPTABLES=n)
+	depends on NETFILTER_ADVANCED
+	---help---
+	This option adds the "HMARK" target.
+
+	The target allows you to create rules in the "raw" and "mangle" tables
+	which alter the netfilter mark (nfmark) field within a given range.
+	First a 32 bit hash value is generated then modulus by <limit> and
+	finally an offset is added before it's written to nfmark.
+
+	Prior to routing, the nfmark can influence the routing method (see
+	"Use netfilter MARK value as routing key") and can also be used by
+	other subsystems to change their behavior.
+
+	The mark match can also be used to match nfmark produced by this module.
+
 config NETFILTER_XT_TARGET_IDLETIMER
 	tristate  "IDLETIMER target support"
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 40f4c3d..2712ba0 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -57,6 +57,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_CONNSECMARK) += xt_CONNSECMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_CT) += xt_CT.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_DSCP) += xt_DSCP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_HL) += xt_HL.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_HMARK) += xt_HMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_LED) += xt_LED.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_NFLOG) += xt_NFLOG.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_NFQUEUE) += xt_NFQUEUE.o
diff --git a/net/netfilter/xt_HMARK.c b/net/netfilter/xt_HMARK.c
new file mode 100644
index 0000000..d90549d
--- /dev/null
+++ b/net/netfilter/xt_HMARK.c
@@ -0,0 +1,319 @@
+/*
+ * xt_hmark - Netfilter module to set mark as hash value
+ *
+ * (C) 2012 Hans Schillstrom <hans.schillstrom@ericsson.com>
+ *
+ *Description:
+ *	This module calculates a hash value that can be modified by modulus
+ *	and an offset, i.e. it is possible to produce a skb->mark within a range
+ *	The hash value is based on a direction independent five tuple:
+ *	src & dst addr src & dst ports and protocol.
+ *	There is two distinct modes for hash calculation:
+ *
+ *	This program is free software; you can redistribute it and/or modify
+ *	it under the terms of the GNU General Public License version 2 as
+ *	published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <net/ip.h>
+#include <linux/icmp.h>
+
+#include <linux/netfilter/xt_HMARK.h>
+#include <linux/netfilter/x_tables.h>
+#include <net/netfilter/nf_conntrack.h>
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+#include <net/ipv6.h>
+#include <linux/netfilter_ipv6/ip6_tables.h>
+#endif
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Hans Schillstrom <hans.schillstrom@ericsson.com>");
+MODULE_DESCRIPTION("Xtables: Packet range mark operations by Hash value");
+MODULE_ALIAS("ipt_HMARK");
+MODULE_ALIAS("ip6t_HMARK");
+
+/*
+ * ICMP, get header offset if icmp error
+ */
+static int get_inner_hdr(struct sk_buff *skb, int iphsz, int *nhoff)
+{
+	const struct icmphdr *icmph;
+	struct icmphdr _ih;
+
+	/* Not enough header? */
+	icmph = skb_header_pointer(skb, *nhoff + iphsz, sizeof(_ih), &_ih);
+	if (icmph == NULL && icmph->type > NR_ICMP_TYPES)
+		return 0;
+
+	/* Error message? */
+	if (icmph->type != ICMP_DEST_UNREACH &&
+	    icmph->type != ICMP_SOURCE_QUENCH &&
+	    icmph->type != ICMP_TIME_EXCEEDED &&
+	    icmph->type != ICMP_PARAMETERPROB &&
+	    icmph->type != ICMP_REDIRECT)
+		return 0;
+
+	*nhoff += iphsz + sizeof(_ih);
+	return 1;
+}
+
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+/*
+ * Get ipv6 header offset if icmp error
+ */
+static int get_inner6_hdr(struct sk_buff *skb, int *offset)
+{
+	struct icmp6hdr *icmp6h, _ih6;
+
+	icmp6h = skb_header_pointer(skb, *offset, sizeof(_ih6), &_ih6);
+	if (icmp6h == NULL)
+		return 0;
+
+	if (icmp6h->icmp6_type && icmp6h->icmp6_type < 128) {
+		*offset += sizeof(struct icmp6hdr);
+		return 1;
+	}
+	return 0;
+}
+/*
+ * Calculate hash based fw-mark, on the five tuple if possible.
+ * special cases :
+ *  - Fragments do not use ports not even on the first fragment,
+ *    nf_defrag_ipv6.ko don't defrag for us like it do in ipv4.
+ *    This might be changed in the future.
+ *  - On ICMP errors the inner header will be used.
+ *  - Tunnels no ports
+ *  - ESP & AH uses SPI
+ * @returns XT_CONTINUE
+ */
+static unsigned int
+hmark_v6(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+	struct ipv6hdr *ip6, _ip6;
+	int poff, flag = IP6T_FH_F_AUTH; /* Ports offset, find_hdr flags */
+	union hmark_ports uports;
+	u32 addr_src, addr_dst, hash, nhoffs = 0;
+	u16 fragoff = 0;
+	u8 nexthdr;
+
+	ip6 = (struct ipv6hdr *) (skb->data + skb_network_offset(skb));
+	nexthdr = ipv6_find_hdr(skb, &nhoffs, -1, &fragoff, &flag);
+	if (nexthdr < 0)
+		return XT_CONTINUE;
+	/* No need to check for icmp errors on fragments */
+	if ((flag & IP6T_FH_F_FRAG) || (nexthdr != IPPROTO_ICMPV6))
+		goto noicmp;
+	/* if an icmp error, use the inner header */
+	if (get_inner6_hdr(skb, &nhoffs)) {
+		ip6 = skb_header_pointer(skb, nhoffs, sizeof(_ip6), &_ip6);
+		if (!ip6)
+			return XT_CONTINUE;
+		/* Treat AH as ESP, use SPI nothing else. */
+		flag = IP6T_FH_F_AUTH;
+		nexthdr = ipv6_find_hdr(skb, &nhoffs, -1, &fragoff, &flag);
+		if (nexthdr < 0)
+			return XT_CONTINUE;
+	}
+noicmp:
+	addr_src = (__force u32)
+		(ip6->saddr.s6_addr32[0] & info->src_mask.in6.s6_addr32[0]) ^
+		(ip6->saddr.s6_addr32[1] & info->src_mask.in6.s6_addr32[1]) ^
+		(ip6->saddr.s6_addr32[2] & info->src_mask.in6.s6_addr32[2]) ^
+		(ip6->saddr.s6_addr32[3] & info->src_mask.in6.s6_addr32[3]);
+	addr_dst = (__force u32)
+		(ip6->daddr.s6_addr32[0] & info->dst_mask.in6.s6_addr32[0]) ^
+		(ip6->daddr.s6_addr32[1] & info->dst_mask.in6.s6_addr32[1]) ^
+		(ip6->daddr.s6_addr32[2] & info->dst_mask.in6.s6_addr32[2]) ^
+		(ip6->daddr.s6_addr32[3] & info->dst_mask.in6.s6_addr32[3]);
+
+	uports.v32 = 0;
+	if ((info->flags & XT_F_HMARK_METHOD_L3) ||
+	    (nexthdr == IPPROTO_ICMPV6))
+		goto no_ports;
+	/* Is next header valid for port or SPI calculation ? */
+	poff = proto_ports_offset(nexthdr);
+	if ((flag & IP6T_FH_F_FRAG) || poff < 0)
+		return XT_CONTINUE;
+
+	nhoffs += poff;
+	if (skb_copy_bits(skb, nhoffs, &uports, sizeof(uports)) < 0)
+		return XT_CONTINUE;
+
+	if ((nexthdr == IPPROTO_ESP) || (nexthdr == IPPROTO_AH))
+		uports.v32 = (uports.v32 & info->spi_mask) | info->spi_set;
+	else {
+		uports.v32 = (uports.v32 & info->port_mask.v32) |
+			      info->port_set.v32;
+		/* get a consistent hash (same value in any flow dirs.) */
+		if (uports.p16.dst < uports.p16.src)
+			swap(uports.p16.dst, uports.p16.src);
+	}
+
+no_ports:
+	nexthdr &= info->proto_mask;
+	/* get a consistent hash (same value in any flow direction) */
+	if (addr_dst < addr_src)
+		swap(addr_src, addr_dst);
+
+	hash = jhash_3words(addr_src, addr_dst, uports.v32, info->hashrnd) ^ nexthdr;
+	skb->mark = (hash % info->hmodulus) + info->hoffset;
+	return XT_CONTINUE;
+}
+#endif
+/*
+ * Calculate hash based fw-mark, on the five tuple if possible.
+ * special cases :
+ *  - Fragments do not use ports not even on the first fragment,
+ *    unless nf_defrag_xx.ko is used.
+ *  - On ICMP errors the inner header will be used.
+ *  - Tunnels no ports
+ *  - ESP & AH uses SPI
+ * @returns XT_CONTINUE
+ */
+static unsigned int
+hmark_v4(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+	struct iphdr *ip, _ip;
+	int nhoff, poff, frag = 0;
+	union hmark_ports uports;
+	u32 addr_src, addr_dst, hash;
+	u8 ip_proto;
+
+	nhoff = skb_network_offset(skb);
+	ip = (struct iphdr *) (skb->data + nhoff);
+	if (ip->protocol == IPPROTO_ICMP) {
+		/* if an icmp error, calc hash on inner header */
+		if (get_inner_hdr(skb, ip->ihl * 4, &nhoff)) {
+			ip = skb_header_pointer(skb, nhoff, sizeof(_ip), &_ip);
+			if (!ip)
+				return XT_CONTINUE;
+		}
+	}
+
+	ip_proto = ip->protocol;
+	if (ip->frag_off & htons(IP_MF | IP_OFFSET))
+		frag = 1;
+
+	addr_src = (__force u32) ip->saddr;
+	addr_dst = (__force u32) ip->daddr;
+	uports.v32 = 0;
+/* conntrack take care of ICMP relation */
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+	if (info->flags & XT_F_HMARK_CT) {
+		struct nf_conntrack_tuple *otuple;
+		struct nf_conntrack_tuple *rtuple;
+		enum ip_conntrack_info ctinfo;
+		struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
+
+		if (!ct || nf_ct_is_untracked(ct))
+			return XT_CONTINUE;
+
+		otuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
+		rtuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
+
+		addr_src       = (__force u32)otuple->src.u3.in.s_addr;
+		uports.p16.src = otuple->src.u.udp.port;
+		addr_dst       = (__force u32)rtuple->src.u3.in.s_addr;
+		uports.p16.dst = rtuple->src.u.udp.port;
+	}
+#endif
+	addr_src &= info->src_mask.ip;
+	addr_dst &= info->dst_mask.ip;
+
+	if ((info->flags & XT_F_HMARK_METHOD_L3) ||
+	    (ip_proto == IPPROTO_ICMP)) {
+		uports.v32 = 0;
+		goto noports;
+	}
+	/* Check if ports can be used in hash calculation. */
+	poff = proto_ports_offset(ip_proto);
+	if (frag || poff < 0)
+		return XT_CONTINUE;
+
+	/* if --ct not given, get ports from skb */
+	if (!uports.v32) {
+		nhoff += (ip->ihl * 4) + poff;
+		if (skb_copy_bits(skb, nhoff, &uports, sizeof(uports)) < 0)
+			return XT_CONTINUE;
+	}
+
+	if (ip_proto == IPPROTO_ESP || ip_proto == IPPROTO_AH)
+		uports.v32 = (uports.v32 & info->spi_mask) | info->spi_set;
+	else {
+		uports.v32 = (uports.v32 & info->port_mask.v32) |
+				info->port_set.v32;
+		/* get a consistent hash (same value in any flow dirs.) */
+		if (uports.p16.dst < uports.p16.src)
+			swap(uports.p16.src, uports.p16.dst);
+	}
+
+noports:
+	/* get a consistent hash (same value in any flow direction) */
+	if (addr_dst < addr_src)
+		swap(addr_src, addr_dst);
+
+	hash = jhash_3words(addr_src, addr_dst, uports.v32, info->hashrnd);
+	hash = hash ^ (ip_proto & info->proto_mask);
+	skb->mark = (hash % info->hmodulus) + info->hoffset;
+	return XT_CONTINUE;
+}
+
+static int hmark_check(const struct xt_tgchk_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+
+	if (!info->hmodulus) {
+		pr_info("HMARK: hmark-mod can't be zero\n");
+		return -EINVAL;
+	}
+	if (info->proto_mask && (info->flags & XT_F_HMARK_METHOD_L3)) {
+		pr_info("HMARK: When method L3 proto mask must be zero\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static struct xt_target hmark_tg_reg[] __read_mostly = {
+	{
+		.name           = "HMARK",
+		.revision       = 0,
+		.family         = NFPROTO_IPV4,
+		.target         = hmark_v4,
+		.targetsize     = sizeof(struct xt_hmark_info),
+		.checkentry     = hmark_check,
+		.me             = THIS_MODULE,
+	},
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+	{
+		.name           = "HMARK",
+		.revision       = 0,
+		.family         = NFPROTO_IPV6,
+		.target         = hmark_v6,
+		.targetsize     = sizeof(struct xt_hmark_info),
+		.checkentry     = hmark_check,
+		.me             = THIS_MODULE,
+	},
+#endif
+};
+
+static int __init hmark_mt_init(void)
+{
+	int ret;
+
+	ret = xt_register_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+	if (ret < 0)
+		return ret;
+	return 0;
+}
+
+static void __exit hmark_mt_exit(void)
+{
+	xt_unregister_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+}
+
+module_init(hmark_mt_init);
+module_exit(hmark_mt_exit);
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [v11 PATCH 3/3] NETFILTER userspace part for target HMARK
  2012-03-22 11:59 [v11 PATCH 0/3] NETFILTER new target module, HMARK Hans Schillstrom
  2012-03-22 11:59 ` [v11 PATCH 1/3] NETFILTER added flags to ipv6_find_hdr() Hans Schillstrom
  2012-03-22 11:59 ` [v11 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark Hans Schillstrom
@ 2012-03-22 11:59 ` Hans Schillstrom
  2 siblings, 0 replies; 6+ messages in thread
From: Hans Schillstrom @ 2012-03-22 11:59 UTC (permalink / raw)
  To: kaber, pablo, jengelh, netfilter-devel, netdev; +Cc: hans, Hans Schillstrom

    The target allows you to create rules in the "raw" and "mangle" tables
    which alter the netfilter mark (nfmark) field within a given range.
    First a 32 bit hash value is generated then modulus by <limit> and
    finally an offset is added before it's written to nfmark.
    Prior to routing, the nfmark can influence the routing method (see
    "Use netfilter MARK value as routing key") and can also be used by
    other subsystems to change their behaviour.

    The mark match can also be used to match nfmark produced by this module.

Ver 10
    conntrack reduced to --hmark-ct switch
    renaming of vars in xt_hmark_info
    Adding helptext and updated man due to --hmark-ct switch

Ver 9
    Formating changes.

Ver 8
    Syntax changes more descriptive options
    --hmark-method added.

Ver 6-7 -

Ver 5
      smask and dmask changed to length

Ver 4
      xtoptions used for parsing.

Ver 3
       -

Ver 2
      IPv4 NAT added
      iptables ver 1.4.12.1 adaptions.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 extensions/libxt_HMARK.c           |  469 ++++++++++++++++++++++++++++++++++++
 extensions/libxt_HMARK.man         |   84 +++++++
 include/linux/netfilter/xt_HMARK.h |   62 +++++
 3 files changed, 615 insertions(+), 0 deletions(-)
 create mode 100644 extensions/libxt_HMARK.c
 create mode 100644 extensions/libxt_HMARK.man
 create mode 100644 include/linux/netfilter/xt_HMARK.h

diff --git a/extensions/libxt_HMARK.c b/extensions/libxt_HMARK.c
new file mode 100644
index 0000000..d657d85
--- /dev/null
+++ b/extensions/libxt_HMARK.c
@@ -0,0 +1,469 @@
+/*
+ * Shared library add-on to iptables to add HMARK target support.
+ *
+ * The kernel module calculates a hash value that can be modified by modulus
+ * and an offset. The hash value is based on a direction independent
+ * five tuple: src & dst addr src & dst ports and protocol.
+ * However src & dst port can be masked and are not used for fragmented
+ * packets, ESP and AH don't have ports so SPI will be used instead.
+ * For ICMP error messages the hash mark values will be calculated on
+ * the source packet i.e. the packet caused the error (If sufficient
+ * amount of data exists).
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <stdbool.h>
+#include <stdio.h>
+#include <string.h>
+
+#include "xtables.h"
+#include <linux/netfilter/xt_HMARK.h>
+
+
+#define DEF_HRAND 0xc175a3b8	/* Default "random" value to jhash */
+
+#define XT_F_HMARK_L4_OPTS  (XT_F_HMARK_SPI_AND | XT_F_HMARK_SPI_OR\
+			     | XT_F_HMARK_SPORT_AND | XT_F_HMARK_SPORT_OR\
+			     | XT_F_HMARK_DPORT_AND | XT_F_HMARK_DPORT_OR\
+			     | XT_F_HMARK_PROTO_AND)
+
+static void HMARK_help(void)
+{
+	printf(
+"HMARK target options, i.e. modify hash calculation by:\n"
+"  --hmark-method <method>          Overall L3/L4 and fragment behavior\n"
+"                 L3                Fragment safe, do not use ports or proto\n"
+"                                   i.e. Fragments don't need special care.\n"
+"                 L3-4 (Default)    Fragment unsafe, use ports and proto\n"
+"                                   if defrag off in conntrack\n"
+"                                      no hmark on any part of a fragment\n"
+"  Limit/modify the calculated hash mark by:\n"
+"  --hmark-mod value                nfmark modulus value\n"
+"  --hmark-offset value             Last action add value to nfmark\n\n"
+" Fine tuning of what will be included in hash calculation\n"
+"  --hmark-src-mask length          Source address mask length\n"
+"  --hmark-dst-mask length          Dest address mask length\n"
+"  --hmark-sport-mask value         Mask src port with value\n"
+"  --hmark-dport-mask value         Mask dst port with value\n"
+"  --hmark-spi-mask value           For esp and ah AND spi with value\n"
+"  --hmark-sport-set value          OR src port with value\n"
+"  --hmark-dport-set value          OR dst port with value\n"
+"  --hmark-spi-set value            For esp and ah OR spi with value\n"
+"  --hmark-proto-mask value         Mask Protocol with value\n"
+"  --hmark-rnd                      Initial Random value to hash cacl.\n"
+" For NAT in IPv4: src part from original/reply tuple will always be used\n"
+" i.e. orig src part will be used as src address/port.\n"
+"     reply src part will be used as dst address/port\n"
+" Make sure to qualify the rule in a proper way when using NAT flag\n"
+" When --ct is used only tracked connections will match\n"
+"  --hmark-ct                       Force conntrack orig and rely tuples as\n"
+"                                   source and destination.\n\n"
+" In many cases hmark can be omitted i.e. --src-mask can be used\n");
+}
+
+#define hi struct xt_hmark_info
+
+static const struct xt_option_entry HMARK_opts[]= {
+ 	{ .name  = "hmark-method",
+	  .type  = XTTYPE_STRING,
+	  .id    = XT_HMARK_METHOD_L3
+	},
+	{ .name  = "hmark-src-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_SADR_AND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, src_mask)
+	},
+	{ .name  = "hmark-dst-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_DADR_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, dst_mask)
+	},
+	{ .name  = "hmark-sport-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.src)
+	},
+	{ .name  = "hmark-dport-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_DPORT_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.dst)
+	},
+	{ .name  = "hmark-spi-mask",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, spi_mask)
+	},
+	{ .name  = "hmark-sport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.src)
+	},
+	{ .name  = "hmark-dport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_DPORT_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.dst)
+	},
+	{ .name  = "hmark-spi-set",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, spi_set)
+	},
+	{ .name  = "hmark-proto-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_PROTO_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, proto_mask)
+	},
+	{ .name  = "hmark-rnd",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_RND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hashrnd)
+	},
+	{ .name = "hmark-mod",
+	  .type = XTTYPE_UINT32,
+	  .id = XT_HMARK_MODULUS,
+	  .min = 1,
+	  .flags = XTOPT_PUT | XTOPT_MAND,
+	  XTOPT_POINTER(hi, hmodulus)
+	},
+	{ .name  = "hmark-offset",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_OFFSET,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hoffset)
+	},
+	{ .name  = "hmark-ct",
+	  .type  = XTTYPE_NONE,
+	  .id    = XT_HMARK_CT
+	},
+
+	{ .name  = "method",
+	  .type  = XTTYPE_STRING,
+	  .id    = XT_HMARK_METHOD_L3
+	},
+	{ .name  = "src-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_SADR_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, src_mask)
+	},
+	{ .name  = "dst-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_DADR_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, dst_mask)
+	},
+	{ .name  = "sport-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.src)
+	},
+	{ .name  = "dport-mask", .type = XTTYPE_UINT16,
+	  .id = XT_HMARK_DPORT_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.dst)
+	},
+	{ .name  = "spi-mask",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, spi_mask)
+	},
+	{ .name  = "sport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.src)
+	},
+	{ .name  = "dport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_DPORT_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.dst)
+	},
+	{ .name  = "spi-set",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, spi_set)
+	},
+	{ .name  = "proto-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_PROTO_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, proto_mask)
+	},
+	{ .name  = "rnd",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_RND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hashrnd)
+	},
+	{ .name  = "mod",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_MODULUS,
+	  .min   = 1,
+	  .flags = XTOPT_PUT,
+	  XTOPT_MAND, XTOPT_POINTER(hi, hmodulus)
+	},
+	{ .name  = "offset",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_OFFSET,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hoffset)
+	},
+	{ .name  = "ct",
+	  .type  = XTTYPE_NONE,
+	  .id    = XT_HMARK_CT
+	},
+	XTOPT_TABLEEND,
+};
+
+static void HMARK_parse(struct xt_option_call *cb)
+{
+	struct xt_hmark_info *info = cb->data;
+
+	if (!cb->xflags) {
+		memset(info, 0xff, sizeof(struct xt_hmark_info));
+		info->port_set.v32 = 0;
+		info->flags = 0;
+		info->spi_set = 0;
+		info->hoffset = 0;
+		info->hashrnd = DEF_HRAND;
+	}
+	xtables_option_parse(cb);
+
+	switch (cb->entry->id) {
+	case XT_HMARK_SPI_AND:
+		info->spi_mask = htonl(cb->val.u32);
+		break;
+	case XT_HMARK_SPI_OR:
+		info->spi_set = htonl(cb->val.u32);
+		break;
+	case XT_HMARK_SPORT_AND:
+		info->port_mask.p16.src = htons(cb->val.u16);
+		break;
+	case XT_HMARK_DPORT_AND:
+		info->port_mask.p16.dst = htons(cb->val.u16);
+		break;
+	case XT_HMARK_SPORT_OR:
+		info->port_set.p16.src = htons(cb->val.u16);
+		break;
+	case XT_HMARK_DPORT_OR:
+		info->port_set.p16.dst = htons(cb->val.u16);
+		break;
+	case XT_HMARK_MODULUS:
+		if (info->hmodulus == 0) {
+			xtables_error(PARAMETER_PROBLEM,
+				      "xxx modulus 0 ? "
+				      "thats a div by 0");
+			info->hmodulus = 0xffffffff;
+		}
+		break;
+	case XT_HMARK_METHOD_L3:
+		if (strcmp(cb->arg, "L3") == 0) {
+			info->proto_mask = 0;
+			cb->xflags &= ~XT_F_HMARK_METHOD_L3_4;
+		} else if (strcmp(cb->arg, "L3-4") == 0) {
+			cb->xflags &= ~XT_F_HMARK_METHOD_L3;
+			cb->xflags |= XT_F_HMARK_METHOD_L3_4;
+		}
+	}
+	info->flags = cb->xflags;
+}
+
+static void HMARK_check(struct xt_fcheck_call *cb)
+{
+	if (!(cb->xflags & XT_F_HMARK_MODULUS))
+		xtables_error(PARAMETER_PROBLEM, "HMARK: the --hmark-mod, "
+			      "is not set, or zero wich is a div by zero");
+	/* Check for invalid options */
+	if (cb->xflags & XT_F_HMARK_METHOD_L3 &&
+	   (cb->xflags & XT_F_HMARK_L4_OPTS))
+		xtables_error(PARAMETER_PROBLEM, "HMARK: --hmark-method L3, "
+			      "can not be combined by an Layer 4 options: "
+		              "port, spi or proto ");
+}
+/*
+ * Common print for IPv4 & IPv6
+ */
+static void HMARK_print(const struct xt_hmark_info *info)
+{
+	if (info->flags & XT_F_HMARK_METHOD_L3) {
+		printf("method L3 ");
+	} else {
+		if (info->flags & XT_F_HMARK_METHOD_L3_4)
+			printf("method L3-4 ");
+		if (info->flags & XT_F_HMARK_SPORT_AND)
+			printf("sport-mask 0x%x ", htons(info->port_mask.p16.src));
+		if (info->flags & XT_F_HMARK_DPORT_AND)
+			printf("dport-mask 0x%x ", htons(info->port_mask.p16.dst));
+		if (info->flags & XT_F_HMARK_SPI_AND)
+			printf("spi-mask 0x%x ", htonl(info->spi_mask));
+		if (info->flags & XT_F_HMARK_SPORT_OR)
+			printf("sport-set 0x%x ", htons(info->port_set.p16.src));
+		if (info->flags & XT_F_HMARK_DPORT_OR)
+			printf("dport-set 0x%x ", htons(info->port_set.p16.dst));
+		if (info->flags & XT_F_HMARK_SPI_OR)
+			printf("spi-set 0x%x ", htonl(info->spi_set));
+		if (info->flags & XT_F_HMARK_PROTO_AND)
+			printf("proto-mask 0x%x ", info->proto_mask);
+	}
+	if (info->flags & XT_F_HMARK_RND)
+		printf("rnd 0x%x ", info->hashrnd);
+
+}
+
+static void HMARK_ip6_print(const void *ip,
+			    const struct xt_entry_target *target, int numeric)
+{
+	const struct xt_hmark_info *info =
+			(const struct xt_hmark_info *)target->data;
+
+	printf(" HMARK ");
+	if (info->flags & XT_F_HMARK_MODULUS)
+		printf("%% 0x%x ", info->hmodulus);
+	if (info->flags & XT_F_HMARK_OFFSET)
+		printf("+ 0x%x ", info->hoffset);
+	if (info->flags & XT_F_HMARK_CT)
+		printf("ct, ");
+	if (info->flags & XT_F_HMARK_SADR_AND)
+		printf("src-mask %s ",
+		       xtables_ip6mask_to_numeric(&info->src_mask.in6) + 1);
+	if (info->flags & XT_F_HMARK_DADR_AND)
+		printf("dst-mask %s ",
+		       xtables_ip6mask_to_numeric(&info->dst_mask.in6) + 1);
+	HMARK_print(info);
+}
+static void HMARK_ip4_print(const void *ip,
+			    const struct xt_entry_target *target, int numeric)
+{
+	const struct xt_hmark_info *info =
+		(const struct xt_hmark_info *)target->data;
+
+	printf(" HMARK ");
+	if (info->flags & XT_F_HMARK_MODULUS)
+		printf("%% 0x%x ", info->hmodulus);
+	if (info->flags & XT_F_HMARK_OFFSET)
+		printf("+ 0x%x ", info->hoffset);
+	if (info->flags & XT_F_HMARK_CT)
+		printf("ct, ");
+	if (info->flags & XT_F_HMARK_SADR_AND)
+		printf("src-mask %s ",
+		       xtables_ipmask_to_numeric(&info->src_mask.in) + 1);
+	if (info->flags & XT_F_HMARK_DADR_AND)
+		printf("dst-mask %s ",
+		       xtables_ipmask_to_numeric(&info->dst_mask.in) + 1);
+	HMARK_print(info);
+}
+static void HMARK_save(const struct xt_hmark_info *info)
+{
+	if (info->flags & XT_F_HMARK_METHOD_L3) {
+		printf(" --hmark-method L3");
+	} else {
+		if (info->flags & XT_F_HMARK_METHOD_L3_4)
+			printf(" --hmark-method L3-4");
+		if (info->flags & XT_F_HMARK_SPORT_AND)
+			printf(" --hmark-sport-mask 0x%x",
+			       htons(info->port_mask.p16.src));
+		if (info->flags & XT_F_HMARK_DPORT_AND)
+			printf(" --hmark-dport-mask 0x%x",
+			       htons(info->port_mask.p16.dst));
+		if (info->flags & XT_F_HMARK_SPI_AND)
+			printf(" --hmark-spi-mask 0x%x",
+			       htonl(info->spi_mask));
+		if (info->flags & XT_F_HMARK_SPORT_OR)
+			printf(" --hmark-sport-set 0x%x",
+			       htons(info->port_set.p16.src));
+		if (info->flags & XT_F_HMARK_DPORT_OR)
+			printf(" --hmark-dport-set 0x%x",
+			       htons(info->port_set.p16.dst));
+		if (info->flags & XT_F_HMARK_SPI_OR)
+			printf(" --hmark-spi-set 0x%x", htonl(info->spi_set));
+		if (info->flags & XT_F_HMARK_PROTO_AND)
+			printf(" --hmark-proto-mask 0x%x", info->proto_mask);
+	}
+	if (info->flags & XT_F_HMARK_RND)
+		printf(" --hmark-rnd 0x%x", info->hashrnd);
+	if (info->flags & XT_F_HMARK_MODULUS)
+		printf(" --hmark-mod 0x%x", info->hmodulus);
+	if (info->flags & XT_F_HMARK_OFFSET)
+		printf(" --hmark-offset 0x%x", info->hoffset);
+	if (info->flags & XT_F_HMARK_CT)
+		printf(" --hmark-ct");
+}
+
+static void HMARK_ip6_save(const void *ip, const struct xt_entry_target *target)
+{
+	const struct xt_hmark_info *info =
+		(const struct xt_hmark_info *)target->data;
+
+	if (info->flags & XT_F_HMARK_SADR_AND)
+		printf(" --hmark-src-mask %s",
+		       xtables_ip6mask_to_numeric(&info->src_mask.in6) + 1);
+	if (info->flags & XT_F_HMARK_DADR_AND)
+		printf(" --hmark-dst-mask %s",
+		       xtables_ip6mask_to_numeric(&info->dst_mask.in6) + 1);
+	HMARK_save(info);
+}
+
+static void HMARK_ip4_save(const void *ip, const struct xt_entry_target *target)
+{
+	const struct xt_hmark_info *info =
+		(const struct xt_hmark_info *)target->data;
+
+	if (info->flags & XT_F_HMARK_SADR_AND)
+		printf(" --hmark-src-mask %s",
+		       xtables_ipmask_to_numeric(&info->src_mask.in) + 1);
+	if (info->flags & XT_F_HMARK_DADR_AND)
+		printf(" --hmark-dst-mask %s",
+		       xtables_ipmask_to_numeric(&info->dst_mask.in) + 1);
+	HMARK_save(info);
+}
+
+static struct xtables_target mark_tg_reg[] = {
+	{
+		.family        = NFPROTO_IPV4,
+		.name          = "HMARK",
+		.version       = XTABLES_VERSION,
+		.revision      = 0,
+		.size          = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.help          = HMARK_help,
+		.print         = HMARK_ip4_print,
+		.save          = HMARK_ip4_save,
+		.x6_parse      = HMARK_parse,
+		.x6_fcheck     = HMARK_check,
+		.x6_options    = HMARK_opts,
+	},
+	{
+		.family        = NFPROTO_IPV6,
+		.name          = "HMARK",
+		.version       = XTABLES_VERSION,
+		.revision      = 0,
+		.size          = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.help          = HMARK_help,
+		.print         = HMARK_ip6_print,
+		.save          = HMARK_ip6_save,
+		.x6_parse      = HMARK_parse,
+		.x6_fcheck     = HMARK_check,
+		.x6_options    = HMARK_opts,
+	},
+};
+
+void _init(void)
+{
+	xtables_register_targets(mark_tg_reg, ARRAY_SIZE(mark_tg_reg));
+}
+
diff --git a/extensions/libxt_HMARK.man b/extensions/libxt_HMARK.man
new file mode 100644
index 0000000..92bd1ed
--- /dev/null
+++ b/extensions/libxt_HMARK.man
@@ -0,0 +1,84 @@
+This module does the same as MARK, i.e. set an fwmark, but the mark is based on a hash value.
+The hash is based on src-addr, dst-addr, sport, dport and proto. The same mark will be produced independent of direction if no masks is set or the same masks is used for src and dest.
+The hash mark could be adjusted by modulus and finally an offset could be added, i.e the final mark will be within a range.
+ICMP error will use the the original message for hash calculation not the icmp it self.
+
+Note: IPv4 packets with nf_defrag_ipv4 loaded will be defragmented before they reach hmark,
+      IPv6 nf_defrag is not implemented this way, hence fragmented ipv6 packets will reach hmark.
+      Default behavior is to completely ignore any fragment if it reach hmark.
+      --hmark-method L3 is fragment safe since neither ports or L4 protocol field is used.
+      None of the parameters effect the packet it self only the calculated hash value.
+      
+.PP
+Parameters:
+Short hand methods
+.TP
+\fB\-\-hmark\-method\fP \fIL3\fP
+Do not use L4 protocol field, ports or spi, only Layer 3 addresses, mask length 
+of L3 addresses can still be used. Fragment or not does not matter in 
+this case since only L3 address can be used in calc. of hash value.
+.TP
+\fB\-\-hmark\-method\fP \fIL3-4\fP (Default)
+Include L4 in calculation. of hash value i.e. all masks below are valid. 
+Fragments will be ignored. (i.e no hash value produced)
+.PP
+For all masks default is all "1:s", to disable a field use mask 0
+.TP
+\fB\-\-hmark\-src\-mask\fP \fIlength\fP
+The length of the mask to AND the source address with (saddr & value).
+.TP
+\fB\-\-hmark\-dst\-mask\fP \fIlength\fP
+The length of the mask to AND the dest. address with (daddr & value).
+.TP
+\fB\-\-hmark\-sport\-mask\fP \fIvalue\fP
+A 16 bit value to AND the src port with (sport & value).
+.TP
+\fB\-\-hmark\-dport\-mask\fP \fIvalue\fP
+A 16 bit value to AND the dest port with (dport & value).
+.TP
+\fB\-\-hmark\-sport\-set\fP \fIvalue\fP
+A 16 bit value to OR the src port with (sport | value).
+.TP
+\fB\-\-hmark\-dport\-set\fP \fIvalue\fP
+A 16 bit value to OR the dest port with (dport | value).
+.TP
+\fB\-\-hmark\-spi\-mask\fP \fIvalue\fP
+Value to AND the spi field with (spi & value) valid for proto esp or ah.
+.TP
+\fB\-\-hmark\-spi\-set\fP \fIvalue\fP
+Value to OR the spi field with (spi | value) valid for proto esp or ah.
+.TP
+\fB\-\-hmark\-proto\-mask\fP \fIvalue\fP
+An 8 bit value to AND the L4 proto field with (proto & value).
+.TP
+\fB\-\-hmark\-ct\fP
+When flag is set, conntrack data should be used. Useful when NAT internal addressed should be used in calculation.
+Be careful when using DNAT since mangle table is handled before nat table. I.e it will not work as expected to put HMARK in table mangle and PREROUTING chain. The initial packet will have it's hash based on the original address, while the rest of the flow will use the NAT:ed address.
+.TP
+\fB\-\-hmark\-rnd\fP \fIvalue\fP
+A 32 bit initial value for hash calc, default is 0xc175a3b8.
+.PP
+Final processing of the mark in order of execution.
+.TP
+\fB\-\-hmark\-mod\fP \fIvalue (must be > 0)\fP
+The easiest way to describe this is:  hash = hash mod <value>
+.TP
+\fB\-\-hmark\-offset\fP \fIvalue\fP
+The easiest way to describe this is:  hash = hash + <value>
+.PP
+\fIExamples:\fP
+.PP
+Default rule handles all TCP, UDP, SCTP, ESP & AH
+.IP
+iptables \-t mangle \-A PREROUTING \-m state \-\-state NEW,ESTABLISHED,RELATED
+ \-j HMARK \-\-hmark-offs 10000 \-\-hmark-mod 10
+.PP
+Handle SCTP and hash dest port only and produce a nfmark between 100-119.
+.IP
+iptables \-t mangle \-A PREROUTING -p SCTP \-j HMARK \-\-src\-mask 0 \-\-dst\-mask 0
+ \-\-sp\-mask 0 \-\-offset 100 \-\-mod 20
+.PP
+Fragment safe Layer 3 only that keep a class C network flow together
+.IP
+iptables \-t mangle \-A PREROUTING \-j HMARK \-\-method L3 \-\-src\-mask 24 \-\-mod 20 \-\-offset 100
+
diff --git a/include/linux/netfilter/xt_HMARK.h b/include/linux/netfilter/xt_HMARK.h
new file mode 100644
index 0000000..cdf4a8f
--- /dev/null
+++ b/include/linux/netfilter/xt_HMARK.h
@@ -0,0 +1,62 @@
+#ifndef XT_HMARK_H_
+#define XT_HMARK_H_
+
+#include <linux/types.h>
+
+enum {
+	XT_HMARK_NONE,
+	XT_HMARK_SADR_AND,
+	XT_HMARK_DADR_AND,
+	XT_HMARK_SPI_AND,
+	XT_HMARK_SPI_OR,
+	XT_HMARK_SPORT_AND,
+	XT_HMARK_DPORT_AND,
+	XT_HMARK_SPORT_OR,
+	XT_HMARK_DPORT_OR,
+	XT_HMARK_PROTO_AND,
+	XT_HMARK_RND,
+	XT_HMARK_MODULUS,
+	XT_HMARK_OFFSET,
+	XT_HMARK_CT,
+	XT_HMARK_METHOD_L3,
+	XT_HMARK_METHOD_L3_4,
+	XT_F_HMARK_SADR_AND    = 1 << XT_HMARK_SADR_AND,
+	XT_F_HMARK_DADR_AND    = 1 << XT_HMARK_DADR_AND,
+	XT_F_HMARK_SPI_AND     = 1 << XT_HMARK_SPI_AND,
+	XT_F_HMARK_SPI_OR      = 1 << XT_HMARK_SPI_OR,
+	XT_F_HMARK_SPORT_AND   = 1 << XT_HMARK_SPORT_AND,
+	XT_F_HMARK_DPORT_AND   = 1 << XT_HMARK_DPORT_AND,
+	XT_F_HMARK_SPORT_OR    = 1 << XT_HMARK_SPORT_OR,
+	XT_F_HMARK_DPORT_OR    = 1 << XT_HMARK_DPORT_OR,
+	XT_F_HMARK_PROTO_AND   = 1 << XT_HMARK_PROTO_AND,
+	XT_F_HMARK_RND         = 1 << XT_HMARK_RND,
+	XT_F_HMARK_MODULUS     = 1 << XT_HMARK_MODULUS,
+	XT_F_HMARK_OFFSET      = 1 << XT_HMARK_OFFSET,
+	XT_F_HMARK_CT          = 1 << XT_HMARK_CT,
+	XT_F_HMARK_METHOD_L3   = 1 << XT_HMARK_METHOD_L3,
+	XT_F_HMARK_METHOD_L3_4 = 1 << XT_HMARK_METHOD_L3_4,
+};
+
+union hmark_ports {
+	struct {
+		__u16	src;
+		__u16	dst;
+	} p16;
+	__u32	v32;
+};
+
+struct xt_hmark_info {
+	union nf_inet_addr	src_mask;	/* Source address mask */
+	union nf_inet_addr	dst_mask;	/* Dest address mask */
+	union hmark_ports	port_mask;
+	union hmark_ports	port_set;
+	__u32			spi_mask;
+	__u32			spi_set;
+	__u32			flags;		/* Print out only */
+	__u16			proto_mask;	/* L4 Proto mask */
+	__u32			hashrnd;
+	__u32			hmodulus;	/* Modulus */
+	__u32			hoffset;	/* Offset */
+};
+
+#endif /* XT_HMARK_H_ */
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [v11 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
  2012-03-22 11:59 ` [v11 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark Hans Schillstrom
@ 2012-04-12 15:54   ` Pablo Neira Ayuso
  2012-04-23 12:33     ` Hans Schillstrom
  0 siblings, 1 reply; 6+ messages in thread
From: Pablo Neira Ayuso @ 2012-04-12 15:54 UTC (permalink / raw)
  To: Hans Schillstrom; +Cc: kaber, jengelh, netfilter-devel, netdev, hans

Hi Hans,

On Thu, Mar 22, 2012 at 12:59:52PM +0100, Hans Schillstrom wrote:
> The target allows you to create rules in the "raw" and "mangle" tables
> which alter the netfilter mark (nfmark) field within a given range.
> First a 32 bit hash value is generated then modulus by <limit> and
> finally an offset is added before it's written to nfmark.
> Prior to routing, the nfmark can influence the routing method (see
> "Use netfilter MARK value as routing key") and can also be used by
> other subsystems to change their behavior.
> 
> man page
>    HMARK
>        This  module  does  the  same  as MARK, i.e. set an fwmark, but the mark
>        is based on a hash value.  The hash is based on saddr, daddr, sport,
>        dport and proto. The same mark will be produced independent of direction
>        if no masks is set or the same masks is used for src and dest.
>        The hash mark could be adjusted by modulus and finally an offset could
>        be added, i.e the final mark will be within a range. ICMP error will use
>        the the original message for hash calculation not the icmp it self.
> 
>        Note: IPv4 packets with nf_defrag_ipv4 loaded will be defragmented before they reach hmark,
>              IPv6 nf_defrag is not implemented this way, hence fragmented ipv6 packets will reach hmark.
>              Default behavior is to completely ignore any fragment if it reach hmark.
>              --hmark-method L3 is fragment safe since neither ports or L4 protocol field is used.
>              None of the parameters effect the packet it self only the calculated hash value.
> 
>        Parameters: Short hand methods
> 
>        --hmark-method L3
>               Do not use L4 protocol field, ports or spi, only Layer 3 addresses,
>               mask length of L3 addresses can still be used. Fragment or not
>               does not matter in this case since only L3 address can be used in
>               calc. of hash value.
> 
>        --hmark-method L3-4 (Default)
>               Include  L4  in  calculation. of hash value i.e. all masks below are valid.
>               Fragments will be ignored. (i.e no hash value produced)
> 
>        For all masks default is all "1:s", to disable a field use mask 0
> 
>        --hmark-src-mask length
>               The length of the mask to AND the source address with (saddr & value).
> 
>        --hmark-dst-mask length
>               The length of the mask to AND the dest. address with (daddr & value).
> 
>        --hmark-sport-mask value
>               A 16 bit value to AND the src port with (sport & value).
> 
>        --hmark-dport-mask value
>               A 16 bit value to AND the dest port with (dport & value).
> 
>        --hmark-sport-set value
>               A 16 bit value to OR the src port with (sport | value).
> 
>        --hmark-dport-set value
>               A 16 bit value to OR the dest port with (dport | value).
> 
>        --hmark-spi-mask value
>               Value to AND the spi field with (spi & value) valid for proto esp or ah.
> 
>        --hmark-spi-set value
>               Value to OR the spi field with (spi | value) valid for proto esp or ah.
> 
>        --hmark-proto-mask value
>               An 8 bit value to AND the L4 proto field with (proto & value).
> 
>        --hmark-ct
>               When flag is set, conntrack data should be used. Useful when NAT internal
>               addressed should be used in calculation.  Be careful when using DNAT
>               since mangle table is handled before nat table. I.e it will not work as
>               expected to put HMARK in table mangle and PREROUTING chain. The  initial
>               packet will have it's hash based on the original address,
>               while the rest of the flow will use the NAT:ed address.
> 
>        --hmark-rnd value
>               A 32 bit initial value for hash calc, default is 0xc175a3b8.
> 
>        Final processing of the mark in order of execution.
> 
>        --hmark-mod value (must be > 0)
>               The easiest way to describe this is:  hash = hash mod <value>
> 
>        --hmark-offset value
>               The easiest way to describe this is:  hash = hash + <value>
> 
>        Examples:
> 
>        Default rule handles all TCP, UDP, SCTP, ESP & AH
> 
>               iptables -t mangle -A PREROUTING -m state --state NEW,ESTABLISHED,RELATED
>                -j HMARK --hmark-offset 10000 --hmark-mod 10
> 
>        Handle SCTP and hash dest port only and produce a nfmark between 100-119.
> 
>               iptables -t mangle -A PREROUTING -p SCTP -j HMARK --src-mask 0 --dst-mask 0
>                --sp-mask 0 --offset 100 --mod 20
> 
>        Fragment safe Layer 3 only, that keep a class C network flow together
> 
>               iptables -t mangle -A PREROUTING -j HMARK --method L3 --src-mask 24 --mod 20 --offset 100
> 
> Rev 11
>     Two comments changed
> Rev 10
>      Even more simplified NAT handling just one switch --hmark-ct
>      some renaming and some minor changes.
>      Changes are based on Pablos review.
> 
> Rev 9
>       Simplified NAT selections, cleanup of comments, added checkentry()
>       change of #ifdef to #if IS_ENABLED and dependency.
>       Some minor formating.
>       Most changes are based on Pablos review.
> 
> Rev 8
>       method L3 / L3-4 added i.e. Fragment handling changed to
>       don't handle in "method L3-4"
>       Syntax change in user mode more NF compatible.
>       Most changes are based on Pablos review.
> 
> Rev 7
>       IPv6 descending into icmp error hdr didn't work as expected
>       with ipv6_find_hdr() Now it works as expected.
> 
> Rev 6
>       Compile options with or without conntrack fixed.
>       __ipv6_find_hdr() replaced by ipv6_find_hdr()
> 
> Rev 5
>       IPv6 rewritten uses __ipv6_find_hdr() (P. Mc Hardy)
>       Full mask and address used for IPv6 smask and dmask (J.Engelhart)
>       Changes due to comments by Pablo Neira Ayuso  and Eric Dumazet
>       i.e uses of skb_header_pointer() and Null check of info->hmod
>       Man page changes
> 
> Rev 4
>       different targets for IPv4 and IPv6
>       Changes based on review by Pablo.
> 
> Rev 3
>       Support added to SCTP for IPv6
> Rev 2
>       IPv6 header scan changed to follow RFC 2640
>       IPv4 icmp echo fragmented does now use proto as ipv6
>       IPv6 pskb_may_pull() check is done in every time in header loop.
>       IPv4 nat support added.
>       default added in IPv6 loop and null check of hp
> 
> Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
> ---
>  include/linux/netfilter/xt_HMARK.h |   62 +++++++
>  net/netfilter/Kconfig              |   18 ++
>  net/netfilter/Makefile             |    1 +
>  net/netfilter/xt_HMARK.c           |  319 ++++++++++++++++++++++++++++++++++++
>  4 files changed, 400 insertions(+), 0 deletions(-)
>  create mode 100644 include/linux/netfilter/xt_HMARK.h
>  create mode 100644 net/netfilter/xt_HMARK.c
> 
> diff --git a/include/linux/netfilter/xt_HMARK.h b/include/linux/netfilter/xt_HMARK.h
> new file mode 100644
> index 0000000..cdf4a8f
> --- /dev/null
> +++ b/include/linux/netfilter/xt_HMARK.h
> @@ -0,0 +1,62 @@
> +#ifndef XT_HMARK_H_
> +#define XT_HMARK_H_
> +
> +#include <linux/types.h>
> +
> +enum {
> +	XT_HMARK_NONE,
> +	XT_HMARK_SADR_AND,
> +	XT_HMARK_DADR_AND,
> +	XT_HMARK_SPI_AND,
> +	XT_HMARK_SPI_OR,
> +	XT_HMARK_SPORT_AND,
> +	XT_HMARK_DPORT_AND,
> +	XT_HMARK_SPORT_OR,
> +	XT_HMARK_DPORT_OR,
> +	XT_HMARK_PROTO_AND,
> +	XT_HMARK_RND,
> +	XT_HMARK_MODULUS,
> +	XT_HMARK_OFFSET,
> +	XT_HMARK_CT,
> +	XT_HMARK_METHOD_L3,
> +	XT_HMARK_METHOD_L3_4,
> +	XT_F_HMARK_SADR_AND    = 1 << XT_HMARK_SADR_AND,
> +	XT_F_HMARK_DADR_AND    = 1 << XT_HMARK_DADR_AND,
> +	XT_F_HMARK_SPI_AND     = 1 << XT_HMARK_SPI_AND,
> +	XT_F_HMARK_SPI_OR      = 1 << XT_HMARK_SPI_OR,
> +	XT_F_HMARK_SPORT_AND   = 1 << XT_HMARK_SPORT_AND,
> +	XT_F_HMARK_DPORT_AND   = 1 << XT_HMARK_DPORT_AND,
> +	XT_F_HMARK_SPORT_OR    = 1 << XT_HMARK_SPORT_OR,
> +	XT_F_HMARK_DPORT_OR    = 1 << XT_HMARK_DPORT_OR,
> +	XT_F_HMARK_PROTO_AND   = 1 << XT_HMARK_PROTO_AND,
> +	XT_F_HMARK_RND         = 1 << XT_HMARK_RND,
> +	XT_F_HMARK_MODULUS     = 1 << XT_HMARK_MODULUS,
> +	XT_F_HMARK_OFFSET      = 1 << XT_HMARK_OFFSET,
> +	XT_F_HMARK_CT          = 1 << XT_HMARK_CT,
> +	XT_F_HMARK_METHOD_L3   = 1 << XT_HMARK_METHOD_L3,
> +	XT_F_HMARK_METHOD_L3_4 = 1 << XT_HMARK_METHOD_L3_4,
> +};
> +
> +union hmark_ports {
> +	struct {
> +		__u16	src;
> +		__u16	dst;
> +	} p16;
> +	__u32	v32;
> +};
> +
> +struct xt_hmark_info {
> +	union nf_inet_addr	src_mask;	/* Source address mask */
> +	union nf_inet_addr	dst_mask;	/* Dest address mask */
> +	union hmark_ports	port_mask;
> +	union hmark_ports	port_set;
> +	__u32			spi_mask;
> +	__u32			spi_set;
> +	__u32			flags;		/* Print out only */
> +	__u16			proto_mask;	/* L4 Proto mask */
> +	__u32			hashrnd;
> +	__u32			hmodulus;	/* Modulus */
> +	__u32			hoffset;	/* Offset */
> +};
> +
> +#endif /* XT_HMARK_H_ */
> diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
> index f8ac4ef..a775804 100644
> --- a/net/netfilter/Kconfig
> +++ b/net/netfilter/Kconfig
> @@ -488,6 +488,24 @@ config NETFILTER_XT_TARGET_HL
>  	since you can easily create immortal packets that loop
>  	forever on the network.
>  
> +config NETFILTER_XT_TARGET_HMARK
> +	tristate '"HMARK" target support'
> +	depends on (IP6_NF_IPTABLES || IP6_NF_IPTABLES=n)

do we really need this dependency above?

> +	depends on NETFILTER_ADVANCED
> +	---help---
> +	This option adds the "HMARK" target.
> +
> +	The target allows you to create rules in the "raw" and "mangle" tables
> +	which alter the netfilter mark (nfmark) field within a given range.
> +	First a 32 bit hash value is generated then modulus by <limit> and
> +	finally an offset is added before it's written to nfmark.
> +
> +	Prior to routing, the nfmark can influence the routing method (see
> +	"Use netfilter MARK value as routing key") and can also be used by
> +	other subsystems to change their behavior.
> +
> +	The mark match can also be used to match nfmark produced by this module.
> +
>  config NETFILTER_XT_TARGET_IDLETIMER
>  	tristate  "IDLETIMER target support"
>  	depends on NETFILTER_ADVANCED
> diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
> index 40f4c3d..2712ba0 100644
> --- a/net/netfilter/Makefile
> +++ b/net/netfilter/Makefile
> @@ -57,6 +57,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_CONNSECMARK) += xt_CONNSECMARK.o
>  obj-$(CONFIG_NETFILTER_XT_TARGET_CT) += xt_CT.o
>  obj-$(CONFIG_NETFILTER_XT_TARGET_DSCP) += xt_DSCP.o
>  obj-$(CONFIG_NETFILTER_XT_TARGET_HL) += xt_HL.o
> +obj-$(CONFIG_NETFILTER_XT_TARGET_HMARK) += xt_HMARK.o
>  obj-$(CONFIG_NETFILTER_XT_TARGET_LED) += xt_LED.o
>  obj-$(CONFIG_NETFILTER_XT_TARGET_NFLOG) += xt_NFLOG.o
>  obj-$(CONFIG_NETFILTER_XT_TARGET_NFQUEUE) += xt_NFQUEUE.o
> diff --git a/net/netfilter/xt_HMARK.c b/net/netfilter/xt_HMARK.c
> new file mode 100644
> index 0000000..d90549d
> --- /dev/null
> +++ b/net/netfilter/xt_HMARK.c
> @@ -0,0 +1,319 @@
> +/*
> + * xt_hmark - Netfilter module to set mark as hash value
> + *
> + * (C) 2012 Hans Schillstrom <hans.schillstrom@ericsson.com>
> + *
> + *Description:
> + *	This module calculates a hash value that can be modified by modulus
> + *	and an offset, i.e. it is possible to produce a skb->mark within a range
> + *	The hash value is based on a direction independent five tuple:
> + *	src & dst addr src & dst ports and protocol.
> + *	There is two distinct modes for hash calculation:
> + *
> + *	This program is free software; you can redistribute it and/or modify
> + *	it under the terms of the GNU General Public License version 2 as
> + *	published by the Free Software Foundation.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/skbuff.h>
> +#include <net/ip.h>
> +#include <linux/icmp.h>
> +
> +#include <linux/netfilter/xt_HMARK.h>
> +#include <linux/netfilter/x_tables.h>
> +#include <net/netfilter/nf_conntrack.h>
> +#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
> +#include <net/ipv6.h>
> +#include <linux/netfilter_ipv6/ip6_tables.h>
> +#endif
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Hans Schillstrom <hans.schillstrom@ericsson.com>");
> +MODULE_DESCRIPTION("Xtables: Packet range mark operations by Hash value");
> +MODULE_ALIAS("ipt_HMARK");
> +MODULE_ALIAS("ip6t_HMARK");
> +
> +/*
> + * ICMP, get header offset if icmp error
> + */
> +static int get_inner_hdr(struct sk_buff *skb, int iphsz, int *nhoff)
> +{
> +	const struct icmphdr *icmph;
> +	struct icmphdr _ih;
> +
> +	/* Not enough header? */
> +	icmph = skb_header_pointer(skb, *nhoff + iphsz, sizeof(_ih), &_ih);
> +	if (icmph == NULL && icmph->type > NR_ICMP_TYPES)
> +		return 0;
> +
> +	/* Error message? */
> +	if (icmph->type != ICMP_DEST_UNREACH &&
> +	    icmph->type != ICMP_SOURCE_QUENCH &&
> +	    icmph->type != ICMP_TIME_EXCEEDED &&
> +	    icmph->type != ICMP_PARAMETERPROB &&
> +	    icmph->type != ICMP_REDIRECT)
> +		return 0;
> +
> +	*nhoff += iphsz + sizeof(_ih);
> +	return 1;
> +}
> +
> +#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
> +/*
> + * Get ipv6 header offset if icmp error
> + */
> +static int get_inner6_hdr(struct sk_buff *skb, int *offset)
> +{
> +	struct icmp6hdr *icmp6h, _ih6;
> +
> +	icmp6h = skb_header_pointer(skb, *offset, sizeof(_ih6), &_ih6);
> +	if (icmp6h == NULL)
> +		return 0;
> +
> +	if (icmp6h->icmp6_type && icmp6h->icmp6_type < 128) {
> +		*offset += sizeof(struct icmp6hdr);
> +		return 1;
> +	}
> +	return 0;
> +}
> +/*
> + * Calculate hash based fw-mark, on the five tuple if possible.
> + * special cases :
> + *  - Fragments do not use ports not even on the first fragment,
> + *    nf_defrag_ipv6.ko don't defrag for us like it do in ipv4.
> + *    This might be changed in the future.
> + *  - On ICMP errors the inner header will be used.
> + *  - Tunnels no ports
> + *  - ESP & AH uses SPI
> + * @returns XT_CONTINUE
> + */
> +static unsigned int
> +hmark_v6(struct sk_buff *skb, const struct xt_action_param *par)
> +{
> +	const struct xt_hmark_info *info = par->targinfo;
> +	struct ipv6hdr *ip6, _ip6;
> +	int poff, flag = IP6T_FH_F_AUTH; /* Ports offset, find_hdr flags */
> +	union hmark_ports uports;
> +	u32 addr_src, addr_dst, hash, nhoffs = 0;
> +	u16 fragoff = 0;
> +	u8 nexthdr;
> +
> +	ip6 = (struct ipv6hdr *) (skb->data + skb_network_offset(skb));
> +	nexthdr = ipv6_find_hdr(skb, &nhoffs, -1, &fragoff, &flag);
> +	if (nexthdr < 0)
> +		return XT_CONTINUE;
> +	/* No need to check for icmp errors on fragments */
> +	if ((flag & IP6T_FH_F_FRAG) || (nexthdr != IPPROTO_ICMPV6))
> +		goto noicmp;
> +	/* if an icmp error, use the inner header */
> +	if (get_inner6_hdr(skb, &nhoffs)) {
> +		ip6 = skb_header_pointer(skb, nhoffs, sizeof(_ip6), &_ip6);
> +		if (!ip6)
> +			return XT_CONTINUE;
> +		/* Treat AH as ESP, use SPI nothing else. */
> +		flag = IP6T_FH_F_AUTH;
> +		nexthdr = ipv6_find_hdr(skb, &nhoffs, -1, &fragoff, &flag);
> +		if (nexthdr < 0)
> +			return XT_CONTINUE;
> +	}
> +noicmp:
> +	addr_src = (__force u32)
> +		(ip6->saddr.s6_addr32[0] & info->src_mask.in6.s6_addr32[0]) ^
> +		(ip6->saddr.s6_addr32[1] & info->src_mask.in6.s6_addr32[1]) ^
> +		(ip6->saddr.s6_addr32[2] & info->src_mask.in6.s6_addr32[2]) ^
> +		(ip6->saddr.s6_addr32[3] & info->src_mask.in6.s6_addr32[3]);
> +	addr_dst = (__force u32)
> +		(ip6->daddr.s6_addr32[0] & info->dst_mask.in6.s6_addr32[0]) ^
> +		(ip6->daddr.s6_addr32[1] & info->dst_mask.in6.s6_addr32[1]) ^
> +		(ip6->daddr.s6_addr32[2] & info->dst_mask.in6.s6_addr32[2]) ^
> +		(ip6->daddr.s6_addr32[3] & info->dst_mask.in6.s6_addr32[3]);
> +
> +	uports.v32 = 0;
> +	if ((info->flags & XT_F_HMARK_METHOD_L3) ||
> +	    (nexthdr == IPPROTO_ICMPV6))
> +		goto no_ports;
> +	/* Is next header valid for port or SPI calculation ? */
> +	poff = proto_ports_offset(nexthdr);
> +	if ((flag & IP6T_FH_F_FRAG) || poff < 0)
> +		return XT_CONTINUE;
> +
> +	nhoffs += poff;
> +	if (skb_copy_bits(skb, nhoffs, &uports, sizeof(uports)) < 0)
> +		return XT_CONTINUE;
> +
> +	if ((nexthdr == IPPROTO_ESP) || (nexthdr == IPPROTO_AH))
> +		uports.v32 = (uports.v32 & info->spi_mask) | info->spi_set;
> +	else {
> +		uports.v32 = (uports.v32 & info->port_mask.v32) |
> +			      info->port_set.v32;
> +		/* get a consistent hash (same value in any flow dirs.) */
> +		if (uports.p16.dst < uports.p16.src)
> +			swap(uports.p16.dst, uports.p16.src);
> +	}
> +
> +no_ports:
> +	nexthdr &= info->proto_mask;
> +	/* get a consistent hash (same value in any flow direction) */
> +	if (addr_dst < addr_src)
> +		swap(addr_src, addr_dst);
> +
> +	hash = jhash_3words(addr_src, addr_dst, uports.v32, info->hashrnd) ^ nexthdr;
> +	skb->mark = (hash % info->hmodulus) + info->hoffset;
> +	return XT_CONTINUE;
> +}
> +#endif
> +/*
> + * Calculate hash based fw-mark, on the five tuple if possible.
> + * special cases :
> + *  - Fragments do not use ports not even on the first fragment,
> + *    unless nf_defrag_xx.ko is used.
> + *  - On ICMP errors the inner header will be used.
> + *  - Tunnels no ports
> + *  - ESP & AH uses SPI
> + * @returns XT_CONTINUE
> + */
> +static unsigned int
> +hmark_v4(struct sk_buff *skb, const struct xt_action_param *par)
> +{
> +	const struct xt_hmark_info *info = par->targinfo;
> +	struct iphdr *ip, _ip;
> +	int nhoff, poff, frag = 0;
> +	union hmark_ports uports;
> +	u32 addr_src, addr_dst, hash;
> +	u8 ip_proto;
> +
> +	nhoff = skb_network_offset(skb);
> +	ip = (struct iphdr *) (skb->data + nhoff);
> +	if (ip->protocol == IPPROTO_ICMP) {
> +		/* if an icmp error, calc hash on inner header */
> +		if (get_inner_hdr(skb, ip->ihl * 4, &nhoff)) {
> +			ip = skb_header_pointer(skb, nhoff, sizeof(_ip), &_ip);
> +			if (!ip)
> +				return XT_CONTINUE;
> +		}
> +	}
> +
> +	ip_proto = ip->protocol;
> +	if (ip->frag_off & htons(IP_MF | IP_OFFSET))
> +		frag = 1;
> +
> +	addr_src = (__force u32) ip->saddr;
> +	addr_dst = (__force u32) ip->daddr;
> +	uports.v32 = 0;
> +/* conntrack take care of ICMP relation */
> +#if IS_ENABLED(CONFIG_NF_CONNTRACK)
> +	if (info->flags & XT_F_HMARK_CT) {
> +		struct nf_conntrack_tuple *otuple;
> +		struct nf_conntrack_tuple *rtuple;
> +		enum ip_conntrack_info ctinfo;
> +		struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
> +
> +		if (!ct || nf_ct_is_untracked(ct))
> +			return XT_CONTINUE;
> +
> +		otuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
> +		rtuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
> +
> +		addr_src       = (__force u32)otuple->src.u3.in.s_addr;
> +		uports.p16.src = otuple->src.u.udp.port;
> +		addr_dst       = (__force u32)rtuple->src.u3.in.s_addr;
> +		uports.p16.dst = rtuple->src.u.udp.port;
> +	}

There's an inconsistency here. No conntrack support for IPv6.

I'd suggest to split hmark_v4 into two functions by checking:

... hmark_v4(...)
{
        if (info->flags & XT_F_HMARK_CT)
                ret = hmark_tg_ct_v4(...)
        else
                ret = hmark_tg_v4(...)

        return ret;
}

Same thing for IPv6. Those function will look smaller, and that's good
to make the code more maintainable.

You can define some hmark_hash_v4 and hmark_hash_v6 function that you
may want to inline.

Another suggestion in case you may need to extend HMARK in
the future. I think some info->type field to specify the 

info->type can be HMARK_T_PKT or HMARK_T_CT.

Thus, the code above would look like:

... hmark_v4(...)
{
        switch(info->type) {
        case HMARK_T_PKT:
                ret = hmark_tg_ct_v4(...)
                break;
        case HMARK_T_CT:
                ret = hmark_tg_v4(...)
                break;
        }
}

But *this is only a suggestion*, of course.

> +#endif
> +	addr_src &= info->src_mask.ip;
> +	addr_dst &= info->dst_mask.ip;
> +
> +	if ((info->flags & XT_F_HMARK_METHOD_L3) ||
> +	    (ip_proto == IPPROTO_ICMP)) {
> +		uports.v32 = 0;
> +		goto noports;
> +	}
> +	/* Check if ports can be used in hash calculation. */
> +	poff = proto_ports_offset(ip_proto);
> +	if (frag || poff < 0)
> +		return XT_CONTINUE;
> +
> +	/* if --ct not given, get ports from skb */
> +	if (!uports.v32) {
> +		nhoff += (ip->ihl * 4) + poff;
> +		if (skb_copy_bits(skb, nhoff, &uports, sizeof(uports)) < 0)
> +			return XT_CONTINUE;
> +	}
> +
> +	if (ip_proto == IPPROTO_ESP || ip_proto == IPPROTO_AH)
> +		uports.v32 = (uports.v32 & info->spi_mask) | info->spi_set;
> +	else {
> +		uports.v32 = (uports.v32 & info->port_mask.v32) |
> +				info->port_set.v32;
> +		/* get a consistent hash (same value in any flow dirs.) */
> +		if (uports.p16.dst < uports.p16.src)
> +			swap(uports.p16.src, uports.p16.dst);
> +	}
> +
> +noports:
> +	/* get a consistent hash (same value in any flow direction) */
> +	if (addr_dst < addr_src)
> +		swap(addr_src, addr_dst);
> +
> +	hash = jhash_3words(addr_src, addr_dst, uports.v32, info->hashrnd);
> +	hash = hash ^ (ip_proto & info->proto_mask);
> +	skb->mark = (hash % info->hmodulus) + info->hoffset;
> +	return XT_CONTINUE;
> +}
> +
> +static int hmark_check(const struct xt_tgchk_param *par)
> +{
> +	const struct xt_hmark_info *info = par->targinfo;
> +
> +	if (!info->hmodulus) {
> +		pr_info("HMARK: hmark-mod can't be zero\n");
> +		return -EINVAL;
> +	}
> +	if (info->proto_mask && (info->flags & XT_F_HMARK_METHOD_L3)) {
> +		pr_info("HMARK: When method L3 proto mask must be zero\n");
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
> +static struct xt_target hmark_tg_reg[] __read_mostly = {
> +	{
> +		.name           = "HMARK",
> +		.revision       = 0,
> +		.family         = NFPROTO_IPV4,
> +		.target         = hmark_v4,
> +		.targetsize     = sizeof(struct xt_hmark_info),
> +		.checkentry     = hmark_check,
> +		.me             = THIS_MODULE,
> +	},
> +#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
> +	{
> +		.name           = "HMARK",
> +		.revision       = 0,
> +		.family         = NFPROTO_IPV6,
> +		.target         = hmark_v6,
> +		.targetsize     = sizeof(struct xt_hmark_info),
> +		.checkentry     = hmark_check,
> +		.me             = THIS_MODULE,
> +	},
> +#endif
> +};
> +
> +static int __init hmark_mt_init(void)
> +{
> +	int ret;
> +
> +	ret = xt_register_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
> +	if (ret < 0)
> +		return ret;
> +	return 0;
> +}
> +
> +static void __exit hmark_mt_exit(void)
> +{
> +	xt_unregister_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
> +}
> +
> +module_init(hmark_mt_init);
> +module_exit(hmark_mt_exit);
> -- 
> 1.7.2.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [v11 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
  2012-04-12 15:54   ` Pablo Neira Ayuso
@ 2012-04-23 12:33     ` Hans Schillstrom
  0 siblings, 0 replies; 6+ messages in thread
From: Hans Schillstrom @ 2012-04-23 12:33 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: kaber, jengelh, netfilter-devel, netdev, hans

On Thursday 12 April 2012 17:54:42 Pablo Neira Ayuso wrote:
> Hi Hans,
> 
> On Thu, Mar 22, 2012 at 12:59:52PM +0100, Hans Schillstrom wrote:
[snip]

> >
> > +config NETFILTER_XT_TARGET_HMARK
> > +     tristate '"HMARK" target support'
> > +     depends on (IP6_NF_IPTABLES || IP6_NF_IPTABLES=n)
> 
> do we really need this dependency above?

Nope, I'll removed it.

[snip] 
> 
> There's an inconsistency here. No conntrack support for IPv6.

That's true, since it was intended for nat from the very begining..
I'll added.

> 
> I'd suggest to split hmark_v4 into two functions by checking:
> 
or make a common  hmark_ct for ipv6 and ipv4

> ... hmark_v4(...)
> {
>         if (info->flags & XT_F_HMARK_CT)
>                 ret = hmark_tg_ct_v4(...)
>         else
>                 ret = hmark_tg_v4(...)
> 
>         return ret;
> }
> 
> Same thing for IPv6. Those function will look smaller, and that's good
> to make the code more maintainable.
> 
> You can define some hmark_hash_v4 and hmark_hash_v6 function that you
> may want to inline.
> 
> Another suggestion in case you may need to extend HMARK in
> the future. I think some info->type field to specify the
> 
> info->type can be HMARK_T_PKT or HMARK_T_CT.
> 
> Thus, the code above would look like:
> 
> ... hmark_v4(...)
> {
>         switch(info->type) {
>         case HMARK_T_PKT:
>                 ret = hmark_tg_ct_v4(...)
>                 break;
>         case HMARK_T_CT:
>                 ret = hmark_tg_v4(...)
>                 break;
>         }
> }
> 
> But *this is only a suggestion*, of course.
> 
Thanks Pablo for your review , I will send a new patch today

Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-04-23 12:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-22 11:59 [v11 PATCH 0/3] NETFILTER new target module, HMARK Hans Schillstrom
2012-03-22 11:59 ` [v11 PATCH 1/3] NETFILTER added flags to ipv6_find_hdr() Hans Schillstrom
2012-03-22 11:59 ` [v11 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark Hans Schillstrom
2012-04-12 15:54   ` Pablo Neira Ayuso
2012-04-23 12:33     ` Hans Schillstrom
2012-03-22 11:59 ` [v11 PATCH 3/3] NETFILTER userspace part for target HMARK Hans Schillstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).