All of lore.kernel.org
 help / color / mirror / Atom feed
* [v5 PATCH 0/3] NETFILTER new target module, HMARK
@ 2012-01-02 15:06 Hans Schillstrom
  2012-01-02 15:06 ` [v5 PATCH 1/3] NETFILTER added flags to __ipv6_find_hdr() Hans Schillstrom
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Hans Schillstrom @ 2012-01-02 15:06 UTC (permalink / raw)
  To: kaber, pablo, jengelh, netfilter-devel, netdev; +Cc: hans, Hans Schillstrom

The target allows you to create rules in the "raw" and "mangle" tables
which alter the netfilter mark (nfmark) field within a given range.
First a 32 bit hash value is generated then modulus by <limit> and
finally an offset is added before it's written to nfmark.
Prior to routing, the nfmark can influence the routing method (see
"Use netfilter MARK value as routing key") and can also be used by
other subsystems to change their behavior.

The mark match can also be used to match nfmark produced by this module.
See the kernel module for more info.

REVISION
Version 5
	Use length of mask an smask and dmask and whole IPv6 addr (Jan E)
	Modify ipv6_find_hdr() and use it while traversing the IPv6 header.
        Manual changes.
	More or less all comments implemented.

Version 4
	Split of IPv6 and IPv4, use IP_CT_IS_REPLY, as Pablo suggested.
	removed one pskb_may_pull()
	xtoption parse used in the user space part.

Version 3
        Handling of SCTP for IPv6 added.

Version 2
	NAT Added for IPv4
	IPv6 ICMP handling enhanced.
	Usage example added

Version 1
	Initial RFC


We (Ericsson) use hmark in-front of ipvs as a pre-loadbalancer and
handles up to 70 ipvs running in parallel in clusters.
However hmark is not restricted to run in front of IPVS it can also be used as
"poor mans" load balancer.
With this version is also NAT supported as an option, with very high flows
you might not want to use conntrack.

The idea is to generate a direction independent fw mark range to use as input to
the routing (i.e. ip rule add fwmark ...).
Pretty straight forward and simple.


Example:
                                      App Server (Real Server)

                                           +---------+
                                        -->| Service |
     Gateway A                             +---------+
                          /
            +----------+ /     +----+      +---------+
--- if -A---| selector |---->  |ipvs|  --->| Service |
            +----------+ \     +----+      +---------+
                          \
                               +----+      +---------+
                               |ipvs|   -->| Service |
                               +----+      +---------+
      Gateway C
            +----------+ /     +----+
--- if-B ---| selector | --->  |ipvs|
            +----------+ \     +----+      +---------+
                                           | Service |
                                           +---------+
                          /
            +----------+ /     +----+     ..
--- if-B ---| selector | --->  |ipvs|      +---------+
            +----------+ \     +----+      | Service |
                          \                +---------+
#
# Example with four ipvs loadbalancers
#
iptables -t mangle -I PREROUTING -d $IPADDR -j HMARK --hmark-mod 4 --hmark-offs 100

ip rule add fwmark 100 table 100
ip rule add fwmark 101 table 101
ip rule add fwmark 102 table 102
ip rule add fwmark 103 table 103

ip ro ad table 100 default via x.y.z.1 dev bond1
ip ro ad table 101 default via x.y.z.2 dev bond1
ip ro ad table 102 default via x.y.z.3 dev bond1
ip ro ad table 103 default via x.y.z.4 dev bond1


If conntrack doesn't handle the return path,
do the oposite with HMARK and send it back right to ipvs.

Another exmaple of usage could be if you have cluster originated connections
and want to spread the connections over a number of interfaces
(NAT will complpicate things for you in this case)



                     \  Blade 1
                      \ +----------+      +---------+
                    <-- | selector | <--- | Service |
                      / +----------+      +---------+
                     /
   +------+
-- | Gw-A |          \  Blade 2
   +------+           \ +----------+      +---------+
   +------+         <-- | selector | <--- | Service |
-- | Gw-B |           / +----------+      +---------+
   +------+          /
   +------+
-- | Gw-C |          \
   +------+           \ +----------+      +---------+
                    <-- | selector | <--- | Service |
                      / +----------+      +---------+
                     /

                     \  Blande -n
                      \ +----------+      +---------+
                    <-- | selector | <--- | Service |
                      / +----------+      +---------+
                     /


Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [v5 PATCH 1/3] NETFILTER added flags to __ipv6_find_hdr()
  2012-01-02 15:06 [v5 PATCH 0/3] NETFILTER new target module, HMARK Hans Schillstrom
@ 2012-01-02 15:06 ` Hans Schillstrom
  2012-01-04 17:37   ` Pablo Neira Ayuso
  2012-01-02 15:06 ` [v5 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark Hans Schillstrom
  2012-01-02 15:06 ` [v5 PATCH 3/3] NETFILTER userspace part for target HMARK Hans Schillstrom
  2 siblings, 1 reply; 6+ messages in thread
From: Hans Schillstrom @ 2012-01-02 15:06 UTC (permalink / raw)
  To: kaber, pablo, jengelh, netfilter-devel, netdev; +Cc: hans, Hans Schillstrom

Two new flags to __ipv6_find_hdr,
One that tells us that this is a fragemnt.
One that stops at AH if any i.e. treat it like a transport header.
i.e. make handling of ESP and AH the same.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/linux/netfilter_ipv6/ip6_tables.h |   16 ++++++++++++++--
 net/ipv6/netfilter/ip6_tables.c           |   19 ++++++++++++++-----
 2 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/include/linux/netfilter_ipv6/ip6_tables.h b/include/linux/netfilter_ipv6/ip6_tables.h
index f549adc..ee0c68e 100644
--- a/include/linux/netfilter_ipv6/ip6_tables.h
+++ b/include/linux/netfilter_ipv6/ip6_tables.h
@@ -288,9 +288,21 @@ extern unsigned int ip6t_do_table(struct sk_buff *skb,
 
 /* Check for an extension */
 extern int ip6t_ext_hdr(u8 nexthdr);
+enum {
+	IP6T_FH_FRAG,
+	IP6T_FH_AUTH,
+	IP6T_FH_F_FRAG = 1 << IP6T_FH_FRAG,
+	IP6T_FH_F_AUTH = 1 << IP6T_FH_AUTH,
+};
 /* find specified header and get offset to it */
-extern int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
-			 int target, unsigned short *fragoff);
+extern int __ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
+			 int target, unsigned short *fragoff, int *fragflg);
+
+static inline int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
+		  int target, unsigned short *fragoff)
+{
+	return __ipv6_find_hdr(skb, offset, target, fragoff, NULL);
+}
 
 #ifdef CONFIG_COMPAT
 #include <net/compat.h>
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 94874b0..8729bff 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -2302,9 +2302,13 @@ static void __exit ip6_tables_fini(void)
  * *offset is meaningless and fragment offset is stored in *fragoff if fragoff
  * isn't NULL.
  *
+ * if flags != NULL AND
+ *    it's a fragment the frag flag "IP6T_FH_F_FRAG" will be set
+ *    it's an AH header and IP6T_FH_F_AUTH is set and target < 0
+ *      stop at AH (i.e. treat is as a transport header)
  */
-int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
-		  int target, unsigned short *fragoff)
+int __ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
+		  int target, unsigned short *fragoff, int *flags)
 {
 	unsigned int start = skb_network_offset(skb) + sizeof(struct ipv6hdr);
 	u8 nexthdr = ipv6_hdr(skb)->nexthdr;
@@ -2329,6 +2333,9 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
 		if (nexthdr == NEXTHDR_FRAGMENT) {
 			unsigned short _frag_off;
 			__be16 *fp;
+
+			if (flags)	/* Indicate that this is a fragment */
+				*flags |= IP6T_FH_F_FRAG;
 			fp = skb_header_pointer(skb,
 						start+offsetof(struct frag_hdr,
 							       frag_off),
@@ -2349,9 +2356,11 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
 				return -ENOENT;
 			}
 			hdrlen = 8;
-		} else if (nexthdr == NEXTHDR_AUTH)
+		} else if (nexthdr == NEXTHDR_AUTH) {
+			if (flags && (*flags & IP6T_FH_F_AUTH) && (target < 0))
+				break;
 			hdrlen = (hp->hdrlen + 2) << 2;
-		else
+		} else
 			hdrlen = ipv6_optlen(hp);
 
 		nexthdr = hp->nexthdr;
@@ -2367,7 +2376,7 @@ EXPORT_SYMBOL(ip6t_register_table);
 EXPORT_SYMBOL(ip6t_unregister_table);
 EXPORT_SYMBOL(ip6t_do_table);
 EXPORT_SYMBOL(ip6t_ext_hdr);
-EXPORT_SYMBOL(ipv6_find_hdr);
+EXPORT_SYMBOL(__ipv6_find_hdr);
 
 module_init(ip6_tables_init);
 module_exit(ip6_tables_fini);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [v5 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
  2012-01-02 15:06 [v5 PATCH 0/3] NETFILTER new target module, HMARK Hans Schillstrom
  2012-01-02 15:06 ` [v5 PATCH 1/3] NETFILTER added flags to __ipv6_find_hdr() Hans Schillstrom
@ 2012-01-02 15:06 ` Hans Schillstrom
  2012-01-02 15:06 ` [v5 PATCH 3/3] NETFILTER userspace part for target HMARK Hans Schillstrom
  2 siblings, 0 replies; 6+ messages in thread
From: Hans Schillstrom @ 2012-01-02 15:06 UTC (permalink / raw)
  To: kaber, pablo, jengelh, netfilter-devel, netdev; +Cc: hans, Hans Schillstrom

The target allows you to create rules in the "raw" and "mangle" tables
which alter the netfilter mark (nfmark) field within a given range.
First a 32 bit hash value is generated then modulus by <limit> and
finally an offset is added before it's written to nfmark.
Prior to routing, the nfmark can influence the routing method (see
"Use netfilter MARK value as routing key") and can also be used by
other subsystems to change their behavior.

man page
   HMARK
       This  module  does  the same as MARK, i.e. set an fwmark,
       but the mark is based on a hash value.  The hash is based on
       saddr, daddr, sport, dport and proto. The same mark will be produced
       independet of direction if no masks is set or the same masks is used for
       src and dest. The hash mark could be adjusted by modulus and finaly an
       offset could be added, i.e the final mark will be within a range.
       ICMP errors will have hash calc based on the original message.
       Note:
        - None of the parameters effect the packet it self
          only the calculated hash value.
       Fragmentation:
        - If a packet is fragmented NONE of the Fragments will include
          ports in the hash calculation.
          When fragments arrives on different machines HMARK will produce
          same fwmark for all frags inpendent of machine so they can be routed
          to the same destination.
        - If nf_defrag_ipv{4,6} is loaded the packets will be defragmented
          before reaching HMARK, i.e. in that case ports (if any) will be used.
          (ICMP Time exceeded will be sent if fragments are lost)

       Parameters: For all masks default is all "1:s", to disable a field
                   use mask 0. For IPv6 it's just the last 32 bits that
                   is included in the hash.

       --hmark-smask length (0-32 or 0-128)
              The value to AND the source address with (saddr & value).

       --hmark-dmask length (0-32 or 0-128)
              The value to AND the dest. address with (daddr & value).

       --hmark-sp-mask value
              A 16 bit value to AND the src port with (sport & value).

       --hmark-dp-mask value
              A 16 bit value to AND the dest port with (dport & value).

       --hmark-sp-set value
              A 16 bit value to OR the src port with (sport | value).

       --hmark-dp-set value
              A 16 bit value to OR the dest port with (dport | value).

       --hmark-spi-mask value
              Value to AND the spi field with (spi & value) valid for proto esp or ah.

       --hmark-spi-set value
              Value to OR the spi field with (spi | value) valid for proto esp or ah.

       --hmark-proto-mask value
              A 16 bit value to AND the L4 proto field with (proto & value).

       --hmark-rnd value
              A 32 bit intitial value for hash calc, default is 0xc175a3b8.

       --hmark-dnat (only IPv4)
              Replace src addr/port with original dst addr/port before calc, hash

       --hmark-snat (only IPv4)
              Replace dst addr/port with original src addr/port before calc, hash

       Final processing of the mark in order of execution.

       --hmark-mod value (must be > 0)
              The easiest way to describe this is:  hash = hash mod <value>

       --hmark-offs alue (must be > 0)
              The easiest way to describe this is:  hash = hash + <value>

       Examples:

       Default rule handles all TCP, UDP, SCTP, ESP & AH
Rev 5
      IPv6 rewritten uses __ipv6_find_hdr() (P. Mc Hardy)
      Full mask and address used for IPv6 smask and dmask (J.Engelhart)
      Changes due to comments by Pablo Neira Ayuso  and Eric Dumazet
      i.e uses of skb_header_pointer() and Null check of info->hmod
      Man page changes

Rev 4
      different targets for IPv4 and IPv6
      Changes based on review by Pablo.

Rev 3
      Support added to SCTP for IPv6
Rev 2
      IPv6 header scan changed to follow RFC 2640
      IPv4 icmp echo fragmented does now use proto as ipv6
      IPv6 pskb_may_pull() check is done in every time in header loop.
      IPv4 nat support added.
      default added in IPv6 loop and null check of hp

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/linux/netfilter/xt_hmark.h |   62 +++++++
 net/netfilter/Kconfig              |   17 ++
 net/netfilter/Makefile             |    1 +
 net/netfilter/xt_hmark.c           |  332 ++++++++++++++++++++++++++++++++++++
 4 files changed, 412 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_hmark.h
 create mode 100644 net/netfilter/xt_hmark.c

diff --git a/include/linux/netfilter/xt_hmark.h b/include/linux/netfilter/xt_hmark.h
new file mode 100644
index 0000000..366ecce
--- /dev/null
+++ b/include/linux/netfilter/xt_hmark.h
@@ -0,0 +1,62 @@
+#ifndef XT_HMARK_H_
+#define XT_HMARK_H_
+
+#include <linux/types.h>
+
+/*
+ * Flags must not start at 0, since it's used as none.
+ */
+enum {
+	XT_HMARK_SADR_AND = 1,	/* SNAT & DNAT are used by the kernel module */
+	XT_HMARK_DADR_AND,
+	XT_HMARK_SPI_AND,
+	XT_HMARK_SPI_OR,
+	XT_HMARK_SPORT_AND,
+	XT_HMARK_DPORT_AND,
+	XT_HMARK_SPORT_OR,
+	XT_HMARK_DPORT_OR,
+	XT_HMARK_PROTO_AND,
+	XT_HMARK_RND,
+	XT_HMARK_MODULUS,
+	XT_HMARK_OFFSET,
+	XT_HMARK_USE_SNAT,
+	XT_HMARK_USE_DNAT,
+	XT_F_HMARK_USE_SNAT = 1 << XT_HMARK_USE_SNAT,
+	XT_F_HMARK_USE_DNAT = 1 << XT_HMARK_USE_DNAT,
+	XT_F_HMARK_SADR_AND = 1 << XT_HMARK_SADR_AND,
+	XT_F_HMARK_DADR_AND = 1 << XT_HMARK_DADR_AND,
+	XT_F_HMARK_SPI_AND = 1 << XT_HMARK_SPI_AND,
+	XT_F_HMARK_SPI_OR = 1 << XT_HMARK_SPI_OR,
+	XT_F_HMARK_SPORT_AND = 1 << XT_HMARK_SPORT_AND,
+	XT_F_HMARK_DPORT_AND = 1 << XT_HMARK_DPORT_AND,
+	XT_F_HMARK_SPORT_OR = 1 << XT_HMARK_SPORT_OR,
+	XT_F_HMARK_DPORT_OR = 1 << XT_HMARK_DPORT_OR,
+	XT_F_HMARK_PROTO_AND = 1 << XT_HMARK_PROTO_AND,
+	XT_F_HMARK_RND = 1 << XT_HMARK_RND,
+	XT_F_HMARK_MODULUS = 1 << XT_HMARK_MODULUS,
+	XT_F_HMARK_OFFSET = 1 << XT_HMARK_OFFSET,
+};
+
+union hports {
+	struct {
+		__u16	src;
+		__u16	dst;
+	} p16;
+	__u32	v32;
+};
+
+struct xt_hmark_info {
+	union nf_inet_addr	smask;		/* Source address mask */
+	union nf_inet_addr	dmask;		/* Dest address mask */
+	union hports		pmask;
+	union hports		pset;
+	__u32			spimask;
+	__u32			spiset;
+	__u16			flags;		/* Print out only */
+	__u16			prmask;		/* L4 Proto mask */
+	__u32			hashrnd;
+	__u32			hmod;		/* Modulus */
+	__u32			hoffs;		/* Offset */
+};
+
+#endif /* XT_HMARK_H_ */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index d5597b7..6e85d7d 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -470,6 +470,23 @@ config NETFILTER_XT_TARGET_HL
 	since you can easily create immortal packets that loop
 	forever on the network.
 
+config NETFILTER_XT_TARGET_HMARK
+	tristate '"HMARK" target support'
+	depends on NETFILTER_ADVANCED
+	---help---
+	This option adds the "HMARK" target.
+
+	The target allows you to create rules in the "raw" and "mangle" tables
+	which alter the netfilter mark (nfmark) field within a given range.
+	First a 32 bit hash value is generated then modulus by <limit> and
+	finally an offset is added before it's written to nfmark.
+
+	Prior to routing, the nfmark can influence the routing method (see
+	"Use netfilter MARK value as routing key") and can also be used by
+	other subsystems to change their behavior.
+
+	The mark match can also be used to match nfmark produced by this module.
+
 config NETFILTER_XT_TARGET_IDLETIMER
 	tristate  "IDLETIMER target support"
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 1a02853..359eeb6 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -56,6 +56,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_CONNSECMARK) += xt_CONNSECMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_CT) += xt_CT.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_DSCP) += xt_DSCP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_HL) += xt_HL.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_HMARK) += xt_hmark.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_LED) += xt_LED.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_NFLOG) += xt_NFLOG.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_NFQUEUE) += xt_NFQUEUE.o
diff --git a/net/netfilter/xt_hmark.c b/net/netfilter/xt_hmark.c
new file mode 100644
index 0000000..1a854a9
--- /dev/null
+++ b/net/netfilter/xt_hmark.c
@@ -0,0 +1,332 @@
+/*
+ * xt_hmark - Netfilter module to set mark as hash value
+ *
+ * (C) 2011 Hans Schillstrom <hans.schillstrom@ericsson.com>
+ *
+ *Description:
+ *	This module calculates a hash value that can be modified by modulus
+ *	and an offset, i.e. it is possible to produce a skb->mark within a range
+ *	The hash value is based on a direction independent five tuple:
+ *	src & dst addr src & dst ports and protocol.
+ *	However src & dst port can be masked and are not used for fragmented
+ *	packets, ESP and AH don't have ports so SPI will be used instead.
+ *	AH will not use ports even if it might be possible.
+ *	Tunnels - only the outer saddr and daddr will beused,
+ *
+ *	For ICMP error messages the hash mark values will be calculated on
+ *	the source packet i.e. the packet caused the error (If sufficient
+ *	amount of data exists).
+ *
+ *Note:	None of the fragments will include ports/spi in the calculation of
+ *	the hash value. (i.e. all frags must the same hash value.)
+ *
+ *	This program is free software; you can redistribute it and/or modify
+ *	it under the terms of the GNU General Public License version 2 as
+ *	published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <net/ip.h>
+#include <linux/icmp.h>
+
+#include <linux/netfilter/xt_hmark.h>
+#include <linux/netfilter/x_tables.h>
+#include <net/netfilter/nf_nat.h>
+
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+#	define WITH_IPV6 1
+#include <net/ipv6.h>
+#include <linux/netfilter_ipv6/ip6_tables.h>
+#endif
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Hans Schillstrom <hans.schillstrom@ericsson.com>");
+MODULE_DESCRIPTION("Xtables: packet range mark operations by hash value");
+MODULE_ALIAS("ipt_HMARK");
+MODULE_ALIAS("ip6t_HMARK");
+
+/*
+ * ICMP, get inner header so calc can be made on the source message
+ *
+ * iphsz: ip header size in bytes
+ * nhoff: network header offset
+ * return; updated nhoff if an icmp error
+ */
+static int get_inner_hdr(struct sk_buff *skb, int iphsz, int nhoff)
+{
+	const struct icmphdr *icmph;
+	struct icmphdr _ih;
+
+	/* Not enough header? */
+	icmph = skb_header_pointer(skb, nhoff + iphsz, sizeof(_ih), &_ih);
+	if (icmph == NULL)
+		return nhoff;
+
+	if (icmph->type > NR_ICMP_TYPES)
+		return nhoff;
+
+	/* Error message? */
+	if (icmph->type != ICMP_DEST_UNREACH &&
+	    icmph->type != ICMP_SOURCE_QUENCH &&
+	    icmph->type != ICMP_TIME_EXCEEDED &&
+	    icmph->type != ICMP_PARAMETERPROB &&
+	    icmph->type != ICMP_REDIRECT)
+		return nhoff;
+
+	return nhoff + iphsz + sizeof(_ih);
+}
+
+#ifdef WITH_IPV6
+/* Dummy header used for size calculation of an error header */
+struct _icmpv6_errh {
+	__u8		icmp6_type;
+	__u8		icmp6_code;
+	__u16		icmp6_cksum;
+	__u32		icmp6_nu;
+};
+/*
+ * Get ipv6 header offset if icmp type < 128 i.e. an error.
+ * @param: offset  input: where ICMPv6 header starts
+ *                output: where ipv6 header starts / unchanged.
+ *
+ * Returns true if it's an icmp error
+ *              and updates *offset to where ipv6 header starts
+ */
+static int get_inner6_hdr(struct sk_buff *skb, int *offset)
+{
+	struct icmp6hdr *icmp6h, _ih6;
+
+	icmp6h = skb_header_pointer(skb, *offset, sizeof(_ih6), &_ih6);
+	if (icmp6h == NULL)
+		return 0;
+
+	if (icmp6h->icmp6_type && icmp6h->icmp6_type < 128) {
+		*offset +=  sizeof(struct _icmpv6_errh);
+		return 1;
+	}
+	return 0;
+}
+/*
+ * Calculate hash based fw-mark, on the five tuple if possible.
+ * special cases :
+ *  - Fragments do not use ports not even on the first fragment,
+ *    nf_defrag_ipv6.ko don't defrag for us like it do in ipv4.
+ *    This might be changed in the future.
+ *  - On ICMP errors the inner header will be used.
+ *  - Tunnels no ports
+ *  - ESP & AH uses SPI
+ * @returns XT_CONTINUE
+ */
+__u32 hmark_v6(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	struct xt_hmark_info *info = (struct xt_hmark_info *)par->targinfo;
+	struct ipv6hdr *ip6, _ip6;
+	int poff, flag = IP6T_FH_F_AUTH; /* Ports offset, find_hdr flags */
+	u32 addr1, addr2, hash, nhoffs;
+	u8 nexthdr;
+	union hports uports = { .v32 = 0 };
+	unsigned short fragoff = 0;
+
+	if (!info->hmod)
+		return XT_CONTINUE;
+
+	nhoffs = skb_network_offset(skb);
+	ip6 = (struct ipv6hdr *) (skb->data + nhoffs);
+
+	/* Try to get transport header */
+	nexthdr = __ipv6_find_hdr(skb, &nhoffs, -1, &fragoff, &flag);
+	if (nexthdr < 0)
+		return XT_CONTINUE;
+	/* dont check for icmp on fragments */
+	if ((flag & IP6T_FH_F_FRAG) || (nexthdr != IPPROTO_ICMPV6))
+		goto noicmp;
+	/* ICMP: if an error then move ptr to inner header */
+	if (get_inner6_hdr(skb, &nhoffs)) {
+		ip6 = skb_header_pointer(skb, nhoffs, sizeof(_ip6), &_ip6);
+		if (!ip6)
+			return XT_CONTINUE;
+		nhoffs += sizeof(_ip6);
+		flag = IP6T_FH_F_AUTH;
+		nexthdr = __ipv6_find_hdr(skb, &nhoffs, -1, &fragoff, &flag);
+		if (nexthdr < 0)
+			return XT_CONTINUE;
+	}
+noicmp:
+	/* Mask of the address and xor it into a u32 */
+	addr1 = (__force u32)
+		(ip6->saddr.s6_addr32[0] & info->smask.in6.s6_addr32[0]) ^
+		(ip6->saddr.s6_addr32[1] & info->smask.in6.s6_addr32[1]) ^
+		(ip6->saddr.s6_addr32[2] & info->smask.in6.s6_addr32[2]) ^
+		(ip6->saddr.s6_addr32[3] & info->smask.in6.s6_addr32[3]);
+	addr2 = (__force u32)
+		(ip6->daddr.s6_addr32[0] & info->dmask.in6.s6_addr32[0]) ^
+		(ip6->daddr.s6_addr32[1] & info->dmask.in6.s6_addr32[1]) ^
+		(ip6->daddr.s6_addr32[2] & info->dmask.in6.s6_addr32[2]) ^
+		(ip6->daddr.s6_addr32[3] & info->dmask.in6.s6_addr32[3]);
+
+	/* Is next header valid for port or SPI calculation ? */
+	poff = proto_ports_offset(nexthdr);
+	if ((flag & IP6T_FH_F_FRAG) || poff < 0)
+		goto no6ports;
+	nhoffs += poff;
+	/* Since uports is modified, skb_header_pointer() can't be used */
+	 if (!pskb_may_pull(skb, nhoffs + 4))
+		goto no6ports;
+	uports.v32 = * (__force u32 *) (skb->data + nhoffs);
+
+	if ((nexthdr == IPPROTO_ESP) || (nexthdr == IPPROTO_AH)) {
+		uports.v32 = (uports.v32 & info->spimask) | info->spiset;
+	} else {
+		uports.v32 = (uports.v32 & info->pmask.v32) | info->pset.v32;
+		/* get a consistent hash (same value on both flow directions) */
+		if (uports.p16.dst < uports.p16.src)
+			swap(uports.p16.dst, uports.p16.src);
+	}
+
+no6ports:
+	nexthdr &= info->prmask;
+	/* get a consistent hash (same value on both flow directions) */
+	if (addr2 < addr1)
+		swap(addr1, addr2);
+
+	hash = jhash_3words(addr1, addr2, uports.v32, info->hashrnd) ^ nexthdr;
+	skb->mark = (hash % info->hmod) + info->hoffs;
+	return XT_CONTINUE;
+}
+#endif
+/*
+ * Calculate hash based fw-mark, on the five tuple if possible.
+ * special cases :
+ *  - Fragments do not use ports not even on the first fragment,
+ *    unless nf_defrag_xx.ko is used.
+ *  - On ICMP errors the inner header will be used.
+ *  - Tunnels no ports
+ *  - ESP & AH uses SPI
+ * @returns XT_CONTINUE
+ */
+unsigned int hmark_v4(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	struct xt_hmark_info *info = (struct xt_hmark_info *)par->targinfo;
+	int nhoff, poff, frag = 0;
+	struct iphdr *ip, _ip;
+	u8 ip_proto;
+	u32 addr1, addr2, hash;
+	u16 snatport = 0, dnatport = 0;
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct = ct = nf_ct_get(skb, &ctinfo);
+	union hports uports;
+
+	if (!info->hmod)
+		return XT_CONTINUE;
+
+	nhoff = skb_network_offset(skb);
+	uports.v32 = 0;
+
+	ip = (struct iphdr *) (skb->data + nhoff);
+	if (ip->protocol == IPPROTO_ICMP) {
+		/* calc hash on inner header if an icmp error */
+		nhoff = get_inner_hdr(skb, ip->ihl * 4, nhoff);
+		ip = skb_header_pointer(skb, nhoff, sizeof(_ip), &_ip);
+		if (!ip)
+			return XT_CONTINUE;
+	}
+
+	ip_proto = ip->protocol;
+	if (ip->frag_off & htons(IP_MF | IP_OFFSET))
+		frag = 1;
+
+	addr1 = (__force u32) ip->saddr & info->smask.ip;
+	addr2 = (__force u32) ip->daddr & info->dmask.ip;
+
+	if (ct && test_bit(IP_CT_IS_REPLY, &ct->status)) {
+		struct nf_conntrack_tuple *otuple;
+
+		otuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
+		/*
+		 * On the "return flow", to get the original address
+		 */
+		if ((ct->status & IPS_DST_NAT) &&
+			(info->flags & XT_HMARK_USE_DNAT)) {
+			addr1 = (__force u32) otuple->dst.u3.in.s_addr;
+			dnatport = otuple->dst.u.udp.port;
+		}
+		if ((ct->status & IPS_SRC_NAT) &&
+			(info->flags & XT_HMARK_USE_SNAT)) {
+			addr2 = (__force u32) otuple->src.u3.in.s_addr;
+			snatport = otuple->src.u.udp.port;
+		}
+	}
+	/* Check if ports can be used in hash calculation. */
+	poff = proto_ports_offset(ip_proto);
+	if (frag || poff < 0)
+		goto noports;
+
+	nhoff += (ip->ihl * 4) + poff;
+	if (!pskb_may_pull(skb, nhoff + 4))
+		goto noports;
+
+	uports.v32 = * (__force u32 *) (skb->data + nhoff);
+	if (ip_proto == IPPROTO_ESP || ip_proto == IPPROTO_AH) {
+		uports.v32 = (uports.v32 & info->spimask) | info->spiset;
+	} else {
+		if (snatport)	/* Replace nat'ed port(s) */
+			uports.p16.dst = snatport;
+		if (dnatport)
+			uports.p16.src = dnatport;
+		uports.v32 = (uports.v32 & info->pmask.v32) |
+				info->pset.v32;
+		/* get a consistent hash (same value on both flow directions) */
+		if (uports.p16.dst < uports.p16.src)
+			swap(uports.p16.src, uports.p16.dst);
+	}
+
+noports:
+	ip_proto &= info->prmask;
+	/* get a consistent hash (same value on both flow directions) */
+	if (addr2 < addr1)
+		swap(addr1, addr2);
+
+	hash = jhash_3words(addr1, addr2, uports.v32, info->hashrnd) ^ ip_proto;
+	skb->mark = (hash % info->hmod) + info->hoffs;
+	return XT_CONTINUE;
+}
+
+static struct xt_target hmark_tg_reg[] __read_mostly = {
+	{
+		.name           = "HMARK",
+		.revision       = 0,
+		.family         = NFPROTO_IPV4,
+		.target         = hmark_v4,
+		.targetsize     = sizeof(struct xt_hmark_info),
+		.me             = THIS_MODULE,
+	},
+#ifdef WITH_IPV6
+	{
+		.name           = "HMARK",
+		.revision       = 0,
+		.family         = NFPROTO_IPV6,
+		.target         = hmark_v6,
+		.targetsize     = sizeof(struct xt_hmark_info),
+		.me             = THIS_MODULE,
+	},
+#endif
+};
+
+static int __init hmark_mt_init(void)
+{
+	int ret;
+
+	ret = xt_register_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+	if (ret < 0)
+		return ret;
+	return 0;
+}
+
+static void __exit hmark_mt_exit(void)
+{
+	xt_unregister_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+}
+
+module_init(hmark_mt_init);
+module_exit(hmark_mt_exit);
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [v5 PATCH 3/3] NETFILTER userspace part for target HMARK
  2012-01-02 15:06 [v5 PATCH 0/3] NETFILTER new target module, HMARK Hans Schillstrom
  2012-01-02 15:06 ` [v5 PATCH 1/3] NETFILTER added flags to __ipv6_find_hdr() Hans Schillstrom
  2012-01-02 15:06 ` [v5 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark Hans Schillstrom
@ 2012-01-02 15:06 ` Hans Schillstrom
  2 siblings, 0 replies; 6+ messages in thread
From: Hans Schillstrom @ 2012-01-02 15:06 UTC (permalink / raw)
  To: kaber, pablo, jengelh, netfilter-devel, netdev; +Cc: hans, Hans Schillstrom

    The target allows you to create rules in the "raw" and "mangle" tables
    which alter the netfilter mark (nfmark) field within a given range.
    First a 32 bit hash value is generated then modulus by <limit> and
    finally an offset is added before it's written to nfmark.
    Prior to routing, the nfmark can influence the routing method (see
    "Use netfilter MARK value as routing key") and can also be used by
    other subsystems to change their behaviour.

    The mark match can also be used to match nfmark produced by this module.
    Ver 5
      smask and dmask changed to length

    Ver 4
      xtoptions used for parsing.

    Ver 3
       -

    Ver 2
      IPv4 NAT added
      iptables ver 1.4.12.1 adaptions.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 extensions/libxt_HMARK.c           |  335 ++++++++++++++++++++++++++++++++++++
 extensions/libxt_HMARK.man         |   65 +++++++
 include/linux/netfilter/xt_hmark.h |   62 +++++++
 3 files changed, 462 insertions(+), 0 deletions(-)
 create mode 100644 extensions/libxt_HMARK.c
 create mode 100644 extensions/libxt_HMARK.man
 create mode 100644 include/linux/netfilter/xt_hmark.h

diff --git a/extensions/libxt_HMARK.c b/extensions/libxt_HMARK.c
new file mode 100644
index 0000000..ec74a19
--- /dev/null
+++ b/extensions/libxt_HMARK.c
@@ -0,0 +1,335 @@
+/*
+ * Shared library add-on to iptables to add HMARK target support.
+ *
+ * The kernel module calculates a hash value that can be modified by modulus
+ * and an offset. The hash value is based on a direction independent
+ * five tuple: src & dst addr src & dst ports and protocol.
+ * However src & dst port can be masked and are not used for fragmented
+ * packets, ESP and AH don't have ports so SPI will be used instead.
+ * For ICMP error messages the hash mark values will be calculated on
+ * the source packet i.e. the packet caused the error (If sufficient
+ * amount of data exists).
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <stdbool.h>
+#include <stdio.h>
+#include <string.h>
+
+#include <xtables.h>
+#include <linux/netfilter/xt_hmark.h>
+
+
+#define DEF_HRAND 0xc175a3b8	/* Default "random" value to jhash */
+
+static void HMARK_help(void)
+{
+	printf(
+"HMARK target options, i.e. modify hash calculation by:\n"
+"  --hmark-smask length               Source address mask length\n"
+"  --hmark-dmask length               Dest address mask length\n"
+"  --hmark-sp-mask value              Mask src port with value\n"
+"  --hmark-dp-mask value              Mask dst port with value\n"
+"  --hmark-spi-mask value             For esp and ah AND spi with value\n"
+"  --hmark-sp-set value               OR src port with value\n"
+"  --hmark-dp-set value               OR dst port with value\n"
+"  --hmark-spi-set value              For esp and ah OR spi with value\n"
+"  --hmark-proto-mask value           Mask Protocol with value\n"
+"  --hmark-rnd                        Random value to hash cacl.\n"
+"  Limit/modify the calculated hash mark by:\n"
+"  --hmark-mod value                  nfmark modulus value\n"
+"  --hmark-offs value                 Last action add value to nfmark\n"
+" For NAT in IPv4 the original address can be used in the return path.\n"
+" Make sure to qualify the statement in a proper way when using nat flags\n"
+"  --hmark-dnat                       Replace src addr with original dst addr\n"
+"  --hmark-snat                       Replace dst addr with original src addr\n"
+" In many cases hmark can be omitted i.e. --smask can be used\n");
+}
+
+#define hi struct xt_hmark_info
+
+static const struct xt_option_entry HMARK_opts[] = {
+	{ .name = "hmark-smask",      .type = XTTYPE_PLENMASK, .id = XT_HMARK_SADR_AND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, smask)
+	},
+	{ .name = "hmark-dmask",      .type = XTTYPE_PLENMASK, .id = XT_HMARK_DADR_AND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, dmask)
+	},
+	{ .name = "hmark-sp-mask",    .type = XTTYPE_UINT16, .id = XT_HMARK_SPORT_AND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, pmask.p16.src)
+	},
+	{ .name = "hmark-dp-mask",    .type = XTTYPE_UINT16, .id = XT_HMARK_DPORT_AND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, pmask.p16.dst)
+	},
+	{ .name = "hmark-spi-mask",   .type = XTTYPE_UINT32, .id = XT_HMARK_SPI_AND,
+	  .flags = XTOPT_PUT,  XTOPT_POINTER(hi, spimask)
+	},
+	{ .name = "hmark-sp-set",     .type = XTTYPE_UINT16, .id = XT_HMARK_SPORT_OR,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, pset.p16.src)
+	},
+	{ .name = "hmark-dp-set",     .type = XTTYPE_UINT16, .id = XT_HMARK_DPORT_OR,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, pset.p16.dst)
+	},
+	{ .name = "hmark-spi-set",    .type = XTTYPE_UINT32, .id = XT_HMARK_SPI_OR,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, spiset)
+	},
+	{ .name = "hmark-proto-mask", .type = XTTYPE_UINT16, .id = XT_HMARK_PROTO_AND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, prmask)
+	},
+	{ .name = "hmark-rnd",        .type = XTTYPE_UINT32, .id = XT_HMARK_RND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, hashrnd)
+	},
+	{ .name = "hmark-mod",        .type = XTTYPE_UINT32, .id = XT_HMARK_MODULUS,
+	  .flags = XTOPT_PUT | XTOPT_MAND, XTOPT_POINTER(hi, hmod)
+	},
+	{ .name = "hmark-offs",       .type = XTTYPE_UINT32, .id = XT_HMARK_OFFSET,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, hoffs)
+	},
+	{ .name = "hmark-dnat",       .type = XTTYPE_NONE,   .id = XT_HMARK_USE_DNAT },
+	{ .name = "hmark-snat",       .type = XTTYPE_NONE,   .id = XT_HMARK_USE_SNAT },
+
+	{ .name = "smask",      .type = XTTYPE_PLENMASK, .id = XT_HMARK_SADR_AND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, smask)
+	},
+	{ .name = "dmask",      .type = XTTYPE_PLENMASK, .id = XT_HMARK_DADR_AND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, dmask)
+	},
+	{ .name = "sp-mask",    .type = XTTYPE_UINT16, .id = XT_HMARK_SPORT_AND,
+	  .flags = XTOPT_PUT,  XTOPT_POINTER(hi, pmask.p16.src)
+	},
+	{ .name = "dp-mask",    .type = XTTYPE_UINT16, .id = XT_HMARK_DPORT_AND,
+	  .flags = XTOPT_PUT,  XTOPT_POINTER(hi, pmask.p16.dst)
+	},
+	{ .name = "spi-mask",   .type = XTTYPE_UINT32, .id = XT_HMARK_SPI_AND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, spimask)
+	},
+	{ .name = "sp-set",     .type = XTTYPE_UINT16, .id = XT_HMARK_SPORT_OR,
+	  .flags = XTOPT_PUT,  XTOPT_POINTER(hi, pset.p16.src)
+	},
+	{ .name = "dp-set",     .type = XTTYPE_UINT16, .id = XT_HMARK_DPORT_OR,
+	  .flags = XTOPT_PUT,  XTOPT_POINTER(hi, pset.p16.dst)
+	},
+	{ .name = "spi-set",    .type = XTTYPE_UINT32, .id = XT_HMARK_SPI_OR,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, spiset)
+	},
+	{ .name = "proto-mask", .type = XTTYPE_UINT16, .id = XT_HMARK_PROTO_AND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, prmask)
+	},
+	{ .name = "rnd",        .type = XTTYPE_UINT32, .id = XT_HMARK_RND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, hashrnd)
+	},
+	{ .name = "mod",        .type = XTTYPE_UINT32, .id = XT_HMARK_MODULUS,
+	  .flags = XTOPT_PUT | XTOPT_MAND, XTOPT_POINTER(hi, hmod)
+	},
+	{ .name = "offs",       .type = XTTYPE_UINT32, .id = XT_HMARK_OFFSET,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, hoffs)
+	},
+	{ .name = "dnat",       .type = XTTYPE_NONE,   .id = XT_HMARK_USE_DNAT },
+	{ .name = "snat",       .type = XTTYPE_NONE,   .id = XT_HMARK_USE_SNAT },
+	XTOPT_TABLEEND,
+};
+
+static void HMARK_parse(struct xt_option_call *cb)
+{
+	struct xt_hmark_info *info = cb->data;
+
+	if (!cb->xflags) {
+		memset(info, 0xff, sizeof(struct xt_hmark_info));
+		info->pset.v32 = 0;
+		info->flags = 0;
+		info->spiset = 0;
+		info->hoffs = 0;
+		info->hashrnd = DEF_HRAND;
+	}
+	xtables_option_parse(cb);
+
+	switch (cb->entry->id) {
+	case XT_HMARK_SPI_AND:
+		info->spimask = htonl(cb->val.u32);
+		break;
+	case XT_HMARK_SPI_OR:
+		info->spiset = htonl(cb->val.u32);
+		break;
+	case XT_HMARK_SPORT_AND:
+		info->pmask.p16.src = htons(cb->val.u16);
+		break;
+	case XT_HMARK_DPORT_AND:
+		info->pmask.p16.dst = htons(cb->val.u16);
+		break;
+	case XT_HMARK_SPORT_OR:
+		info->pset.p16.src = htons(cb->val.u16);
+		break;
+	case XT_HMARK_DPORT_OR:
+		info->pset.p16.dst = htons(cb->val.u16);
+		break;
+	case XT_HMARK_MODULUS:
+		if (info->hmod == 0) {
+			xtables_error(PARAMETER_PROBLEM,
+				      "xxx modulus 0 ? "
+				      "thats a div by 0");
+			info->hmod = 0xffffffff;
+		}
+		break;
+	}
+	info->flags = cb->xflags;
+}
+
+static void HMARK_check(struct xt_fcheck_call *cb)
+{
+	if (!(cb->xflags & XT_F_HMARK_MODULUS))
+		xtables_error(PARAMETER_PROBLEM, "HMARK: the --hmark-mod, "
+			   "is not set, that means the nfmark will be in range"
+			   " 0 - 0xffffffff");
+}
+/*
+ * Common print for IPv4 & IPv6
+ */
+static void HMARK_print(const struct xt_hmark_info *info)
+{
+	if (info->flags & (1 << XT_HMARK_SPORT_AND))
+		printf("sp-mask 0x%x ", htons(info->pmask.p16.src));
+	if (info->flags & (1 << XT_HMARK_DPORT_AND))
+		printf("dp-mask 0x%x ", htons(info->pmask.p16.dst));
+	if (info->flags & (1 << XT_HMARK_SPI_AND))
+		printf("spi-mask 0x%x ", htonl(info->spimask));
+	if (info->flags & (1 << XT_HMARK_SPORT_OR))
+		printf("sp-set 0x%x ", htons(info->pset.p16.src));
+	if (info->flags & (1 << XT_HMARK_DPORT_OR))
+		printf("dp-set 0x%x ", htons(info->pset.p16.dst));
+	if (info->flags & (1 << XT_HMARK_SPI_OR))
+		printf("spi-set 0x%x ", htonl(info->spiset));
+	if (info->flags & (1 << XT_HMARK_PROTO_AND))
+		printf("proto-mask 0x%x ", info->prmask);
+	if (info->flags & (1 << XT_HMARK_RND))
+		printf("rnd 0x%x ", info->hashrnd);
+}
+
+static void HMARK_ip6_print(const void *ip, const struct xt_entry_target *target,
+			int numeric)
+{
+	const struct xt_hmark_info *info =
+			(const struct xt_hmark_info *)target->data;
+
+	printf(" HMARK ");
+	if (info->flags & (1 << XT_HMARK_MODULUS))
+		printf("%% 0x%x ", info->hmod);
+	if (info->flags & (1 << XT_HMARK_OFFSET))
+		printf("+ 0x%x ", info->hoffs);
+	if (info->flags & (1 << XT_HMARK_USE_SNAT))
+		printf("snat, ");
+	if (info->flags & (1 << XT_HMARK_SADR_AND))
+		printf("smask %s ", xtables_ip6mask_to_numeric(&info->smask.in6) + 1);
+	if (info->flags & (1 << XT_HMARK_USE_DNAT))
+		printf("dnat, ");
+	if (info->flags & (1 << XT_HMARK_DADR_AND))
+		printf("dmask %s ", xtables_ip6mask_to_numeric(&info->dmask.in6) + 1);
+	HMARK_print(info);
+}
+static void HMARK_ip4_print(const void *ip, const struct xt_entry_target *target, int numeric)
+{
+	const struct xt_hmark_info *info = (const struct xt_hmark_info *)target->data;
+
+	printf(" HMARK ");
+	if (info->flags & (1 << XT_HMARK_MODULUS))
+		printf("%% 0x%x ", info->hmod);
+	if (info->flags & (1 << XT_HMARK_OFFSET))
+		printf("+ 0x%x ", info->hoffs);
+	if (info->flags & (1 << XT_HMARK_USE_SNAT))
+		printf("snat, ");
+	if (info->flags & (1 << XT_HMARK_SADR_AND))
+		printf("smask %s ", xtables_ipmask_to_numeric(&info->smask.in) + 1);
+	if (info->flags & (1 << XT_HMARK_USE_DNAT))
+		printf("dnat, ");
+	if (info->flags & (1 << XT_HMARK_DADR_AND))
+		printf("dmask %s ", xtables_ipmask_to_numeric(&info->dmask.in) + 1);
+	HMARK_print(info);
+}
+static void HMARK_save(const struct xt_hmark_info *info)
+{
+	if (info->flags & (1 << XT_HMARK_SPORT_AND))
+		printf(" --hmark-sp-mask 0x%x", htons(info->pmask.p16.src));
+	if (info->flags & (1 << XT_HMARK_DPORT_AND))
+		printf(" --hmark-dp-mask 0x%x", htons(info->pmask.p16.dst));
+	if (info->flags & (1 << XT_HMARK_SPI_AND))
+		printf(" --hmark-spi-mask 0x%x", htonl(info->spimask));
+	if (info->flags & (1 << XT_HMARK_SPORT_OR))
+		printf(" --hmark-sp-set 0x%x", htons(info->pset.p16.src));
+	if (info->flags & (1 << XT_HMARK_DPORT_OR))
+		printf(" --hmark-dp-set 0x%x", htons(info->pset.p16.dst));
+	if (info->flags & (1 << XT_HMARK_SPI_OR))
+		printf(" --hmark-spi-set 0x%x", htonl(info->spiset));
+	if (info->flags & (1 << XT_HMARK_PROTO_AND))
+		printf(" --hmark-proto-mask 0x%x", info->prmask);
+	if (info->flags & (1 << XT_HMARK_RND))
+		printf(" --hmark-rnd 0x%x", info->hashrnd);
+	if (info->flags & (1 << XT_HMARK_MODULUS))
+		printf(" --hmark-mod 0x%x", info->hmod);
+	if (info->flags & (1 << XT_HMARK_OFFSET))
+		printf(" --hmark-offs 0x%x", info->hoffs);
+	if (info->flags & (1 << XT_HMARK_USE_DNAT))
+		printf(" --hmark-dnat");
+	if (info->flags & (1 << XT_HMARK_USE_SNAT))
+		printf(" --hmark-snat");
+}
+
+static void HMARK_ip6_save(const void *ip, const struct xt_entry_target *target)
+{
+	const struct xt_hmark_info *info =
+		(const struct xt_hmark_info *)target->data;
+
+	if (info->flags & (1 << XT_HMARK_SADR_AND))
+		printf(" --hmark-smask %s", xtables_ip6mask_to_numeric(&info->smask.in6) + 1);
+	if (info->flags & (1 << XT_HMARK_DADR_AND))
+		printf(" --hmark-dmask %s", xtables_ip6mask_to_numeric(&info->dmask.in6) + 1);
+	HMARK_save(info);
+}
+
+static void HMARK_ip4_save(const void *ip, const struct xt_entry_target *target)
+{
+	const struct xt_hmark_info *info =
+		(const struct xt_hmark_info *)target->data;
+
+	if (info->flags & (1 << XT_HMARK_SADR_AND))
+		printf(" --hmark-smask %s", xtables_ipmask_to_numeric(&info->smask.in) + 1);
+	if (info->flags & (1 << XT_HMARK_DADR_AND))
+		printf(" --hmark-dmask %s", xtables_ipmask_to_numeric(&info->dmask.in) + 1);
+	HMARK_save(info);
+}
+
+static struct xtables_target mark_tg_reg[] = {
+	{
+		.family        = NFPROTO_IPV4,
+		.name          = "HMARK",
+		.version       = XTABLES_VERSION,
+		.revision      = 0,
+		.size          = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.help          = HMARK_help,
+		.print         = HMARK_ip4_print,
+		.save          = HMARK_ip4_save,
+		.x6_parse      = HMARK_parse,
+		.x6_fcheck     = HMARK_check,
+		.x6_options    = HMARK_opts,
+	},
+	{
+		.family        = NFPROTO_IPV6,
+		.name          = "HMARK",
+		.version       = XTABLES_VERSION,
+		.revision      = 0,
+		.size          = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.help          = HMARK_help,
+		.print         = HMARK_ip6_print,
+		.save          = HMARK_ip6_save,
+		.x6_parse      = HMARK_parse,
+		.x6_fcheck     = HMARK_check,
+		.x6_options    = HMARK_opts,
+	},
+};
+
+void _init(void)
+{
+	xtables_register_targets(mark_tg_reg, ARRAY_SIZE(mark_tg_reg));
+}
+
diff --git a/extensions/libxt_HMARK.man b/extensions/libxt_HMARK.man
new file mode 100644
index 0000000..74beb64
--- /dev/null
+++ b/extensions/libxt_HMARK.man
@@ -0,0 +1,65 @@
+This module does the same as MARK, i.e. set an fwmark, but the mark is based on a hash value.
+The hash is based on saddr, daddr, sport, dport and proto. The same mark will be produced independet of direction if no masks is set or the same masks is used for src and dest.
+The hash mark could be adjusted by modulus and finally an offset could be added, i.e the final mark will be within a range.
+ICMP error will use the the original message for hash calculation not the icmp it self.
+Fragmented packets will not use ports/spi not even the first packet.
+
+Note: IPv4 packets with nf_defrag_ipv4 loaded will be defragmented before they reach hmark,
+      IPv6 nf_defrag is not implemented this way, hence fragmented ipv6 packets will reach hmark.
+      None of the parameters effect the packet it self only the calculated hash value.
+.PP
+Parameters:
+For all masks default is all "1:s", to disable a field use mask 0
+For IPv6 it's just the last 32 bits that is included in the hash
+.TP
+\fB\-\-hmark\-smask\fP \fIlength\fP
+The length of the mask to AND the source address with (saddr & value).
+.TP
+\fB\-\-hmark\-dmask\fP \fIlength\fP
+The length of the mask to AND the dest. address with (daddr & value).
+.TP
+\fB\-\-hmark\-sp\-mask\fP \fIvalue\fP
+A 16 bit value to AND the src port with (sport & value).
+.TP
+\fB\-\-hmark\-dp\-mask\fP \fIvalue\fP
+A 16 bit value to AND the dest port with (dport & value).
+.TP
+\fB\-\-hmark\-sp\-set\fP \fIvalue\fP
+A 16 bit value to OR the src port with (sport | value).
+.TP
+\fB\-\-hmark\-dp\-set\fP \fIvalue\fP
+A 16 bit value to OR the dest port with (dport | value).
+.TP
+\fB\-\-hmark\-spi\-mask\fP \fIvalue\fP
+Value to AND the spi field with (spi & value) valid for proto esp or ah.
+.TP
+\fB\-\-hmark\-spi\-set\fP \fIvalue\fP
+Value to OR the spi field with (spi | value) valid for proto esp or ah.
+.TP
+\fB\-\-hmark\-proto\-mask\fP \fIvalue\fP
+An 8 bit value to AND the L4 proto field with (proto & value).
+.TP
+\fB\-\-hmark\-rnd\fP \fIvalue\fP
+A 32 bit initial value for hash calc, default is 0xc175a3b8.
+.PP
+Final processing of the mark in order of execution.
+.TP
+\fB\-\-hmark\-mod\fP \fvalue (must be > 0)\fP
+The easiest way to describe this is:  hash = hash mod <value>
+.TP
+\fB\-\-hmark\-offs\fP \fvalue\fP
+The easiest way to describe this is:  hash = hash + <value>
+.PP
+\fIExamples:\fP
+.PP
+Default rule handles all TCP, UDP, SCTP, ESP & AH
+.IP
+iptables \-t mangle \-A PREROUTING \-m state \-\-state NEW,ESTABLISHED,RELATED
+ \-j HMARK \-\-hmark-offs 10000 \-\-hmark-mod 10
+.PP
+Handle SCTP and hash dest port only and produce a nfmark between 100-119.
+.IP
+iptables \-t mangle \-A PREROUTING -p SCTP \-j HMARK \-\-smask 0 \-\-dmask 0
+ \-\-sp\-mask 0 \-\-offs 100 \-\-mod 20
+.PP
+
diff --git a/include/linux/netfilter/xt_hmark.h b/include/linux/netfilter/xt_hmark.h
new file mode 100644
index 0000000..366ecce
--- /dev/null
+++ b/include/linux/netfilter/xt_hmark.h
@@ -0,0 +1,62 @@
+#ifndef XT_HMARK_H_
+#define XT_HMARK_H_
+
+#include <linux/types.h>
+
+/*
+ * Flags must not start at 0, since it's used as none.
+ */
+enum {
+	XT_HMARK_SADR_AND = 1,	/* SNAT & DNAT are used by the kernel module */
+	XT_HMARK_DADR_AND,
+	XT_HMARK_SPI_AND,
+	XT_HMARK_SPI_OR,
+	XT_HMARK_SPORT_AND,
+	XT_HMARK_DPORT_AND,
+	XT_HMARK_SPORT_OR,
+	XT_HMARK_DPORT_OR,
+	XT_HMARK_PROTO_AND,
+	XT_HMARK_RND,
+	XT_HMARK_MODULUS,
+	XT_HMARK_OFFSET,
+	XT_HMARK_USE_SNAT,
+	XT_HMARK_USE_DNAT,
+	XT_F_HMARK_USE_SNAT = 1 << XT_HMARK_USE_SNAT,
+	XT_F_HMARK_USE_DNAT = 1 << XT_HMARK_USE_DNAT,
+	XT_F_HMARK_SADR_AND = 1 << XT_HMARK_SADR_AND,
+	XT_F_HMARK_DADR_AND = 1 << XT_HMARK_DADR_AND,
+	XT_F_HMARK_SPI_AND = 1 << XT_HMARK_SPI_AND,
+	XT_F_HMARK_SPI_OR = 1 << XT_HMARK_SPI_OR,
+	XT_F_HMARK_SPORT_AND = 1 << XT_HMARK_SPORT_AND,
+	XT_F_HMARK_DPORT_AND = 1 << XT_HMARK_DPORT_AND,
+	XT_F_HMARK_SPORT_OR = 1 << XT_HMARK_SPORT_OR,
+	XT_F_HMARK_DPORT_OR = 1 << XT_HMARK_DPORT_OR,
+	XT_F_HMARK_PROTO_AND = 1 << XT_HMARK_PROTO_AND,
+	XT_F_HMARK_RND = 1 << XT_HMARK_RND,
+	XT_F_HMARK_MODULUS = 1 << XT_HMARK_MODULUS,
+	XT_F_HMARK_OFFSET = 1 << XT_HMARK_OFFSET,
+};
+
+union hports {
+	struct {
+		__u16	src;
+		__u16	dst;
+	} p16;
+	__u32	v32;
+};
+
+struct xt_hmark_info {
+	union nf_inet_addr	smask;		/* Source address mask */
+	union nf_inet_addr	dmask;		/* Dest address mask */
+	union hports		pmask;
+	union hports		pset;
+	__u32			spimask;
+	__u32			spiset;
+	__u16			flags;		/* Print out only */
+	__u16			prmask;		/* L4 Proto mask */
+	__u32			hashrnd;
+	__u32			hmod;		/* Modulus */
+	__u32			hoffs;		/* Offset */
+};
+
+#endif /* XT_HMARK_H_ */
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [v5 PATCH 1/3] NETFILTER added flags to __ipv6_find_hdr()
  2012-01-02 15:06 ` [v5 PATCH 1/3] NETFILTER added flags to __ipv6_find_hdr() Hans Schillstrom
@ 2012-01-04 17:37   ` Pablo Neira Ayuso
  2012-01-04 20:48     ` Hans Schillstrom
  0 siblings, 1 reply; 6+ messages in thread
From: Pablo Neira Ayuso @ 2012-01-04 17:37 UTC (permalink / raw)
  To: Hans Schillstrom; +Cc: kaber, jengelh, netfilter-devel, netdev, hans

On Mon, Jan 02, 2012 at 04:06:39PM +0100, Hans Schillstrom wrote:
> Two new flags to __ipv6_find_hdr,
> One that tells us that this is a fragemnt.
> One that stops at AH if any i.e. treat it like a transport header.
> i.e. make handling of ESP and AH the same.
> 
> Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
> ---
>  include/linux/netfilter_ipv6/ip6_tables.h |   16 ++++++++++++++--
>  net/ipv6/netfilter/ip6_tables.c           |   19 ++++++++++++++-----
>  2 files changed, 28 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/netfilter_ipv6/ip6_tables.h b/include/linux/netfilter_ipv6/ip6_tables.h
> index f549adc..ee0c68e 100644
> --- a/include/linux/netfilter_ipv6/ip6_tables.h
> +++ b/include/linux/netfilter_ipv6/ip6_tables.h
> @@ -288,9 +288,21 @@ extern unsigned int ip6t_do_table(struct sk_buff *skb,
>  
>  /* Check for an extension */
>  extern int ip6t_ext_hdr(u8 nexthdr);
> +enum {
> +	IP6T_FH_FRAG,
> +	IP6T_FH_AUTH,
> +	IP6T_FH_F_FRAG = 1 << IP6T_FH_FRAG,
> +	IP6T_FH_F_AUTH = 1 << IP6T_FH_AUTH,
> +};
>  /* find specified header and get offset to it */
> -extern int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
> -			 int target, unsigned short *fragoff);
> +extern int __ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
> +			 int target, unsigned short *fragoff, int *fragflg);

Please, don't do this.

the convention in the kernel is to use __function for non-locked
versions of one function.

The number of clients for this function seems small. I'll be very
happy if you send me a patch that changes this interface and that
propagates the changes to other clients of it.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [v5 PATCH 1/3] NETFILTER added flags to __ipv6_find_hdr()
  2012-01-04 17:37   ` Pablo Neira Ayuso
@ 2012-01-04 20:48     ` Hans Schillstrom
  0 siblings, 0 replies; 6+ messages in thread
From: Hans Schillstrom @ 2012-01-04 20:48 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Hans Schillstrom, kaber, jengelh, netfilter-devel, netdev


On Wednesday, January 04, 2012 18:37:41 Pablo Neira Ayuso wrote:
> On Mon, Jan 02, 2012 at 04:06:39PM +0100, Hans Schillstrom wrote:
> > Two new flags to __ipv6_find_hdr,
> > One that tells us that this is a fragemnt.
> > One that stops at AH if any i.e. treat it like a transport header.
> > i.e. make handling of ESP and AH the same.
> > 
> > Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
> > ---
> >  include/linux/netfilter_ipv6/ip6_tables.h |   16 ++++++++++++++--
> >  net/ipv6/netfilter/ip6_tables.c           |   19 ++++++++++++++-----
> >  2 files changed, 28 insertions(+), 7 deletions(-)
> > 
> > diff --git a/include/linux/netfilter_ipv6/ip6_tables.h b/include/linux/netfilter_ipv6/ip6_tables.h
> > index f549adc..ee0c68e 100644
> > --- a/include/linux/netfilter_ipv6/ip6_tables.h
> > +++ b/include/linux/netfilter_ipv6/ip6_tables.h
> > @@ -288,9 +288,21 @@ extern unsigned int ip6t_do_table(struct sk_buff *skb,
> >  
> >  /* Check for an extension */
> >  extern int ip6t_ext_hdr(u8 nexthdr);
> > +enum {
> > +	IP6T_FH_FRAG,
> > +	IP6T_FH_AUTH,
> > +	IP6T_FH_F_FRAG = 1 << IP6T_FH_FRAG,
> > +	IP6T_FH_F_AUTH = 1 << IP6T_FH_AUTH,
> > +};
> >  /* find specified header and get offset to it */
> > -extern int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
> > -			 int target, unsigned short *fragoff);
> > +extern int __ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
> > +			 int target, unsigned short *fragoff, int *fragflg);
> 
> Please, don't do this.
> 
> the convention in the kernel is to use __function for non-locked
> versions of one function.
> 
> The number of clients for this function seems small. I'll be very
> happy if you send me a patch that changes this interface and that
> propagates the changes to other clients of it.
> 
No problem, I'll fix this.

I have some minor compilation warnings with out nat to fix also in the other patch.

Thanks
Hans

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-01-04 20:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-02 15:06 [v5 PATCH 0/3] NETFILTER new target module, HMARK Hans Schillstrom
2012-01-02 15:06 ` [v5 PATCH 1/3] NETFILTER added flags to __ipv6_find_hdr() Hans Schillstrom
2012-01-04 17:37   ` Pablo Neira Ayuso
2012-01-04 20:48     ` Hans Schillstrom
2012-01-02 15:06 ` [v5 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark Hans Schillstrom
2012-01-02 15:06 ` [v5 PATCH 3/3] NETFILTER userspace part for target HMARK Hans Schillstrom

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.