netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [v4 PATCH 0/2] NETFILTER new target module, HMARK
@ 2011-11-25  9:36 Hans Schillstrom
  2011-11-25  9:36 ` [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark Hans Schillstrom
  2011-11-25  9:36 ` [v4 PATCH 2/2] NETFILTER userspace part for target HMARK Hans Schillstrom
  0 siblings, 2 replies; 19+ messages in thread
From: Hans Schillstrom @ 2011-11-25  9:36 UTC (permalink / raw)
  To: kaber, pablo, jengelh, netfilter-devel, netdev; +Cc: hans.schillstrom

From: Hans Schillstrom <hans.schillstrom@ericsson.com>

The target allows you to create rules in the "raw" and "mangle" tables
which alter the netfilter mark (nfmark) field within a given range.
First a 32 bit hash value is generated then modulus by <limit> and
finally an offset is added before it's written to nfmark.
Prior to routing, the nfmark can influence the routing method (see
"Use netfilter MARK value as routing key") and can also be used by
other subsystems to change their behavior.

The mark match can also be used to match nfmark produced by this module.
See the kernel module for more info.

REVISION
Version 4
	Split of IPv6 and IPv4, use IP_CT_IS_REPLY, as Pablo suggested.
	removed one pskb_may_pull()
	xtoption parse used in the user space part.

Version 3
        Handling of SCTP for IPv6 added.

Version 2
	NAT Added for IPv4
	IPv6 ICMP handling enhanced.
	Usage example added

Version 1
	Initial RFC


We (Ericsson) use hmark in-front of ipvs as a pre-loadbalancer and
handles up to 70 ipvs running in parallel in clusters.
However hmark is not restricted to run in front of IPVS it can also be used as
"poor mans" load balancer.
With this version is also NAT supported as an option, with very high flows
you might not want to use conntrack.

The idea is to generate a direction independent fw mark range to use as input to
the routing (i.e. ip rule add fwmark ...).
Pretty straight forward and simple.


Example:
                                      App Server (Real Server)

                                           +---------+
                                        -->| Service |
     Gateway A                             +---------+
                          /
            +----------+ /     +----+      +---------+
--- if -A---| selector |---->  |ipvs|  --->| Service |
            +----------+ \     +----+      +---------+
                          \
                               +----+      +---------+
                               |ipvs|   -->| Service |
                               +----+      +---------+
      Gateway C
            +----------+ /     +----+
--- if-B ---| selector | --->  |ipvs|
            +----------+ \     +----+      +---------+
                                           | Service |
                                           +---------+
                          /
            +----------+ /     +----+     ..
--- if-B ---| selector | --->  |ipvs|      +---------+
            +----------+ \     +----+      | Service |
                          \                +---------+
#
# Example with four ipvs loadbalancers
#
iptables -t mangle -I PREROUTING -d $IPADDR -j HMARK --hmark-mod 4 --hmark-offs 100

ip rule add fwmark 100 table 100
ip rule add fwmark 101 table 101
ip rule add fwmark 102 table 102
ip rule add fwmark 103 table 103

ip ro ad table 100 default via x.y.z.1 dev bond1
ip ro ad table 101 default via x.y.z.2 dev bond1
ip ro ad table 102 default via x.y.z.3 dev bond1
ip ro ad table 103 default via x.y.z.4 dev bond1


If conntrack doesn't handle the return path,
do the oposite with HMARK and send it back right to ipvs.

Another exmaple of usage could be if you have cluster originated connections
and want to spread the connections over a number of interfaces
(NAT will complpicate things for you in this case)



                     \  Blade 1
                      \ +----------+      +---------+
                    <-- | selector | <--- | Service |
                      / +----------+      +---------+
                     /
   +------+
-- | Gw-A |          \  Blade 2
   +------+           \ +----------+      +---------+
   +------+         <-- | selector | <--- | Service |
-- | Gw-B |           / +----------+      +---------+
   +------+          /
   +------+
-- | Gw-C |          \
   +------+           \ +----------+      +---------+
                    <-- | selector | <--- | Service |
                      / +----------+      +---------+
                     /

                     \  Blande -n
                      \ +----------+      +---------+
                    <-- | selector | <--- | Service |
                      / +----------+      +---------+
                     /

Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark
  2011-11-25  9:36 [v4 PATCH 0/2] NETFILTER new target module, HMARK Hans Schillstrom
@ 2011-11-25  9:36 ` Hans Schillstrom
  2011-11-25 14:19   ` David Laight
                     ` (3 more replies)
  2011-11-25  9:36 ` [v4 PATCH 2/2] NETFILTER userspace part for target HMARK Hans Schillstrom
  1 sibling, 4 replies; 19+ messages in thread
From: Hans Schillstrom @ 2011-11-25  9:36 UTC (permalink / raw)
  To: kaber, pablo, jengelh, netfilter-devel, netdev; +Cc: hans.schillstrom

From: Hans Schillstrom <hans.schillstrom@ericsson.com>

The target allows you to create rules in the "raw" and "mangle" tables
which alter the netfilter mark (nfmark) field within a given range.
First a 32 bit hash value is generated then modulus by <limit> and
finally an offset is added before it's written to nfmark.
Prior to routing, the nfmark can influence the routing method (see
"Use netfilter MARK value as routing key") and can also be used by
other subsystems to change their behavior.

man page
   HMARK
       This  module  does  the same as MARK, i.e. set an fwmark,
       but the mark is based on a hash value.  The hash is based on
       saddr, daddr, sport, dport and proto. The same mark will be produced
       independet of direction if no masks is set or the same masks is used for
       src and dest. The hash mark could be adjusted by modulus and finaly an
       offset could be added, i.e the final mark will be within a range.
       ICMP errors will have hash calc based on the original message.
       Note: None of the parameters effect the packet it self
       only the calculated hash value.

       Parameters: For all masks default is all "1:s", to disable a field
                   use mask 0. For IPv6 it's just the last 32 bits that
                   is included in the hash.

       --hmark-smask value
              The value to AND the source address with (saddr & value).

       --hmark-dmask value
              The value to AND the dest. address with (daddr & value).

       --hmark-sp-mask value
              A 16 bit value to AND the src port with (sport & value).

       --hmark-dp-mask value
              A 16 bit value to AND the dest port with (dport & value).

       --hmark-sp-set value
              A 16 bit value to OR the src port with (sport | value).

       --hmark-dp-set value
              A 16 bit value to OR the dest port with (dport | value).

       --hmark-spi-mask value
              Value to AND the spi field with (spi & value) valid for proto esp or ah.

       --hmark-spi-set value
              Value to OR the spi field with (spi | value) valid for proto esp or ah.

       --hmark-proto-mask value
              A 16 bit value to AND the L4 proto field with (proto & value).

       --hmark-rnd value
              A 32 bit intitial value for hash calc, default is 0xc175a3b8.

       --hmark-dnat
              Replace src addr/port with original dst addr/port before calc, hash

       --hmark-snat
              Replace dst addr/port with original src addr/port before calc, hash

       Final processing of the mark in order of execution.

       --hmark-mod value (must be > 0)
              The easiest way to describe this is:  hash = hash mod <value>

       --hmark-offs alue (must be > 0)
              The easiest way to describe this is:  hash = hash + <value>

       Examples:

       Default rule handles all TCP, UDP, SCTP, ESP & AH
Rev 4
      different targets for IPv4 and IPv6
      Changes based on review by Pablo.

Rev 3
      Support added to SCTP for IPv6
Rev 2
      IPv6 header scan changed to follow RFC 2640
      IPv4 icmp echo fragmented does now use proto as ipv6
      IPv6 pskb_may_pull() check is done in every time in header loop.
      IPv4 nat support added.
      default added in IPv6 loop and null check of hp

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/linux/netfilter/xt_hmark.h |   48 ++++++
 include/net/ipv6.h                 |    1 +
 net/netfilter/Kconfig              |   17 ++
 net/netfilter/Makefile             |    1 +
 net/netfilter/xt_hmark.c           |  327 ++++++++++++++++++++++++++++++++++++
 5 files changed, 394 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_hmark.h
 create mode 100644 net/netfilter/xt_hmark.c

diff --git a/include/linux/netfilter/xt_hmark.h b/include/linux/netfilter/xt_hmark.h
new file mode 100644
index 0000000..6c1436a
--- /dev/null
+++ b/include/linux/netfilter/xt_hmark.h
@@ -0,0 +1,48 @@
+#ifndef XT_HMARK_H_
+#define XT_HMARK_H_
+
+#include <linux/types.h>
+
+/*
+ * Flags must not start at 0, since it's used as none.
+ */
+enum {
+	XT_HMARK_SADR_AND = 1,	/* SNAT & DNAT are used by the kernel module */
+	XT_HMARK_DADR_AND,
+	XT_HMARK_SPI_AND,
+	XT_HMARK_SPI_OR,
+	XT_HMARK_SPORT_AND,
+	XT_HMARK_DPORT_AND,
+	XT_HMARK_SPORT_OR,
+	XT_HMARK_DPORT_OR,
+	XT_HMARK_PROTO_AND,
+	XT_HMARK_RND,
+	XT_HMARK_MODULUS,
+	XT_HMARK_OFFSET,
+	XT_HMARK_USE_SNAT,
+	XT_HMARK_USE_DNAT,
+};
+
+union ports {
+	struct {
+		__u16	src;
+		__u16	dst;
+	} p16;
+	__u32	v32;
+};
+
+struct xt_hmark_info {
+	__u32		smask;		/* Source address mask */
+	__u32		dmask;		/* Dest address mask */
+	union ports	pmask;
+	union ports	pset;
+	__u32		spimask;
+	__u32		spiset;
+	__u16		flags;		/* Print out only */
+	__u16		prmask;		/* L4 Proto mask */
+	__u32		hashrnd;
+	__u32		hmod;		/* Modulus */
+	__u32		hoffs;		/* Offset */
+};
+
+#endif /* XT_HMARK_H_ */
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 3f0258d..9e4d4f9 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -39,6 +39,7 @@
 #define NEXTHDR_ICMP		58	/* ICMP for IPv6. */
 #define NEXTHDR_NONE		59	/* No next header */
 #define NEXTHDR_DEST		60	/* Destination options header. */
+#define NEXTHDR_SCTP		132	/* Stream Control Transport Protocol */
 #define NEXTHDR_MOBILITY	135	/* Mobility header. */
 
 #define NEXTHDR_MAX		255
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 8260b13..41bee43 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -471,6 +471,23 @@ config NETFILTER_XT_TARGET_HL
 	since you can easily create immortal packets that loop
 	forever on the network.
 
+config NETFILTER_XT_TARGET_HMARK
+	tristate '"HMARK" target support'
+	depends on NETFILTER_ADVANCED
+	---help---
+	This option adds the "HMARK" target.
+
+	The target allows you to create rules in the "raw" and "mangle" tables
+	which alter the netfilter mark (nfmark) field within a given range.
+	First a 32 bit hash value is generated then modulus by <limit> and
+	finally an offset is added before it's written to nfmark.
+
+	Prior to routing, the nfmark can influence the routing method (see
+	"Use netfilter MARK value as routing key") and can also be used by
+	other subsystems to change their behavior.
+
+	The mark match can also be used to match nfmark produced by this module.
+
 config NETFILTER_XT_TARGET_IDLETIMER
 	tristate  "IDLETIMER target support"
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 1a02853..359eeb6 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -56,6 +56,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_CONNSECMARK) += xt_CONNSECMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_CT) += xt_CT.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_DSCP) += xt_DSCP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_HL) += xt_HL.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_HMARK) += xt_hmark.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_LED) += xt_LED.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_NFLOG) += xt_NFLOG.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_NFQUEUE) += xt_NFQUEUE.o
diff --git a/net/netfilter/xt_hmark.c b/net/netfilter/xt_hmark.c
new file mode 100644
index 0000000..ae33293
--- /dev/null
+++ b/net/netfilter/xt_hmark.c
@@ -0,0 +1,327 @@
+/*
+ *	xt_hmark - Netfilter module to set mark as hash value
+ *
+ *	(C) 2011 Hans Schillstrom <hans.schillstrom@ericsson.com>
+ *
+ *	Description:
+ *	This module calculates a hash value that can be modified by modulus
+ *	and an offset. The hash value is based on a direction independent
+ *	five tuple: src & dst addr src & dst ports and protocol.
+ *	However src & dst port can be masked and are not used for fragmented
+ *	packets, ESP and AH don't have ports so SPI will be used instead.
+ *	For ICMP error messages the hash mark values will be calculated on
+ *	the source packet i.e. the packet caused the error (If sufficient
+ *	amount of data exists).
+ *
+ *	This program is free software; you can redistribute it and/or modify
+ *	it under the terms of the GNU General Public License version 2 as
+ *	published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <net/ip.h>
+#include <linux/icmp.h>
+
+#include <linux/netfilter/xt_hmark.h>
+#include <linux/netfilter/x_tables.h>
+#include <net/netfilter/nf_nat.h>
+
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+#	define WITH_IPV6 1
+#include <net/ipv6.h>
+#include <linux/netfilter_ipv6/ip6_tables.h>
+#endif
+
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Hans Schillstrom <hans.schillstrom@ericsson.com>");
+MODULE_DESCRIPTION("Xtables: packet range mark operations by hash value");
+MODULE_ALIAS("ipt_HMARK");
+MODULE_ALIAS("ip6t_HMARK");
+
+/*
+ * ICMP, get inner header so calc can be made on the source message
+ *       not the icmp header, i.e. same hash mark must be produced
+ *       on an icmp error message.
+ */
+static int get_inner_hdr(struct sk_buff *skb, int iphsz, int nhoff)
+{
+	const struct icmphdr *icmph;
+	struct icmphdr _ih;
+	struct iphdr *iph = NULL;
+
+	/* Not enough header? */
+	icmph = skb_header_pointer(skb, nhoff + iphsz, sizeof(_ih), &_ih);
+	if (icmph == NULL)
+		return nhoff;
+
+	if (icmph->type > NR_ICMP_TYPES)
+		return nhoff;
+
+	/* Error message? */
+	if (icmph->type != ICMP_DEST_UNREACH &&
+	    icmph->type != ICMP_SOURCE_QUENCH &&
+	    icmph->type != ICMP_TIME_EXCEEDED &&
+	    icmph->type != ICMP_PARAMETERPROB &&
+	    icmph->type != ICMP_REDIRECT)
+		return nhoff;
+	/* Checkin full IP header plus 8 bytes of protocol to
+	 * avoid additional coding at protocol handlers.
+	 */
+	if (!pskb_may_pull(skb, nhoff + iphsz + sizeof(_ih) + 8))
+		return nhoff;
+
+	iph = (struct iphdr *)(skb->data + nhoff + iphsz + sizeof(_ih));
+	return nhoff + iphsz + sizeof(_ih);
+}
+/*
+ * ICMPv6
+ * Input nhoff Offset into network header
+ *       offset where ICMPv6 header starts
+ * Returns true if it's a icmp error and updates nhoff
+ */
+#ifdef WITH_IPV6
+static int get_inner6_hdr(struct sk_buff *skb, int *offset, int hdrlen)
+{
+	struct icmp6hdr *icmp6h;
+	struct icmp6hdr _ih6;
+
+	icmp6h = skb_header_pointer(skb, *offset + hdrlen, sizeof(_ih6), &_ih6);
+	if (icmp6h == NULL)
+		return 0;
+
+	if (icmp6h->icmp6_type && icmp6h->icmp6_type < 128) {
+		*offset += hdrlen + sizeof(_ih6);
+		return 1;
+	}
+	return 0;
+}
+/*
+ * Calc hash value, special casre is taken on icmp and fragmented messages
+ * i.e. fragmented messages don't use ports.
+ */
+__u32 hmark_v6(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	struct xt_hmark_info *info = (struct xt_hmark_info *)par->targinfo;
+	int nhoff, poff, hdrlen;
+	u32 addr1, addr2, hash;
+	struct ipv6hdr *ip6;
+	u8 nexthdr;
+	int frag = 0, ip6hdrlvl = 0;	/* Header level */
+	struct ipv6_opt_hdr _hdr, *hp;
+	union {
+		u32 v32;
+		u16 v16[2];
+	} ports;
+
+	ports.v32 = 0;
+	nhoff = skb_network_offset(skb);
+
+hdr_new:
+	/* Get header info */
+	ip6 = (struct ipv6hdr *) (skb->data + nhoff);
+	nexthdr = ip6->nexthdr;
+	hdrlen = sizeof(struct ipv6hdr);
+	hp = skb_header_pointer(skb, nhoff + hdrlen, sizeof(_hdr), &_hdr);
+
+	while (nexthdr) {
+		switch (nexthdr) {
+		case IPPROTO_ICMPV6:
+			/* ICMP Error then move ptr to inner header */
+			if (get_inner6_hdr(skb, &nhoff, hdrlen)) {
+				ip6hdrlvl++;
+				if (!pskb_may_pull(skb, sizeof(_hdr) + nhoff))
+					return XT_CONTINUE;
+				goto hdr_new;
+			}
+			nhoff += hdrlen;
+			goto hdr_rdy;
+
+		case NEXTHDR_FRAGMENT:
+			if (!ip6hdrlvl) /* Do not use ports if fragmented */
+				frag = 1;
+			break;
+
+		/* End of hdr traversing cont. with ports and hash calc. */
+		case NEXTHDR_IPV6:	/* Do not process tunnels */
+		case NEXTHDR_TCP:
+		case NEXTHDR_UDP:
+		case NEXTHDR_ESP:
+		case NEXTHDR_AUTH:
+		case NEXTHDR_SCTP:
+		case NEXTHDR_NONE:	/* Last hdr of something unknown */
+			nhoff += hdrlen;
+			goto hdr_rdy;
+		default:
+			return XT_CONTINUE;
+		}
+		if (!hp)
+			return XT_CONTINUE;
+		nhoff += hdrlen;	/* eat current header */
+		nexthdr =  hp->nexthdr;	/* Next header */
+		hdrlen = ipv6_optlen(hp);
+		hp = skb_header_pointer(skb, nhoff + hdrlen, sizeof(_hdr),
+					&_hdr);
+
+		if (!pskb_may_pull(skb, nhoff))
+			return XT_CONTINUE;
+	}
+hdr_rdy:
+
+	addr1 = (__force u32) ip6->saddr.s6_addr32[3];
+	addr2 = (__force u32) ip6->daddr.s6_addr32[3];
+	poff = proto_ports_offset(nexthdr);
+	nhoff += poff;
+	if (frag || poff < 0 || !pskb_may_pull(skb, nhoff + 4))
+		goto no6ports;
+
+	ports.v32 = * (__force u32 *) (skb->data + nhoff);
+	if (nexthdr == IPPROTO_ESP || nexthdr == IPPROTO_AH) {
+		ports.v32 = (ports.v32 & info->spimask) | info->spiset;
+	} else {
+		ports.v32 = (ports.v32 & info->pmask.v32) |
+				info->pset.v32;
+		/* get a consistent hash (same value on both flow directions) */
+		if (ports.v16[1] < ports.v16[0])
+			swap(ports.v16[0], ports.v16[1]);
+	}
+
+no6ports:
+	nexthdr &= info->prmask;
+	/* get a consistent hash (same value on both flow directions) */
+	if (addr2 < addr1)
+		swap(addr1, addr2);
+	hash = jhash_3words(addr1, addr2, ports.v32, info->hashrnd) ^ nexthdr;
+	if (info->hmod)
+		skb->mark = (hash % info->hmod) + info->hoffs;
+
+	return XT_CONTINUE;
+}
+#endif
+
+/*
+ * Calc hash value, special case is taken on icmp and fragmented messages
+ * i.e. fragmented messages don't use ports.
+ */
+unsigned int hmark_v4(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	struct xt_hmark_info *info = (struct xt_hmark_info *)par->targinfo;
+	int nhoff, poff, frag = 0;
+	struct iphdr *ip;
+	u8 ip_proto;
+	u32 addr1, addr2, hash;
+	u16 snatport = 0, dnatport = 0;
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct = ct = nf_ct_get(skb, &ctinfo);
+	union {
+		u32 v32;
+		u16 v16[2];
+	} ports;
+
+	nhoff = skb_network_offset(skb);
+	ports.v32 = 0;
+
+	ip = (struct iphdr *) (skb->data + nhoff);
+	if (ip->protocol == IPPROTO_ICMP) {
+		/* calc hash on inner header if right type */
+		nhoff = get_inner_hdr(skb, ip->ihl * 4, nhoff);
+		ip = (struct iphdr *) (skb->data + nhoff);
+	}
+
+	ip_proto = ip->protocol;
+	if (ip->frag_off & htons(IP_MF | IP_OFFSET))
+		frag = 1;
+
+	addr1 = (__force u32) ip->saddr & info->smask;
+	addr2 = (__force u32) ip->daddr & info->dmask;
+
+	if (ct && test_bit(IP_CT_IS_REPLY, &ct->status)) {
+		struct nf_conntrack_tuple *otuple;
+
+		otuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
+		/* On the "return flow", to get the original address
+		 * i,e, replace the source address. */
+		if ((ct->status & IPS_DST_NAT) &&
+			(info->flags & XT_HMARK_USE_DNAT)) {
+			addr1 = (__force u32) otuple->dst.u3.in.s_addr;
+			dnatport = otuple->dst.u.udp.port;
+		}
+		/* On the "return flow", to get the original address
+		 * i,e, replace the destination address. */
+		if ((ct->status & IPS_SRC_NAT) &&
+			(info->flags & XT_HMARK_USE_SNAT)) {
+			addr2 = (__force u32) otuple->src.u3.in.s_addr;
+			snatport = otuple->src.u.udp.port;
+		}
+	}
+
+	poff = proto_ports_offset(ip_proto);
+	nhoff += ip->ihl * 4 + poff;
+	if (frag || poff < 0 || !pskb_may_pull(skb, nhoff + 4))
+		goto noports;
+
+	ports.v32 = * (__force u32 *) (skb->data + nhoff);
+	if (ip_proto == IPPROTO_ESP || ip_proto == IPPROTO_AH) {
+		ports.v32 = (ports.v32 & info->spimask) | info->spiset;
+	} else {
+		if (snatport)	/* Replace nat'ed port(s) */
+			ports.v16[1] = snatport;
+		if (dnatport)
+			ports.v16[0] = dnatport;
+		ports.v32 = (ports.v32 & info->pmask.v32) |
+				info->pset.v32;
+		if (ports.v16[1] < ports.v16[0])
+			swap(ports.v16[0], ports.v16[1]);
+	}
+
+noports:
+	ip_proto &= info->prmask;
+	/* get a consistent hash (same value on both flow directions) */
+	if (addr2 < addr1)
+		swap(addr1, addr2);
+
+	hash = jhash_3words(addr1, addr2, ports.v32, info->hashrnd) ^ ip_proto;
+	if (info->hmod)
+		skb->mark = (hash % info->hmod) + info->hoffs;
+	return XT_CONTINUE;
+}
+
+static struct xt_target hmark_tg_reg[] __read_mostly = {
+	{
+		.name           = "HMARK",
+		.revision       = 0,
+		.family         = NFPROTO_IPV4,
+		.target         = hmark_v4,
+		.targetsize     = sizeof(struct xt_hmark_info),
+		.me             = THIS_MODULE,
+	},
+#ifdef WITH_IPV6
+	{
+		.name           = "HMARK",
+		.revision       = 0,
+		.family         = NFPROTO_IPV6,
+		.target         = hmark_v6,
+		.targetsize     = sizeof(struct xt_hmark_info),
+		.me             = THIS_MODULE,
+	},
+#endif
+};
+
+static int __init hmark_mt_init(void)
+{
+	int ret;
+
+	ret = xt_register_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+	if (ret < 0)
+		return ret;
+	return 0;
+}
+
+static void __exit hmark_mt_exit(void)
+{
+	xt_unregister_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+}
+
+module_init(hmark_mt_init);
+module_exit(hmark_mt_exit);
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [v4 PATCH 2/2] NETFILTER userspace part for target HMARK
  2011-11-25  9:36 [v4 PATCH 0/2] NETFILTER new target module, HMARK Hans Schillstrom
  2011-11-25  9:36 ` [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark Hans Schillstrom
@ 2011-11-25  9:36 ` Hans Schillstrom
  2011-11-25 13:20   ` Jan Engelhardt
  1 sibling, 1 reply; 19+ messages in thread
From: Hans Schillstrom @ 2011-11-25  9:36 UTC (permalink / raw)
  To: kaber, pablo, jengelh, netfilter-devel, netdev; +Cc: hans.schillstrom

From: Hans Schillstrom <hans.schillstrom@ericsson.com>

The target allows you to create rules in the "raw" and "mangle" tables
which alter the netfilter mark (nfmark) field within a given range.
First a 32 bit hash value is generated then modulus by <limit> and
finally an offset is added before it's written to nfmark.
Prior to routing, the nfmark can influence the routing method (see
"Use netfilter MARK value as routing key") and can also be used by
other subsystems to change their behaviour.

The mark match can also be used to match nfmark produced by this module.

Ver 4
  xtoptions used for parsing.

Ver 3
   -

Ver 2
  IPv4 NAT added
  iptables ver 1.4.12.1 adaptions.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 extensions/libxt_HMARK.c           |  301 ++++++++++++++++++++++++++++++++++++
 extensions/libxt_HMARK.man         |   60 +++++++
 include/linux/netfilter/xt_hmark.h |   62 ++++++++
 3 files changed, 423 insertions(+), 0 deletions(-)
 create mode 100644 extensions/libxt_HMARK.c
 create mode 100644 extensions/libxt_HMARK.man
 create mode 100644 include/linux/netfilter/xt_hmark.h

diff --git a/extensions/libxt_HMARK.c b/extensions/libxt_HMARK.c
new file mode 100644
index 0000000..5027cc1
--- /dev/null
+++ b/extensions/libxt_HMARK.c
@@ -0,0 +1,301 @@
+/*
+ * Shared library add-on to iptables to add HMARK target support.
+ *
+ * The kernel module calculates a hash value that can be modified by modulus
+ * and an offset. The hash value is based on a direction independent
+ * five tuple: src & dst addr src & dst ports and protocol.
+ * However src & dst port can be masked and are not used for fragmented
+ * packets, ESP and AH don't have ports so SPI will be used instead.
+ * For ICMP error messages the hash mark values will be calculated on
+ * the source packet i.e. the packet caused the error (If sufficient
+ * amount of data exists).
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <stdbool.h>
+#include <stdio.h>
+#include <string.h>
+
+#include <xtables.h>
+#include <linux/netfilter/xt_hmark.h>
+
+
+#define DEF_HRAND 0xc175a3b8	/* Default "random" value to jhash */
+
+static void HMARK_help(void)
+{
+	printf(
+"HMARK target options, i.e. modify hash calculation by:\n"
+"  --hmark-smask value                Mask source address with value\n"
+"  --hmark-dmask value                Mask Dest. address with value\n"
+"  --hmark-sp-mask value              Mask src port with value\n"
+"  --hmark-dp-mask value              Mask dst port with value\n"
+"  --hmark-spi-mask value             For esp and ah AND spi with value\n"
+"  --hmark-sp-set value               OR src port with value\n"
+"  --hmark-dp-set value               OR dst port with value\n"
+"  --hmark-spi-set value              For esp and ah OR spi with value\n"
+"  --hmark-proto-mask value           Mask Protocol with value\n"
+"  --hmark-rnd                        Random value to hash cacl.\n"
+"  Limit/modify the calculated hash mark by:\n"
+"  --hmark-mod value                  nfmark modulus value\n"
+"  --hmark-offs value                 Last action add value to nfmark\n"
+" For NAT in IPv4 the original address can be used in the return path.\n"
+" Make sure to qualify the statement in a proper way when using nat flags\n"
+"  --hmark-dnat                       Replace src addr with original dst addr\n"
+"  --hmark-snat                       Replace dst addr with original src addr\n"
+" In many cases hmark can be omitted i.e. --smask can be used\n");
+}
+
+#define hi struct xt_hmark_info
+
+static const struct xt_option_entry HMARK_opts[] = {
+	{ .name = "hmark-smask",      .type = XTTYPE_UINT32, .id = XT_HMARK_SADR_AND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, smask)
+	},
+	{ .name = "hmark-dmask",      .type = XTTYPE_UINT32, .id = XT_HMARK_DADR_AND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, dmask)
+	},
+	{ .name = "hmark-sp-mask",    .type = XTTYPE_UINT16, .id = XT_HMARK_SPORT_AND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, pmask.p16.src)
+	},
+	{ .name = "hmark-dp-mask",    .type = XTTYPE_UINT16, .id = XT_HMARK_DPORT_AND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, pmask.p16.dst)
+	},
+	{ .name = "hmark-spi-mask",   .type = XTTYPE_UINT32, .id = XT_HMARK_SPI_AND,
+	  .flags = XTOPT_PUT,  XTOPT_POINTER(hi, spimask)
+	},
+	{ .name = "hmark-sp-set",     .type = XTTYPE_UINT16, .id = XT_HMARK_SPORT_OR,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, pset.p16.src)
+	},
+	{ .name = "hmark-dp-set",     .type = XTTYPE_UINT16, .id = XT_HMARK_DPORT_OR,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, pset.p16.dst)
+	},
+	{ .name = "hmark-spi-set",    .type = XTTYPE_UINT32, .id = XT_HMARK_SPI_OR,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, spiset)
+	},
+	{ .name = "hmark-proto-mask", .type = XTTYPE_UINT16, .id = XT_HMARK_PROTO_AND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, prmask)
+	},
+	{ .name = "hmark-rnd",        .type = XTTYPE_UINT32, .id = XT_HMARK_RND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, hashrnd)
+	},
+	{ .name = "hmark-mod",        .type = XTTYPE_UINT32, .id = XT_HMARK_MODULUS,
+	  .flags = XTOPT_PUT | XTOPT_MAND, XTOPT_POINTER(hi, hmod)
+	},
+	{ .name = "hmark-offs",       .type = XTTYPE_UINT32, .id = XT_HMARK_OFFSET,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, hoffs)
+	},
+	{ .name = "hmark-dnat",       .type = XTTYPE_NONE,   .id = XT_HMARK_USE_DNAT },
+	{ .name = "hmark-snat",       .type = XTTYPE_NONE,   .id = XT_HMARK_USE_SNAT },
+
+	{ .name = "smask",      .type = XTTYPE_UINT32, .id = XT_HMARK_SADR_AND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, smask)
+	},
+	{ .name = "dmask",      .type = XTTYPE_UINT32, .id = XT_HMARK_DADR_AND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, dmask)
+	},
+	{ .name = "sp-mask",    .type = XTTYPE_UINT16, .id = XT_HMARK_SPORT_AND,
+	  .flags = XTOPT_PUT,  XTOPT_POINTER(hi, pmask.p16.src)
+	},
+	{ .name = "dp-mask",    .type = XTTYPE_UINT16, .id = XT_HMARK_DPORT_AND,
+	  .flags = XTOPT_PUT,  XTOPT_POINTER(hi, pmask.p16.dst)
+	},
+	{ .name = "spi-mask",   .type = XTTYPE_UINT32, .id = XT_HMARK_SPI_AND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, spimask)
+	},
+	{ .name = "sp-set",     .type = XTTYPE_UINT16, .id = XT_HMARK_SPORT_OR,
+	  .flags = XTOPT_PUT,  XTOPT_POINTER(hi, pset.p16.src)
+	},
+	{ .name = "dp-set",     .type = XTTYPE_UINT16, .id = XT_HMARK_DPORT_OR,
+	  .flags = XTOPT_PUT,  XTOPT_POINTER(hi, pset.p16.dst)
+	},
+	{ .name = "spi-set",    .type = XTTYPE_UINT32, .id = XT_HMARK_SPI_OR,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, spiset)
+	},
+	{ .name = "proto-mask", .type = XTTYPE_UINT16, .id = XT_HMARK_PROTO_AND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, prmask)
+	},
+	{ .name = "rnd",        .type = XTTYPE_UINT32, .id = XT_HMARK_RND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, hashrnd)
+	},
+	{ .name = "mod",        .type = XTTYPE_UINT32, .id = XT_HMARK_MODULUS,
+	  .flags = XTOPT_PUT | XTOPT_MAND, XTOPT_POINTER(hi, hmod)
+	},
+	{ .name = "offs",       .type = XTTYPE_UINT32, .id = XT_HMARK_OFFSET,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, hoffs)
+	},
+	{ .name = "dnat",       .type = XTTYPE_NONE,   .id = XT_HMARK_USE_DNAT },
+	{ .name = "snat",       .type = XTTYPE_NONE,   .id = XT_HMARK_USE_SNAT },
+	XTOPT_TABLEEND,
+};
+
+static void HMARK_parse(struct xt_option_call *cb)
+{
+	struct xt_hmark_info *info = cb->data;
+
+	if (!cb->xflags) {
+		memset(info, 0xff, sizeof(struct xt_hmark_info));
+		info->pset.v32 = 0;
+		info->flags = 0;
+		info->spiset = 0;
+		info->hoffs = 0;
+		info->hashrnd = DEF_HRAND;
+	}
+	xtables_option_parse(cb);
+
+	switch (cb->entry->id) {
+	case XT_HMARK_SADR_AND:
+		info->smask = htonl(cb->val.u32);
+		break;
+	case XT_HMARK_DADR_AND:
+		info->dmask = htonl(cb->val.u32);
+		break;
+	case XT_HMARK_SPI_AND:
+		info->spimask = htonl(cb->val.u32);
+		break;
+	case XT_HMARK_SPI_OR:
+		info->spiset = htonl(cb->val.u32);
+		break;
+	case XT_HMARK_SPORT_AND:
+		info->pmask.p16.src = htons(cb->val.u16);
+		break;
+	case XT_HMARK_DPORT_AND:
+		info->pmask.p16.dst = htons(cb->val.u16);
+		break;
+	case XT_HMARK_SPORT_OR:
+		info->pset.p16.src = htons(cb->val.u16);
+		break;
+	case XT_HMARK_DPORT_OR:
+		info->pset.p16.dst = htons(cb->val.u16);
+		break;
+	case XT_HMARK_MODULUS:
+		if (info->hmod == 0) {
+			xtables_error(PARAMETER_PROBLEM,
+				      "xxx modulus 0 ? "
+				      "thats a div by 0");
+			info->hmod = 0xffffffff;
+		}
+		break;
+	}
+	info->flags = cb->xflags;
+}
+
+static void HMARK_check(struct xt_fcheck_call *cb)
+{
+	if (!(cb->xflags & XT_F_HMARK_MODULUS))
+		xtables_error(PARAMETER_PROBLEM, "HMARK: the --hmark-mod, "
+			   "is not set, that means the nfmark will be in range"
+			   " 0 - 0xffffffff");
+}
+
+static void HMARK_print(const void *ip, const struct xt_entry_target *target,
+			int numeric)
+{
+	const struct xt_hmark_info *info =
+			(const struct xt_hmark_info *)target->data;
+
+	printf(" HMARK ");
+	if (info->flags & (1 << XT_HMARK_USE_SNAT))
+		printf("snat, ");
+	if (info->flags & (1 << XT_HMARK_SADR_AND))
+		printf("smask 0x%x ", htonl(info->smask));
+
+	if (info->flags & (1 << XT_HMARK_USE_DNAT))
+		printf("dnat, ");
+	if (info->flags & (1 << XT_HMARK_DADR_AND))
+		printf("dmask 0x%x ", htonl(info->dmask));
+
+	if (info->flags & (1 << XT_HMARK_SPORT_AND))
+		printf("sp-mask 0x%x ", htons(info->pmask.p16.src));
+	if (info->flags & (1 << XT_HMARK_DPORT_AND))
+		printf("dp-mask 0x%x ", htons(info->pmask.p16.dst));
+	if (info->flags & (1 << XT_HMARK_SPI_AND))
+		printf("spi-mask 0x%x ", htonl(info->spimask));
+	if (info->flags & (1 << XT_HMARK_SPORT_OR))
+		printf("sp-set 0x%x ", htons(info->pset.p16.src));
+	if (info->flags & (1 << XT_HMARK_DPORT_OR))
+		printf("dp-set 0x%x ", htons(info->pset.p16.dst));
+	if (info->flags & (1 << XT_HMARK_SPI_OR))
+		printf("spi-set 0x%x ", htonl(info->spiset));
+	if (info->flags & (1 << XT_HMARK_PROTO_AND))
+		printf("proto-mask 0x%x ", info->prmask);
+	if (info->flags & (1 << XT_HMARK_RND))
+		printf("rnd 0x%x ", info->hashrnd);
+	if (info->flags & (1 << XT_HMARK_MODULUS))
+		printf("mark=hv %% 0x%x ", info->hmod);
+	if (info->flags & (1 << XT_HMARK_OFFSET))
+		printf("+ 0x%x ", info->hoffs);
+}
+
+static void HMARK_save(const void *ip, const struct xt_entry_target *target)
+{
+	const struct xt_hmark_info *info =
+		(const struct xt_hmark_info *)target->data;
+
+	if (info->flags & (1 << XT_HMARK_SADR_AND))
+		printf(" --hmark-smask 0x%x", htonl(info->smask));
+	if (info->flags & (1 << XT_HMARK_DADR_AND))
+		printf(" --hmark-dmask 0x%x", htonl(info->dmask));
+	if (info->flags & (1 << XT_HMARK_SPORT_AND))
+		printf(" --hmark-sp-mask 0x%x", htons(info->pmask.p16.src));
+	if (info->flags & (1 << XT_HMARK_DPORT_AND))
+		printf(" --hmark-dp-mask 0x%x", htons(info->pmask.p16.dst));
+	if (info->flags & (1 << XT_HMARK_SPI_AND))
+		printf(" --hmark-spi-mask 0x%x", htonl(info->spimask));
+	if (info->flags & (1 << XT_HMARK_SPORT_OR))
+		printf(" --hmark-sp-set 0x%x", htons(info->pset.p16.src));
+	if (info->flags & (1 << XT_HMARK_DPORT_OR))
+		printf(" --hmark-dp-set 0x%x", htons(info->pset.p16.dst));
+	if (info->flags & (1 << XT_HMARK_SPI_OR))
+		printf(" --hmark-spi-set 0x%x", htonl(info->spiset));
+	if (info->flags & (1 << XT_HMARK_PROTO_AND))
+		printf(" --hmark-proto-mask 0x%x", info->prmask);
+	if (info->flags & (1 << XT_HMARK_RND))
+		printf(" --hmark-rnd 0x%x", info->hashrnd);
+	if (info->flags & (1 << XT_HMARK_MODULUS))
+		printf(" --hmark-mod 0x%x", info->hmod);
+	if (info->flags & (1 << XT_HMARK_OFFSET))
+		printf(" --hmark-offs 0x%x", info->hoffs);
+	if (info->flags & (1 << XT_HMARK_USE_DNAT))
+		printf(" --hmark-dnat");
+	if (info->flags & (1 << XT_HMARK_USE_SNAT))
+		printf(" --hmark-snat");
+}
+
+static struct xtables_target mark_tg_reg[] = {
+	{
+		.family        = NFPROTO_IPV4,
+		.name          = "HMARK",
+		.version       = XTABLES_VERSION,
+		.revision      = 0,
+		.size          = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.help          = HMARK_help,
+		.print         = HMARK_print,
+		.save          = HMARK_save,
+		.x6_parse      = HMARK_parse,
+		.x6_fcheck     = HMARK_check,
+		.x6_options    = HMARK_opts,
+	},
+	{
+		.family        = NFPROTO_IPV6,
+		.name          = "HMARK",
+		.version       = XTABLES_VERSION,
+		.revision      = 0,
+		.size          = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.help          = HMARK_help,
+		.print         = HMARK_print,
+		.save          = HMARK_save,
+		.x6_parse      = HMARK_parse,
+		.x6_fcheck     = HMARK_check,
+		.x6_options    = HMARK_opts,
+	},
+};
+
+void _init(void)
+{
+	xtables_register_targets(mark_tg_reg, ARRAY_SIZE(mark_tg_reg));
+}
+
diff --git a/extensions/libxt_HMARK.man b/extensions/libxt_HMARK.man
new file mode 100644
index 0000000..f24ac9b
--- /dev/null
+++ b/extensions/libxt_HMARK.man
@@ -0,0 +1,60 @@
+This module does the same as MARK, i.e. set an fwmark, but the mark is based on a hash value.
+The hash is based on saddr, daddr, sport, dport and proto. The same mark will be produced independet of direction if no masks is set or the same masks is used for src and dest.
+The hash mark could be adjusted by modulus and finally an offset could be added, i.e the final mark will be within a range. If state RELATED is used icmp will be handled also, i.e. hash will be calculated on the original message not the icmp it self.
+Note: None of the parameters effect the packet it self only the calculated hash value.
+.PP
+Parameters:
+For all masks default is all "1:s", to disable a field use mask 0
+For IPv6 it's just the last 32 bits that is included in the hash
+.TP
+\fB\-\-hmark\-smask\fP \fIvalue\fP
+The value to AND the source address with (saddr & value).
+.TP
+\fB\-\-hmark\-dmask\fP \fIvalue\fP
+The value to AND the dest. address with (daddr & value).
+.TP
+\fB\-\-hmark\-sp\-mask\fP \fIvalue\fP
+A 16 bit value to AND the src port with (sport & value).
+.TP
+\fB\-\-hmark\-dp\-mask\fP \fIvalue\fP
+A 16 bit value to AND the dest port with (dport & value).
+.TP
+\fB\-\-hmark\-sp\-set\fP \fIvalue\fP
+A 16 bit value to OR the src port with (sport | value).
+.TP
+\fB\-\-hmark\-dp\-set\fP \fIvalue\fP
+A 16 bit value to OR the dest port with (dport | value).
+.TP
+\fB\-\-hmark\-spi\-mask\fP \fIvalue\fP
+Value to AND the spi field with (spi & value) valid for proto esp or ah.
+.TP
+\fB\-\-hmark\-spi\-set\fP \fIvalue\fP
+Value to OR the spi field with (spi | value) valid for proto esp or ah.
+.TP
+\fB\-\-hmark\-proto\-mask\fP \fIvalue\fP
+An 8 bit value to AND the L4 proto field with (proto & value).
+.TP
+\fB\-\-hmark\-rnd\fP \fIvalue\fP
+A 32 bit initial value for hash calc, default is 0xc175a3b8.
+.PP
+Final processing of the mark in order of execution.
+.TP
+\fB\-\-hmark\-mod\fP \fvalue (must be > 0)\fP
+The easiest way to describe this is:  hash = hash mod <value>
+.TP
+\fB\-\-hmark\-offs\fP \fvalue\fP
+The easiest way to describe this is:  hash = hash + <value>
+.PP
+\fIExamples:\fP
+.PP
+Default rule handles all TCP, UDP, SCTP, ESP & AH
+.IP
+iptables \-t mangle \-A PREROUTING \-m state \-\-state NEW,ESTABLISHED,RELATED
+ \-j HMARK \-\-hmark-offs 10000 \-\-hmark-mod 10
+.PP
+Handle SCTP and hash dest port only and produce a nfmark between 100-119.
+.IP
+iptables \-t mangle \-A PREROUTING -p SCTP \-j HMARK \-\-smask 0 \-\-dmask 0
+ \-\-sp\-mask 0 \-\-offs 100 \-\-mod 20
+.PP
+
diff --git a/include/linux/netfilter/xt_hmark.h b/include/linux/netfilter/xt_hmark.h
new file mode 100644
index 0000000..1760015
--- /dev/null
+++ b/include/linux/netfilter/xt_hmark.h
@@ -0,0 +1,62 @@
+#ifndef XT_HMARK_H_
+#define XT_HMARK_H_
+
+#include <linux/types.h>
+
+/*
+ * Flags must not start at 0, since it's used as none.
+ */
+enum {
+	XT_HMARK_USE_SNAT = 1,	/* SNAT & DNAT are used by the kernel module */
+	XT_HMARK_USE_DNAT,
+	XT_HMARK_SADR_AND,
+	XT_HMARK_DADR_AND,
+	XT_HMARK_SPI_AND,
+	XT_HMARK_SPI_OR,
+	XT_HMARK_SPORT_AND,
+	XT_HMARK_DPORT_AND,
+	XT_HMARK_SPORT_OR,
+	XT_HMARK_DPORT_OR,
+	XT_HMARK_PROTO_AND,
+	XT_HMARK_RND,
+	XT_HMARK_MODULUS,
+	XT_HMARK_OFFSET,
+	XT_F_HMARK_USE_SNAT = 1 << XT_HMARK_USE_SNAT,
+	XT_F_HMARK_USE_DNAT = 1 << XT_HMARK_USE_DNAT,
+	XT_F_HMARK_SADR_AND = 1 << XT_HMARK_SADR_AND,
+	XT_F_HMARK_DADR_AND = 1 << XT_HMARK_DADR_AND,
+	XT_F_HMARK_SPI_AND = 1 << XT_HMARK_SPI_AND,
+	XT_F_HMARK_SPI_OR = 1 << XT_HMARK_SPI_OR,
+	XT_F_HMARK_SPORT_AND = 1 << XT_HMARK_SPORT_AND,
+	XT_F_HMARK_DPORT_AND = 1 << XT_HMARK_DPORT_AND,
+	XT_F_HMARK_SPORT_OR = 1 << XT_HMARK_SPORT_OR,
+	XT_F_HMARK_DPORT_OR = 1 << XT_HMARK_DPORT_OR,
+	XT_F_HMARK_PROTO_AND = 1 << XT_HMARK_PROTO_AND,
+	XT_F_HMARK_RND = 1 << XT_HMARK_RND,
+	XT_F_HMARK_MODULUS = 1 << XT_HMARK_MODULUS,
+	XT_F_HMARK_OFFSET = 1 << XT_HMARK_OFFSET,
+};
+
+union ports {
+	struct {
+		__u16	src;
+		__u16	dst;
+	} p16;
+	__u32	v32;
+};
+
+struct xt_hmark_info {
+	__u32		smask;		/* Source address mask */
+	__u32		dmask;		/* Dest address mask */
+	union ports	pmask;
+	union ports	pset;
+	__u32		spimask;
+	__u32		spiset;
+	__u16		flags;		/* Print out only */
+	__u16		prmask;		/* L4 Proto mask */
+	__u32		hashrnd;
+	__u32		hmod;		/* Modulus */
+	__u32		hoffs;		/* Offset */
+};
+
+#endif /* XT_HMARK_H_ */
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [v4 PATCH 2/2] NETFILTER userspace part for target HMARK
  2011-11-25  9:36 ` [v4 PATCH 2/2] NETFILTER userspace part for target HMARK Hans Schillstrom
@ 2011-11-25 13:20   ` Jan Engelhardt
  2011-11-25 14:04     ` Hans Schillström
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Engelhardt @ 2011-11-25 13:20 UTC (permalink / raw)
  To: Hans Schillstrom; +Cc: kaber, pablo, netfilter-devel, netdev, hans.schillstrom


On Friday 2011-11-25 10:36, Hans Schillstrom wrote:

>+Parameters:
>+For all masks default is all "1:s", to disable a field use mask 0
>+For IPv6 it's just the last 32 bits that is included in the hash

Why limit IPv6 to 32?

>diff --git a/include/linux/netfilter/xt_hmark.h b/include/linux/netfilter/xt_hmark.h
>new file mode 100644
>index 0000000..1760015
>--- /dev/null
>+++ b/include/linux/netfilter/xt_hmark.h
>@@ -0,0 +1,62 @@
>+#ifndef XT_HMARK_H_
>+#define XT_HMARK_H_
>+
>+#include <linux/types.h>
>+
>+/*
>+ * Flags must not start at 0, since it's used as none.
>+ */
>+enum {
>+	XT_HMARK_USE_SNAT = 1,	/* SNAT & DNAT are used by the kernel module */
>+	XT_HMARK_USE_DNAT,
>+	XT_HMARK_SADR_AND,
>+	XT_HMARK_DADR_AND,
>+	XT_HMARK_SPI_AND,
>+	XT_HMARK_SPI_OR,
>+	XT_HMARK_SPORT_AND,
>+	XT_HMARK_DPORT_AND,
>+	XT_HMARK_SPORT_OR,
>+	XT_HMARK_DPORT_OR,
>+	XT_HMARK_PROTO_AND,
>+	XT_HMARK_RND,
>+	XT_HMARK_MODULUS,
>+	XT_HMARK_OFFSET,
>+	XT_F_HMARK_USE_SNAT = 1 << XT_HMARK_USE_SNAT,

This file does not match the kernel-side xt_hmark.h.
Definitions only used within the userspace side should go into libxt_hmark.c
anyhow.

>+union ports {
>+	struct {
>+		__u16	src;
>+		__u16	dst;
>+	} p16;
>+	__u32	v32;
>+};

Bad name "ports", big clash potential.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [v4 PATCH 2/2] NETFILTER userspace part for target HMARK
  2011-11-25 13:20   ` Jan Engelhardt
@ 2011-11-25 14:04     ` Hans Schillström
  2011-11-25 15:44       ` Jan Engelhardt
  0 siblings, 1 reply; 19+ messages in thread
From: Hans Schillström @ 2011-11-25 14:04 UTC (permalink / raw)
  To: Jan Engelhardt, Hans Schillstrom; +Cc: kaber, pablo, netfilter-devel, netdev


>On Friday 2011-11-25 10:36, Hans Schillstrom wrote:
>
>>+Parameters:
>>+For all masks default is all "1:s", to disable a field use mask 0
>>+For IPv6 it's just the last 32 bits that is included in the hash
>
>Why limit IPv6 to 32?

Performance, and the gain of adding another 192 bits to jhash ain't much.
However there is some cases when it hurts, i.e. when you can't mask of an subnet
I'm not sure it it's a problem or not... 

>>diff --git a/include/linux/netfilter/xt_hmark.h b/include/linux/netfilter/xt_hmark.h
[snip]
>
>This file does not match the kernel-side xt_hmark.h.
>Definitions only used within the userspace side should go into libxt_hmark.c
>anyhow.

Oop,s just forgot to copy the file :-)

>>+union ports {
>>+      struct {
>>+              __u16   src;
>>+              __u16   dst;
>>+      } p16;
>>+      __u32   v32;
>>+};
>
>Bad name "ports", big clash potential.

Yes, I'll change that..

Thanks
Hans

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark
  2011-11-25  9:36 ` [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark Hans Schillstrom
@ 2011-11-25 14:19   ` David Laight
  2011-11-25 14:36     ` Eric Dumazet
  2011-11-25 14:43   ` Eric Dumazet
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: David Laight @ 2011-11-25 14:19 UTC (permalink / raw)
  To: Hans Schillstrom, kaber, pablo, jengelh, netfilter-devel, netdev
  Cc: hans.schillstrom

 
> +	addr1 = (__force u32) ip6->saddr.s6_addr32[3];
> +	addr2 = (__force u32) ip6->daddr.s6_addr32[3];
...
> +	ports.v32 = * (__force u32 *) (skb->data + nhoff);

Is this code even vaguely portable??
I suspect the 'ports' bit has serious endianness problems.
I'm also not sure whether linux guarantees the alignment
of skb->data here.

	David

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark
  2011-11-25 14:19   ` David Laight
@ 2011-11-25 14:36     ` Eric Dumazet
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2011-11-25 14:36 UTC (permalink / raw)
  To: David Laight
  Cc: Hans Schillstrom, kaber, pablo, jengelh, netfilter-devel, netdev,
	hans.schillstrom

Le vendredi 25 novembre 2011 à 14:19 +0000, David Laight a écrit :
> > +	addr1 = (__force u32) ip6->saddr.s6_addr32[3];
> > +	addr2 = (__force u32) ip6->daddr.s6_addr32[3];
> ...
> > +	ports.v32 = * (__force u32 *) (skb->data + nhoff);
> 
> Is this code even vaguely portable??

Yes it is.

> I suspect the 'ports' bit has serious endianness problems.

We dont care of endianness here, and we document it with the (__force
u32) cast.



> I'm also not sure whether linux guarantees the alignment
> of skb->data here.


It is guaranteed in whole linux stack.


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark
  2011-11-25  9:36 ` [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark Hans Schillstrom
  2011-11-25 14:19   ` David Laight
@ 2011-11-25 14:43   ` Eric Dumazet
  2011-11-25 17:36   ` Pablo Neira Ayuso
  2011-11-30 15:51   ` Patrick McHardy
  3 siblings, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2011-11-25 14:43 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: kaber, pablo, jengelh, netfilter-devel, netdev, hans.schillstrom

Le vendredi 25 novembre 2011 à 10:36 +0100, Hans Schillstrom a écrit :
> From: Hans Schillstrom <hans.schillstrom@ericsson.com>
> 
> The target allows you to create rules in the "raw" and "mangle" tables
> which alter the netfilter mark (nfmark) field within a given range.
> First a 32 bit hash value is generated then modulus by <limit> and
> finally an offset is added before it's written to nfmark.
> Prior to routing, the nfmark can influence the routing method (see
> "Use netfilter MARK value as routing key") and can also be used by
> other subsystems to change their behavior.
> 


Oh well, yet another duplicated flow dissector ...

> +/*
> + * Calc hash value, special casre is taken on icmp and fragmented messages
> + * i.e. fragmented messages don't use ports.
> + */
> +__u32 hmark_v6(struct sk_buff *skb, const struct xt_action_param *par)
> +{
> +	struct xt_hmark_info *info = (struct xt_hmark_info *)par->targinfo;



> +no6ports:
> +	nexthdr &= info->prmask;
> +	/* get a consistent hash (same value on both flow directions) */
> +	if (addr2 < addr1)
> +		swap(addr1, addr2);
> +	hash = jhash_3words(addr1, addr2, ports.v32, info->hashrnd) ^ nexthdr;

whats the point computing hash, if info->hmod is null, since we dont set
skb->mark ?

> +	if (info->hmod)
> +		skb->mark = (hash % info->hmod) + info->hoffs;
> +
> +	return XT_CONTINUE;
> +}
> +#endif
> +


Same problem/question on hmark_v4()


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [v4 PATCH 2/2] NETFILTER userspace part for target HMARK
  2011-11-25 14:04     ` Hans Schillström
@ 2011-11-25 15:44       ` Jan Engelhardt
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Engelhardt @ 2011-11-25 15:44 UTC (permalink / raw)
  To: Hans Schillström
  Cc: Hans Schillstrom, kaber, pablo, netfilter-devel, netdev

On Friday 2011-11-25 15:04, Hans Schillström wrote:

>
>>On Friday 2011-11-25 10:36, Hans Schillstrom wrote:
>>
>>>+Parameters:
>>>+For all masks default is all "1:s", to disable a field use mask 0
>>>+For IPv6 it's just the last 32 bits that is included in the hash
>>
>>Why limit IPv6 to 32?
>
>Performance, and the gain of adding another 192 bits to jhash ain't much.
>However there is some cases when it hurts, i.e. when you can't mask of an subnet
>I'm not sure it it's a problem or not... 

I was thinking about the case where two particular hosts have the same 
trailing 32 bits in their source address. For example, assuming IPv6 
starts to take a stronghold in the real world and home customers start 
assigning <myprefix>::1 to the little home server (i.e. the PPP 
endpoint) of theirs for remote login.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark
  2011-11-25  9:36 ` [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark Hans Schillstrom
  2011-11-25 14:19   ` David Laight
  2011-11-25 14:43   ` Eric Dumazet
@ 2011-11-25 17:36   ` Pablo Neira Ayuso
  2011-11-25 18:31     ` Jan Engelhardt
  2011-11-30 15:51   ` Patrick McHardy
  3 siblings, 1 reply; 19+ messages in thread
From: Pablo Neira Ayuso @ 2011-11-25 17:36 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: kaber, jengelh, netfilter-devel, netdev, hans.schillstrom

On Fri, Nov 25, 2011 at 10:36:26AM +0100, Hans Schillstrom wrote:
> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> index 3f0258d..9e4d4f9 100644
> --- a/include/net/ipv6.h
> +++ b/include/net/ipv6.h
> @@ -39,6 +39,7 @@
>  #define NEXTHDR_ICMP		58	/* ICMP for IPv6. */
>  #define NEXTHDR_NONE		59	/* No next header */
>  #define NEXTHDR_DEST		60	/* Destination options header. */
> +#define NEXTHDR_SCTP		132	/* Stream Control Transport Protocol */
>  #define NEXTHDR_MOBILITY	135	/* Mobility header. */
>  
>  #define NEXTHDR_MAX		255

This has to go in a separated patch. Please, send it to netdev. I
think davem can pick that for 3.2-rc

> diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
> index 8260b13..41bee43 100644
> --- a/net/netfilter/Kconfig
> +++ b/net/netfilter/Kconfig
> @@ -471,6 +471,23 @@ config NETFILTER_XT_TARGET_HL
>  	since you can easily create immortal packets that loop
>  	forever on the network.
>  
> +config NETFILTER_XT_TARGET_HMARK
> +	tristate '"HMARK" target support'
> +	depends on NETFILTER_ADVANCED
> +	---help---
> +	This option adds the "HMARK" target.
> +
> +	The target allows you to create rules in the "raw" and "mangle" tables
> +	which alter the netfilter mark (nfmark) field within a given range.
> +	First a 32 bit hash value is generated then modulus by <limit> and
> +	finally an offset is added before it's written to nfmark.
> +
> +	Prior to routing, the nfmark can influence the routing method (see
> +	"Use netfilter MARK value as routing key") and can also be used by
> +	other subsystems to change their behavior.
> +
> +	The mark match can also be used to match nfmark produced by this module.
> +
>  config NETFILTER_XT_TARGET_IDLETIMER
>  	tristate  "IDLETIMER target support"
>  	depends on NETFILTER_ADVANCED
> diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
> index 1a02853..359eeb6 100644
> --- a/net/netfilter/Makefile
> +++ b/net/netfilter/Makefile
> @@ -56,6 +56,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_CONNSECMARK) += xt_CONNSECMARK.o
>  obj-$(CONFIG_NETFILTER_XT_TARGET_CT) += xt_CT.o
>  obj-$(CONFIG_NETFILTER_XT_TARGET_DSCP) += xt_DSCP.o
>  obj-$(CONFIG_NETFILTER_XT_TARGET_HL) += xt_HL.o
> +obj-$(CONFIG_NETFILTER_XT_TARGET_HMARK) += xt_hmark.o
>  obj-$(CONFIG_NETFILTER_XT_TARGET_LED) += xt_LED.o
>  obj-$(CONFIG_NETFILTER_XT_TARGET_NFLOG) += xt_NFLOG.o
>  obj-$(CONFIG_NETFILTER_XT_TARGET_NFQUEUE) += xt_NFQUEUE.o
> diff --git a/net/netfilter/xt_hmark.c b/net/netfilter/xt_hmark.c
> new file mode 100644
> index 0000000..ae33293
> --- /dev/null
> +++ b/net/netfilter/xt_hmark.c
> @@ -0,0 +1,327 @@
> +/*
> + *	xt_hmark - Netfilter module to set mark as hash value
> + *
> + *	(C) 2011 Hans Schillstrom <hans.schillstrom@ericsson.com>
> + *
> + *	Description:
> + *	This module calculates a hash value that can be modified by modulus
> + *	and an offset. The hash value is based on a direction independent
> + *	five tuple: src & dst addr src & dst ports and protocol.
> + *	However src & dst port can be masked and are not used for fragmented
> + *	packets, ESP and AH don't have ports so SPI will be used instead.
> + *	For ICMP error messages the hash mark values will be calculated on
> + *	the source packet i.e. the packet caused the error (If sufficient
> + *	amount of data exists).
> + *
> + *	This program is free software; you can redistribute it and/or modify
> + *	it under the terms of the GNU General Public License version 2 as
> + *	published by the Free Software Foundation.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/skbuff.h>
> +#include <net/ip.h>
> +#include <linux/icmp.h>
> +
> +#include <linux/netfilter/xt_hmark.h>
> +#include <linux/netfilter/x_tables.h>
> +#include <net/netfilter/nf_nat.h>
> +
> +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
> +#	define WITH_IPV6 1
> +#include <net/ipv6.h>
> +#include <linux/netfilter_ipv6/ip6_tables.h>
> +#endif
> +
> +

Comestic: unnecessary extra line.

> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Hans Schillstrom <hans.schillstrom@ericsson.com>");
> +MODULE_DESCRIPTION("Xtables: packet range mark operations by hash value");
> +MODULE_ALIAS("ipt_HMARK");
> +MODULE_ALIAS("ip6t_HMARK");
> +
> +/*
> + * ICMP, get inner header so calc can be made on the source message
> + *       not the icmp header, i.e. same hash mark must be produced
> + *       on an icmp error message.
> + */
> +static int get_inner_hdr(struct sk_buff *skb, int iphsz, int nhoff)
> +{
> +	const struct icmphdr *icmph;
> +	struct icmphdr _ih;
> +	struct iphdr *iph = NULL;
> +
> +	/* Not enough header? */
> +	icmph = skb_header_pointer(skb, nhoff + iphsz, sizeof(_ih), &_ih);
> +	if (icmph == NULL)
> +		return nhoff;
> +
> +	if (icmph->type > NR_ICMP_TYPES)
> +		return nhoff;
> +
> +	/* Error message? */
> +	if (icmph->type != ICMP_DEST_UNREACH &&
> +	    icmph->type != ICMP_SOURCE_QUENCH &&
> +	    icmph->type != ICMP_TIME_EXCEEDED &&
> +	    icmph->type != ICMP_PARAMETERPROB &&
> +	    icmph->type != ICMP_REDIRECT)
> +		return nhoff;
> +	/* Checkin full IP header plus 8 bytes of protocol to
> +	 * avoid additional coding at protocol handlers.
> +	 */
> +	if (!pskb_may_pull(skb, nhoff + iphsz + sizeof(_ih) + 8))
> +		return nhoff;

skb_header_pointer again here, if conntrack is enabled, we can benefit
from handling fragments.

> +	iph = (struct iphdr *)(skb->data + nhoff + iphsz + sizeof(_ih));
> +	return nhoff + iphsz + sizeof(_ih);
> +}
> +/*
> + * ICMPv6
> + * Input nhoff Offset into network header
> + *       offset where ICMPv6 header starts
> + * Returns true if it's a icmp error and updates nhoff
> + */
> +#ifdef WITH_IPV6
> +static int get_inner6_hdr(struct sk_buff *skb, int *offset, int hdrlen)
> +{
> +	struct icmp6hdr *icmp6h;
> +	struct icmp6hdr _ih6;
> +
> +	icmp6h = skb_header_pointer(skb, *offset + hdrlen, sizeof(_ih6), &_ih6);
> +	if (icmp6h == NULL)
> +		return 0;
> +
> +	if (icmp6h->icmp6_type && icmp6h->icmp6_type < 128) {
> +		*offset += hdrlen + sizeof(_ih6);
> +		return 1;
> +	}
> +	return 0;
> +}
> +/*
> + * Calc hash value, special casre is taken on icmp and fragmented messages
> + * i.e. fragmented messages don't use ports.
> + */
> +__u32 hmark_v6(struct sk_buff *skb, const struct xt_action_param *par)
> +{
> +	struct xt_hmark_info *info = (struct xt_hmark_info *)par->targinfo;
> +	int nhoff, poff, hdrlen;
> +	u32 addr1, addr2, hash;
> +	struct ipv6hdr *ip6;
> +	u8 nexthdr;
> +	int frag = 0, ip6hdrlvl = 0;	/* Header level */
> +	struct ipv6_opt_hdr _hdr, *hp;
> +	union {
> +		u32 v32;
> +		u16 v16[2];
> +	} ports;
> +
> +	ports.v32 = 0;
> +	nhoff = skb_network_offset(skb);
> +
> +hdr_new:
> +	/* Get header info */
> +	ip6 = (struct ipv6hdr *) (skb->data + nhoff);
> +	nexthdr = ip6->nexthdr;
> +	hdrlen = sizeof(struct ipv6hdr);
> +	hp = skb_header_pointer(skb, nhoff + hdrlen, sizeof(_hdr), &_hdr);

you have to check return value of skb_header_pointer here.

> +	while (nexthdr) {
> +		switch (nexthdr) {
> +		case IPPROTO_ICMPV6:
> +			/* ICMP Error then move ptr to inner header */
> +			if (get_inner6_hdr(skb, &nhoff, hdrlen)) {
> +				ip6hdrlvl++;
> +				if (!pskb_may_pull(skb, sizeof(_hdr) + nhoff))
> +					return XT_CONTINUE;
> +				goto hdr_new;
> +			}
> +			nhoff += hdrlen;
> +			goto hdr_rdy;
> +
> +		case NEXTHDR_FRAGMENT:
> +			if (!ip6hdrlvl) /* Do not use ports if fragmented */
> +				frag = 1;
> +			break;
> +
> +		/* End of hdr traversing cont. with ports and hash calc. */
> +		case NEXTHDR_IPV6:	/* Do not process tunnels */
> +		case NEXTHDR_TCP:
> +		case NEXTHDR_UDP:
> +		case NEXTHDR_ESP:
> +		case NEXTHDR_AUTH:
> +		case NEXTHDR_SCTP:
> +		case NEXTHDR_NONE:	/* Last hdr of something unknown */
> +			nhoff += hdrlen;
> +			goto hdr_rdy;
> +		default:
> +			return XT_CONTINUE;
> +		}
> +		if (!hp)
> +			return XT_CONTINUE;
> +		nhoff += hdrlen;	/* eat current header */
> +		nexthdr =  hp->nexthdr;	/* Next header */
> +		hdrlen = ipv6_optlen(hp);
> +		hp = skb_header_pointer(skb, nhoff + hdrlen, sizeof(_hdr),
> +					&_hdr);

same here.

> +		if (!pskb_may_pull(skb, nhoff))

why this after skb_header_pointer?

[... trimmed off ...]
>       poff = proto_ports_offset(ip_proto);
>       nhoff += ip->ihl * 4 + poff;
>       if (frag || poff < 0 || !pskb_may_pull(skb, nhoff + 4))
>               goto noports;
>
>       ports.v32 = * (__force u32 *) (skb->data + nhoff);
>       if (ip_proto == IPPROTO_ESP || ip_proto == IPPROTO_AH) {
>               ports.v32 = (ports.v32 & info->spimask) | nfo->spiset;
>       } else {
>               if (snatport)   /* Replace nat'ed port(s) */
>                       ports.v16[1] = snatport;
>               if (dnatport)
>                       ports.v16[0] = dnatport;
>               ports.v32 = (ports.v32 & info->pmask.v32) |
>                               info->pset.v32;
>               if (ports.v16[1] < ports.v16[0])
>                       swap(ports.v16[0], ports.v16[1]);
>       }
>
>noports:
>       ip_proto &= info->prmask;
>       /* get a consistent hash (same value on both flow directions)/
>       if (addr2 < addr1)
>               swap(addr1, addr2);
>
>       hash = jhash_3words(addr1, addr2, ports.v32, info->hashrnd) ^ p_proto;
>       if (info->hmod)
>               skb->mark = (hash % info->hmod) + info->hoffs;
>       return XT_CONTINUE;
> }

Hm, I think the fragmentation handling is broken.

Say that the first fragment contains the transport header
header, then the mark is calculated based on the address and ports.
Then, later on fragments will receive the mark based on the network
header only. They may have different marks.

If you don't want to use conntrack in your setup and you want to handle
fragments, then you have to configure HMARK to calculate the hashing
based on the network addresses. If you want to fully support fragments,
then enable conntrack and you can configure HMARK to calculate the
hashing based on network address + transport bits.

Fix this by removing the fragmentation handling, then assume that
people can select between two hashing configuration for HMARK. One
based for network address which is fragment-safe, one that uses the
transport layer information, that requires conntrack. Otherwise, I
don't see a sane way to handle this situation.

I think this has to be documented in the iptables manpage for HMARK.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark
  2011-11-25 17:36   ` Pablo Neira Ayuso
@ 2011-11-25 18:31     ` Jan Engelhardt
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Engelhardt @ 2011-11-25 18:31 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Hans Schillstrom, kaber, netfilter-devel, netdev, hans.schillstrom

On Friday 2011-11-25 18:36, Pablo Neira Ayuso wrote:

>On Fri, Nov 25, 2011 at 10:36:26AM +0100, Hans Schillstrom wrote:
>> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
>> index 3f0258d..9e4d4f9 100644
>> --- a/include/net/ipv6.h
>> +++ b/include/net/ipv6.h
>> @@ -39,6 +39,7 @@
>>  #define NEXTHDR_ICMP		58	/* ICMP for IPv6. */
>>  #define NEXTHDR_NONE		59	/* No next header */
>>  #define NEXTHDR_DEST		60	/* Destination options header. */
>> +#define NEXTHDR_SCTP		132	/* Stream Control Transport Protocol */
>>  #define NEXTHDR_MOBILITY	135	/* Mobility header. */
>>  
>>  #define NEXTHDR_MAX		255
>
>This has to go in a separated patch. Please, send it to netdev. I
>think davem can pick that for 3.2-rc

I do have to wonder a little why we need the l4proto values twice 
(IPPROTO_SCTP plus NEXTHDR_SCTP). Has nobody ever thought of
doing one foobar_<PROTOCOL>?

>> +	    icmph->type != ICMP_REDIRECT)
>> +		return nhoff;
>> +	/* Checkin full IP header plus 8 bytes of protocol to
>> +	 * avoid additional coding at protocol handlers.
>> +	 */
>> +	if (!pskb_may_pull(skb, nhoff + iphsz + sizeof(_ih) + 8))
>> +		return nhoff;

NB:I point out that the preferred long comment style begins with /*\n 
(to match the trailing \n*/, naturally) like in

>> +/*
>> + * ICMPv6
>> + * Input nhoff Offset into network header
>> + *       offset where ICMPv6 header starts
>> + * Returns true if it's a icmp error and updates nhoff
>> + */

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark
  2011-11-25  9:36 ` [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark Hans Schillstrom
                     ` (2 preceding siblings ...)
  2011-11-25 17:36   ` Pablo Neira Ayuso
@ 2011-11-30 15:51   ` Patrick McHardy
  2011-12-01  0:25     ` Hans Schillstrom
  3 siblings, 1 reply; 19+ messages in thread
From: Patrick McHardy @ 2011-11-30 15:51 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: pablo, jengelh, netfilter-devel, netdev, hans.schillstrom

On 11/25/2011 10:36 AM, Hans Schillstrom wrote:
> +__u32 hmark_v6(struct sk_buff *skb, const struct xt_action_param *par)
> +{
> +	struct xt_hmark_info *info = (struct xt_hmark_info *)par->targinfo;
> +	int nhoff, poff, hdrlen;
> +	u32 addr1, addr2, hash;
> +	struct ipv6hdr *ip6;
> +	u8 nexthdr;
> +	int frag = 0, ip6hdrlvl = 0;	/* Header level */
> +	struct ipv6_opt_hdr _hdr, *hp;
> +	union {
> +		u32 v32;
> +		u16 v16[2];
> +	} ports;
> +
> +	ports.v32 = 0;
> +	nhoff = skb_network_offset(skb);
> +
> +hdr_new:
> +	/* Get header info */
> +	ip6 = (struct ipv6hdr *) (skb->data + nhoff);
> +	nexthdr = ip6->nexthdr;
> +	hdrlen = sizeof(struct ipv6hdr);
> +	hp = skb_header_pointer(skb, nhoff + hdrlen, sizeof(_hdr),&_hdr);
> +
> +	while (nexthdr) {
> +		switch (nexthdr) {
> +		case IPPROTO_ICMPV6:
> +			/* ICMP Error then move ptr to inner header */
> +			if (get_inner6_hdr(skb,&nhoff, hdrlen)) {

This doesn't look right. You assume the ICMPv6 header is following
the IPv6 header with any other headers in between. If there are
other headers, hdrlen will contain the length of the last header.

> +				ip6hdrlvl++;
> +				if (!pskb_may_pull(skb, sizeof(_hdr) + nhoff))
> +					return XT_CONTINUE;
> +				goto hdr_new;
> +			}
> +			nhoff += hdrlen;
> +			goto hdr_rdy;
> +
> +		case NEXTHDR_FRAGMENT:
> +			if (!ip6hdrlvl) /* Do not use ports if fragmented */
> +				frag = 1;

Shouldn't you also check for fragment offset == 0 here?
The fragment header also doesn't include the length, so
using ipv6_optlen() below is incorrect.
> +			break;
> +
> +		/* End of hdr traversing cont. with ports and hash calc. */
> +		case NEXTHDR_IPV6:	/* Do not process tunnels */

That comment looks misleading, you do seem to process them?

> +		case NEXTHDR_TCP:
> +		case NEXTHDR_UDP:
> +		case NEXTHDR_ESP:
> +		case NEXTHDR_AUTH:

Don't you want to use the port numbers if only authentication
without encryption is used?

> +		case NEXTHDR_SCTP:
> +		case NEXTHDR_NONE:	/* Last hdr of something unknown */
> +			nhoff += hdrlen;
> +			goto hdr_rdy;
> +		default:
> +			return XT_CONTINUE;
> +		}
> +		if (!hp)
> +			return XT_CONTINUE;
> +		nhoff += hdrlen;	/* eat current header */
> +		nexthdr =  hp->nexthdr;	/* Next header */
> +		hdrlen = ipv6_optlen(hp);
> +		hp = skb_header_pointer(skb, nhoff + hdrlen, sizeof(_hdr),
> +					&_hdr);
> +
> +		if (!pskb_may_pull(skb, nhoff))
> +			return XT_CONTINUE;
> +	}

And final question, why not simply use ipv6_skip_exthdr()?

> +hdr_rdy:
> +...



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark
  2011-11-30 15:51   ` Patrick McHardy
@ 2011-12-01  0:25     ` Hans Schillstrom
  2011-12-01 10:05       ` Patrick McHardy
  0 siblings, 1 reply; 19+ messages in thread
From: Hans Schillstrom @ 2011-12-01  0:25 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: pablo, jengelh, netfilter-devel, netdev, hans.schillstrom

On Wednesday, November 30, 2011 16:51:35 Patrick McHardy wrote:
> On 11/25/2011 10:36 AM, Hans Schillstrom wrote:
> > +__u32 hmark_v6(struct sk_buff *skb, const struct xt_action_param *par)
> > +{
> > +	struct xt_hmark_info *info = (struct xt_hmark_info *)par->targinfo;
> > +	int nhoff, poff, hdrlen;
> > +	u32 addr1, addr2, hash;
> > +	struct ipv6hdr *ip6;
> > +	u8 nexthdr;
> > +	int frag = 0, ip6hdrlvl = 0;	/* Header level */
> > +	struct ipv6_opt_hdr _hdr, *hp;
> > +	union {
> > +		u32 v32;
> > +		u16 v16[2];
> > +	} ports;
> > +
> > +	ports.v32 = 0;
> > +	nhoff = skb_network_offset(skb);
> > +
> > +hdr_new:
> > +	/* Get header info */
> > +	ip6 = (struct ipv6hdr *) (skb->data + nhoff);
> > +	nexthdr = ip6->nexthdr;
> > +	hdrlen = sizeof(struct ipv6hdr);
> > +	hp = skb_header_pointer(skb, nhoff + hdrlen, sizeof(_hdr),&_hdr);
> > +
> > +	while (nexthdr) {
> > +		switch (nexthdr) {
> > +		case IPPROTO_ICMPV6:
> > +			/* ICMP Error then move ptr to inner header */
> > +			if (get_inner6_hdr(skb,&nhoff, hdrlen)) {
> 
> This doesn't look right. You assume the ICMPv6 header is following
> the IPv6 header with any other headers in between. If there are
> other headers, hdrlen will contain the length of the last header.
 
RFC-4443  "Every ICMPv6 message is preceded by an IPv6 header and zero or more IPv6 extension headers."
hdrlen is actually previous header length in bytes, to be correct.
nhoff is the sum of processed headers.
So in case of an icmp the nhoff will be updated, and hdrlen preset to ipv6hdr size

> > +				ip6hdrlvl++;
> > +				if (!pskb_may_pull(skb, sizeof(_hdr) + nhoff))
> > +					return XT_CONTINUE;
> > +				goto hdr_new;
> > +			}
> > +			nhoff += hdrlen;
> > +			goto hdr_rdy;
> > +
> > +		case NEXTHDR_FRAGMENT:
> > +			if (!ip6hdrlvl) /* Do not use ports if fragmented */
> > +				frag = 1;
> 
> Shouldn't you also check for fragment offset == 0 here?

According to the RFC "Initialized to zero for transmission; ignored on reception"

> The fragment header also doesn't include the length, so
> using ipv6_optlen() below is incorrect.

True, it has a fixed size, of 8 octets
I'll fix that. 
(as long as it is zero it will work :-)

> > +			break;
> > +
> > +		/* End of hdr traversing cont. with ports and hash calc. */
> > +		case NEXTHDR_IPV6:	/* Do not process tunnels */
> 
> That comment looks misleading, you do seem to process them?

Ooops a "return XT_CONTINUE;" seems to be missing here.

> 
> > +		case NEXTHDR_TCP:
> > +		case NEXTHDR_UDP:
> > +		case NEXTHDR_ESP:
> > +		case NEXTHDR_AUTH:
> 
> Don't you want to use the port numbers if only authentication
> without encryption is used?

with esp or ah the SPI will be used instead of ports.
Useful or not I don't know since they are asymmetric in terms of a flow.

> 
> > +		case NEXTHDR_SCTP:
> > +		case NEXTHDR_NONE:	/* Last hdr of something unknown */
> > +			nhoff += hdrlen;
> > +			goto hdr_rdy;
> > +		default:
> > +			return XT_CONTINUE;
> > +		}
> > +		if (!hp)
> > +			return XT_CONTINUE;
> > +		nhoff += hdrlen;	/* eat current header */
> > +		nexthdr =  hp->nexthdr;	/* Next header */
> > +		hdrlen = ipv6_optlen(hp);
> > +		hp = skb_header_pointer(skb, nhoff + hdrlen, sizeof(_hdr),
> > +					&_hdr);
> > +
> > +		if (!pskb_may_pull(skb, nhoff))
> > +			return XT_CONTINUE;
> > +	}
> 
> And final question, why not simply use ipv6_skip_exthdr()?

problems with fragments...
But when looking into ipv6_skip_exthdr() again I realize that handling of 
NEXTHDR_HOP NEXTHDR_ROUTING, and NEXTHDR_DEST is wrong.

It think I need to rewrite this part a bit

just skip this headers;
	case NEXTHDR_HOP:
	case NEXTHDR_ROUTING:
	case NEXTHDR_DEST:
		break;

and exit on this:
	case NEXTHDR_IPV6:
	case	NEXTHDR_NONE:
	default:
		return XT_CONTINUE;	
> 
> > +hdr_rdy:
> > +...
> 
> 

Thanks
Hans

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark
  2011-12-01  0:25     ` Hans Schillstrom
@ 2011-12-01 10:05       ` Patrick McHardy
  0 siblings, 0 replies; 19+ messages in thread
From: Patrick McHardy @ 2011-12-01 10:05 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: pablo, jengelh, netfilter-devel, netdev, hans.schillstrom

On 12/01/2011 01:25 AM, Hans Schillstrom wrote:
> On Wednesday, November 30, 2011 16:51:35 Patrick McHardy wrote:
>> On 11/25/2011 10:36 AM, Hans Schillstrom wrote:
>>> +
>>> +hdr_new:
>>> +	/* Get header info */
>>> +	ip6 = (struct ipv6hdr *) (skb->data + nhoff);
>>> +	nexthdr = ip6->nexthdr;
>>> +	hdrlen = sizeof(struct ipv6hdr);
>>> +	hp = skb_header_pointer(skb, nhoff + hdrlen, sizeof(_hdr),&_hdr);
>>> +
>>> +	while (nexthdr) {
>>> +		switch (nexthdr) {
>>> +		case IPPROTO_ICMPV6:
>>> +			/* ICMP Error then move ptr to inner header */
>>> +			if (get_inner6_hdr(skb,&nhoff, hdrlen)) {
>> This doesn't look right. You assume the ICMPv6 header is following
>> the IPv6 header with any other headers in between. If there are
>> other headers, hdrlen will contain the length of the last header.
>
> RFC-4443  "Every ICMPv6 message is preceded by an IPv6 header and zero or more IPv6 extension headers."
> hdrlen is actually previous header length in bytes, to be correct.
> nhoff is the sum of processed headers.
> So in case of an icmp the nhoff will be updated, and hdrlen preset to ipv6hdr size

Right, I missed that you're using nhoff + hdrlen in
get_inner6_hdr().

>>> +				ip6hdrlvl++;
>>> +				if (!pskb_may_pull(skb, sizeof(_hdr) + nhoff))
>>> +					return XT_CONTINUE;
>>> +				goto hdr_new;
>>> +			}
>>> +			nhoff += hdrlen;
>>> +			goto hdr_rdy;
>>> +
>>> +		case NEXTHDR_FRAGMENT:
>>> +			if (!ip6hdrlvl) /* Do not use ports if fragmented */
>>> +				frag = 1;
>> Shouldn't you also check for fragment offset == 0 here?
> According to the RFC "Initialized to zero for transmission; ignored on reception"

No, what I meant is that for the first fragment, you do
have the upper layer header available. But as we already
discussed for a stable identifier you want to ignore it
anyways.

>>> +		case NEXTHDR_TCP:
>>> +		case NEXTHDR_UDP:
>>> +		case NEXTHDR_ESP:
>>> +		case NEXTHDR_AUTH:
>> Don't you want to use the port numbers if only authentication
>> without encryption is used?
> with esp or ah the SPI will be used instead of ports.
> Useful or not I don't know since they are asymmetric in terms of a flow.

Yes, but with AH you could either use the ESP SPI or if no ESP
is used the port numbers of the upper layer protocol.

>> And final question, why not simply use ipv6_skip_exthdr()?
> problems with fragments...

So the probem is that it will return the transport layer protocol
header for fragments with frag_off == 0? We also have ipv6_find_hdr()
which we could modify to indicate this in the frag_off pointer.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v4 PATCH 1/2] NETFILTER module xt_hmark, new target  for HASH based fwmark
  2011-12-01 11:39 Re[2]: " Hans Schillstrom
@ 2011-12-01 11:46 ` Patrick McHardy
  0 siblings, 0 replies; 19+ messages in thread
From: Patrick McHardy @ 2011-12-01 11:46 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: pablo, jengelh, netfilter-devel, netdev, hans.schillstrom

On 12/01/2011 12:39 PM, Hans Schillstrom wrote:
> t: Re: [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark
>> On 12/01/2011 12:05 PM, Hans Schillstrom wrote:
>>>>>> And final question, why not simply use ipv6_skip_exthdr()?
>>>>> problems with fragments...
>>>> So the probem is that it will return the transport layer protocol
>>>> header for fragments with frag_off == 0? We also have ipv6_find_hdr()
>>>> which we could modify to indicate this in the frag_off pointer.
>>> ipv6_find_hdr() will do the trick with a light modification
>>> What about a wrapper like:
>>>
>>> int __ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
>>> 		  int target, unsigned short *fragoff,  int *fragflg)
>>> {
>>> ...
>>> 		if (nexthdr == NEXTHDR_FRAGMENT) {
>>> 			unsigned short _frag_off;
>>> 			__be16 *fp;
>>>
>>> 			if (fragflg)
>>> 			        fragflg = 1;
>>> 			fp = skb_header_pointer(skb,
>>> 						start+offsetof(struct frag_hdr,
>>> 							       frag_off),
>>> 						sizeof(_frag_off),
>>> 						&_frag_off);
>>>
>>> ...
>>> }
>>>
>>> int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
>>> 		  int target, unsigned short *fragoff)
>>> {
>>>           return __ipv6_find_hdr(skb, offset, terget, fragoff, NULL);
>>> }
>> Hmm that would require to change all current callers.
> Nope, ipv6_find_hdr()  looks the same,
> __ipv6_find_hdr() have an extra param.

Ah, right, apparently need more coffee :)

>> I was more thinking of unconditionally setting *frag_off in case of
>> fragments, then you can initialize it to some impossible value
>> like 0xffff and determine the presence of a fragment header
>> based on its value after calling ipv6_find_hdr().
> That's another way  :-)
>
> Which one do you prefer ?

You way seems cleaner to me, lets do that.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v4 PATCH 1/2] NETFILTER module xt_hmark, new target  for HASH based fwmark
  2011-12-01 11:05 Re[2]: " Hans Schillstrom
@ 2011-12-01 11:24 ` Patrick McHardy
  0 siblings, 0 replies; 19+ messages in thread
From: Patrick McHardy @ 2011-12-01 11:24 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: pablo, jengelh, netfilter-devel, netdev, hans.schillstrom

On 12/01/2011 12:05 PM, Hans Schillstrom wrote:
>>>> And final question, why not simply use ipv6_skip_exthdr()?
>>> problems with fragments...
>> So the probem is that it will return the transport layer protocol
>> header for fragments with frag_off == 0? We also have ipv6_find_hdr()
>> which we could modify to indicate this in the frag_off pointer.
> ipv6_find_hdr() will do the trick with a light modification
> What about a wrapper like:
>
> int __ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
> 		  int target, unsigned short *fragoff,  int *fragflg)
> {
> ...
> 		if (nexthdr == NEXTHDR_FRAGMENT) {
> 			unsigned short _frag_off;
> 			__be16 *fp;
>
> 			if (fragflg)
> 			        fragflg = 1;
> 			fp = skb_header_pointer(skb,
> 						start+offsetof(struct frag_hdr,
> 							       frag_off),
> 						sizeof(_frag_off),
> 						&_frag_off);
>
> ...
> }
>
> int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
> 		  int target, unsigned short *fragoff)
> {
>          return __ipv6_find_hdr(skb, offset, terget, fragoff, NULL);
> }

Hmm that would require to change all current callers. I was more
thinking of unconditionally setting *frag_off in case of
fragments, then you can initialize it to some impossible value
like 0xffff and determine the presence of a fragment header
based on its value after calling ipv6_find_hdr().


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark
  2011-11-30 18:28   ` Pablo Neira Ayuso
@ 2011-12-01  0:52     ` Hans Schillstrom
  0 siblings, 0 replies; 19+ messages in thread
From: Hans Schillstrom @ 2011-12-01  0:52 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Patrick McHardy, jengelh, netfilter-devel, netdev, hans.schillstrom


On Wednesday, November 30, 2011 19:28:15 Pablo Neira Ayuso wrote:
> On Wed, Nov 30, 2011 at 04:27:26PM +0100, Patrick McHardy wrote:
> > On 11/28/2011 10:36 AM, Hans Schillstrom wrote:
> > >>If you don't want to use conntrack in your setup and you want to handle
> > >>fragments, then you have to configure HMARK to calculate the hashing
> > >>based on the network addresses. If you want to fully support fragments,
> > >>then enable conntrack and you can configure HMARK to calculate the
> > >>hashing based on network address + transport bits.
> > >>
> > >>Fix this by removing the fragmentation handling, then assume that
> > >>people can select between two hashing configuration for HMARK. One
> > >>based for network address which is fragment-safe, one that uses the
> > >>transport layer information, that requires conntrack. Otherwise, I
> > >>don't see a sane way to handle this situation.
> > >Correct me if I'm wrong here,
> > >If conntrack is enabled hmark don't see the packet until it is reassembled and
> > >in that case the fragmentation header is removed.
> > >
> > >So, with conntrack HMARK will operate on full packets not fragments
> > >without conntrack ports will not be used on any fragment
> > 
> > Correct.
> 
> To complete what Patrick said. They are collected but not linearized.
> That's why you have to use skb_header_pointer.
OK, thanks
I'll will do that.

> 
> > You don't necessarily need conntrack for defragmentation though,
> > we've moved defragmentation to a seperate module for TPROXY. You
> > can depend on that and get defragmentation without full
> > connection tracking.
> 
> Indeed, I missed this. That way you can skip conntrack but solving the
> broken fragments handling.

In a cluster with inputs on many different blades (with a Virtual IP address) you can
receive the fragments on different blades and in that case there should not be any
defrag in that node. HMARK will just produce the same fw-mark for all fragments.

In our case defrag will happens in next hop i.e. before going into IPVS.

As mention earlier, this needs to be documented better.
I will add this to the man page.

Thanks
Hans

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark
  2011-11-30 15:27 ` Patrick McHardy
@ 2011-11-30 18:28   ` Pablo Neira Ayuso
  2011-12-01  0:52     ` Hans Schillstrom
  0 siblings, 1 reply; 19+ messages in thread
From: Pablo Neira Ayuso @ 2011-11-30 18:28 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Hans Schillstrom, jengelh, netfilter-devel, netdev, hans.schillstrom

On Wed, Nov 30, 2011 at 04:27:26PM +0100, Patrick McHardy wrote:
> On 11/28/2011 10:36 AM, Hans Schillstrom wrote:
> >>If you don't want to use conntrack in your setup and you want to handle
> >>fragments, then you have to configure HMARK to calculate the hashing
> >>based on the network addresses. If you want to fully support fragments,
> >>then enable conntrack and you can configure HMARK to calculate the
> >>hashing based on network address + transport bits.
> >>
> >>Fix this by removing the fragmentation handling, then assume that
> >>people can select between two hashing configuration for HMARK. One
> >>based for network address which is fragment-safe, one that uses the
> >>transport layer information, that requires conntrack. Otherwise, I
> >>don't see a sane way to handle this situation.
> >Correct me if I'm wrong here,
> >If conntrack is enabled hmark don't see the packet until it is reassembled and
> >in that case the fragmentation header is removed.
> >
> >So, with conntrack HMARK will operate on full packets not fragments
> >without conntrack ports will not be used on any fragment
> 
> Correct.

To complete what Patrick said. They are collected but not linearized.
That's why you have to use skb_header_pointer.

> You don't necessarily need conntrack for defragmentation though,
> we've moved defragmentation to a seperate module for TPROXY. You
> can depend on that and get defragmentation without full
> connection tracking.

Indeed, I missed this. That way you can skip conntrack but solving the
broken fragments handling.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v4 PATCH 1/2] NETFILTER module xt_hmark, new target  for HASH based fwmark
  2011-11-28  9:36 Re[2]: [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark Hans Schillstrom
@ 2011-11-30 15:27 ` Patrick McHardy
  2011-11-30 18:28   ` Pablo Neira Ayuso
  0 siblings, 1 reply; 19+ messages in thread
From: Patrick McHardy @ 2011-11-30 15:27 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: Pablo Neira Ayuso, jengelh, netfilter-devel, netdev, hans.schillstrom

On 11/28/2011 10:36 AM, Hans Schillstrom wrote:
>> If you don't want to use conntrack in your setup and you want to handle
>> fragments, then you have to configure HMARK to calculate the hashing
>> based on the network addresses. If you want to fully support fragments,
>> then enable conntrack and you can configure HMARK to calculate the
>> hashing based on network address + transport bits.
>>
>> Fix this by removing the fragmentation handling, then assume that
>> people can select between two hashing configuration for HMARK. One
>> based for network address which is fragment-safe, one that uses the
>> transport layer information, that requires conntrack. Otherwise, I
>> don't see a sane way to handle this situation.
> Correct me if I'm wrong here,
> If conntrack is enabled hmark don't see the packet until it is reassembled and
> in that case the fragmentation header is removed.
>
> So, with conntrack HMARK will operate on full packets not fragments
> without conntrack ports will not be used on any fragment

Correct.

You don't necessarily need conntrack for defragmentation though,
we've moved defragmentation to a seperate module for TPROXY. You
can depend on that and get defragmentation without full
connection tracking.



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-12-01 11:46 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-25  9:36 [v4 PATCH 0/2] NETFILTER new target module, HMARK Hans Schillstrom
2011-11-25  9:36 ` [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark Hans Schillstrom
2011-11-25 14:19   ` David Laight
2011-11-25 14:36     ` Eric Dumazet
2011-11-25 14:43   ` Eric Dumazet
2011-11-25 17:36   ` Pablo Neira Ayuso
2011-11-25 18:31     ` Jan Engelhardt
2011-11-30 15:51   ` Patrick McHardy
2011-12-01  0:25     ` Hans Schillstrom
2011-12-01 10:05       ` Patrick McHardy
2011-11-25  9:36 ` [v4 PATCH 2/2] NETFILTER userspace part for target HMARK Hans Schillstrom
2011-11-25 13:20   ` Jan Engelhardt
2011-11-25 14:04     ` Hans Schillström
2011-11-25 15:44       ` Jan Engelhardt
2011-11-28  9:36 Re[2]: [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark Hans Schillstrom
2011-11-30 15:27 ` Patrick McHardy
2011-11-30 18:28   ` Pablo Neira Ayuso
2011-12-01  0:52     ` Hans Schillstrom
2011-12-01 11:05 Re[2]: " Hans Schillstrom
2011-12-01 11:24 ` Patrick McHardy
2011-12-01 11:39 Re[2]: " Hans Schillstrom
2011-12-01 11:46 ` Patrick McHardy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).