* [v2 PATCH 0/2] NETFILTER new target module, HMARK
@ 2011-10-03 17:46 Hans Schillstrom
2011-10-03 17:46 ` [v2 PATCH 1/2] NETFILTER module xt_hmark new target for HASH based fw Hans Schillstrom
2011-10-03 17:46 ` [v2 PATCH 2/2] NETFILTER userspace part for target HMARK Hans Schillstrom
0 siblings, 2 replies; 14+ messages in thread
From: Hans Schillstrom @ 2011-10-03 17:46 UTC (permalink / raw)
To: kaber, pablo, jengelh, netfilter-devel, netdev; +Cc: hans, Hans Schillstrom
The target allows you to create rules in the "raw" and "mangle" tables
which alter the netfilter mark (nfmark) field within a given range.
First a 32 bit hash value is generated then modulus by <limit> and
finally an offset is added before it's written to nfmark.
Prior to routing, the nfmark can influence the routing method (see
"Use netfilter MARK value as routing key") and can also be used by
other subsystems to change their behaviour.
The mark match can also be used to match nfmark produced by this module.
See the kernel module for more info.
REVISION
Version 2
NAT Added for IPv4
IPv6 ICMP handling enhanced.
Usage example added
Version 1
Initial RFC
We (Ericsson) use hmark in-front of ipvs as a pre-loadbalancer and
handles up to 70 ipvs running in parallel in clusters.
However hmark is not restricted to run infront of IPVS it can also be used as
"poor mans" load balancer.
With this version is also NAT supported as an option, with very high flows
you might not want to use conntrack.
The idea is to generate a direction independent fw mark range to use as input to
the routing (i.e. ip rule add fwmark ...).
Pretty straight forward and simple.
Example:
App Server (Real Server)
+---------+
-->| Service |
Gateway A +---------+
/
+----------+ / +----+ +---------+
--- if -A---| selector |----> |ipvs| --->| Service |
+----------+ \ +----+ +---------+
\
+----+ +---------+
|ipvs| -->| Service |
+----+ +---------+
Gateway C
+----------+ / +----+
--- if-B ---| selector | ---> |ipvs|
+----------+ \ +----+ +---------+
| Service |
+---------+
/
+----------+ / +----+ ..
--- if-B ---| selector | ---> |ipvs| +---------+
+----------+ \ +----+ | Service |
\ +---------+
#
# Example with four ipvs loadbalancers
#
iptables -t mangle -I PREROUTING -d $IPADDR -j HMARK --hmark-mod 4 --hmark-offs 100
ip rule add fwmark 100 table 100
ip rule add fwmark 101 table 101
ip rule add fwmark 102 table 102
ip rule add fwmark 103 table 103
ip ro ad table 100 default via x.y.z.1 dev bond1
ip ro ad table 101 default via x.y.z.2 dev bond1
ip ro ad table 102 default via x.y.z.3 dev bond1
ip ro ad table 103 default via x.y.z.4 dev bond1
If conntrack doesn't handle the return path,
do the oposite with HMARK and send it back right to ipvs.
Another exmaple of usage could be if you have cluster originated connections
and want to spread the connections over a number of interfaces
(NAT will complpicate things for you in this case)
\ Blade 1
\ +----------+ +---------+
<-- | selector | <--- | Service |
/ +----------+ +---------+
/
+------+
-- | Gw-A | \ Blade 2
+------+ \ +----------+ +---------+
+------+ <-- | selector | <--- | Service |
-- | Gw-B | / +----------+ +---------+
+------+ /
+------+
-- | Gw-C | \
+------+ \ +----------+ +---------+
<-- | selector | <--- | Service |
/ +----------+ +---------+
/
\ Blande -n
\ +----------+ +---------+
<-- | selector | <--- | Service |
/ +----------+ +---------+
/
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [v2 PATCH 1/2] NETFILTER module xt_hmark new target for HASH based fw
2011-10-03 17:46 [v2 PATCH 0/2] NETFILTER new target module, HMARK Hans Schillstrom
@ 2011-10-03 17:46 ` Hans Schillstrom
2011-11-07 0:52 ` Pablo Neira Ayuso
2011-10-03 17:46 ` [v2 PATCH 2/2] NETFILTER userspace part for target HMARK Hans Schillstrom
1 sibling, 1 reply; 14+ messages in thread
From: Hans Schillstrom @ 2011-10-03 17:46 UTC (permalink / raw)
To: kaber, pablo, jengelh, netfilter-devel, netdev; +Cc: hans, Hans Schillstrom
The target allows you to create rules in the "raw" and "mangle" tables
which alter the netfilter mark (nfmark) field within a given range.
First a 32 bit hash value is generated then modulus by <limit> and
finally an offset is added before it's written to nfmark.
Prior to routing, the nfmark can influence the routing method (see
"Use netfilter MARK value as routing key") and can also be used by
other subsystems to change their behavior.
man page
HMARK
This module does the same as MARK, i.e. set an fwmark,
but the mark is based on a hash value. The hash is based on
saddr, daddr, sport, dport and proto. The same mark will be produced
independet of direction if no masks is set or the same masks is used for
src and dest. The hash mark could be adjusted by modulus and finaly an
offset could be added, i.e the final mark will be within a range.
ICMP errors will have hash calc based on the original message.
Note: None of the parameters effect the packet it self
only the calculated hash value.
Parameters: For all masks default is all "1:s", to disable a field
use mask 0. For IPv6 it's just the last 32 bits that
is included in the hash.
--hmark-smask value
The value to AND the source address with (saddr & value).
--hmark-dmask value
The value to AND the dest. address with (daddr & value).
--hmark-sp-mask value
A 16 bit value to AND the src port with (sport & value).
--hmark-dp-mask value
A 16 bit value to AND the dest port with (dport & value).
--hmark-sp-set value
A 16 bit value to OR the src port with (sport | value).
--hmark-dp-set value
A 16 bit value to OR the dest port with (dport | value).
--hmark-spi-mask value
Value to AND the spi field with (spi & value) valid for proto esp or ah.
--hmark-spi-set value
Value to OR the spi field with (spi | value) valid for proto esp or ah.
--hmark-proto-mask value
A 16 bit value to AND the L4 proto field with (proto & value).
--hmark-rnd value
A 32 bit intitial value for hash calc, default is 0xc175a3b8.
--hmark-dnat
Replace src addr/port with original dst addr/port before calc, hash
--hmark-snat
Replace dst addr/port with original src addr/port before calc, hash
Final processing of the mark in order of execution.
--hmark-mod value (must be > 0)
The easiest way to describe this is: hash = hash mod <value>
--hmark-offs alue (must be > 0)
The easiest way to describe this is: hash = hash + <value>
Examples:
Default rule handles all TCP, UDP, SCTP, ESP & AH
Rev 2
IPv6 header scan changed to follow RFC 2640
IPv4 icmp echo fragmented does now use proto as ipv6
IPv6 pskb_may_pull() check is done in every time in header loop.
IPv4 nat support added.
default added in IPv6 loop and null check of hp
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
include/linux/netfilter/xt_hmark.h | 48 ++++++
net/netfilter/Kconfig | 17 ++
net/netfilter/Makefile | 1 +
net/netfilter/xt_hmark.c | 320 ++++++++++++++++++++++++++++++++++++
4 files changed, 386 insertions(+), 0 deletions(-)
create mode 100644 include/linux/netfilter/xt_hmark.h
create mode 100644 net/netfilter/xt_hmark.c
diff --git a/include/linux/netfilter/xt_hmark.h b/include/linux/netfilter/xt_hmark.h
new file mode 100644
index 0000000..6c1436a
--- /dev/null
+++ b/include/linux/netfilter/xt_hmark.h
@@ -0,0 +1,48 @@
+#ifndef XT_HMARK_H_
+#define XT_HMARK_H_
+
+#include <linux/types.h>
+
+/*
+ * Flags must not start at 0, since it's used as none.
+ */
+enum {
+ XT_HMARK_SADR_AND = 1, /* SNAT & DNAT are used by the kernel module */
+ XT_HMARK_DADR_AND,
+ XT_HMARK_SPI_AND,
+ XT_HMARK_SPI_OR,
+ XT_HMARK_SPORT_AND,
+ XT_HMARK_DPORT_AND,
+ XT_HMARK_SPORT_OR,
+ XT_HMARK_DPORT_OR,
+ XT_HMARK_PROTO_AND,
+ XT_HMARK_RND,
+ XT_HMARK_MODULUS,
+ XT_HMARK_OFFSET,
+ XT_HMARK_USE_SNAT,
+ XT_HMARK_USE_DNAT,
+};
+
+union ports {
+ struct {
+ __u16 src;
+ __u16 dst;
+ } p16;
+ __u32 v32;
+};
+
+struct xt_hmark_info {
+ __u32 smask; /* Source address mask */
+ __u32 dmask; /* Dest address mask */
+ union ports pmask;
+ union ports pset;
+ __u32 spimask;
+ __u32 spiset;
+ __u16 flags; /* Print out only */
+ __u16 prmask; /* L4 Proto mask */
+ __u32 hashrnd;
+ __u32 hmod; /* Modulus */
+ __u32 hoffs; /* Offset */
+};
+
+#endif /* XT_HMARK_H_ */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 32bff6d..3abd3a4 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -483,6 +483,23 @@ config NETFILTER_XT_TARGET_IDLETIMER
To compile it as a module, choose M here. If unsure, say N.
+config NETFILTER_XT_TARGET_HMARK
+ tristate '"HMARK" target support'
+ depends on NETFILTER_ADVANCED
+ ---help---
+ This option adds the "HMARK" target.
+
+ The target allows you to create rules in the "raw" and "mangle" tables
+ which alter the netfilter mark (nfmark) field within a given range.
+ First a 32 bit hash value is generated then modulus by <limit> and
+ finally an offset is added before it's written to nfmark.
+
+ Prior to routing, the nfmark can influence the routing method (see
+ "Use netfilter MARK value as routing key") and can also be used by
+ other subsystems to change their behavior.
+
+ The mark match can also be used to match nfmark produced by this module.
+
config NETFILTER_XT_TARGET_LED
tristate '"LED" target support'
depends on LEDS_CLASS && LEDS_TRIGGERS
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 1a02853..359eeb6 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -56,6 +56,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_CONNSECMARK) += xt_CONNSECMARK.o
obj-$(CONFIG_NETFILTER_XT_TARGET_CT) += xt_CT.o
obj-$(CONFIG_NETFILTER_XT_TARGET_DSCP) += xt_DSCP.o
obj-$(CONFIG_NETFILTER_XT_TARGET_HL) += xt_HL.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_HMARK) += xt_hmark.o
obj-$(CONFIG_NETFILTER_XT_TARGET_LED) += xt_LED.o
obj-$(CONFIG_NETFILTER_XT_TARGET_NFLOG) += xt_NFLOG.o
obj-$(CONFIG_NETFILTER_XT_TARGET_NFQUEUE) += xt_NFQUEUE.o
diff --git a/net/netfilter/xt_hmark.c b/net/netfilter/xt_hmark.c
new file mode 100644
index 0000000..2f0aa93
--- /dev/null
+++ b/net/netfilter/xt_hmark.c
@@ -0,0 +1,320 @@
+/*
+ * xt_hmark - Netfilter module to set mark as hash value
+ *
+ * (C) 2010 Hans Schillstrom <hans.schillstrom@ericsson.com>
+ *
+ * Description:
+ * This module calculates a hash value that can be modified by modulus
+ * and an offset. The hash value is based on a direction independent
+ * five tuple: src & dst addr src & dst ports and protocol.
+ * However src & dst port can be masked and are not used for fragmented
+ * packets, ESP and AH don't have ports so SPI will be used instead.
+ * For ICMP error messages the hash mark values will be calculated on
+ * the source packet i.e. the packet caused the error (If sufficient
+ * amount of data exists).
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <net/ip.h>
+#include <linux/icmp.h>
+
+#include <linux/netfilter/xt_hmark.h>
+#include <linux/netfilter/x_tables.h>
+#include <net/netfilter/nf_nat.h>
+
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+# define WITH_IPV6 1
+#include <net/ipv6.h>
+#include <linux/netfilter_ipv6/ip6_tables.h>
+#endif
+
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Hans Schillstrom <hans.schillstrom@ericsson.com>");
+MODULE_DESCRIPTION("Xtables: packet range mark operations by hash value");
+MODULE_ALIAS("ipt_HMARK");
+MODULE_ALIAS("ip6t_HMARK");
+
+/*
+ * ICMP, get inner header so calc can be made on the source message
+ * not the icmp header, i.e. same hash mark must be produced
+ * on an icmp error message.
+ */
+static int get_inner_hdr(struct sk_buff *skb, int iphsz, int nhoff)
+{
+ const struct icmphdr *icmph;
+ struct icmphdr _ih;
+ struct iphdr *iph = NULL;
+
+ /* Not enough header? */
+ icmph = skb_header_pointer(skb, nhoff + iphsz, sizeof(_ih), &_ih);
+ if (icmph == NULL)
+ goto out;
+
+ if (icmph->type > NR_ICMP_TYPES)
+ goto out;
+
+
+ /* Error message? */
+ if (icmph->type != ICMP_DEST_UNREACH &&
+ icmph->type != ICMP_SOURCE_QUENCH &&
+ icmph->type != ICMP_TIME_EXCEEDED &&
+ icmph->type != ICMP_PARAMETERPROB &&
+ icmph->type != ICMP_REDIRECT)
+ goto out;
+ /* Checkin full IP header plus 8 bytes of protocol to
+ * avoid additional coding at protocol handlers.
+ */
+ if (!pskb_may_pull(skb, nhoff + iphsz + sizeof(_ih) + 8))
+ goto out;
+
+ iph = (struct iphdr *)(skb->data + nhoff + iphsz + sizeof(_ih));
+ return nhoff + iphsz + sizeof(_ih);
+out:
+ return nhoff;
+}
+/*
+ * ICMPv6
+ * Input nhoff Offset into network header
+ * offset where ICMPv6 header starts
+ * Returns true if it's a icmp error and updates nhoff
+ */
+#ifdef WITH_IPV6
+static int get_inner6_hdr(struct sk_buff *skb, int *offset, int hdrlen)
+{
+ struct icmp6hdr *icmp6h;
+ struct icmp6hdr _ih6;
+
+ icmp6h = skb_header_pointer(skb, *offset + hdrlen, sizeof(_ih6), &_ih6);
+ if (icmp6h == NULL)
+ goto out;
+
+ if (icmp6h->icmp6_type && icmp6h->icmp6_type < 128) {
+ *offset += hdrlen + sizeof(_ih6);
+ return 1;
+ }
+out:
+ return 0;
+}
+#endif
+
+/*
+ * Calc hash value, special casre is taken on icmp and fragmented messages
+ * i.e. fragmented messages don't use ports.
+ */
+static __u32 get_hash(struct sk_buff *skb, struct xt_hmark_info *info)
+{
+ int nhoff, hash = 0, poff, proto, frag = 0;
+ struct iphdr *ip;
+ u8 ip_proto;
+ u32 addr1, addr2, ihl;
+ u16 snatport = 0, dnatport = 0;
+ union {
+ u32 v32;
+ u16 v16[2];
+ } ports;
+
+ nhoff = skb_network_offset(skb);
+ proto = skb->protocol;
+
+ if (!proto && skb->sk) {
+ if (skb->sk->sk_family == AF_INET)
+ proto = __constant_htons(ETH_P_IP);
+ else if (skb->sk->sk_family == AF_INET6)
+ proto = __constant_htons(ETH_P_IPV6);
+ }
+
+ switch (proto) {
+ case __constant_htons(ETH_P_IP):
+ {
+ enum ip_conntrack_info ctinfo;
+ struct nf_conn *ct = ct = nf_ct_get(skb, &ctinfo);
+ struct nf_conntrack_tuple *otuple, *rtuple;
+
+ if (!pskb_may_pull(skb, sizeof(*ip) + nhoff))
+ goto done;
+
+ ip = (struct iphdr *) (skb->data + nhoff);
+ if (ip->protocol == IPPROTO_ICMP) {
+ /* Switch hash calc to inner header ? */
+ nhoff = get_inner_hdr(skb, ip->ihl * 4, nhoff);
+ ip = (struct iphdr *) (skb->data + nhoff);
+ }
+
+ if (ip->frag_off & htons(IP_MF | IP_OFFSET))
+ frag = 1;
+
+ ip_proto = ip->protocol;
+ ihl = ip->ihl;
+ addr1 = (__force u32) ip->saddr & info->smask;
+ addr2 = (__force u32) ip->daddr & info->dmask;
+
+ if (!ct || !nf_ct_is_confirmed(ct))
+ break;
+ otuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
+ /* On the "return flow", to get the original address
+ * i,e, replace the source address.
+ */
+ if (ct->status & IPS_DST_NAT &&
+ info->flags & XT_HMARK_USE_DNAT) {
+ rtuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
+ addr1 = (__force u32) otuple->dst.u3.in.s_addr;
+ dnatport = otuple->dst.u.udp.port;
+ }
+ /* On the "return flow", to get the original address
+ * i,e, replace the destination address.
+ */
+ if (ct->status & IPS_SRC_NAT &&
+ info->flags & XT_HMARK_USE_SNAT) {
+ rtuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
+ addr2 = (__force u32) otuple->src.u3.in.s_addr;
+ snatport = otuple->src.u.udp.port;
+ }
+ break;
+ }
+#ifdef WITH_IPV6
+ case __constant_htons(ETH_P_IPV6):
+ {
+ struct ipv6hdr *ip6; /* ip hdr */
+ int hdrlen = 0; /* In ip header */
+ u8 nexthdr;
+ int ip6hdrlvl = 0; /* Header level */
+ struct ipv6_opt_hdr _hdr, *hp;
+
+hdr_new:
+ if (!pskb_may_pull(skb, sizeof(*ip6) + nhoff))
+ goto done;
+
+ /* ip header */
+ ip6 = (struct ipv6hdr *) (skb->data + nhoff);
+ nexthdr = ip6->nexthdr;
+ /* nhoff += sizeof(struct ipv6hdr); Where hdr starts */
+ hdrlen = sizeof(struct ipv6hdr);
+ hp = skb_header_pointer(skb, nhoff + hdrlen, sizeof(_hdr),
+ &_hdr);
+ while (nexthdr) {
+ switch (nexthdr) {
+ case IPPROTO_ICMPV6:
+ /* ICMP Error then move ptr to inner header */
+ if (get_inner6_hdr(skb, &nhoff, hdrlen)) {
+ ip6hdrlvl++;
+ goto hdr_new;
+ }
+ nhoff += hdrlen;
+ goto hdr_rdy;
+
+ case NEXTHDR_FRAGMENT:
+ if (!ip6hdrlvl)
+ frag = 1;
+ break;
+ /* End of hdr traversing */
+ case NEXTHDR_IPV6: /* Do not process tunnels */
+ case NEXTHDR_TCP:
+ case NEXTHDR_UDP:
+ case NEXTHDR_ESP:
+ case NEXTHDR_AUTH:
+ case NEXTHDR_NONE:
+ nhoff += hdrlen;
+ goto hdr_rdy;
+ default:
+ goto done;
+ }
+ if (!hp)
+ goto done;
+ nhoff += hdrlen; /* eat current header */
+ nexthdr = hp->nexthdr; /* Next header */
+ hdrlen = ipv6_optlen(hp);
+ hp = skb_header_pointer(skb, nhoff + hdrlen,
+ sizeof(_hdr), &_hdr);
+
+ if (!pskb_may_pull(skb, nhoff))
+ goto done;
+ }
+hdr_rdy:
+ ip_proto = nexthdr;
+
+ addr1 = (__force u32) ip6->saddr.s6_addr32[3];
+ addr2 = (__force u32) ip6->daddr.s6_addr32[3];
+ ihl = 0; /* (40 >> 2); */
+ break;
+ }
+#endif
+ default:
+ goto done;
+ }
+
+ ports.v32 = 0;
+ poff = proto_ports_offset(ip_proto);
+ nhoff += ihl * 4 + poff;
+ if (!frag && poff >= 0 && pskb_may_pull(skb, nhoff + 4)) {
+ ports.v32 = * (__force u32 *) (skb->data + nhoff);
+ if (ip_proto == IPPROTO_ESP || ip_proto == IPPROTO_AH) {
+ ports.v32 = (ports.v32 & info->spimask) | info->spiset;
+ } else { /* Handle endian */
+ if (snatport) /* Replace snated dst port (ret flow) */
+ ports.v16[1] = snatport;
+ if (dnatport)
+ ports.v16[0] = dnatport;
+ ports.v32 = (ports.v32 & info->pmask.v32) |
+ info->pset.v32;
+ if (ports.v16[1] < ports.v16[0])
+ swap(ports.v16[0], ports.v16[1]);
+ }
+ }
+ ip_proto &= info->prmask;
+ /* get a consistent hash (same value on both flow directions) */
+ if (addr2 < addr1)
+ swap(addr1, addr2);
+
+ hash = jhash_3words(addr1, addr2, ports.v32, info->hashrnd) ^ ip_proto;
+ if (!hash)
+ hash = 1;
+
+ return hash;
+
+done:
+ return 0;
+}
+
+static unsigned int
+hmark_tg(struct sk_buff *skb, const struct xt_action_param *par)
+{
+ struct xt_hmark_info *info = (struct xt_hmark_info *)par->targinfo;
+ __u32 hash = get_hash(skb, info);
+
+ if (info->hmod && hash)
+ skb->mark = (hash % info->hmod) + info->hoffs;
+ return XT_CONTINUE;
+}
+
+static struct xt_target hmark_tg_reg __read_mostly = {
+ .name = "HMARK",
+ .revision = 0,
+ .family = NFPROTO_UNSPEC,
+ .target = hmark_tg,
+ .targetsize = sizeof(struct xt_hmark_info),
+ .me = THIS_MODULE,
+};
+
+static int __init hmark_mt_init(void)
+{
+ int ret;
+
+ ret = xt_register_target(&hmark_tg_reg);
+ if (ret < 0)
+ return ret;
+ return 0;
+}
+
+static void __exit hmark_mt_exit(void)
+{
+ xt_unregister_target(&hmark_tg_reg);
+}
+
+module_init(hmark_mt_init);
+module_exit(hmark_mt_exit);
--
1.7.4.4
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [v2 PATCH 2/2] NETFILTER userspace part for target HMARK
2011-10-03 17:46 [v2 PATCH 0/2] NETFILTER new target module, HMARK Hans Schillstrom
2011-10-03 17:46 ` [v2 PATCH 1/2] NETFILTER module xt_hmark new target for HASH based fw Hans Schillstrom
@ 2011-10-03 17:46 ` Hans Schillstrom
2011-11-07 0:55 ` Pablo Neira Ayuso
1 sibling, 1 reply; 14+ messages in thread
From: Hans Schillstrom @ 2011-10-03 17:46 UTC (permalink / raw)
To: kaber, pablo, jengelh, netfilter-devel, netdev; +Cc: hans, Hans Schillstrom
The target allows you to create rules in the "raw" and "mangle" tables
which alter the netfilter mark (nfmark) field within a given range.
First a 32 bit hash value is generated then modulus by <limit> and
finally an offset is added before it's written to nfmark.
Prior to routing, the nfmark can influence the routing method (see
"Use netfilter MARK value as routing key") and can also be used by
other subsystems to change their behaviour.
The mark match can also be used to match nfmark produced by this module.
Ver 2
IPv4 NAT added
iptables ver 1.4.12.1 adaptions.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
extensions/libxt_HMARK.c | 381 ++++++++++++++++++++++++++++++++++++
extensions/libxt_HMARK.man | 66 ++++++
include/linux/netfilter/xt_hmark.h | 48 +++++
3 files changed, 495 insertions(+), 0 deletions(-)
create mode 100644 extensions/libxt_HMARK.c
create mode 100644 extensions/libxt_HMARK.man
create mode 100644 include/linux/netfilter/xt_hmark.h
diff --git a/extensions/libxt_HMARK.c b/extensions/libxt_HMARK.c
new file mode 100644
index 0000000..0def034
--- /dev/null
+++ b/extensions/libxt_HMARK.c
@@ -0,0 +1,381 @@
+/*
+ * Shared library add-on to iptables to add HMARK target support.
+ *
+ * The kernel module calculates a hash value that can be modified by modulus
+ * and an offset. The hash value is based on a direction independent
+ * five tuple: src & dst addr src & dst ports and protocol.
+ * However src & dst port can be masked and are not used for fragmented
+ * packets, ESP and AH don't have ports so SPI will be used instead.
+ * For ICMP error messages the hash mark values will be calculated on
+ * the source packet i.e. the packet caused the error (If sufficient
+ * amount of data exists).
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <stdbool.h>
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <getopt.h>
+
+#include <xtables.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_hmark.h>
+
+
+#define DEF_HRAND 0xc175a3b8 /* Default "random" value to jhash */
+
+static void HMARK_help(void)
+{
+ printf(
+"HMARK target options, i.e. modify hash calculation by:\n"
+" --hmark-smask value Mask source address with value\n"
+" --hmark-dmask value Mask Dest. address with value\n"
+" --hmark-sp-mask value Mask src port with value\n"
+" --hmark-dp-mask value Mask dst port with value\n"
+" --hmark-spi-mask value For esp and ah AND spi with value\n"
+" --hmark-sp-set value OR src port with value\n"
+" --hmark-dp-set value OR dst port with value\n"
+" --hmark-spi-set value For esp and ah OR spi with value\n"
+" --hmark-proto-mask value Mask Protocol with value\n"
+" --hmark-rnd Random value to hash cacl.\n"
+" Limit/modify the calculated hash mark by:\n"
+" --hmark-mod value nfmark modulus value\n"
+" --hmark-offs value Last action add value to nfmark\n"
+" For NAT in IPv4 the original address can be used in the return path.\n"
+" Make sure to qualify the statement in a proper way when using nat flags\n"
+" --hmark-dnat Replace src addr/port with original dst addr/port\n"
+" --hmark-snat Replace dst addr/port with original src addr/port\n"
+" In many cases hmark can be omitted i.e. --smask can be used\n");
+}
+
+static const struct option HMARK_opts[] = {
+ { "hmark-smask", 1, NULL, XT_HMARK_SADR_AND },
+ { "hmark-dmask", 1, NULL, XT_HMARK_DADR_AND },
+ { "hmark-sp-mask", 1, NULL, XT_HMARK_SPORT_AND },
+ { "hmark-dp-mask", 1, NULL, XT_HMARK_DPORT_AND },
+ { "hmark-spi-mask", 1, NULL, XT_HMARK_SPI_AND },
+ { "hmark-sp-set", 1, NULL, XT_HMARK_SPORT_OR },
+ { "hmark-dp-set", 1, NULL, XT_HMARK_DPORT_OR },
+ { "hmark-spi-set", 1, NULL, XT_HMARK_SPI_OR },
+ { "hmark-proto-mask", 1, NULL, XT_HMARK_PROTO_AND },
+ { "hmark-rnd", 1, NULL, XT_HMARK_RND },
+ { "hmark-mod", 1, NULL, XT_HMARK_MODULUS },
+ { "hmark-offs", 1, NULL, XT_HMARK_OFFSET },
+ { "hmark-dnat", 1, NULL, XT_HMARK_USE_DNAT },
+ { "hmark-snat", 1, NULL, XT_HMARK_USE_SNAT },
+ { "smask", 1, NULL, XT_HMARK_SADR_AND },
+ { "dmask", 1, NULL, XT_HMARK_DADR_AND },
+ { "sp-mask", 1, NULL, XT_HMARK_SPORT_AND },
+ { "dp-mask", 1, NULL, XT_HMARK_DPORT_AND },
+ { "spi-mask", 1, NULL, XT_HMARK_SPI_AND },
+ { "sp-set", 1, NULL, XT_HMARK_SPORT_OR },
+ { "dp-set", 1, NULL, XT_HMARK_DPORT_OR },
+ { "spi-set", 1, NULL, XT_HMARK_SPI_OR },
+ { "proto-mask", 1, NULL, XT_HMARK_PROTO_AND },
+ { "rnd", 1, NULL, XT_HMARK_RND },
+ { "mod", 1, NULL, XT_HMARK_MODULUS },
+ { "offs", 1, NULL, XT_HMARK_OFFSET },
+ { "dnat", 1, NULL, XT_HMARK_USE_DNAT },
+ { "snat", 1, NULL, XT_HMARK_USE_SNAT },
+ { .name = NULL }
+};
+
+static int
+HMARK_parse(int c, char **argv, int invert, unsigned int *flags,
+ const void *entry, struct xt_entry_target **target)
+{
+ struct xt_hmark_info *hmarkinfo
+ = (struct xt_hmark_info *)(*target)->data;
+ unsigned int value = 0xffffffff;
+ unsigned int maxint = UINT32_MAX;
+
+ if ((c < XT_HMARK_SADR_AND) || (c > XT_HMARK_OFFSET)) {
+ xtables_error(PARAMETER_PROBLEM, "Bad HMARK option \"%s\"",
+ optarg);
+ return 0;
+ }
+
+ if (c >= XT_HMARK_SPORT_AND && c <= XT_HMARK_DPORT_OR)
+ maxint = UINT16_MAX;
+ else if (c == XT_HMARK_PROTO_AND)
+ maxint = UINT8_MAX;
+
+ if (!xtables_strtoui(optarg, NULL, &value, 0, maxint))
+ xtables_error(PARAMETER_PROBLEM, "Bad HMARK value \"%s\"",
+ optarg);
+
+ if (*flags == 0) {
+ memset(hmarkinfo, 0xff, sizeof(struct xt_hmark_info));
+ hmarkinfo->pset.v32 = 0;
+ hmarkinfo->flags = 0;
+ hmarkinfo->spiset = 0;
+ hmarkinfo->hoffs = 0;
+ hmarkinfo->hashrnd = DEF_HRAND;
+ }
+ switch (c) {
+ case XT_HMARK_SADR_AND:
+ if (*flags & (1 << c)) {
+ xtables_error(PARAMETER_PROBLEM,
+ "Can only specify "
+ "`--hmark-smask' once");
+ }
+ hmarkinfo->smask = htonl(value);
+ if (value == maxint)
+ c = 0;
+ break;
+
+ case XT_HMARK_DADR_AND:
+ if (*flags & (1 << c)) {
+ xtables_error(PARAMETER_PROBLEM,
+ "Can only specify "
+ "`--hmark-dmask' once");
+ }
+ hmarkinfo->dmask = htonl(value);
+ if (value == maxint)
+ c = 0;
+ break;
+
+ case XT_HMARK_MODULUS:
+ if (*flags & (1 << c)) {
+ xtables_error(PARAMETER_PROBLEM,
+ "Can only specify "
+ "`--hmark-mod' once");
+ }
+ if (value == 0) {
+ xtables_error(PARAMETER_PROBLEM,
+ "xxx modulus 0 ? "
+ "thats a div by 0");
+ value = 0xffffffff;
+ }
+ hmarkinfo->hmod = value;
+ if (value == maxint)
+ c = 0;
+ break;
+
+ case XT_HMARK_OFFSET:
+ if (*flags & (1 << c)) {
+ xtables_error(PARAMETER_PROBLEM,
+ "Can only specify "
+ "`--hmark-offs' once");
+ }
+ hmarkinfo->hoffs = value;
+ if (value == 0)
+ c = 0;
+ break;
+
+ case XT_HMARK_SPORT_AND:
+ if (*flags & (1 << c)) {
+ xtables_error(PARAMETER_PROBLEM,
+ "Can only specify "
+ "`--hmark-sp-mask' once");
+ }
+ hmarkinfo->pmask.p16.src = htons(value);
+ if (value == maxint)
+ c = 0;
+ break;
+
+ case XT_HMARK_DPORT_AND:
+ if (*flags & (1 << c)) {
+ xtables_error(PARAMETER_PROBLEM,
+ "Can only specify "
+ "`--hmark-dp-mask' once");
+ }
+ hmarkinfo->pmask.p16.dst = htons(value);
+ if (value == maxint)
+ c = 0;
+ break;
+
+ case XT_HMARK_SPI_AND:
+ if (*flags & (1 << c)) {
+ xtables_error(PARAMETER_PROBLEM,
+ "Can only specify "
+ "`--hmark-spi-mask' once");
+ }
+ hmarkinfo->spimask = htonl(value);
+ if (value == maxint)
+ c = 0;
+ break;
+
+ case XT_HMARK_SPORT_OR:
+ if (*flags & (1 << c)) {
+ xtables_error(PARAMETER_PROBLEM,
+ "Can only specify "
+ "`--hmark-sp-set' once");
+ }
+ hmarkinfo->pset.p16.src = htons(value);
+ if (!value)
+ c = 0;
+ break;
+
+ case XT_HMARK_DPORT_OR:
+ if (*flags & (1 << c)) {
+ xtables_error(PARAMETER_PROBLEM,
+ "Can only specify "
+ "`--hmark-dp-set' once");
+ }
+ hmarkinfo->pset.p16.dst = htons(value);
+ if (!value)
+ c = 0;
+ break;
+
+ case XT_HMARK_SPI_OR:
+ if (*flags & (1 << c)) {
+ xtables_error(PARAMETER_PROBLEM,
+ "Can only specify "
+ "`--hmark-spi-set' once");
+ }
+ hmarkinfo->spiset = htonl(value);
+ if (!value)
+ c = 0;
+ break;
+
+ case XT_HMARK_PROTO_AND:
+ if (*flags & (1 << c)) {
+ xtables_error(PARAMETER_PROBLEM,
+ "Can only specify "
+ "`--hmark-proto-mask' once");
+ }
+ hmarkinfo->prmask = value;
+ if (value == maxint)
+ c = 0;
+ break;
+
+ case XT_HMARK_RND:
+ if (*flags & (1 << c)) {
+ xtables_error(PARAMETER_PROBLEM,
+ "Can only specify `--hmark-rnd' once");
+ }
+ hmarkinfo->hashrnd = value;
+ if (value == DEF_HRAND)
+ c = 0;
+ break;
+
+ case XT_HMARK_USE_DNAT:
+ if (*flags & (1 << c)) {
+ xtables_error(PARAMETER_PROBLEM,
+ "Can only specify `--hmark-rnd' once");
+ }
+ break;
+
+ case XT_HMARK_USE_SNAT:
+ if (*flags & (1 << c)) {
+ xtables_error(PARAMETER_PROBLEM,
+ "Can only specify `--hmark-rnd' once");
+ }
+ break;
+
+ default:
+ return 0;
+ }
+ *flags |= 1 << c;
+ hmarkinfo->flags = *flags;
+
+ return 1;
+}
+
+static void HMARK_check(unsigned int flags)
+{
+ if (!(flags & (1 << XT_HMARK_MODULUS)))
+ xtables_error(PARAMETER_PROBLEM, "HMARK: the --hmark-mod, "
+ "is not set, that means the nfmark will be in range"
+ " 0 - 0xffffffff");
+}
+
+static void HMARK_print(const void *ip, const struct xt_entry_target *target,
+ int numeric)
+{
+ const struct xt_hmark_info *info =
+ (const struct xt_hmark_info *)target->data;
+
+ printf(" HMARK ");
+ if (info->flags & (1 << XT_HMARK_USE_SNAT))
+ printf("snat, ");
+ if (info->flags & (1 << XT_HMARK_SADR_AND))
+ printf("smask 0x%x ", htonl(info->smask));
+
+ if (info->flags & (1 << XT_HMARK_USE_DNAT))
+ printf("dnat, ");
+ if (info->flags & (1 << XT_HMARK_DADR_AND))
+ printf("dmask 0x%x ", htonl(info->dmask));
+
+ if (info->flags & (1 << XT_HMARK_SPORT_AND))
+ printf("sp-mask 0x%x ", htons(info->pmask.p16.src));
+ if (info->flags & (1 << XT_HMARK_DPORT_AND))
+ printf("dp-mask 0x%x ", htons(info->pmask.p16.dst));
+ if (info->flags & (1 << XT_HMARK_SPI_AND))
+ printf("spi-mask 0x%x ", htonl(info->spimask));
+ if (info->flags & (1 << XT_HMARK_SPORT_OR))
+ printf("sp-set 0x%x ", htons(info->pset.p16.src));
+ if (info->flags & (1 << XT_HMARK_DPORT_OR))
+ printf("dp-set 0x%x ", htons(info->pset.p16.dst));
+ if (info->flags & (1 << XT_HMARK_SPI_OR))
+ printf("spi-set 0x%x ", htonl(info->spiset));
+ if (info->flags & (1 << XT_HMARK_PROTO_AND))
+ printf("proto-mask 0x%x ", info->prmask);
+ if (info->flags & (1 << XT_HMARK_RND))
+ printf("rnd 0x%x ", info->hashrnd);
+ if (info->flags & (1 << XT_HMARK_MODULUS))
+ printf("mark=hv %% 0x%x ", info->hmod);
+ if (info->flags & (1 << XT_HMARK_OFFSET))
+ printf("+ 0x%x ", info->hoffs);
+}
+
+static void HMARK_save(const void *ip, const struct xt_entry_target *target)
+{
+ const struct xt_hmark_info *info =
+ (const struct xt_hmark_info *)target->data;
+
+ if (info->flags & (1 << XT_HMARK_SADR_AND))
+ printf("--hmark-smask 0x%x ", htonl(info->smask));
+ if (info->flags & (1 << XT_HMARK_DADR_AND))
+ printf("--hmark-dmask 0x%x ", htonl(info->dmask));
+ if (info->flags & (1 << XT_HMARK_SPORT_AND))
+ printf("--hmark-sp-mask 0x%x ", htons(info->pmask.p16.src));
+ if (info->flags & (1 << XT_HMARK_DPORT_AND))
+ printf("--hmark-dp-mask 0x%x ", htons(info->pmask.p16.dst));
+ if (info->flags & (1 << XT_HMARK_SPI_AND))
+ printf("--hmark-spi-mask 0x%x ", htonl(info->spimask));
+ if (info->flags & (1 << XT_HMARK_SPORT_OR))
+ printf("--hmark-sp-set 0x%x ", htons(info->pset.p16.src));
+ if (info->flags & (1 << XT_HMARK_DPORT_OR))
+ printf("--hmark-dp-set 0x%x ", htons(info->pset.p16.dst));
+ if (info->flags & (1 << XT_HMARK_SPI_OR))
+ printf("--hmark-spi-set 0x%x ", htonl(info->spiset));
+ if (info->flags & (1 << XT_HMARK_PROTO_AND))
+ printf("--hmark-proto-mask 0x%x ", info->prmask);
+ if (info->flags & (1 << XT_HMARK_RND))
+ printf("--hmark-rnd 0x%x ", info->hashrnd);
+ if (info->flags & (1 << XT_HMARK_MODULUS))
+ printf("--hmark-mod 0x%x ", info->hmod);
+ if (info->flags & (1 << XT_HMARK_OFFSET))
+ printf("--hmark-offs 0x%x ", info->hoffs);
+ if (info->flags & (1 << XT_HMARK_USE_DNAT))
+ printf("--hmark-dnat ");
+ if (info->flags & (1 << XT_HMARK_USE_SNAT))
+ printf("--hmark-snat ");
+}
+
+static struct xtables_target mark_tg_reg[] = {
+ {
+ .family = NFPROTO_UNSPEC,
+ .name = "HMARK",
+ .version = XTABLES_VERSION,
+ .revision = 0,
+ .size = XT_ALIGN(sizeof(struct xt_hmark_info)),
+ .userspacesize = XT_ALIGN(sizeof(struct xt_hmark_info)),
+ .help = HMARK_help,
+ .parse = HMARK_parse,
+ .final_check = HMARK_check,
+ .print = HMARK_print,
+ .save = HMARK_save,
+ .extra_opts = HMARK_opts,
+ },
+};
+
+void _init(void)
+{
+ xtables_register_targets(mark_tg_reg, ARRAY_SIZE(mark_tg_reg));
+}
+
diff --git a/extensions/libxt_HMARK.man b/extensions/libxt_HMARK.man
new file mode 100644
index 0000000..8f44676
--- /dev/null
+++ b/extensions/libxt_HMARK.man
@@ -0,0 +1,66 @@
+This module does the same as MARK, i.e. set an fwmark, but the mark is based on a hash value.
+The hash is based on saddr, daddr, sport, dport and proto. The same mark will be produced independet of direction if no masks is set or the same masks is used for src and dest.
+The hash mark could be adjusted by modulus and finally an offset could be added, i.e the final mark will be within a range. If state RELATED is used icmp will be handled also, i.e. hash will be calculated on the original message not the icmp it self.
+Note: None of the parameters effect the packet it self only the calculated hash value.
+.PP
+Parameters:
+For all masks default is all "1:s", to disable a field use mask 0
+For IPv6 it's just the last 32 bits that is included in the hash
+.TP
+\fB\-\-hmark\-smask\fP \fIvalue\fP
+The value to AND the source address with (saddr & value).
+.TP
+\fB\-\-hmark\-dmask\fP \fIvalue\fP
+The value to AND the dest. address with (daddr & value).
+.TP
+\fB\-\-hmark\-sp\-mask\fP \fIvalue\fP
+A 16 bit value to AND the src port with (sport & value).
+.TP
+\fB\-\-hmark\-dp\-mask\fP \fIvalue\fP
+A 16 bit value to AND the dest port with (dport & value).
+.TP
+\fB\-\-hmark\-sp\-set\fP \fIvalue\fP
+A 16 bit value to OR the src port with (sport | value).
+.TP
+\fB\-\-hmark\-dp\-set\fP \fIvalue\fP
+A 16 bit value to OR the dest port with (dport | value).
+.TP
+\fB\-\-hmark\-spi\-mask\fP \fIvalue\fP
+Value to AND the spi field with (spi & value) valid for proto esp or ah.
+.TP
+\fB\-\-hmark\-spi\-set\fP \fIvalue\fP
+Value to OR the spi field with (spi | value) valid for proto esp or ah.
+.TP
+\fB\-\-hmark\-proto\-mask\fP \fIvalue\fP
+An 8 bit value to AND the L4 proto field with (proto & value).
+.TP
+\fB\-\-hmark\-rnd\fP \fIvalue\fP
+A 32 bit initial value for hash calc, default is 0xc175a3b8.
+.TP
+\fB\-\-hmark\-dnat\fP
+Replace src addr/port with original dst addr/port before calc, hash
+.TP
+\fB\-\-hmark\-dnat\fP
+Replace dst addr/port with original src addr/port before calc, hash
+.PP
+Final processing of the mark in order of execution.
+.TP
+\fB\-\-hmark\-mod\fP \fvalue (must be > 0)\fP
+The easiest way to describe this is: hash = hash mod <value>
+.TP
+\fB\-\-hmark\-offs\fP \fvalue\fP
+The easiest way to describe this is: hash = hash + <value>
+.PP
+\fIExamples:\fP
+.PP
+Default rule handles all TCP, UDP, SCTP, ESP & AH
+.IP
+iptables \-t mangle \-A PREROUTING \-m state \-\-state NEW,ESTABLISHED,RELATED
+ \-j HMARK \-\-hmark-offs 10000 \-\-hmark-mod 10
+.PP
+Handle SCTP and hash dest port only and produce a nfmark between 100-119.
+.IP
+iptables \-t mangle \-A PREROUTING -p SCTP \-j HMARK \-\-smask 0 \-\-dmask 0
+ \-\-sp\-mask 0 \-\-offs 100 \-\-mod 20
+.PP
+
diff --git a/include/linux/netfilter/xt_hmark.h b/include/linux/netfilter/xt_hmark.h
new file mode 100644
index 0000000..7b3ee5d
--- /dev/null
+++ b/include/linux/netfilter/xt_hmark.h
@@ -0,0 +1,48 @@
+#ifndef XT_HMARK_H_
+#define XT_HMARK_H_
+
+#include <linux/types.h>
+
+/*
+ * Flags must not start at 0, since it's used as none.
+ */
+enum {
+ XT_HMARK_USE_SNAT = 1, /* SNAT & DNAT are used by the kernel module */
+ XT_HMARK_USE_DNAT,
+ XT_HMARK_SADR_AND,
+ XT_HMARK_DADR_AND,
+ XT_HMARK_SPI_AND,
+ XT_HMARK_SPI_OR,
+ XT_HMARK_SPORT_AND,
+ XT_HMARK_DPORT_AND,
+ XT_HMARK_SPORT_OR,
+ XT_HMARK_DPORT_OR,
+ XT_HMARK_PROTO_AND,
+ XT_HMARK_RND,
+ XT_HMARK_MODULUS,
+ XT_HMARK_OFFSET,
+};
+
+union ports {
+ struct {
+ __u16 src;
+ __u16 dst;
+ } p16;
+ __u32 v32;
+};
+
+struct xt_hmark_info {
+ __u32 smask; /* Source address mask */
+ __u32 dmask; /* Dest address mask */
+ union ports pmask;
+ union ports pset;
+ __u32 spimask;
+ __u32 spiset;
+ __u16 flags; /* Print out only */
+ __u16 prmask; /* L4 Proto mask */
+ __u32 hashrnd;
+ __u32 hmod; /* Modulus */
+ __u32 hoffs; /* Offset */
+};
+
+#endif /* XT_HMARK_H_ */
--
1.7.4.4
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [v2 PATCH 1/2] NETFILTER module xt_hmark new target for HASH based fw
2011-10-03 17:46 ` [v2 PATCH 1/2] NETFILTER module xt_hmark new target for HASH based fw Hans Schillstrom
@ 2011-11-07 0:52 ` Pablo Neira Ayuso
2011-11-07 3:36 ` Jan Engelhardt
0 siblings, 1 reply; 14+ messages in thread
From: Pablo Neira Ayuso @ 2011-11-07 0:52 UTC (permalink / raw)
To: Hans Schillstrom; +Cc: kaber, jengelh, netfilter-devel, netdev, hans
Hi Hans,
On Mon, Oct 03, 2011 at 07:46:42PM +0200, Hans Schillstrom wrote:
> diff --git a/include/linux/netfilter/xt_hmark.h b/include/linux/netfilter/xt_hmark.h
> new file mode 100644
> index 0000000..6c1436a
> --- /dev/null
> +++ b/include/linux/netfilter/xt_hmark.h
> @@ -0,0 +1,48 @@
> +#ifndef XT_HMARK_H_
> +#define XT_HMARK_H_
> +
> +#include <linux/types.h>
> +
> +/*
> + * Flags must not start at 0, since it's used as none.
> + */
> +enum {
> + XT_HMARK_SADR_AND = 1, /* SNAT & DNAT are used by the kernel module */
> + XT_HMARK_DADR_AND,
> + XT_HMARK_SPI_AND,
> + XT_HMARK_SPI_OR,
> + XT_HMARK_SPORT_AND,
> + XT_HMARK_DPORT_AND,
> + XT_HMARK_SPORT_OR,
> + XT_HMARK_DPORT_OR,
> + XT_HMARK_PROTO_AND,
> + XT_HMARK_RND,
> + XT_HMARK_MODULUS,
> + XT_HMARK_OFFSET,
> + XT_HMARK_USE_SNAT,
> + XT_HMARK_USE_DNAT,
> +};
> +
> +union ports {
> + struct {
> + __u16 src;
> + __u16 dst;
> + } p16;
> + __u32 v32;
> +};
> +
> +struct xt_hmark_info {
> + __u32 smask; /* Source address mask */
> + __u32 dmask; /* Dest address mask */
> + union ports pmask;
> + union ports pset;
> + __u32 spimask;
> + __u32 spiset;
> + __u16 flags; /* Print out only */
> + __u16 prmask; /* L4 Proto mask */
> + __u32 hashrnd;
> + __u32 hmod; /* Modulus */
> + __u32 hoffs; /* Offset */
> +};
> +
> +#endif /* XT_HMARK_H_ */
> diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
> index 32bff6d..3abd3a4 100644
> --- a/net/netfilter/Kconfig
> +++ b/net/netfilter/Kconfig
> @@ -483,6 +483,23 @@ config NETFILTER_XT_TARGET_IDLETIMER
>
> To compile it as a module, choose M here. If unsure, say N.
>
> +config NETFILTER_XT_TARGET_HMARK
New config option has to go in alphabetic order (this one should go
after NETFILTER_XT_TARGET_HL).
> + tristate '"HMARK" target support'
> + depends on NETFILTER_ADVANCED
> + ---help---
> + This option adds the "HMARK" target.
> +
> + The target allows you to create rules in the "raw" and "mangle" tables
> + which alter the netfilter mark (nfmark) field within a given range.
> + First a 32 bit hash value is generated then modulus by <limit> and
> + finally an offset is added before it's written to nfmark.
> +
> + Prior to routing, the nfmark can influence the routing method (see
> + "Use netfilter MARK value as routing key") and can also be used by
> + other subsystems to change their behavior.
> +
> + The mark match can also be used to match nfmark produced by this module.
> +
> config NETFILTER_XT_TARGET_LED
> tristate '"LED" target support'
> depends on LEDS_CLASS && LEDS_TRIGGERS
> diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
> index 1a02853..359eeb6 100644
> --- a/net/netfilter/Makefile
> +++ b/net/netfilter/Makefile
> @@ -56,6 +56,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_CONNSECMARK) += xt_CONNSECMARK.o
> obj-$(CONFIG_NETFILTER_XT_TARGET_CT) += xt_CT.o
> obj-$(CONFIG_NETFILTER_XT_TARGET_DSCP) += xt_DSCP.o
> obj-$(CONFIG_NETFILTER_XT_TARGET_HL) += xt_HL.o
> +obj-$(CONFIG_NETFILTER_XT_TARGET_HMARK) += xt_hmark.o
> obj-$(CONFIG_NETFILTER_XT_TARGET_LED) += xt_LED.o
> obj-$(CONFIG_NETFILTER_XT_TARGET_NFLOG) += xt_NFLOG.o
> obj-$(CONFIG_NETFILTER_XT_TARGET_NFQUEUE) += xt_NFQUEUE.o
> diff --git a/net/netfilter/xt_hmark.c b/net/netfilter/xt_hmark.c
> new file mode 100644
> index 0000000..2f0aa93
> --- /dev/null
> +++ b/net/netfilter/xt_hmark.c
> @@ -0,0 +1,320 @@
> +/*
> + * xt_hmark - Netfilter module to set mark as hash value
> + *
> + * (C) 2010 Hans Schillstrom <hans.schillstrom@ericsson.com>
> + *
> + * Description:
> + * This module calculates a hash value that can be modified by modulus
> + * and an offset. The hash value is based on a direction independent
> + * five tuple: src & dst addr src & dst ports and protocol.
> + * However src & dst port can be masked and are not used for fragmented
> + * packets, ESP and AH don't have ports so SPI will be used instead.
> + * For ICMP error messages the hash mark values will be calculated on
> + * the source packet i.e. the packet caused the error (If sufficient
> + * amount of data exists).
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/skbuff.h>
> +#include <net/ip.h>
> +#include <linux/icmp.h>
> +
> +#include <linux/netfilter/xt_hmark.h>
> +#include <linux/netfilter/x_tables.h>
> +#include <net/netfilter/nf_nat.h>
> +
> +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
> +# define WITH_IPV6 1
> +#include <net/ipv6.h>
> +#include <linux/netfilter_ipv6/ip6_tables.h>
> +#endif
> +
> +
extra space not required.
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Hans Schillstrom <hans.schillstrom@ericsson.com>");
> +MODULE_DESCRIPTION("Xtables: packet range mark operations by hash value");
> +MODULE_ALIAS("ipt_HMARK");
> +MODULE_ALIAS("ip6t_HMARK");
> +
> +/*
> + * ICMP, get inner header so calc can be made on the source message
> + * not the icmp header, i.e. same hash mark must be produced
> + * on an icmp error message.
> + */
> +static int get_inner_hdr(struct sk_buff *skb, int iphsz, int nhoff)
This looks very similar to icmp_error in nf_conntrack_proto_icmp.c.
Yours lacks of checksumming validation btw.
I'm trying to find some place where we can put this function to make
it available for both nf_conntrack_ipv4 and your module (to avoid code
redundancy), but I didn't find any so far.
It would be nice to find some way to avoid duplicating code with
similar functionality.
> +{
> + const struct icmphdr *icmph;
> + struct icmphdr _ih;
> + struct iphdr *iph = NULL;
> +
> + /* Not enough header? */
> + icmph = skb_header_pointer(skb, nhoff + iphsz, sizeof(_ih), &_ih);
> + if (icmph == NULL)
> + goto out;
> +
> + if (icmph->type > NR_ICMP_TYPES)
> + goto out;
> +
> +
extra space not required.
> + /* Error message? */
> + if (icmph->type != ICMP_DEST_UNREACH &&
> + icmph->type != ICMP_SOURCE_QUENCH &&
> + icmph->type != ICMP_TIME_EXCEEDED &&
> + icmph->type != ICMP_PARAMETERPROB &&
> + icmph->type != ICMP_REDIRECT)
> + goto out;
> + /* Checkin full IP header plus 8 bytes of protocol to
> + * avoid additional coding at protocol handlers.
> + */
> + if (!pskb_may_pull(skb, nhoff + iphsz + sizeof(_ih) + 8))
> + goto out;
We prefer skb_header_pointer instead. If conntrack is enabled, we can
benefit from defragmention. Please, replace all pskb_may_pull by
skb_header_pointer in this code.
We can assume that the IP header is linear (not fragmented).
> + iph = (struct iphdr *)(skb->data + nhoff + iphsz + sizeof(_ih));
> + return nhoff + iphsz + sizeof(_ih);
> +out:
> + return nhoff;
> +}
> +/*
> + * ICMPv6
> + * Input nhoff Offset into network header
> + * offset where ICMPv6 header starts
> + * Returns true if it's a icmp error and updates nhoff
> + */
> +#ifdef WITH_IPV6
> +static int get_inner6_hdr(struct sk_buff *skb, int *offset, int hdrlen)
> +{
> + struct icmp6hdr *icmp6h;
> + struct icmp6hdr _ih6;
> +
> + icmp6h = skb_header_pointer(skb, *offset + hdrlen, sizeof(_ih6), &_ih6);
> + if (icmp6h == NULL)
> + goto out;
> +
> + if (icmp6h->icmp6_type && icmp6h->icmp6_type < 128) {
> + *offset += hdrlen + sizeof(_ih6);
> + return 1;
> + }
> +out:
> + return 0;
> +}
> +#endif
> +
> +/*
> + * Calc hash value, special casre is taken on icmp and fragmented messages
> + * i.e. fragmented messages don't use ports.
> + */
> +static __u32 get_hash(struct sk_buff *skb, struct xt_hmark_info *info)
This function seems to big to me, please, split it into smaller
chunks, like get_hash_ipv4, get_hash_ipv6 and get_hash_ports.
> +{
> + int nhoff, hash = 0, poff, proto, frag = 0;
> + struct iphdr *ip;
> + u8 ip_proto;
> + u32 addr1, addr2, ihl;
> + u16 snatport = 0, dnatport = 0;
> + union {
> + u32 v32;
> + u16 v16[2];
> + } ports;
> +
> + nhoff = skb_network_offset(skb);
> + proto = skb->protocol;
> +
> + if (!proto && skb->sk) {
> + if (skb->sk->sk_family == AF_INET)
> + proto = __constant_htons(ETH_P_IP);
> + else if (skb->sk->sk_family == AF_INET6)
> + proto = __constant_htons(ETH_P_IPV6);
You already have the layer3 protocol number in xt_action_param. No
need to use the socket information then.
> + }
> +
> + switch (proto) {
> + case __constant_htons(ETH_P_IP):
> + {
> + enum ip_conntrack_info ctinfo;
> + struct nf_conn *ct = ct = nf_ct_get(skb, &ctinfo);
> + struct nf_conntrack_tuple *otuple, *rtuple;
> +
> + if (!pskb_may_pull(skb, sizeof(*ip) + nhoff))
> + goto done;
> +
> + ip = (struct iphdr *) (skb->data + nhoff);
> + if (ip->protocol == IPPROTO_ICMP) {
> + /* Switch hash calc to inner header ? */
> + nhoff = get_inner_hdr(skb, ip->ihl * 4, nhoff);
> + ip = (struct iphdr *) (skb->data + nhoff);
> + }
> +
> + if (ip->frag_off & htons(IP_MF | IP_OFFSET))
> + frag = 1;
> +
> + ip_proto = ip->protocol;
> + ihl = ip->ihl;
> + addr1 = (__force u32) ip->saddr & info->smask;
> + addr2 = (__force u32) ip->daddr & info->dmask;
> +
> + if (!ct || !nf_ct_is_confirmed(ct))
You seem to (ab)use nf_ct_is_confirmed to make sure you're not in the
original direction. Better use the direction that you get by means of
nf_ct_get.
> + break;
> + otuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
> + /* On the "return flow", to get the original address
> + * i,e, replace the source address.
> + */
> + if (ct->status & IPS_DST_NAT &&
> + info->flags & XT_HMARK_USE_DNAT) {
> + rtuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
> + addr1 = (__force u32) otuple->dst.u3.in.s_addr;
> + dnatport = otuple->dst.u.udp.port;
> + }
> + /* On the "return flow", to get the original address
> + * i,e, replace the destination address.
> + */
> + if (ct->status & IPS_SRC_NAT &&
> + info->flags & XT_HMARK_USE_SNAT) {
> + rtuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
> + addr2 = (__force u32) otuple->src.u3.in.s_addr;
> + snatport = otuple->src.u.udp.port;
> + }
> + break;
> + }
> +#ifdef WITH_IPV6
> + case __constant_htons(ETH_P_IPV6):
> + {
> + struct ipv6hdr *ip6; /* ip hdr */
> + int hdrlen = 0; /* In ip header */
> + u8 nexthdr;
> + int ip6hdrlvl = 0; /* Header level */
> + struct ipv6_opt_hdr _hdr, *hp;
> +
> +hdr_new:
> + if (!pskb_may_pull(skb, sizeof(*ip6) + nhoff))
> + goto done;
> +
> + /* ip header */
> + ip6 = (struct ipv6hdr *) (skb->data + nhoff);
> + nexthdr = ip6->nexthdr;
> + /* nhoff += sizeof(struct ipv6hdr); Where hdr starts */
> + hdrlen = sizeof(struct ipv6hdr);
> + hp = skb_header_pointer(skb, nhoff + hdrlen, sizeof(_hdr),
> + &_hdr);
> + while (nexthdr) {
> + switch (nexthdr) {
> + case IPPROTO_ICMPV6:
> + /* ICMP Error then move ptr to inner header */
> + if (get_inner6_hdr(skb, &nhoff, hdrlen)) {
> + ip6hdrlvl++;
> + goto hdr_new;
> + }
> + nhoff += hdrlen;
> + goto hdr_rdy;
> +
> + case NEXTHDR_FRAGMENT:
> + if (!ip6hdrlvl)
> + frag = 1;
> + break;
> + /* End of hdr traversing */
> + case NEXTHDR_IPV6: /* Do not process tunnels */
> + case NEXTHDR_TCP:
> + case NEXTHDR_UDP:
> + case NEXTHDR_ESP:
> + case NEXTHDR_AUTH:
> + case NEXTHDR_NONE:
> + nhoff += hdrlen;
> + goto hdr_rdy;
> + default:
> + goto done;
This goto doesn't make too much sense to me, better return 0.
> + }
> + if (!hp)
> + goto done;
> + nhoff += hdrlen; /* eat current header */
> + nexthdr = hp->nexthdr; /* Next header */
> + hdrlen = ipv6_optlen(hp);
> + hp = skb_header_pointer(skb, nhoff + hdrlen,
> + sizeof(_hdr), &_hdr);
> +
> + if (!pskb_may_pull(skb, nhoff))
> + goto done;
> + }
> +hdr_rdy:
> + ip_proto = nexthdr;
> +
> + addr1 = (__force u32) ip6->saddr.s6_addr32[3];
> + addr2 = (__force u32) ip6->daddr.s6_addr32[3];
> + ihl = 0; /* (40 >> 2); */
> + break;
> + }
> +#endif
> + default:
> + goto done;
> + }
> +
> + ports.v32 = 0;
> + poff = proto_ports_offset(ip_proto);
> + nhoff += ihl * 4 + poff;
> + if (!frag && poff >= 0 && pskb_may_pull(skb, nhoff + 4)) {
> + ports.v32 = * (__force u32 *) (skb->data + nhoff);
> + if (ip_proto == IPPROTO_ESP || ip_proto == IPPROTO_AH) {
> + ports.v32 = (ports.v32 & info->spimask) | info->spiset;
> + } else { /* Handle endian */
> + if (snatport) /* Replace snated dst port (ret flow) */
> + ports.v16[1] = snatport;
> + if (dnatport)
> + ports.v16[0] = dnatport;
> + ports.v32 = (ports.v32 & info->pmask.v32) |
> + info->pset.v32;
> + if (ports.v16[1] < ports.v16[0])
> + swap(ports.v16[0], ports.v16[1]);
> + }
> + }
> + ip_proto &= info->prmask;
> + /* get a consistent hash (same value on both flow directions) */
> + if (addr2 < addr1)
> + swap(addr1, addr2);
> +
> + hash = jhash_3words(addr1, addr2, ports.v32, info->hashrnd) ^ ip_proto;
> + if (!hash)
> + hash = 1;
> +
> + return hash;
> +
> +done:
> + return 0;
> +}
I'll try to find more time to look into this. Specifically, I want to
review the IPv6 bits more carefully.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v2 PATCH 2/2] NETFILTER userspace part for target HMARK
2011-10-03 17:46 ` [v2 PATCH 2/2] NETFILTER userspace part for target HMARK Hans Schillstrom
@ 2011-11-07 0:55 ` Pablo Neira Ayuso
0 siblings, 0 replies; 14+ messages in thread
From: Pablo Neira Ayuso @ 2011-11-07 0:55 UTC (permalink / raw)
To: Hans Schillstrom; +Cc: kaber, jengelh, netfilter-devel, netdev, hans
On Mon, Oct 03, 2011 at 07:46:43PM +0200, Hans Schillstrom wrote:
> The target allows you to create rules in the "raw" and "mangle" tables
> which alter the netfilter mark (nfmark) field within a given range.
> First a 32 bit hash value is generated then modulus by <limit> and
> finally an offset is added before it's written to nfmark.
> Prior to routing, the nfmark can influence the routing method (see
> "Use netfilter MARK value as routing key") and can also be used by
> other subsystems to change their behaviour.
>
> The mark match can also be used to match nfmark produced by this module.
>
> Ver 2
> IPv4 NAT added
> iptables ver 1.4.12.1 adaptions.
>
> Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
> ---
> extensions/libxt_HMARK.c | 381 ++++++++++++++++++++++++++++++++++++
> extensions/libxt_HMARK.man | 66 ++++++
> include/linux/netfilter/xt_hmark.h | 48 +++++
> 3 files changed, 495 insertions(+), 0 deletions(-)
> create mode 100644 extensions/libxt_HMARK.c
> create mode 100644 extensions/libxt_HMARK.man
> create mode 100644 include/linux/netfilter/xt_hmark.h
>
> diff --git a/extensions/libxt_HMARK.c b/extensions/libxt_HMARK.c
> new file mode 100644
> index 0000000..0def034
> --- /dev/null
> +++ b/extensions/libxt_HMARK.c
> @@ -0,0 +1,381 @@
> +/*
> + * Shared library add-on to iptables to add HMARK target support.
> + *
> + * The kernel module calculates a hash value that can be modified by modulus
> + * and an offset. The hash value is based on a direction independent
> + * five tuple: src & dst addr src & dst ports and protocol.
> + * However src & dst port can be masked and are not used for fragmented
> + * packets, ESP and AH don't have ports so SPI will be used instead.
> + * For ICMP error messages the hash mark values will be calculated on
> + * the source packet i.e. the packet caused the error (If sufficient
> + * amount of data exists).
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +#include <stdbool.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <stdlib.h>
> +#include <getopt.h>
> +
> +#include <xtables.h>
> +#include <linux/netfilter/x_tables.h>
> +#include <linux/netfilter/xt_hmark.h>
> +
> +
> +#define DEF_HRAND 0xc175a3b8 /* Default "random" value to jhash */
> +
> +static void HMARK_help(void)
> +{
> + printf(
> +"HMARK target options, i.e. modify hash calculation by:\n"
> +" --hmark-smask value Mask source address with value\n"
> +" --hmark-dmask value Mask Dest. address with value\n"
> +" --hmark-sp-mask value Mask src port with value\n"
> +" --hmark-dp-mask value Mask dst port with value\n"
> +" --hmark-spi-mask value For esp and ah AND spi with value\n"
> +" --hmark-sp-set value OR src port with value\n"
> +" --hmark-dp-set value OR dst port with value\n"
> +" --hmark-spi-set value For esp and ah OR spi with value\n"
> +" --hmark-proto-mask value Mask Protocol with value\n"
> +" --hmark-rnd Random value to hash cacl.\n"
> +" Limit/modify the calculated hash mark by:\n"
> +" --hmark-mod value nfmark modulus value\n"
> +" --hmark-offs value Last action add value to nfmark\n"
> +" For NAT in IPv4 the original address can be used in the return path.\n"
> +" Make sure to qualify the statement in a proper way when using nat flags\n"
> +" --hmark-dnat Replace src addr/port with original dst addr/port\n"
> +" --hmark-snat Replace dst addr/port with original src addr/port\n"
> +" In many cases hmark can be omitted i.e. --smask can be used\n");
> +}
> +
> +static const struct option HMARK_opts[] = {
> + { "hmark-smask", 1, NULL, XT_HMARK_SADR_AND },
> + { "hmark-dmask", 1, NULL, XT_HMARK_DADR_AND },
> + { "hmark-sp-mask", 1, NULL, XT_HMARK_SPORT_AND },
> + { "hmark-dp-mask", 1, NULL, XT_HMARK_DPORT_AND },
> + { "hmark-spi-mask", 1, NULL, XT_HMARK_SPI_AND },
> + { "hmark-sp-set", 1, NULL, XT_HMARK_SPORT_OR },
> + { "hmark-dp-set", 1, NULL, XT_HMARK_DPORT_OR },
> + { "hmark-spi-set", 1, NULL, XT_HMARK_SPI_OR },
> + { "hmark-proto-mask", 1, NULL, XT_HMARK_PROTO_AND },
> + { "hmark-rnd", 1, NULL, XT_HMARK_RND },
> + { "hmark-mod", 1, NULL, XT_HMARK_MODULUS },
> + { "hmark-offs", 1, NULL, XT_HMARK_OFFSET },
> + { "hmark-dnat", 1, NULL, XT_HMARK_USE_DNAT },
> + { "hmark-snat", 1, NULL, XT_HMARK_USE_SNAT },
> + { "smask", 1, NULL, XT_HMARK_SADR_AND },
> + { "dmask", 1, NULL, XT_HMARK_DADR_AND },
> + { "sp-mask", 1, NULL, XT_HMARK_SPORT_AND },
> + { "dp-mask", 1, NULL, XT_HMARK_DPORT_AND },
> + { "spi-mask", 1, NULL, XT_HMARK_SPI_AND },
> + { "sp-set", 1, NULL, XT_HMARK_SPORT_OR },
> + { "dp-set", 1, NULL, XT_HMARK_DPORT_OR },
> + { "spi-set", 1, NULL, XT_HMARK_SPI_OR },
> + { "proto-mask", 1, NULL, XT_HMARK_PROTO_AND },
> + { "rnd", 1, NULL, XT_HMARK_RND },
> + { "mod", 1, NULL, XT_HMARK_MODULUS },
> + { "offs", 1, NULL, XT_HMARK_OFFSET },
> + { "dnat", 1, NULL, XT_HMARK_USE_DNAT },
> + { "snat", 1, NULL, XT_HMARK_USE_SNAT },
> + { .name = NULL }
> +};
> +
> +static int
> +HMARK_parse(int c, char **argv, int invert, unsigned int *flags,
> + const void *entry, struct xt_entry_target **target)
> +{
> + struct xt_hmark_info *hmarkinfo
> + = (struct xt_hmark_info *)(*target)->data;
> + unsigned int value = 0xffffffff;
> + unsigned int maxint = UINT32_MAX;
> +
> + if ((c < XT_HMARK_SADR_AND) || (c > XT_HMARK_OFFSET)) {
> + xtables_error(PARAMETER_PROBLEM, "Bad HMARK option \"%s\"",
> + optarg);
> + return 0;
> + }
> +
> + if (c >= XT_HMARK_SPORT_AND && c <= XT_HMARK_DPORT_OR)
> + maxint = UINT16_MAX;
> + else if (c == XT_HMARK_PROTO_AND)
> + maxint = UINT8_MAX;
> +
> + if (!xtables_strtoui(optarg, NULL, &value, 0, maxint))
> + xtables_error(PARAMETER_PROBLEM, "Bad HMARK value \"%s\"",
> + optarg);
> +
> + if (*flags == 0) {
> + memset(hmarkinfo, 0xff, sizeof(struct xt_hmark_info));
> + hmarkinfo->pset.v32 = 0;
> + hmarkinfo->flags = 0;
> + hmarkinfo->spiset = 0;
> + hmarkinfo->hoffs = 0;
> + hmarkinfo->hashrnd = DEF_HRAND;
> + }
> + switch (c) {
> + case XT_HMARK_SADR_AND:
> + if (*flags & (1 << c)) {
> + xtables_error(PARAMETER_PROBLEM,
> + "Can only specify "
> + "`--hmark-smask' once");
> + }
> + hmarkinfo->smask = htonl(value);
> + if (value == maxint)
> + c = 0;
> + break;
Please, check current iptables git tree. Jan implemented more advanced
method to handle options. For instance, have a look at libxt_cluster.c
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v2 PATCH 1/2] NETFILTER module xt_hmark new target for HASH based fw
2011-11-07 0:52 ` Pablo Neira Ayuso
@ 2011-11-07 3:36 ` Jan Engelhardt
0 siblings, 0 replies; 14+ messages in thread
From: Jan Engelhardt @ 2011-11-07 3:36 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: Hans Schillstrom, kaber, netfilter-devel, netdev, hans
On Monday 2011-11-07 01:52, Pablo Neira Ayuso wrote:
>> +static __u32 get_hash(struct sk_buff *skb, struct xt_hmark_info *info)
>> +{
>> + int nhoff, hash = 0, poff, proto, frag = 0;
>> + struct iphdr *ip;
>> + u8 ip_proto;
>> + u32 addr1, addr2, ihl;
>> + u16 snatport = 0, dnatport = 0;
>> + union {
>> + u32 v32;
>> + u16 v16[2];
>> + } ports;
>> +
>> + nhoff = skb_network_offset(skb);
>> + proto = skb->protocol;
>> +
>> + if (!proto && skb->sk) {
>> + if (skb->sk->sk_family == AF_INET)
>> + proto = __constant_htons(ETH_P_IP);
>> + else if (skb->sk->sk_family == AF_INET6)
>> + proto = __constant_htons(ETH_P_IPV6);
>
>You already have the layer3 protocol number in xt_action_param. No
>need to use the socket information then.
xt_action_param.family (NFPROTO_) is not the same class af AF_ or ETH_.
Though, wouldn't proto = skb->proto; just be simpler here?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v2 PATCH 1/2] NETFILTER module xt_hmark new target for HASH based fw
2011-11-16 9:28 ` Hans Schillstrom
@ 2011-11-16 10:50 ` Pablo Neira Ayuso
0 siblings, 0 replies; 14+ messages in thread
From: Pablo Neira Ayuso @ 2011-11-16 10:50 UTC (permalink / raw)
To: Hans Schillstrom
Cc: Hans Schillstrom, kaber, jengelh, netfilter-devel, netdev
On Wed, Nov 16, 2011 at 10:28:42AM +0100, Hans Schillstrom wrote:
> I have some problems with the generator...,
> so I did some simple iperf tcp test with KVM:s i.e. standart tcp setup
I have some simple http traffic generator in case that you want to use
it:
http://1984.lsi.us.es/git/?p=http-client-benchmark/.git;a=summary
You have to use it with willy tarreau's httpterm in the server side.
Iperf is fine, I just wanted to share with you what I use it for
generating traffic.
It allows more interesting evaluation like making experiments with
very short flows and long ones. I think iperf doesn't allow that?
Well, I didn't look at it since long time ago but I remember that it
didn't fit with my needs. Evaluating only throught was not enough for
me. Sessions per seconds are also an interesting value to take into
account in my case.
> iptables just one rule
> -A PREROUTING -d 10.0.0.10/32 -j HMARK --hmark-mod 0x2 --hmark-offs 0x64
>
> Some typical values shows ~8% degradation with conntrack loaded
>
>
> a) Without conntrack loaded
>
> [ 3] 0.0-10.0 sec 83.5 MBytes 70.0 Mbits/sec
>
>
> b) With conntrack loaded (no iptable rules in use --ctstate or -m conntrack)
>
> [ 3] 0.0-10.0 sec 78.0 MBytes 65.4 Mbits/sec
>
> c) With iptables rule in use
> iptables -t mangle -A PREROUTING -d 10.0.0.10 -m conntrack --ctstate NEW -j HMARK --mod 2 --offs 100
> iptables -t mangle -A PREROUTING -d 10.0.0.10 -m conntrack --ctstate ESTABLISHED,RELATED -j HMARK --mod 2 --offs 100
> iptables -t mangle -A PREROUTING -d 10.0.0.10 -m conntrack --ctstate INVALID -j DROP
You have to use connmark so we can skip the hashing in the
established case, otherwise conntrack is a clear loser :-). The point
of this experiment is to see if hashing every single packet is more
performance than hashing only the first one and using the ctmark for
established connections (so we only hash once!).
Eric Leblond wrote a nice documentation on connmark in his blog:
http://home.regit.org/netfilter-en/netfilter-connmark/
BTW, that related in the NEW rule. related is similar to New but for
the first packet of a related connection.
> [ 3] 0.0-10.0 sec 77.4 MBytes 64.9 Mbits/sec
>
>
> A clean KVM with 3.2.0-rc1 kernel with virt-io
> Module Size Used by Not tainted
> nf_conntrack_ipv4 16731 1
> nf_defrag_ipv4 12436 1 nf_conntrack_ipv4
> xt_conntrack 12390 1
> xt_hmark 12390 1
> iptable_mangle 12390 1
> ip_tables 20755 1 iptable_mangle
> ipip 16515 0
> tunnel4 12484 1 ipip
Thanks for taking the time to evaluate this. If you can repeat the
experiments with my comments, we can get interesting conclusions like
demystifying the fact that at conntrack is not that bad in terms of
throughput if you take advantage appropriately of what it provides ;-)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v2 PATCH 1/2] NETFILTER module xt_hmark new target for HASH based fw
2011-11-09 14:39 ` Pablo Neira Ayuso
@ 2011-11-16 9:28 ` Hans Schillstrom
2011-11-16 10:50 ` Pablo Neira Ayuso
0 siblings, 1 reply; 14+ messages in thread
From: Hans Schillstrom @ 2011-11-16 9:28 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Hans Schillstrom, kaber, jengelh, netfilter-devel, netdev
Hello Pablo
On Wednesday, November 09, 2011 15:39:22 Pablo Neira Ayuso wrote:
> On Tue, Nov 08, 2011 at 04:12:27PM +0100, Hans Schillstrom wrote:
> > >BTW, do you have some number of this running with and without
> > >conntrack? It would be interesting to have.
> >
> > I didn't save them, but I can make a new benchmark later on.
>
> Thanks, I'm interested in them. It can be just xt_HMARK with and
> without conntrack enabled. Also make sure that you use stateful
> rule-set if conntrack is enabled (thus, resulting in hashing only
> once, not every packet). Otherwise, conntrack will not provide
> any improvement.
>
I have some problems with the generator...,
so I did some simple iperf tcp test with KVM:s i.e. standart tcp setup
iptables just one rule
-A PREROUTING -d 10.0.0.10/32 -j HMARK --hmark-mod 0x2 --hmark-offs 0x64
Some typical values shows ~8% degradation with conntrack loaded
a) Without conntrack loaded
[ 3] 0.0-10.0 sec 83.5 MBytes 70.0 Mbits/sec
b) With conntrack loaded (no iptable rules in use --ctstate or -m conntrack)
[ 3] 0.0-10.0 sec 78.0 MBytes 65.4 Mbits/sec
c) With iptables rule in use
iptables -t mangle -A PREROUTING -d 10.0.0.10 -m conntrack --ctstate NEW -j HMARK --mod 2 --offs 100
iptables -t mangle -A PREROUTING -d 10.0.0.10 -m conntrack --ctstate ESTABLISHED,RELATED -j HMARK --mod 2 --offs 100
iptables -t mangle -A PREROUTING -d 10.0.0.10 -m conntrack --ctstate INVALID -j DROP
[ 3] 0.0-10.0 sec 77.4 MBytes 64.9 Mbits/sec
A clean KVM with 3.2.0-rc1 kernel with virt-io
Module Size Used by Not tainted
nf_conntrack_ipv4 16731 1
nf_defrag_ipv4 12436 1 nf_conntrack_ipv4
xt_conntrack 12390 1
xt_hmark 12390 1
iptable_mangle 12390 1
ip_tables 20755 1 iptable_mangle
ipip 16515 0
tunnel4 12484 1 ipip
/Hans
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v2 PATCH 1/2] NETFILTER module xt_hmark new target for HASH based fw
2011-11-14 11:38 ` Jan Engelhardt
@ 2011-11-15 10:01 ` Pablo Neira Ayuso
0 siblings, 0 replies; 14+ messages in thread
From: Pablo Neira Ayuso @ 2011-11-15 10:01 UTC (permalink / raw)
To: Jan Engelhardt
Cc: Hans Schillstrom, Hans Schillstrom, kaber, netfilter-devel, netdev
On Mon, Nov 14, 2011 at 12:38:28PM +0100, Jan Engelhardt wrote:
> On Monday 2011-11-14 10:19, Hans Schillstrom wrote:
>
> >
> >On Sunday, November 13, 2011 18:05:28 Pablo Neira Ayuso wrote:
> >> BTW, I think you should split xt_HMARK to ipt_HMARK and ip6t_HMARK
> >> (see recent Florian Westphal patches regarding reserve lookup for
> >> instance).
> >>
> >> The IPv4 and IPv6 parts for HMARK look so different that I don't think
> >> it makes sense to keep them into one single xt_HMARK thing with all
> >> those conditional ifdefs for IPV6.
> >>
> >Ok I'll do that, for some reason a thought it was better with one module.
>
> So do I. The module overhead is so much larger.
Yes, it will if both modules are loaded.
I think it depends, if you only load IPv4 support, the overhead will
be smaller than having everything into one module.
But I'm open to more discussion on this, of course.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v2 PATCH 1/2] NETFILTER module xt_hmark new target for HASH based fw
2011-11-14 9:19 ` Hans Schillstrom
@ 2011-11-14 11:38 ` Jan Engelhardt
2011-11-15 10:01 ` Pablo Neira Ayuso
0 siblings, 1 reply; 14+ messages in thread
From: Jan Engelhardt @ 2011-11-14 11:38 UTC (permalink / raw)
To: Hans Schillstrom
Cc: Pablo Neira Ayuso, Hans Schillstrom, kaber, netfilter-devel, netdev
On Monday 2011-11-14 10:19, Hans Schillstrom wrote:
>
>On Sunday, November 13, 2011 18:05:28 Pablo Neira Ayuso wrote:
>> BTW, I think you should split xt_HMARK to ipt_HMARK and ip6t_HMARK
>> (see recent Florian Westphal patches regarding reserve lookup for
>> instance).
>>
>> The IPv4 and IPv6 parts for HMARK look so different that I don't think
>> it makes sense to keep them into one single xt_HMARK thing with all
>> those conditional ifdefs for IPV6.
>>
>Ok I'll do that, for some reason a thought it was better with one module.
So do I. The module overhead is so much larger.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v2 PATCH 1/2] NETFILTER module xt_hmark new target for HASH based fw
2011-11-13 17:05 ` Pablo Neira Ayuso
@ 2011-11-14 9:19 ` Hans Schillstrom
2011-11-14 11:38 ` Jan Engelhardt
0 siblings, 1 reply; 14+ messages in thread
From: Hans Schillstrom @ 2011-11-14 9:19 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Hans Schillstrom, kaber, jengelh, netfilter-devel, netdev
On Sunday, November 13, 2011 18:05:28 Pablo Neira Ayuso wrote:
> BTW, I think you should split xt_HMARK to ipt_HMARK and ip6t_HMARK
> (see recent Florian Westphal patches regarding reserve lookup for
> instance).
>
> The IPv4 and IPv6 parts for HMARK look so different that I don't think
> it makes sense to keep them into one single xt_HMARK thing with all
> those conditional ifdefs for IPV6.
>
Ok I'll do that, for some reason a thought it was better with one module.
--
Hans
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v2 PATCH 1/2] NETFILTER module xt_hmark new target for HASH based fw
2011-11-08 10:51 ` Pablo Neira Ayuso
@ 2011-11-13 17:05 ` Pablo Neira Ayuso
2011-11-14 9:19 ` Hans Schillstrom
0 siblings, 1 reply; 14+ messages in thread
From: Pablo Neira Ayuso @ 2011-11-13 17:05 UTC (permalink / raw)
To: Hans Schillstrom
Cc: Hans Schillstrom, kaber, jengelh, netfilter-devel, netdev
BTW, I think you should split xt_HMARK to ipt_HMARK and ip6t_HMARK
(see recent Florian Westphal patches regarding reserve lookup for
instance).
The IPv4 and IPv6 parts for HMARK look so different that I don't think
it makes sense to keep them into one single xt_HMARK thing with all
those conditional ifdefs for IPV6.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v2 PATCH 1/2] NETFILTER module xt_hmark new target for HASH based fw
2011-11-08 15:12 Re[2]: " Hans Schillstrom
@ 2011-11-09 14:39 ` Pablo Neira Ayuso
2011-11-16 9:28 ` Hans Schillstrom
0 siblings, 1 reply; 14+ messages in thread
From: Pablo Neira Ayuso @ 2011-11-09 14:39 UTC (permalink / raw)
To: Hans Schillstrom
Cc: Hans Schillstrom, kaber, jengelh, netfilter-devel, netdev
On Tue, Nov 08, 2011 at 04:12:27PM +0100, Hans Schillstrom wrote:
> >BTW, do you have some number of this running with and without
> >conntrack? It would be interesting to have.
>
> I didn't save them, but I can make a new benchmark later on.
Thanks, I'm interested in them. It can be just xt_HMARK with and
without conntrack enabled. Also make sure that you use stateful
rule-set if conntrack is enabled (thus, resulting in hashing only
once, not every packet). Otherwise, conntrack will not provide
any improvement.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v2 PATCH 1/2] NETFILTER module xt_hmark new target for HASH based fw
2011-11-07 23:29 Re[2]: [v2 PATCH 1/2] NETFILTER module xt_hmark new target for HASH based fw Hans Schillstrom
@ 2011-11-08 10:51 ` Pablo Neira Ayuso
2011-11-13 17:05 ` Pablo Neira Ayuso
0 siblings, 1 reply; 14+ messages in thread
From: Pablo Neira Ayuso @ 2011-11-08 10:51 UTC (permalink / raw)
To: Hans Schillstrom
Cc: Hans Schillstrom, kaber, jengelh, netfilter-devel, netdev
On Tue, Nov 08, 2011 at 12:29:53AM +0100, Hans Schillstrom wrote:
> >We prefer skb_header_pointer instead. If conntrack is enabled, we can
> >benefit from defragmention.
>
> In our case conntrack will not be there
Yes, but if conntrack is there, we benefit from fragment reassembly if
you use skb_header_pointer.
> >Please, replace all pskb_may_pull by skb_header_pointer in this code.
> >
> >We can assume that the IP header is linear (not fragmented).
>
> I ran in to this issue in IPv6 testing so I got a little bit "paranoid".
> Are you sure that the embedded IP and L4 header in the ICMP msg also is unfragmented.
> Is this true for both IPv6 & IPv4 ?
No sorry. I was refering to normal IP header in one packet.
> From what I remember when I was testing IPv6 icmp and digged into the original header (on a 2.6.32 kernel)
> pskb_may_pull was needed.
Yes, it is indeed needed.
> [snip]
>
> >> +/*
> >> + * Calc hash value, special casre is taken on icmp and fragmented messages
> >> + * i.e. fragmented messages don't use ports.
> >> + */
> >> +static __u32 get_hash(struct sk_buff *skb, struct xt_hmark_info *info)
> >
> >This function seems to big to me, please, split it into smaller
> >chunks, like get_hash_ipv4, get_hash_ipv6 and get_hash_ports.
> >
>
> Good catch I'll do that,
>
> >> +{
> >> + int nhoff, hash = 0, poff, proto, frag = 0;
> >> + struct iphdr *ip;
> >> + u8 ip_proto;
> >> + u32 addr1, addr2, ihl;
> >> + u16 snatport = 0, dnatport = 0;
> >> + union {
> >> + u32 v32;
> >> + u16 v16[2];
> >> + } ports;
> >> +
> >> + nhoff = skb_network_offset(skb);
> >> + proto = skb->protocol;
> >> +
> >> + if (!proto && skb->sk) {
> >> + if (skb->sk->sk_family == AF_INET)
> >> + proto = __constant_htons(ETH_P_IP);
> >> + else if (skb->sk->sk_family == AF_INET6)
> >> + proto = __constant_htons(ETH_P_IPV6);
> >
> >You already have the layer3 protocol number in xt_action_param. No
> >need to use the socket information then.
>
> When splitting get_hash() above will be removed, xt_action_param ->family will be used for selection.
>
> [snip]
> >> +
> >> + if (!ct || !nf_ct_is_confirmed(ct))
> >
> >You seem to (ab)use nf_ct_is_confirmed to make sure you're not in the
> >original direction. Better use the direction that you get by means of
> >nf_ct_get.
> >
> I'm not sure I follow you here ?
OK, why are you using nf_ct_is_confirmed here? :-)
> >> + break;
> >> + otuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
> >> + /* On the "return flow", to get the original address
> >> + * i,e, replace the source address.
> >> + */
> >> + if (ct->status & IPS_DST_NAT &&
> >> + info->flags & XT_HMARK_USE_DNAT) {
> >> + rtuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
> >> + addr1 = (__force u32) otuple->dst.u3.in.s_addr;
> >> + dnatport = otuple->dst.u.udp.port;
> >> + }
> >> + /* On the "return flow", to get the original address
> >> + * i,e, replace the destination address.
> >> + */
> >> + if (ct->status & IPS_SRC_NAT &&
> >> + info->flags & XT_HMARK_USE_SNAT) {
> >> + rtuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
> >> + addr2 = (__force u32) otuple->src.u3.in.s_addr;
> >> + snatport = otuple->src.u.udp.port;
> >> + }
> >> + break;
> >> + }
>
> [snip]
>
> >> + case NEXTHDR_NONE:
> >> + nhoff += hdrlen;
> >> + goto hdr_rdy;
> >> + default:
> >> + goto done;
> >
> >This goto doesn't make too much sense to me, better return 0.
>
> hmmm
> kind of left overs, Actually all "goto done" can be replaced by return 0
no problem, just comestic change ;-)
> [snip]
>
> >> +done:
> >> + return 0;
> >> +}
> >
> >I'll try to find more time to look into this. Specifically, I want to
> >review the IPv6 bits more carefully.
>
> The IPv6 header recursion is not obvious, and it's hard to test all cases :-)
>
> I really appreciate you review
Welcome, let's see if we can get this into 3.3 since we cannot make it
for 3.2.
BTW, do you have some number of this running with and without
conntrack? It would be interesting to have.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2011-11-16 10:50 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-03 17:46 [v2 PATCH 0/2] NETFILTER new target module, HMARK Hans Schillstrom
2011-10-03 17:46 ` [v2 PATCH 1/2] NETFILTER module xt_hmark new target for HASH based fw Hans Schillstrom
2011-11-07 0:52 ` Pablo Neira Ayuso
2011-11-07 3:36 ` Jan Engelhardt
2011-10-03 17:46 ` [v2 PATCH 2/2] NETFILTER userspace part for target HMARK Hans Schillstrom
2011-11-07 0:55 ` Pablo Neira Ayuso
2011-11-07 23:29 Re[2]: [v2 PATCH 1/2] NETFILTER module xt_hmark new target for HASH based fw Hans Schillstrom
2011-11-08 10:51 ` Pablo Neira Ayuso
2011-11-13 17:05 ` Pablo Neira Ayuso
2011-11-14 9:19 ` Hans Schillstrom
2011-11-14 11:38 ` Jan Engelhardt
2011-11-15 10:01 ` Pablo Neira Ayuso
2011-11-08 15:12 Re[2]: " Hans Schillstrom
2011-11-09 14:39 ` Pablo Neira Ayuso
2011-11-16 9:28 ` Hans Schillstrom
2011-11-16 10:50 ` Pablo Neira Ayuso
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).